In one of the previous posts about the state of modern web applications security I mentioned GraphQL – a new technology for building APIs developed by Facebook. GraphQL is rapidly gaining popularity, more and more services switch to this technology, both web and mobile applications. Some of the GraphQL users are: GitHub, Shopify, Pintereset, HackerOne and many more. You can find many posts about GraphQL benefits and advantages over classic REST API on the internet, however there is not so much information about GraphQL security considerations. In this post I would like to elaborate on GraphQL: how it works, what the weak points are, how an attacker can abuse them, and which tools can be used.
Intro to GraphQL
So, what is GraphQL? GraphQL is not a database, nor is a storage model. According to official site: “GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data”. GraphQL has several distinct key points which make it special. In REST API you typically have a number of endpoints on the backend that can return some data based on the input parameters. Contrary to this, GraphQL enables frontend client to query whatever data it needs from the backend, provided that this data is defined by the schema. Unlike REST API with varying implementations, GraphQL is based on a well-described specification which clearly defines the rules, so that the client already knows everything about the data it is going to retrieve before sending the request. GraphQL is strongly typed, which is great for security. All the relations between objects, their types, and properties are defined in the schema. The schema itself supports meta properties which makes it highly introspective. It means that you can retrieve the whole structure of your API with a single request to GraphQL endpoint. For frontend clients GraphQL responses are very easy to consume, because GraphQL is hierarchical and will return exactly the same structure that was requested by the client.
GraphQL is not only about querying data, it also supports data modification, which is possible with “mutations”. Each mutation and query can be parameterized with variable values, which are also strongly typed.
The syntax of GraphQL queries is easy to grasp, it looks like this:
And response is the simple JSON:
GraphQL queries are usually wrapped in JSON with two fields: “query” and “variables”. To learn more about GraphQL syntax and its features, you can refer to the official documentation.
Messing with GraphQL endpoints
When you open Developer Tools, and see requests to a “/graphql” endpoint, you already know what’s up. To make it easier to understand requests, you can use GraphQL-Network Chrome extension:
But we do not want just to watch requests, we want to modify them, right? Burp Suite will not be very helpful, because it does not understand GraphQL yet. You can try to request “/graphiql” endpoint (pay attention to “i” character). Sometimes, you will find there a GraphiQL app left by developers which is an in-browser IDE for exploring GraphQL and is useful for debugging.
However, most likely you won’t. This is not a problem since GraphQL endpoints can be requested from anywhere, provided that Cross Origin Resource Sharing policy allows it (and it usually does). We will use the tool called “GraphQL IDE” which is an Electron app for GraphiQL but allows to send requests to any address. After adding new environment with the URL of GraphQL endpoint and all the necessary Cookies, tokens, and variables, we are good to go. GraphQL IDE will automatically fetch the schema using introspection capabilities that are provided by GraphQL. If you are curious how the query looks like, you can find it here. From now, you can switch to “Documentation” page and start exploring the structure of API.
There are two root types: query and mutation. You will have to manually construct them, but it is really easy once you understand examples from the docs. So, what shall we look for?
Broken Access Control/Insecure Direct Object Reference
GraphQL does not provide any specific means for securing you data by design. It is the developer who is charge of implementing access control. GraphQL is database agnostic, and a developer has to write “resolvers” which will map the data to the queries for the database of his choice. Resolvers may contain ACL-related flaws and IDORs, so a natural thing to do is to try to retrieve data that is not intended for you. For example:
If a custom resolver fails to properly authorize user request, we will get personal details of some other user.
Developers love detailed debug information, and very often you can see stack traces, full path disclosures and various types of information leakage when you try to perform an illegal GraphQL query:
Secondly, you can try to retrieve password hashes, hidden data and the fields that are not intended to be accessible. So, be sure to examine GraphQL schema carefully.
GraphQL is a layer between client and actual database. Backend database is usually NoSQL, namely mongodb, but sometimes classic SQL relational databases are used, e.g. PostgreSQL. As you already know, GraphQL has variables that are supplied to the queries. If a resolver does not properly sanitize these variables before using in the target query to the database, it is possible to inject malicious operators, i.e. turning it into an SQL injection. If a NoSQL database is used, things are more complicated since you cannot juggle types due to the schema type definitions, in other words you won’t be able convert a string into an array for a mongodb injection. But with an SQL backend, it is as simple as:
GraphQL currently is a very young technology, but it is a sure thing that it will be getting wide spread. More GraphQL-specific attack tools and techniques should be expected in the nearest future. Hope this post helped to understand the ideas behind GraphQL and some of the directions you can follow while investigating GraphQL endpoints.
Also published on Medium.