神刀安全网

Productionizing GraphQL.js: How we protect customer data and site uptime

We recently released Fabric mission control , a new dashboard on fabric.io that shows the pulse of all your apps. In order to build mission control, we needed to query for metrics across multiple apps, which would’ve been very difficult to integrate into our existing REST endpoints. Therefore, we decided to create a new service to handle GraphQL queries ( Fin Hopkins went over GraphQL basics and introduced our usage of it in aprevious post ). To do this we used GraphQL.js (a library that serves as the reference implementation of the GraphQL specification) as our core query resolver. We were incredibly happy with GraphQL.js as it’s flexible and easy to work with , however, little is prescribed in terms of production operation.

Before putting this new tool into production, we needed to solve two common problems that almost all developers face: how can we protect our customer’s data and be confident that abuse will not impact the availability of the site?

Infrastructure context

Let’s start by defining exactly what proper access control is: it is knowing both who is making a request ( authenticating them) and if they’re authorized to perform the operation they’re requesting. GraphQL.js can easily be exposed as an HTTP service using express-graphql (an adapter library that provides an Express handler). Express is a popular framework for writing HTTP services with Node.js, so we had plenty of options for common concerns like authentication, logging, and CORS. We used OAuth 2 to authenticate requests, but still needed to figure out how to authorize access to fields in our schema to ensure that we protect our customer’s data.

For some extra context, we store data across a number of backend systems. Rather than embed access rules in each of them, we decided to consolidate authorization logic in our GraphQL server. Having a single set of access rules for resources reduces the responsibilities of the services that own those resources. The flexibility of GraphQL.js allowed us to integrate these rules with our schema to build a single representation of the available data and the conditions that allow access to it.

Schema basics: GraphQL.js 101

Before we dive into our solution, here’s a refresher on GraphQL.js. GraphQL requires a schema that describes the entities in your system and their relationships with each other. You can define a new type with GraphQL.js using the GraphQLObjectType constructor:

export default new GraphQLObjectType({  name: ‘App’,  fields: () => ({    identifier: {      type: GraphQLString,    },  }), });

Each field on a GraphQL type has an associated function (called its resolver function) to fetch the value for that field from the input data (called the source ). Many resolvers are simple — they just perform a field lookup off the source data (this is the default behavior and in this case the resolver can be omitted). Resolvers can also perform arbitrary complex operations — such as making a request to another service and parsing the response.

In addition to the source data, resolvers are also called with a context object. It contains data specific to an individual GraphQL request. We use it to inject caches, a logger, and information about the authenticated account.

Authorizing access to the GraphQL schema

Now that you’re all caught up on GraphQL.js let’s take a look at how we added authorization to our schema.

A schema’s resolvers are the key abstraction through which its behavior is defined. So we modified the resolve functions in our schema to check authorization. Given this type in our schema:

export default new GraphQLObjectType({  name: ‘App’,  fields: () => ({    identifier: {      type: GraphQLString,      resolve: (source) => source.identifier,    },  }), });

Here’s how we added authorization checks to the resolve function:

export default new GraphQLObjectType({  name: ‘App’,  fields: () => ({    identifier: {      type: GraphQLString,      resolve: (source, _args, ctx) => {        If (ctx.account.canViewApp(source)) {          return source.identifier;        }      },    },  }), });

And since resolvers are just functions we can modify their behavior using function composition, which allows us to abstract this pattern:

export default new GraphQLObjectType({  name: ‘App’,  fields: () => ({    identifier: {      type: GraphQLString,      resolve: makeAuthorizedResolver(        (source, _args, ctx) => ctx.account.canViewApp(source),        (source) =>  source.identifier      ),    },  }), });    function makeAuthorizedResolver(authorizationCheckFunc, resolveFunc) {   return function authorizedResolver(...args) {     if (authorizationCheckFunc.apply(this, args)) {       return this.resolveFunc.apply(this, args);     }   } }

We even factored the authorization checker out so it could be reused:

export default new GraphQLObjectType({  name: ‘App’,  fields: () => ({    identifier: {      type: GraphQLString,      resolve: makeAuthorizedResolver(hasAccessToApp, (source) => {        return source.identifier,      }),    },  }), });  function hasAccessToApp(source, _args, ctx) {  return ctx.account.canViewApp(source); }  function makeAuthorizedResolver(authorizationCheckFunc, resolveFunc) {   return function authorizedResolver(...args) {     if (authorizationCheckFunc.apply(this, args)) {       return this.resolveFunc.apply(this, args);     }   } }

The authorization primitives we wrote are highly reusable which makes it easy to check access across the schema. Using makeAuthorizedResolver ensures that we’re checking authorization consistently across the code base. We’ve found that the best way to prevent errors is to build systems that make it easy to do the right thing. But we wanted to take things a step further; to not only make it easy to check authorization on every field but to require it .

Mandating authorization in GraphQL schemas

Requiring explicit authorization on every field in the schema gives us a strong guarantee that we’re controlling access to our customer’s data. To do this we took advantage of the flexibility of JavaScript to build a new abstraction into our resolve functions. Since functions are objects we can assign new fields to them at runtime — in this case information to mark them as a special “authorization” function:

function makeAuthorizedResolver(authorizationCheckFunc, resolveFunc) {   const resolver = function authorizedResolver(...args) {     if (authorizationCheckFunc.apply(this, args)) {       return this.resolveFunc.apply(this, args);     }   };   resolver.checksAuthorization = true;   return resolver; }

If a field in a type we define is missing a resolve function tagged as one that checks authorization, then tests will fail. This moves authorization to the forefront of an engineer’s mind as they make changes to the schema.

We even generalized this into a facility for checking arbitrary properties on our resolvers by “tagging” additional fields onto the resolver function and verifying their presence in tests. This is a shared code base (meaning engineers may not have all the necessary context when making changes), and people make mistakes. Verifying our schema in tests helps us ensure that the appropriate guard rails are in place to prevent human error from taking down the site, and has proved to be an effective strategy as we’ve increased the number of developers working on our GraphQL server.

Preventing abuse without sacrificing agility

After implementing authorization in our schema we moved on to figure out how to protect our GraphQL server from abuse. Since it handles queries for arbitrary data served by multiple backend services a bad query could impact other systems that our customers rely on 24/7. Even if no one makes intentionally malicious requests, there’s still the chance that we’d accidentally ship a bad query to production (could you imagine taking down the site due to a preventable mistake?). However, we didn’t want to put up fences that would get in the way of product development. We identified and implemented a couple of key techniques to protect the availability of our system while maintaining the properties that allowed us to work fast with GraphQL.

First, we restricted the total concurrency of a single request. Since our GraphQL server is dispatching requests to other services asynchronously a single query can result in many concurrent downstream requests. We’re primarily concerned with limiting the impact to other services, and preventing requests from getting made felt more straightforward than parsing a query to attempt to infer its impact statically. Large queries have little effect on our GraphQL service itself as it’s a service composition layer with little logic of its own. We can easily scale it out if needed since it’s totally stateless (our backend services require more attention to increase the capacity of their data stores as well).

To enforce this restriction, requests to other services must first acquire a permit on a per-query semaphore before executing . The total concurrency per query is limited by the number of available permits.

We also added a limit to the total execution time of a request. Long running queries are stopped after this timeout elapses. This, along with the concurrency limit, places an upper bound on the concurrent work allowed by a single inbound request — in effect limiting the “breadth” of a query.

We can use these limits to understand the worst case performance of our system. Understanding and limiting the work that could be done by the service allows us to be good clients . These restrictions are all instrumented so we can keep track of their impact and understand if we need to adjust them. We’ve found that these techniques have provided a good degree of safety against overwhelming the server, without getting in the way as we write queries. The instrumentation we added makes it clear during development if a query gets restricted, giving the developer the chance to improve it before going to production.

Success!

Overall we’ve been really pleased working with GraphQL. It was great to quickly integrate GraphQL with other services that allowed us to rapidly iterate on customer features, rather than non-user facing infrastructure. Head over to fabric.io  to see it in action!

We’re helping our customers build the best apps in the world. Want to be a part of the Fabric team and build awesome stuff? Check out our open positions!

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Productionizing GraphQL.js: How we protect customer data and site uptime

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址