Building Fullstack, Serverless GraphQL APIs in the Cloud Tyler Charboneau June 25, 2019 A powerful query language for APIs, GraphQL has long been appreciated for its data-integration capabilities. GraphQL allows developers to extract pertinent data from their APIs without added fluff. In turn, apps can precisely control what data they get. Whereas traditional RESTful setups rely on server requests, GraphQL assumes control of data retrieval. That granular control is supremely beneficial to both developers and end users. This keeps apps performant, stable, and scalable as databases grow. William Lyon, a software developer at Neo4j, regularly helps developers build fullstack GraphQL applications. With proper use of tooling and data visualization, this development avenue holds an immense amount of promise for demanding applications. Companies like Facebook, GitHub, Pinterest, and Intuit incorporate GraphQL platform-wide. Conversely, even small teams can harness the language to create capable APIs. During his presentation at our 2019 Austin API Summit, William outlined how developers can do just that. Watch William Lyon from Neo4j present at the 2019 Austin API Summit: GraphQL Overview and Development Tools Lyon kicks things off with a brief rundown of useful development tools for GraphQL API deployment. These services include GraphQL, React, Apollo, and Neo4j Database. Each plays a key role in the build process. Together, these form a single GraphQL framework called GRANDstack. To begin, let’s provide an overview of Neo4j – the data layer for this project: Neo4j uses graphs for data models, as opposed to tables or documents. The platform also uses Cypher, which Lyon likens to SQL specialized for GraphQL. As depicted above, Cypher relies quite heavily on pattern matching within its logic. Graph databases in conjunction with Neo4j are useful for the following applications: Knowledge graphs Personalized recommendations Master data management Network and IT management Fraud detection Analytics Identity and access management Privacy and compliance Logistics and routing Graph-based search Already, we can observe some overlap with REST’s use cases, though GraphQL approaches these ends using different means. Network and IT management have their place within both methodologies. An interesting parallel lies with identity and access control. We often think of things like Role-Based Access Control (RBAC), which we’ve touched on in a piece regarding REST API best practices. These services are ubiquitous regardless of API-creation approach, granting GraphQL immense enterprise potential. The Basics of GraphQL But how is the language comprised? GraphQL utilizes type definitions which define collections of data structures. These groupings are referred to as schemas. Those experienced with SQL should find these familiar. For the purposes of our GraphQL API, these comprise the Schema Definition Language (SDL). From here, we assess the various data and fields available. These will form the specifications for our GraphQL API. The client can use those schemas to construct queries – the building blocks of GraphQL – which form your API calls. The results from those queries will match the fields the client requests. GraphQL is unique because it interprets your application data as a graph, as its namesake suggests. Consumers and clients should interpret the data in this fundamental way. Each node of data is interconnected in some manner. Those relationships make a process called introspection possible. This allows developers to query schemas and view how types and fields are connected. Information from those queries helps form the documentation for our APIs. Tools like GraphiQL and GraphQL Playground allow us to visualize these type structures, giving documentation greater clarity. What are the individual components of a GraphQL query? The API has entry points, which begin with an operation name and associated arguments. These arguments form instructions for the API, which specify the database portion we want to access. Say we want to retrieve data on a specific movie, from a database of different movies. For our argument, we would specify this movie title. Once we’ve determined and entered that, we add in types for our query. Those data types collective form the selection set. { Movie(tite: "River Runs Through It, A") { title actors(first: 2) { Name } genres { Name } directors { name movies(first: 3) { title } } } } The types in this selection set determine the specific data points our API will return. As seen in the figure above, querying the database for “River Runs Through It, A” will return the title, the first two actors listed, and their names. We can also add in parameters for genres, directors, or any other data types pertinent to the film. Conveniently, our API will not retrieve any information that we don’t explicitly request. This is the beauty of GraphQL. We can keep our queries efficient and clean, without accumulating extraneous API calls. It also alleviates the need to construct numerous, elaborate structures like we would in REST. Data is returned in a JSON format, using only fields that we request. That being said, Lyon stresses that GraphQL is an API-query language as opposed to a database query language. Consequently, we have relatively little expressivity within GraphQL. For example, projections, aggregations, and variable-length queries are limited. GraphQL is also data-layer agnostic. We can use a GraphQL API to access data regardless of how we store it, making the language flexible. A graph database isn’t necessary. Accordingly, GraphQL often draws upon data from multiple sources and pools it together under one API. Advantages and Disadvantages of GraphQL We know GraphQL is awesome for returning only the specific data we request. While other types of APIs might encounter issues with over-fetching, a GraphQL API doesn’t suffer this same fate. This keeps data transmission to a minimum, preserving bandwidth and resources. In that same respect, granular control allows us to prevent underfetching. If we write a query properly, we can bank on specified fields returning without issue. We can render app views and complete data requests with one round trip to the API, without clogging things up. GraphQL is also built upon sets of predefined specifications. REST APIs are more open-ended in this regard, introducing variation and often uncertainty into the build process. These ambiguities can lead to errors or confusion, especially for newer developers. Resultantly, GraphQL is more user given its structured approach to API construction. Data relationships are based on contextual importance, according to how types and fields relate. For example, it’s easy to look at a blog post and connect it to a given author. We approach all queries in this same way. In REST, we view data points as resources, which are somewhat less cut and dry. GraphQL builds upon the heuristics and logical associations we already draw between items. We can also build our queries around individual components in our applications. Disadvantages No system is perfect, however, and we have to illuminate some of GraphQL’s shortcomings. Many of the best practices from REST need not apply, separating the two approaches. Jumping between REST and GraphQL can be challenging. HTTP status codes are lacking, and accordingly, error codes give little context. While REST developers can provide customized error messages for clients, GraphQL APIs provide a universal 200 response. Web caching isn’t as effective. However, if we’re dealing with an authenticated API, caching may not be a huge concern. Lyon also addresses questions associated with arbitrary complexity, and how we handle those repercussions with the client. If the client can produce a query that’s unnecessarily complex, how can we mitigate this? Also, what performance implications might this introduce? There’s also an n+1 problem for queries, pertaining to list fetching and data matching. Say we retrieve a number of blog posts or a group of movies. Will we then need to request authors or directors for each from our database? These issues can be circumvented, though it does take a little work to do so. Finally, rate and complexity limiting raise uncertainties. There are solutions and best practices that address these issues, such as query limiting. In these instances, clients can only create queries from a predetermined list, locking requests down and mitigating performance problems. GraphQL API: Example and Build Process When first building out your API, using GraphQL Playground as a workspace can be extremely useful. This allows for real-time testing and modifications without pushing changes to your live application. It also allows you to flesh out queries, types, fields, and structured content without risk. For this example, we’re taking a look at William’s own project, a Discourse forum based around categorized blogs and user posts. Each query is connected to different types and fields relevant to the forum – such as author, username, screen name, and avatar. If we send a query in Playgrounds, we can view the results of our API call in the pane on the right. These returns are based on the parameters we set. In William’s case, he wanted to retrieve the first 10 open-source projects from the community. This query also retrieves title, URL, author name, and the associated Discourse credentials for that author: This information helps render a view for forum visitors, represented as information on a webpage or associated application. As information in the database changes as applicable to a given query, these renders will change accordingly. This means a simple database update will push new data to the user, should that query remain active within the app. For this reason, GraphQL APIs are good for generating content reliant on a database. This is true for both static and dynamic content. How to Build a GraphQL Service We can build instances to collect data from the community, this time from Neo4j. When a community member publishes new content based on Neo4j, this information can be parsed and added automatically to a database. From here, queries can access schemas within the database via our GraphQL API. Query and Mutation types help define these API entry points. After we define these type definitions, we move on to build GraphQL resolvers. These contain fetch logic for our GraphQL requests, which tell the API how to work properly: const resolvers = { Query: { topCommunityBlogsAndContent: (object, params, context) => { // TODO: check auth headers from context // TODO: query the database // TODO: validate / format response // TODO: return results } } }; These are important for authentication, validation, or any other processes that ensure proper data retrieval. They also make it possible to utilize ORMs for easy object manipulation. Once we solidify retrieval functionality, we can dictate how this data is formatted upon return. Queries can be made in Cypher format. This also allows us to handle error returns should they occur. const driver = neo4j.driver ( process.env. NEO4J_URI || `bolt://localhost:7867`, neo4j.auth.basic(“neo4j”, “neo4j”) ); Using Node.js as a backbone, the Apollo server allows us to serve schemas (type definitions and resolvers) by creating a database connection. Type definitions and resolvers are combined into an executable format under various schemas. Once Apollo is active, we can start running queries atop our data layer. This is how we make a GraphQL API in a standard way. However, keep an eye out for schema duplication, unneeded mapping, excess boilerplate code, and n+1 problems. GraphQL Engines and GRANDstack Starter GraphQL engines are essentially database integrations that simplify how we work with GraphQL. These are beneficial for cutting development time and reducing the learning curve associated with API deployment. Some of these plugins, like PostGraphile and Hasura, allow us to work with Postgres easier. AWS AppSync grants access to Amazon Web Services resources. Prisma is a useful tool for working with multiple databases simultaneously. Lyons advocates for the Neo4j-GraphQL engine. Its integration comes with some goals and caveats, which we’ll outline here: Enable GraphQL first development, including type definitions and schemas Generate Cypher from GraphQL, while ensuring one query accounts for a single round trip using our API Generate GraphQL-based CRUD APIs Promote a Cypher schema directive by extending GraphQL’s functionality type Query { sessionsBySubstring(string: String): [Session] @cypher( statement: “””MATCH (s:Session) WHERE toLower(s.description) CONTAINS toLower($string) OR toLower(s.name) CONTAINS toLower($string) RETURN s;”””) } These Cypher schema directives allow for custom logic and field annotation, which map to a Cypher query. The engine comes in JavaScript and Java versions, including a database plugin. Requests are generated by the client, are processed by the Apollo server, and then sent on to the Neo4j database. Data is fetched and returned to the client. GRANDstack Starter makes these code implementations even easier for developers. We begin with a schema and type definitions. We import any necessary libraries and start an express app, after which we pass type definitions to a makeAugmentedSchema function. This function provisions a CRUD GraphQL API while adding filtering and pagination. import { typeDefs } from “./graphql-schema”; import { ApolloServer } from “apollo-server-express”; import express from “express”; import { v1 as neo4j } from “neo4j-driver”; import { makeAugmentedSchema } from “neo4j-graphql-js”; import dotenv from “dotenv”; // set environment variables from ../.env dotenv.config(); const app = express(); /* * Create an executable GraphQL schema object from GraphQL type definitions * including autogenerated queries and mutations. * Optionally a config object can be included to specify which types to include * in generated queries and/or mutations. Read more in the docs: * https://grandstack.io/docs/neo4j-graphql-js-api.html#makeaugmentedschemaoptions-graphqlschema From here, we create a connection with the database. This is connected to the Apollo server, from which we can define various GraphQL endpoints. We can run this and review our schemas, including any mutations that may be present. These mutations account for any changes made to information present in a given database. If we design our entry points correctly, it’ll allow us to run any necessary CRUD operations with relative ease. Once we have everything up and running, processing complex queries is possible. We can write in a plethora of types and have our GraphQL API go hunting for the relevant data. This can include numerical information, text, and more. Returning to our code allows to match for different parameters, such as users and various businesses they’re writing reviews for (to name an example). This filtering allows us to return information more relevant to the client. The Benefits of Engines GraphQL engines are useful because they allow us to build greater functionality into our APIs. For one, we can achieve this via declarative integrations. This allows us to clearly define data-fetching procedures while incorporating middleware. Type definitions allow us to define database models. Auto-generated GraphQL APIs arise from our type definitions, which we can provision for data retrieval. As part of that, we can also enrich our schemas to present more detailed information to our clients – this data often holds more contextual importance as well. We can also create resolvers automatically while reducing the quantity of boilerplate code needed to get projects up and running. This saves time and slims things down, two hallmarks of the GraphQL methodology. So, how do these engines help generate database queries? The resolveInfo resolver argument contains information for the GraphQL query abstract syntax tree (AST), the schema objects and selection sets, variables, and more to build database queries. Going the Serverless Route If we’re looking to take a serverless approach to deployment, we’re led to many services that provide key functionality. These include: AWS Lambda Google Cloud Functions Serverless Framework Vercel Netlify Functions The tools built on top of these functions provide a unique developer experience. We can combine deployment of static client code with our serverless API using these options, making development easier. There are plenty of resources outlining how we can use these services. Querying GraphQL for the Client GRANDstack uses React to achieve this. Apollo provides front-end framework integrations, streamlining this process. We can also use Relay and urql. GraphiQL, Playground, fetch, and HTTP clients are also available as alternatives to Apollo. Our choice of HTTP client, should we go that route, depends on the unique features provided and how they stack up against our project goals. Caching and integrations are crucial considerations with these options. In the Neo4j community forum, they use fetch requests in conjunction with JavaScript to serve content and return data to the client. The React integration for Apollo is fairly straightforward. We create a new instance and direct it to various GraphQL endpoints. Authentication needs will determine header requirements. This Apollo instance is injected into the React component. This is similar to Redux integration. import ApolloClient from ‘apollo-boost’; Import { ApolloProvider } from ‘react-apollo’; const client = new ApolloClient({ url: “https://graphconnect-2018-graphql-api.now.sh/” }) Configuring Apollo Server to integrate seamlessly with React. Our <Query> component can then take a GraphQL query, or we can define our own fragments – unique selection sets for our queries – and combine them to React components. This promotes compartmentalization of code and makes management easier in the long run. The children of that define a response and render tables, outlining how we present returned data. Authorization in GraphQL Luckily, GraphQL provides a fair amount of options. We can use our resolvers to achieve these tasks, or create business logic around a data-access layer. This allows granular control over which clients can retrieve which data from our databases. Middleware can provide an added layer of security around our resolvers. Schema directives are also useful. Take Cypher, for example. Annotating our type definitions and creating rules provides quick authorization protocols. Resolver authorizations are easy to implement and fast to prototype with. However, we can duplicate our logic if we aren’t careful. Data-access layers provide a great deal of flexibility when processing requests, and allow for a single implementation. However, questions are raised when dealing with generated resolvers – especially relating to GraphQL engine tools. Wrapping resolvers allows us to define permissions together and unify authentication rules. These permissions must match resolvers, and generated resolvers can make matching trickier. Schema directives are declarative via annotation and mesh harmoniously with generated resolvers. This plays well with GraphQL engines. However, authorization rules are spread across the entire GraphQL schema. The latest API insights straight to your inbox