The advent of cloud computing has changed the way applications are being built, deployed and hosted. One important development in recent years has been the emergence of DevOps — a discipline at the crossroads between application development and system administration.
Empowered developers have been given a wide new set of tools to enable:
- Application lifecycle management with continuous integration software like Jenkins, Travis CI, CircleCI, and CodeShip;
- Server provisioning with software and metadata using configuration management tools like Chef, Puppet, Salt, and Ansible;
- Hosting applications in the cloud, whether they use an IaaS provider like Amazon Web Services, Google Compute Engine, or Digital Ocean, or a PaaS solution like Heroku, Google App Engine, or any technology-specific offering.
While these tools are usually wielded day-to-day from the command line, they have all sprouted APIs, and developers are increasingly building API clients to manage the DevOps workflows at technology companies just as they do within their own products.
Out of this set of emerging technologies, one of them has taken the world of DevOps by storm in the last three years: Docker.
As we’ve previously described, Docker is an open source project that is backed by a company of the same name. It enables one to simplify and accelerate the building, deployment, and running of applications, while reducing tension between developers and traditional Ops departments.
Docker was created by Franco-American developer Solomon Hykes, who was building a deployment engine for the PaaS company dotCloud. The project was developed in the Go programming language and open sourced in 2013.
Docker virtual containers are packaged with everything the application needs to run on the host server and are isolated from anything else they don’t need — containers can be moved from one host to another without any changes. Contrary to hypervisor-managed virtual machines, Docker containers are lightweight and quick to start.
Docker also comes with tools to build and deploy applications into Docker containers. Containers can be hosted on regular Linux servers or in the cloud (or pretty much anywhere using Docker Machine).
Each Docker setup includes a Docker client (typically a command line interface, but Docker also features a Remote API and a daemon, the persistent process that runs on each host and listens to API calls. Both the client and the daemon can share a single host, or the daemon can run in a remote host.
Docker images are read-only templates from which containers are generated.
An image consists of a snapshot of a Linux distribution like Ubuntu or Fedora — and maybe a set of applications or runtime environments, like Apache, Java, or ElasticSearch. Users can create their own Docker images, or reuse one of the many images created by other users and available on the Docker Hub.
Docker registries are repositories from which one can download or upload Docker images. The Docker Hub is a large public registry, and can be used to pull images within a Docker workflow, but more often teams prefer to have their own registry containing the relevant subset of public Docker images that it requires along with its own private images.
Docker containers are directories containing everything needed for the application to run, including an operating system and a file system, leveraging the underlying system’s kernel but without relying on anything environment-specific. This enables containers to be created once and moved from host to host without risk of configuration errors. In other words, the exact same container will work just as well on a developer’s workstation as it will on a remote server.
A Docker workflow is a sequence of actions on registries, images and containers. It allows a team of developers to create containers based on a customized image pulled from a registry, and deploy and run them on a host server. Every team has its own workflow — potentially integrating with a continuous integration server like Jenkins, configuration management tools like Chef or Puppet, and maybe deploying to cloud servers like Amazon Web Services. The daemon on each Docker host enables further actions on the containers — they can be stopped, deleted or moved. The result of all of these actions are called lifecycle events.
Who uses Docker?
Since it arrived onto the scene in 2013, Docker has seen widespread adoption at technology companies. Interestingly, whereas early adopters for most new technologies are typically limited to small startups, large enterprises were quick to adopt Docker as they benefit more from the gains in efficiency that it enables, and from the microservices architecture that it encourages. Docker’s adopters include Oracle, Cisco, Zenefits, Sony, GoPro, Oculus and Harvard University.
Where does Docker fit in the DevOps puzzle?
While DevOps has made it easier to provision servers and to master configuration management for developers and ops teams, it can be a bit overwhelming for beginners. The dizzying number of technologies to choose from can be frustrating, and it can invite a lot of complexity into a company’s workflow if the purpose of each component is poorly understood.
Docker doesn’t fall neatly into one of the categories we listed in our introduction. Rather, it can be involved in all areas of DevOps, from the build and test stages to deployment and server management.
Its features overlap with those of configuration management software – Docker can be used as a substitute for Chef or Puppet to an extent. These tools allow you to manage all server configuration in one place instead of writing a bunch of bash scripts to provision servers, which becomes unwieldy when the number of servers hits the hundred mark. Complexity invariably starts to creep in when upgrades, installations and changes in configuration take place. The resulting Chef cookbooks and Puppet modules then need to be carefully managed for state changes, which is traditionally a shared task between developers and ops people.
Docker’s philosophy around configuration management is radically different. Proponents of immutable infrastructure love Docker because it encourages the creation of a single, disposable container with all the components of an application bundled together, and deployed as-is to one or more hosts. Instead of modifying these containers in the future (and therefore managing state like you would with Chef or Puppet), you can simply regenerate an entirely new container from the base image, and deploy it again as-is. Managing change therefore becomes simplified with Docker, as does repeating the build and deployment process and aligning development, staging and production environments. As James Turnbull writes in The Docker Book, “the recreation of state may often be cheaper than the remediation of state”.
Of course, Docker lacks the flexibility afforded by tools like Chef and Puppet, and using it by itself assumes that your team operates only with containers. If this isn’t the case and your applications straddle both container-based processes and bare metal or VM-based apps, then configuration management tools retain their usefulness. Furthermore, immutable infrastructure doesn’t work when state is essential to the application, like in the case of a database. It can also be frustrating for small changes.
In these cases, or if Chef or Puppet are an important part of a team’s architecture prior to introducing Docker, it is quite easy to integrate these tools within a Docker container, or even to orchestrate Docker containers using a Chef cookbook or a Puppet module.
Continuous integration software like Jenkins can work with Docker to build images which can then be published to a Docker Registry. Docker also enables artifact management by versioning images. In that way the Docker Hub acts a bit like Maven Central or public GitHub artifact repositories.
All of the events listed in the previous section can be triggered via the Docker command line interface, which remains the weapon of choice for many system engineers.
But daemons can also be accessed through a TCP socket using Docker’s Remote API, enabling applications to trigger and monitor all events programmatically. Docker’s API also exposes container metadata and key performance metrics.
The Remote API is essentially a well-documented REST API that uses an open schema model and supports basic authentication of API clients. The availability of this API has opened the door for creative developers to build tools on top of the Docker stack. We’ll explore awesome tools that consume the Remote API in an upcoming post, so stay tuned and sign up to our newsletter so you won’t miss out!