Docker Swarm: Key Concepts and Use Cases.
Let’s quickly recap a few things. You have known how to run Docker container commands, and you’re likely familiar with the process of setting up a complex application that runs multiple services. A better way to do this is by using Docker Compose.
Docker Compose is a powerful tool for managing multi-container Docker applications. It allows you to define the services that make up your application, including their dependencies, in a single YAML file. This makes it easier to implement, run, and maintain your application, as all changes are stored in the Docker Compose file. However, before we dive into Docker Compose, let’s first look at how to run a multi-container application using Docker run commands.
Assuming we have an application with five parts: two user interfaces (one written in Python and the other in Node.js), two databases (Redis and PostgreSQL), and a backend that transfers data between Redis and PostgreSQL. We’ll use the following Docker run commands to start each container:
docker run -d --name=redis redis
docker run -d --name=db postgres:9.4
docker run -d --name=first_interface -p 5000:80 interface1
docker run -d --name=second_interface -p 5001:80 interface2
docker run -d --name=workerback worker
Note: Using links in this way is deprecated and support for it may be removed in the future. This is because more advanced and newer concepts in Docker Swarm and networking provide better ways of achieving what was done here with links.
While this successfully runs all the containers, it doesn’t link them together. For example, the first_interface
app needs to connect to the Redis service, but it can’t resolve the hostname “redis”. To solve this problem, we can use the Docker link command. The link command allows us to link two containers together. For example, to link the first_interface
container to the Redis container, we can use the following command:
docker run -d --name=first_interface --link redis:redis interface1
This creates an entry in the /etc/hosts
file on the first_interface
app container, mapping the hostname “redis” to the internal IP of the Redis container. We need to do this for both the second_interface
container and the workerback
container as well. Now that we’ve linked all the containers together, our application is up and running. However, managing the dependencies between containers can get complex, especially as our application grows. This is where Docker Compose comes in.
With Docker Compose, we can define the services that make up our application, including their dependencies, in a single YAML file. This makes it easier to implement, run, and maintain our application, as all changes are stored in the Docker Compose file. Let’s look at an example Docker Compose file for our application:
version: '3'
services:
redis:
image: redis
db:
image: postgres
first_interface:
image: interface1
ports:
- "5000:80"
depends_on:
- redis
second_interface:
image: interface2
ports:
- "5001:80"
depends_on:
- redis
workerback:
image: worker
depends_on:
- redis
- db
Here, we define five services: redis
, db
, first_interface
, second_interface
, and workerback
. The redis
and db
services are straightforward, while the first_interface
and second_interface
services specify the ports they expose and the dependencies they have on the redis
service. The workerback
service depends on both the redis
and db
services. With this Docker Compose file, we can use the following command to start our entire application:
docker-compose up
This starts all the services defined in the Docker Compose file, including their dependencies. We can also use the following command to stop the services:
docker-compose down
This stops all the services defined in the Docker Compose file, including their dependencies. In summary, Docker Compose is a powerful tool for managing multi-container Docker applications. It allows you to define the services that make up your application, including their dependencies, in a single YAML file. This makes it easier to implement, run, and maintain your application, as all changes are stored in the Docker Compose file.
If we would like to instruct Docker Compose to run a Docker build instead of trying to pull an image, we could replace the image line in Docker Compose with a build line and specify the location of the directory containing the application code and the Dockerfile, along with instructions. Then, when we run the Docker Compose command, it will first build the images with a temporary name for them and then use those images to run containers using the options specified before.
Docker Compose has evolved over time and now supports many more options than it did initially. In version 1, Docker Compose attaches all the containers it runs to the default bridge network and then uses links to enable communication. However, in version 2, Docker Compose automatically creates a dedicated bridge network for each application and attaches all the containers of that application to the new network. This allows all containers to communicate with each other using each other’s service names, making links unnecessary in version 2. Additionally, in version 2, we can define the depends_on
feature to specify the order of running services. For example, if we want the first_interface
container to run after Redis, we can add it to the Docker Compose file. Finally, version 3 of Docker Compose comes with support for Docker Swarm, which we will discuss later on.
Docker Swarm
Let’s take a closer look at Docker Swarm and its various concepts. Running your containers on a single Docker host may be suitable for development and testing environments, but in a production setup, this may not be a good idea. If the underlying host fails, we will lose all the containers and our application will go down. This is where Docker Swarm comes into play. With Docker Swarm, you can combine multiple Docker machines together into a single cluster, providing high availability (HA) and load balancing across different systems and hardware. Here’s how to set up a Docker Swarm:
- First, make sure you have Docker installed on all the hosts that will be part of the swarm.
- Designate one of the hosts to be the master or swarm manager, and the others as slaves or workers.
- On the swarm manager node, run the
docker swarm init
command. This will initialize the swarm manager and provide a command to be run on the worker nodes to join the swarm. - Copy the command and run it on the worker nodes to join the swarm. Once the workers join the swarm, they are also referred to as nodes and are now ready to create services and deploy them on the swarm cluster.
- With Docker Swarm, you can create services that span multiple hosts, providing high availability and load balancing. This means that if one host fails, the service will continue to run on the other hosts, ensuring that your application remains up and running.
- Docker Swarm also provides a way to manage the scaling of your services. You can configure the swarm to automatically add or remove nodes from the cluster based on resource utilization, ensuring that your application can handle changes in traffic and resource requirements.
- Additionally, Docker Swarm provides a built-in load balancer that can distribute traffic across multiple nodes, further ensuring that your application is highly available and performs well under heavy traffic.
Docker Swarm is a powerful tool for running containerized applications in a production environment. It provides high availability, load balancing, and scalability, making it an ideal choice for applications that require a high level of reliability and performance.
Let’s dive deeper into the details of Docker Swarm and understand how it works.
The Manager Node
The manager node is the master node where the swarm cluster is initiated. The manager node is responsible for maintaining the cluster state and managing the workers, adding and removing workers, creating, distributing, and ensuring the state of the containers and services across the cluster. It’s important to note that having a single manager node is not recommended, as if it fails, there will be no manager node to manage the cluster. To ensure fault tolerance, you can have multiple manager nodes in a single cluster.
However, having multiple manager nodes can lead to a conflict of interest. To prevent this, only a single manager node is allowed to make management decisions at any given time. This node is called the leader. It’s important to note that the leader cannot always make decisions on its own. All decisions have to be mutually agreed upon by all the managers or the majority of the managers in the cluster. This is important because if the leader were to make a decision and then fail before informing the other managers about the decision, the cluster would be in an inconsistent state.
For example, if a new worker was to be added to the cluster by the leader without updating the other managers, and the leader fails, the other managers would not be aware of the new worker, and the cluster operations would ignore the new worker and the services running on that worker, resulting in an inconsistent application status. This is known as the problem of distributed consensus.
Docker solves this problem by implementing the Raft consensus algorithm. The Raft algorithm decides who is going to be the leader among the managers. If the leader has enough votes available to make a decision, and all decisions are in consent with the other managers, then the decision is considered valid.
In summary, Docker Swarm uses a distributed consensus algorithm, specifically the Raft algorithm, to ensure that all manager
nodes have the same information about the cluster at all times. This ensures that the cluster can continue to operate even if one or more manager nodes fail, and that all decisions are made in a consistent and agreed-upon manner.
The Raft Algorithm
The Raft algorithm uses random timers to initiate requests, which helps to prevent a single node from monopolizing the leadership role. For example, a random timer is kicked off on the three manager nodes, and the first one to finish the timer sends out a request to the other manager nodes requesting permission to be the leader. The other manager nodes receive the request and respond with their vote. If the majority of the manager nodes vote in favor of the requesting node, it assumes the role of the leader.
Once a node assumes the role of the leader, it sends out notifications at regular intervals to the other manager nodes informing them that it is continuing to assume the role of the leader. These notifications serve as a heartbeat, allowing the other nodes to know that the leader is still active and in charge. If a node fails to receive a notification from the leader at some point in time, which could be due to the leader going down or losing network connectivity, the nodes initiate a re-election process among themselves. This process ensures that a new leader is identified and takes over the role of managing the cluster.
It’s important to note that every manager node has its own copy of the Raft database, which stores information about the entire cluster. The Raft algorithm ensures that all manager nodes have the same information and that any changes made to the database are done with consent from the majority of the manager nodes. This is achieved by requiring the leader to notify the other manager nodes of any changes it plans to make and receiving a response from at least one of them before committing to the changes. This ensures that any change in the environment is made with consent from the majority of the manager nodes in the cluster.
In summary, the Raft algorithm uses random timers to initiate requests, ensuring that no single node monopolizes the leadership role. The algorithm also ensures that all manager nodes have the same information and that any changes made to the database are done with consent from the majority of the manager nodes. This ensures that the cluster can continue to operate even if one or more manager nodes fail, and that all decisions are made in a consistent and agreed-upon manner.
Quorum
In a Raft consensus algorithm, every decision must be agreed upon by a majority of the manager nodes. For example, if a new worker is to be added to the cluster, it must be agreed upon by the majority of the manager nodes. This means that if one node fails or is not responding, the remaining nodes can still make decisions as long as they have a quorum. Quorum is defined as the minimum number of members in an assembly that must be present at any of its meetings to make the proceedings of that meeting valid. In the case of a Raft cluster, the quorum is the minimum number of manager nodes that must be available to make decisions. For example, if there are five manager nodes in a cluster, the quorum would be three nodes.
There is a formula to calculate the quorum, which is the total number of manager nodes divided by two plus one. Docker recommends no more than seven manager nodes for a swarm, as adding more nodes does not increase the scalability or performance of the cluster. However, there is no hard limit on the number of manager nodes, and you can add more nodes to the cluster if needed.
It’s important to note that the quorum is the minimum number of manager nodes required to keep the cluster alive, and the reverse of that is the maximum number of nodes that can fail. By calculating the quorum, you can determine the fault tolerance of the cluster. You can calculate the quorum by dividing the number of nodes by two and then subtracting one.
Handling Manager Node Failures
What happens when you don’t have enough managers up and the cluster fails? For example, let’s say we have a cluster with three managers and five worker nodes. The worker nodes are running multiple instances of a web application, and our cluster is alive and serving users. We know that since we have three managers, the quorum or the minimum number of managers that should be available at a time is two. What if two managers fail at the same time?
The swarm will no longer be able to perform any more management tasks. However, the existing worker nodes will continue to operate as they were supposed to, with their current configuration and services running as usual. But you can’t perform any modifications on them, such as adding a new worker or updating the service configuration, creating, destroying, or moving an existing service.
To resolve this situation, the best way to recover from losing the quorum is to bring the failed nodes back online. If you can’t bring any of the failed nodes back online to fulfill the quorum, and you only have one master node available, the only way to recover is to force recreate a new cluster.
When we run the docker swarm init
command with the -f
or --force-new-cluster
option, a new cluster is created with the current node as the only manager, and we get a healthy cluster with a single manager node. Since this manager already has information about the services and tasks, the worker nodes are still part of the swarm, and services continue running.
You may later add more manager nodes or promote existing worker nodes to become manager nodes. To promote an existing worker node to become a manager node, run the docker node promote
command.
Note: Manager nodes are responsible for performing management tasks, such as managing the cluster, creating and managing services, and scaling services. They can also do work like the worker nodes, such as hosting a service. By default, all manager nodes are also worker nodes, meaning when you create a service, the instances are also spread across the manager nodes. However, you can dedicate a node for management purposes alone by disabling the worker functionality on that node. This can be done by running the command:
docker node update --availability drain <node>
This will remove the node from the list of available worker nodes, and it will only be used for management tasks. By dedicating a node for management purposes, you can ensure that the manager nodes are not overwhelmed with work and can focus on their management tasks. This can help improve the performance and reliability of your Docker Swarm cluster.
Docker Service
Now that we’ve learned how to create a swarm cluster, let’s discuss how to utilize it to run multiple instances of a web server to support a larger number of users. That’s where Docker Swarm orchestration comes in. Docker Swarm orchestrator manages the tasks that need to be done to achieve our goals with the cluster.
The key component of Swarm orchestration is Docker services. A Docker service is one or more instances of a single application or service that runs across the swarm cluster. For example, you could create a Docker service that runs multiple instances of your web server application across the worker nodes in your swarm cluster.
To create a Docker service, you run the docker service create
command on your manager node and specify the image name. You can use the --replicas
option to define the number of instances of your application you’d like to run across the cluster. The docker service create
command is similar to the docker run
command in terms of the options passed, such as -p
for publishing ports, -e
for defining environment variables, and so on.
When you run the docker service create
command to create three instances of your web server, the orchestrator on the manager node decides how many tasks to create and then schedules them on each worker node. The task on the worker node is a process that kicks off the actual container instances. A task has a 1:1 relationship with each container. The task is responsible for updating the status of the container to the manager node, so the manager node can keep track of the workers and instances running on them. If a container fails, the task fails as well, and the manager node becomes aware of it and automatically reschedules a new task to run a new container instance to replace the failed container.
There are two types of services in Docker Swarm: replicated and global. Replicated services are created with the --replicas
option and a predefined number of replicas. This is useful when you need to run a specific number of instances of a service, regardless of the number of underlying worker nodes available in the cluster.
On the other hand, global services are created with the --mode global
option on the docker service create
command. These services are designed to run on every node in the cluster, with just one instance per node. A good example of a global service would be a monitoring agent, log collection agent, or an antivirus scanner that you want to run on every node in the cluster.
In summary, replicated services are useful when you need to run a specific number of instances of a service, while global services are useful when you want to run just one instance of a service on every node in the cluster.
Updating Services
For example, let’s say we start three services, so we run the command:
docker service create --replicas=3 --name my-web-server webserverimage
To create a service with 3 replicas. At a later point in time, we decide that we must have four instances running, and so we must update the service to add a new instance. To do this, run the docker service update
command and specify the new number of replicas followed by the service name:
docker service update --replicas=4 my-web-server
Network in Swarm
If you’re familiar with Docker, you may know about various network drivers such as bridge, none, macvlan, or host. However, in a Docker Swarm cluster, we don’t use these drivers. Instead, we utilize the overlay driver network, which enables our containers to communicate with each other across the
entire cluster, regardless of which node they’re hosted on. This provides greater flexibility and scalability for our containerized applications. The overlay network driver creates a distributed network among multiple Docker daemon hosts, enabling secure communication between containers connected to it, including swarm service containers.
When you create a Docker Swarm cluster, an overlay network named ingress is created by default. However, Docker DNS resolution doesn’t work within this network, which can cause issues with name resolution. To avoid this problem, it’s recommended that you create the overlay network for your service before creating the service itself in Docker Swarm. This way, you don’t have to worry about name resolution inside the network, as Docker will automatically handle it for you.
Let’s discuss how the overlay network works with a Docker Swarm cluster. When you create a service with a specified overlay network, a virtual IP (VIP) is assigned to the service name. For example, if we create an nginx
service with 2 replicas inside a network named webserver_service
, each container has a specified IP address, such as 10.0.1.2 and 10.0.1.3. However, another IP address is assigned for this service, such as 10.0.1.1.
When we send a request to the VIP address, Docker will load balance it to each container. It’s important to remember that these IP addresses are only accessible from containers inside this overlay network. On the other hand, assume that we published port 80 to port 80 of the service:
docker service create --name nginx_application --network webserver_service -p 80:80 nginx
This means that on each host of our cluster, port 80 is listened to by Docker, and every request sent to this port on every host IP will be routed to the service’s IP address. Then, it will be load balanced to our containers. This is what we mean by ingress in Docker.
Additionally, as mentioned earlier, DNS resolution works inside user-defined overlay networks. This means that containers within the overlay network can access other containers or services by their names.
It’s also possible to have multiple services within a single overlay network in Docker Swarm. This allows for better organization and management of your services, as well as easier communication between them.
Docker Stacks
In the past, we’ve seen how to run individual containers with the docker run
command. However, instead of running multiple run commands, we can put together an application stack in a docker-compose file and bring it up with a single command, docker-compose up
. Similarly, with Docker Swarm, we can do the same thing but with a slight twist.
We can run multiple instances of each service and orchestrate their deployment using services and stacks. Instead of using the docker run
command, we use the docker service
command to create a service for each component in our application. Just like how we converted our docker run commands into a docker-compose file, we can convert the docker service commands into a docker-compose file once we have that ready.
Finally, we can run the docker stack deploy
command to deploy the entire application stack all at once. This way, we don’t have to run multiple docker service
commands, and the configuration of our entire application is now in a single compose file, making it easier to manage and maintain.
Before proceeding, let’s first clarify what a Docker stack is. To do this, let’s start by examining the building blocks of a Docker stack, beginning with containers.
A container, as you may know, is a packaged form of an application that includes its own dependencies and runs in its own environment.
Next, we have a service, which is one or more instances of the same type of container that runs on a single node or across multiple nodes in a Docker Swarm cluster.
Finally, we have a stack, which is a group of interrelated services that together form a complete application. The stack configuration file is defined in a YAML format, similar to the format used by Docker Compose.
To configure a stack in Docker Swarm, we need to use version 3 of the Docker Compose file. This version introduces a new property called deploy
, which is used by Docker Swarm for stack-related configurations. In Docker Compose, we’ll now add the deploy
property to each service.
The deploy
property has several sub-properties, including replicas
and placement
. Replicas allow us to deploy multiple instances of the same application by specifying a count. For example, we can set replicas to 3 to deploy three instances of a service.
Placement is another important sub-property under deploy. By default, Docker Swarm will place containers on any worker node. However, we can specify a placement preference using the placement
property under deploy. This property allows us to specify constraints, which are like conditions, to determine where a service should be placed.
For instance, we can specify node.hostname == nodeworker1
to ensure that a service is placed on a node with that hostname. Alternatively, we can set node.role == manager
to place a service on a node with the role of a manager. We can specify multiple constraints to match all of them and place a service on a node that meets all the conditions.
You can find more information about constraint conditions at the following link: Docker Documentation on Placement Constraints.
To limit resource consumption on the underlying hardware in a Docker Swarm cluster, we add the resources
property under the deploy section. Under the resources
property, we specify limits for CPU and memory constraints. This ensures that the service does not drain the underlying operating system of resources, potentially impacting other priority services.
It’s important to note that the resources
property is dynamic, meaning that it can be updated as needed to reflect changes in the cluster’s resource availability. This allows the service to adapt to changing conditions and maintain optimal performance.
To stay up-to-date with the latest information on configuring resource limits in Docker Swarm, we recommend consulting the official documentation. The documentation provides detailed instructions and examples for configuring resource limits, as well as information on other advanced features and best practices for managing resources in a Docker Swarm cluster.
To work with Docker Stack, it is recommended to create a file named docker-stack.yml
and write your configuration based on Compose v3 or later inside it. Once your file is ready, you can run the docker stack deploy
command to bring up your stack. In this command, you can use the --compose-file
flag followed by the path of the stack file:
docker stack deploy [name of our stack] --compose-file ./docker-stack.yml
The first action this command performs is creating a network for the stack, followed by attempting to create the services defined within the file. If you wish to modify aspects of your stack, such as the number of replicas of your services, you should make these changes within the stack file and then re-run the docker stack deploy
command with the same name as before. Docker will update your services without shutting down the previously launched containers (unless the change involves increasing the number of replicas; in this case, it should terminate the old containers and launch new ones with the updated image).
For further details on configuring Docker Stacks, please refer to the official Docker documentation: Docker Stacks.
Now let’s see an example:
let’s start with the process of creating a Docker Swarm cluster with 3 manager nodes and 4 worker nodes, then deploy a stack with the specified services.
Step 1: Setting Up the Docker Swarm Cluster
1.1. Initialize Docker Swarm
First, initialize Docker Swarm on the manager node.
docker swarm init --advertise-addr
This command will output a token which you will use to join the worker nodes to the swarm.
1.2. Add Manager Nodes
Run the following command on the remaining manager nodes to join them to the swarm:
docker swarm join --token :2377
1.3. Add Worker Nodes
Run the following command on each worker node:
docker swarm join --token :2377
You can retrieve the tokens from the manager node at any time:
docker swarm join-token manager
docker swarm join-token worker
Step 2: Creating the Docker Stack File
Create a docker-compose.yml file with the following content:
version: '3.8'
services:
frontend:
image: frontend_image
deploy:
replicas: 3
placement:
constraints: [node.role == worker]
resources:
limits:
cpus: '0.5'
memory: 512M
reservations:
cpus: '0.25'
memory: 256M
ports:
- "443:443"
depends_on:
- redis
backend:
image: back_image
deploy:
replicas: 3
placement:
constraints: [node.role == worker]
resources:
limits:
cpus: '1.0'
memory: 1024M
reservations:
cpus: '0.5'
memory: 512M
depends_on:
- redis
- postgres
redis:
image: redis:alpine
deploy:
replicas: 1
placement:
constraints: [node.role == worker]
postgres:
image: postgres:alpine
environment:
POSTGRES_USER: exampleuser
POSTGRES_PASSWORD: examplepass
POSTGRES_DB: exampledb
volumes:
- pgdata:/var/lib/postgresql/data
deploy:
replicas: 1
placement:
constraints: [node.role == worker]
zabbix-agent:
image: zabbix/zabbix-agent:alpine-5.0-latest
deploy:
mode: global
environment:
ZBX_SERVER_HOST: zabbix-server
ZBX_HOSTNAME: "docker-swarm-agent"
volumes:
pgdata:
Step 3: Deploying the Stack
Deploy the stack using the following command on one of the manager nodes:
docker stack deploy -c docker-compose.yml mystack
Step 4: Explanation of the Configuration
4.1. Frontend Servic
Image: frontend_image Replicas: 3 Port: Exposed on 443 Dependencies: Depends on Redis CPU Limits: The frontend service can use up to 0.5 CPU cores. Memory Limits: The frontend service can use up to 512MB of memory. CPU Reservations: The frontend service is guaranteed 0.25 CPU cores. Memory Reservations: The frontend service is guaranteed 256MB of memory.
4.2. Backend Service
Image: back_image Replicas: 3 Dependencies: Depends on Redis and PostgreSQL CPU Limits: The backend service can use up to 1.0 CPU cores. Memory Limits: The backend service can use up to 1024MB (1GB) of memory. CPU Reservations: The backend service is guaranteed 0.5 CPU cores. Memory Reservations: The backend service is guaranteed 512MB of memory.
4.3. Redis Service
Image: redis:alpine
Replicas: 1
Role: Key-value store used by both frontend and backend
4.4. PostgreSQL Service
Image: postgres:alpine
Replicas: 1
Persistent Volume: pgdata
Role: Relational database used by the backend
4.5. Zabbix Agent
Image: zabbix/zabbix-agent:alpine-5.0-latest Mode: Deployed globally on all nodes Role: Monitoring agent for the Docker Swarm cluster
Note:
In Docker Swarm, volumes are not automatically shared across all nodes in the swarm. When you create a volume in a Docker Swarm stack, the volume is local to the node where the service is running. This means that if a service is scheduled to run on a different node, it will not have access to the volume created on another node.
To share data between nodes in a Docker Swarm cluster, you have a few options:
- NFS (Network File System): You can use NFS to create a shared storage that is mounted on all nodes. This allows all nodes to access the same data.
- GlusterFS or Ceph: These are distributed file systems that can be used to create a shared storage system that spans across all nodes in your cluster.
- Docker Volume Plugins: Use a Docker volume plugin that supports distributed or shared storage. Some popular options include:
- Rex-Ray: Provides support for shared storage solutions like EBS, EFS, and more.
- Portworx: A cloud-native storage solution that provides high availability and shared volumes across nodes.
- Flocker: Manages Docker volumes and allows them to be shared across multiple nodes.
Note: Important Ports for firewall: Inbound Traffic for Swarm Management
- TCP 2377: Cluster management & raft sync communications
- TCP/UDP 7946: Control plane gossip discovery communication between nodes
- UDP 4789: Data plane VXLAN overlay network traffic
- IP Protocol 50 (ESP): Required if using overlay network encryption