Install a highly available Elasticsearch cluster in Kubernetes

0

Elasticsearch consists of a popular open source search and analysis engine of a distributed nature. Elasticsearch’s shard and replica management features make it robust and scalable, and if you deploy Elasticsearch on Kubernetes, instead of traditional virtual or physical machines, you’ll find that it’s easy to install, configure and manage.

When it comes to enterprise-level deployments, you need to have a highly available Elasticsearch cluster in multiple zones so that if one goes down, the cluster is still available. In this tutorial, you will learn how to set up such a cluster.

In virtually all types of cloud environments, you can have a cluster of Kubernetes in a region that spans multiple zones, which in turn typically consist of data centers very close to each other. After all, you want to have the application available even when few nodes in a zone – or an entire zone – are unavailable.

A typical Elasticsearch production-level cluster on Kubernetes is comprised of master pods, data pods and intake pods. The visualization component consists of the Kibana pod, while pod-masters control the Elasticsearch cluster, including creating or deleting indexes, tracking cluster members and allocating shards to different data pods. Elasticsearch requires a stable master node for its operation, and the data pods store the information and perform CRUD, search and aggregation operations. Ingestion nodes help transform and enrich data before it is stored in the index as documents. The data pod and pod master require persistent storage and are therefore implemented in Kubernetes as StatefulSets. Kibana and intake pods do not require persistent storage and are installed as Kubernetes deployment drivers.

See Also
What are container orchestration and Kubernetes?

An important requirement for Elasticsearch is having local solid state drives (SSDs) for storage and better performance. In this tutorial, you will use local SSDs for Elasticsearch, and your sample solution will achieve both high availability and high fault tolerance in a single zone.

Requirements

Before moving on with this tutorial, make sure you have the following environment:

A Kubernetes cluster that spans three zones. If you are using the IBM Cloud, it will be easier to create a multi-zone cluster with the Kubernetes service.

A minimum of two working nodes per zone; the recommended are three working nodes per zone.

The working nodes in the cluster that contain local solid state disks.

Estimated time

This tutorial should take about 30 minutes to complete.

An architecture overview

The following illustration shows the architecture of this solution. There are three zones, and ideally, you should have at least one pod master available in each. Likewise, it is recommended that you have a minimum of one data pod per zone. When you need to add more data pods, add a multiple of three (one for each zone).


LEAVE A REPLY

Please enter your comment!
Please enter your name here