Optimization for Kubernetes cluster running MongoDB

The customer is a UK-based cryptocurrency exchange allowing to instantly trade in various cryptocurrencies including Ethereum. The system is able to provide historical data on the prices and updates them in real-time. The customer wanted us to optimize the resource usage of the existing cluster, implement automated MongoDB backups and enable smart monitoring of the key system parameters.

Case info

Location: London, UK

Industry: Cryptocurrency trading

Partnership period: October 2019 – ongoing

Team size: 1 Team Lead, 1 DevOps engineer

Team location: Kharkiv, Ukraine

Services: MongoDB database performance optimization, monitoring implementation, automation of backup/restore operations

Expertise delivered: Cloud infrastructure management and optimization, monitoring implementation, CI/CD configuration

Technology stack: AWS, GitHub, Ansible, Kubernetes, Helm, Terraform, MongoDB, Prometheus, Grafana, AlertManager

Project requirements

The customer requested to provide the following DevOps services:

Optimize Prometheus+Grafana monitoring to identify slow MongoDB queries
Use this data to optimize Kubernetes cluster performance
Automating backup/restore operations for MongoDB
Enabling archiving and retrieval for historical MongoDB data
Building a staging environment that runs in parallel
Minimizing the expenses of the system operations

Challenges and solutions

The system the customer was using was not performing well. Particularly, MongoDB had some long queries that used an exorbitant amount of resources for no clearly visible reason. Prometheus and Grafana versions were outdated and not configured quite correctly. Finally, he lacked an inexpensive staging environment to test the system changes before rolling them out to production.

The first step was obviously configuring the backup and restore for MongoDB. Luckily, IT Svit has developed our own solution for automated MongoDB backup/restoration in AWS using Terraform and Jenkins, which worked just fine for this project. This ensured the data was safe and could be recovered in case of any contingency.

The next step was to optimize the MongoDB performance. We suggested using the database sharding and either reupload all the data to an existing collection after MongoDB and Kubernetes cluster optimization or starting the new collection afresh. We have tested several sharding keys to match different workloads. Another solution was to use specific nodepools for specific workloads (one for monitoring, one for staging, one for prod, one for MongoDB). We have also configured limits to resource requests. This helped reduce resource usage and lower costs.

The next step was improving the efficiency of cluster monitoring. We configured an automated deployment process using Helm charts to build and update Prometheus and Grafana operators to the latest versions and configure their interaction with AlertManager. This helped visualize the workloads and gave the customer a clear view of the system performance.

Results

Thus sad, the customer currently has a system running on a Kubernetes cluster with automated backup and restoration capabilities, as well as with in-depth monitoring and smart alerting. By sharding the database we helped lower the system workload and decrease the expenses. We are now working to further enhance system performance and lower its costs.

Conclusions

The customer product operates reliably, the data is saved at regular intervals and the customer has in-depth awareness fo the system performance. We helped lower the resource consumption and operating costs significantly and are currently working on further improving the system performance.