20 October 2020 - Big Data & Data Science

Business Insights & Data Analytics Platform

We were contacted by a co-founder & COO at a real-time Big Data analytics platform that provides information based on Artificial Intelligence to make it easier than ever before to get the most out of your data. The product helps dig valuable insights out of the mounds of data your business has access to.

Location: Los Angeles, CA

Industry: Business Intelligence and Big Data analytics

Partnership period: December 2018 — ongoing

Team size: 1 team lead, 2 DevOps engineers, Big Data engineer

Team location: Kharkiv, Ukraine

Services: Cloud infrastructure design and optimization, database performance optimization, Machine Learning model training, monitoring implementation, CI/CD implementation

Expertise delivered: App containerization and container management, ML training and management, cloud infrastructure management, monitoring implementation, CI/CD

Technology stack: AWS, Coordinator, Alertmanager, G Suite, GitHub, Jenkins, Kibana, OpenVPN, Prometheus, Spinnaker, Superset.

Project requirements

The customer has a pretty large backend Big Data system running on AWS infrastructure. They needed IT Svit to help with the following tasks:

  1. Planning and performing the upgrade for their Druid instance
  2. Implementing enhanced system monitoring and alerting with Amazon Cloudwatch, Prometheus & Grafana and AlertManager
  3. Configuring a staging environment. They needed access to expertise with Kubernetes, Ansible, Docker, Apache Kafka, Hadoop and Druid, Postgres, Node, AlertManager and Prometheus & Grafana.
  4. Plan and build a CI/CD pipeline using Jenkins to improve the insights processing pipeline

Results

IT Svit was able to accomplish all the tasks in the order of importance:

  1. We moved all the apps from Marathon to Amazon EKS and optimized the system architecture. As a result, Druid cluster upgrade time was reduced 4 times, from 1h to 15 minutes.
  2. We found and resolved the issue that caused application workers to restart frequently on legacy clusters.
  3. We configured the CI/CD pipeline using Jenkins to automate daily cluster operations.
  4. We connected Spinnaker to Prometheus using AlertManager. System abnormal behavior is immediately alerted to Slack and is being resolved much quicker now.

Challenges and solutions

This analytics company has experienced several challenges that hindered product growth and reduced the stability of operations:

  1. Marathon infrastructure was not able to deal with the ever-growing number of requests
  2. Some Big Data input sources were slow and slowed down the whole platform, and the Druid version was very outdated
  3. There was little to no insight on the system performance due to the complexity of the backend structure

We solved these issues successfully:

  1. Set up Airflow for Data Analysis automation.
  2. Updated Druid and Spinnaker to the latest versions
  3. Set up the ECS cluster and moved the majority of our apps from Marathon to Amazon EKS.
  4. Installed and configured Jenkins pipelines for automated application updates.
  5. Set up monitoring and notifications using Prometheus, Grafana and AlertManager.

Conclusions

By updating Druid and Spinnaker, moving from Marathon to EKS, resolving the issues with Jenkins and implementing in-depth monitoring with smart alerting IT Svit helped the customer to ensure the systems operate at top efficiency, are updated seamlessly and any incidents are instantly reported and quickly resolved.

Contact Us




Our website uses cookies to personalise content and to analyse our traffic. Check our privacy policy and cookie policy to learn more on how we process your personal data. By pressing Accept you agree with these terms.