IT Svit internal monitoring system

IT Svit cloud infrastructure is diverse and when something stops working, it’s hard to identify the issue and react at once. This is why we decided to implement an internal monitoring for our services with the following components:

Zabbix-server
Prometheus operator
Grafana

Project requirements

We wanted our system to provide the following results:

Automatic monitoring of various cloud infrastructure parameters like CPU usage, bandwidth usage, disk volume usage, etc.
Timely alerts if any issues occur
Informative notifications with screenshots to simplify troubleshooting

Project results

With installed monitoring, consisting of Zabbix+Prometheus+Grafana, we always receive alert notifications in Telegram and know when something is wrong with our Linux server or Kubernetes cluster. It allows us to respond instantly and solve issues quickly. Also it allows to save time, because we know an issue and the reason of the issue.

Location: Ukraine
Partnership period: 2005-ongoing
Team size: 2 – 4 people
Team location: Kharkiv, Ukraine
Services: Cloud architecture, cloud infrastructure management, cloud monitoring solutions
Expertise delivered: AWS cloud administration, DevOps services, cloud infrastructure management, monitoring solutions configuration
Technologies: Zabbix, Kubernetes, Prometheus, Grafana

Product Overview

Client’s goals

This was the internal project aimed at improving IT Svit DevOps team versatility and performance. We simply wanted to be better informed of the various processes within our IT infrastructure in order to be able to identify and solve the issues faster:

Automatic monitoring of the infrastructure must be performed by Zabbix
Timely alerts sent to Telegram
Informative screenshots with the trouble description to simplify troubleshooting

Project implementation and challenges resolved

In order to achieve the goals set, we implemented the following decisions:

We used Zabbix + Prometheus as a datasource for Grafana. This allowed us to enable detailed and on-point issue reporting
Zabbix agents are running in every container or instance within our infrastructure and they report to a Zabbix server should any issue arise.
A Python script sends all the information about the incident to Telegram chat, so the admins are notified momentarily if something happens — and we know what exactly happened at once.

Combining Zabbix + Prometheus + Grafana allows us to monitor the IT infrastructure cost-effectively and almost effortlessly, while responding to the issues immediately.