DevOps Challenges and Solutions — IT Svit Team’s notes!
-
11761
-
13
-
17
-
0
Probably you’ve already become a huge fan of Bullish DevOps news/updates monthly issues where we follow all fresh DevOps headlines including the latest tools, methodologies, guides, tips, and recommendations to deal with tough DevOps projects. Today, we’ve prepared something new: a compilation of challenges our customers faced and how we came up with sophisticated solutions. So, in today’s Tips from IT Svit release: how to install and configure a scalable cluster logging system capable of collecting and studying logs for a certain period; how to install and configure the Zimbra mail server in GCP, how to combine remote networks into one broadcast network with MikroTik at the input and other goodies. Let’s delve into reading!
Scalable cluster logging system installation and configuration
Challenge — It was necessary to provide developers with an intuitive system for monitoring the cluster resources and the cluster itself.
Solution —
Step 1 – Analysis. We analyzed existing open-source logging systems ELK stack and Loki
ELK pros:
- Uses query DSL and Lucene query language, which provide full-text search capabilities
- Includes Logstash and GROK tools for generating metrics from string logs
- Horizontally scalable
- Transfers logs and backs-up to S3 (google storage)
- Has lots of supplementary modules.
ELK cons:
- May require a sufficient amount of storage space
- When installing a cluster solution and plenty of information, ELK is a rather resource-intensive solution.
Loki pros:
- Horizontally scalable
- Transfers logs and backs-up to S3 (google storage)
- Less resource-intensive than ELK.
Loki cons:
- Displaying and filtering over 10,000 rows takes a long time in Grafana
- Filtering logs and converting metrics regular expressions.
Step 2 — Installation. Our choice fell on ELK. Here is the installation process.
Add the Elastic Helm charts repo:
helm repo add elastic https://helm.elastic.co
Elastic
with Helm 3:
helm install elasticsearch --version <version> elastic/elasticsearch
wih Helm 2 (deprecated):
helm install --name elasticsearch --version <version> elastic/elasticsearch
Logstash
with Helm 3:
helm install logstash --version <version> elastic/logstash
with Helm 2 (deprecated):
helm install --name logstash --version <version>` elastic/logstash
Filebeat
with Helm 3:
helm install filebeat --version <version> elastic/filebeat
with Helm 2 (deprecated):
helm install --name filebeat --version <version> elastic/filebeat
Kibana
with Helm 3:
helm install kibana --version <version> elastic/kibana
with Helm 2 (deprecated):
helm install --name kibana --version <version> elastic/kibana
Results — The ELK stack can be used for a variety of purposes, providing the maximum flexibility and feature-rich Kibana interface for analytics, visualization, and queries. As a result, developers get an intuitive system for monitoring cluster resources and the cluster itself.
Zimbra mail server in GCP installation and configuration
Challenge — It was required to install and configure the Zimbra mail server in the Google Cloud Platform. Zimbra mail server uses port 25 in SMTP to send mail. And the problem is that by default in GCP the Compute Engine service allows outgoing connections on all ports except port 25, which is blocked due to the risk of abuse. All other ports are open, including TLS ports 587 and 465. This means the Zimbra mail server receives mail, but cannot send it.
Solution —
Step 1 — SMTP Relay installation. The service SendGrid from Azure was used as an SMTP Relay meaning generating API KEY for configuring SMTP Relay and performing domain authentication. Configuration details here.
Step 2 — Relay host configuration. The relay host has been configured on the Zimbra mail server using the SendGrid SMTP server.
Results — The mail provider blocking from the newly installed and configured Zimbra mail server was bypassed. Plus, the mail is encrypted. The SendGrid service is widely used, and therefore trusted, which increases the chance the mail will not be Spam marked.
Restricting access to IAM for modules in EKS
Challenge — We were required to provide our customer with a tool that would restrict access to IAM for apps in the cluster, thereby solving our customer’s main security issue.
Solution —
Step 1 – Analysis. We studied existing open-source tools that control IAM access for modules kiam and kube2iam.
Kiam pros:
- Its architecture significantly restricts access to IAM, which makes it a reliable security solution.
- It was designed from the outset with specific security concerns in mind, which means that fewer nodes will be able to take on IAM roles.
- An additional feature restricts access to the EC2 metadata API.
Kiam con is that the kiam setting takes significantly longer for a production environment due to TLS and forced namespace configuration.
Kube2iam pro is that it has a simpler architecture that takes much less time to go into production. Kube2iam con is a weaker security solution than Kiam.
Step 2 — Installation. Our choice fell on kiam.
Kiam setup:
- Create roles that your pods will take on.
- Add permission only for the nodes on which the server will run (usually master nodes) so that they can perform these roles.
- Annotate pods with iam.amazonaws.com/role: <role name> and apply changes
- Annotate the namespaces to allow acceptance of the role in each namespace with iam.amazonaws.com/permitted: “. *”. This step is required and is helpful in multi-user clusters and uses regex matching for role names.
- Configure TLS for agent-server communication. This step ensures that only agents can contact the server to obtain credentials.
- Create RBAC server resources.
- Expand the DaemonSet server.
- Deploy the DaemonSet agent.
- Clear the previous IAM access method as soon as you confirm access via kiam.
Results — The kiam architecture limits IAM access even more than kube2iam does as it was designed with specific security concerns in mind. Fewer nodes will be able to take on IAM roles. Requiring namespace constraints is good practice even if you don’t have a multi-user cluster. The ability to restrict the metadata API is a nice bonus you get when dealing with kiam.
Combining remote networks into one broadcast network
Challenge — With Covid-19 irretrievably changing our world, unexpectedly businesses had to transform their work models to remote/hybrid forms. What if you need your teams to work in one broadcast network from their home-offices? For example, we have 3 geographically distant apartments. Each of them is connected to a different ISP, and MikroTik is at the entrance. OpenVPN tunnels are set up with a central server for unification, routes to subnets are registered. As a result, a person from apartment A can connect to the server in apartment B, however, DLNA traffic is visible only within one apartment and does not apply to the entire network.
Solution —
Step 1 — Analysis. The obvious solution to the problem is to combine apartments into one broadcast network! We can do this using L2 tunnels. Given that everywhere we have MikroTik at the entrance, we can configure EoIP (Ethernet over IP) or OpenVPN tunnels. In our case, each router has a real IP address, so the choice fell on EoIP, as it is easier to configure.
Step 2 — Implementation.
- Create EoIP tunnels between routers.
- Add them to local breeches, specify any but the same horizon to avoid loops.
- Change the netmask on the bridge from / 24 to / 22 (or another, if necessary).
- We also issue a netmask / 22 to customers via DHCP.
Step 3 — Dealing with complications. The broadcast is live! But now another problem has appeared – there are now 3 DHCPs in our network and the user can obtain an IP address from any of them and, accordingly, access the Internet through the tunnel, not through his/her provider. Bridged traffic does not pass the IP firewall, because of L2, which would seem to exacerbate the problem. This can be solved by placing a filter in the bridge, on ports 67-68. We add restrictions that do not allow DHCP requests (and responses) to go to the tunnels and this problem is solved.
Results — We have one broadcast network. Users communicate with each other (allegedly) directly, even a network printer in one apartment is visible to another apartment as a local device, and DLNA broadcasting passes without any problems.
Warning: Never do so for office networks. For security reasons!
Wrapping things up
Daily, IT companies face daring challenges they have to deal with. Harsh reality! And in the majority of cases, IT leaders approach DevOps outsourcing companies to help solve their issues. Our customers are in good hands as you can see. If you have any questions you are welcome to address us anytime you need. A new batch of tips from our cool DevOps team will be brought to your table shortly.