techblog-post2-2880_x2

The Kubernetes journey

Trustly, Inc. (previously known as PayWithMybank) has worked with its entire cloud operation to run more than 20 microservices since 2015 that are responsible for connecting its users with more than 1,000 banks globally. Although Trustly had a mature and stable infrastructure, the journey began in 2018 to modernize the application architecture to use containers in Kubernetes.

Overview

Trusty's environment runs entirely on Amazon Web Services (AWS). For years, it primarily had cloud orchestration using CloudFormation and Packer. We used three layers of CloudFormation: one for the network, one for reverse proxy (Nginx), and one for the application. Approximately two deployments per day were done directly by CloudFormation. This continuous deployment model served us very well. Let's call this old environment classic-infra.

In 2018, Trustly's DevSecOps team began studying Kubernetes (K8s) and turned over the production traffic to the new environment in May 2021. When we started to study Kubernetes, it became clear that it did not meet all the capabilities of our classic-infra. As a result, we researched the essential tools for our process inside the cluster. 

But first, let's talk about the Elastic Kubernetes Service (EKS)...

EKS: Since we were on AWS, it made sense for us to use the EKS to run our environment on K8s. EKS is an AWS service that provides security and availability for the Kubernetes Control Plane. When we started in 2018, the EKS version was 1.11, which created many problems and required manual processes to integrate our VPC with the EKSCTL tool. For example, you could provision the K8s cluster, but you had too many manual steps to make it all work. By the time we released the K8s cluster into production in 2021, we were already at version 1.20 of the EKS, which provided many more automation capabilities in EKSCTL.

Tooling

OpenSearch: With Kubernetes, you need to collect and centralize the logs in some way. For this reason, we needed a tool to manage application logs within Kubernetes. One of the reasons the project took three years to complete was the time it took to learn and deploy an ElasticSearch cluster using OpenDistroElasticSearch (at first). We recently upgraded to OpenSearch, which is a robust tool with all the necessary features for reliable log management and which can be integrated into Filebeat and Logstash.

Spinnaker: On the journey to deploying Kubernetes, we noticed limitations to deployment requirements. To solve those, we adopted the Spinnaker tool, developed and maintained by Netflix. Spinnaker enabled us to create a complex pipeline and the possibility to have a blue/green deployment in our cluster.

Terraform: When we started our journey, our entire environment was in CloudFormation. We had no experience with Terraform. For the new Kubernetes environments, all setup of the AWS account was created by Terraform. Terraform is responsible for creating the VPC, security groups, subnets, route tables, nat gateway, and necessary resources to run the K8s cluster.

Eksctl: This is an excellent tool for installing and configuring an EKS cluster. It manages to facilitate several steps in the installation of a new EKS cluster.

With all these tools, it was possible to maintain the same visibility we had in the classic-infra within the EKS cluster, gaining the benefits of Kubernetes in this new architecture.

The turn

To be prepared for the dynamism and fast pace of Kubernetes development, we designed–both in the VPC and in components associated with the cluster–the ability to run our application simultaneously in two clusters (routing traffic between them through ROUTE53). This strategy helped us move from classic-infra to Kubernetes, splitting the traffic into small percentages until it reached 100% traffic on K8s. 

We carefully increased traffic over weeks (starting with 1% of traffic, then moving to 5% of traffic). After many adjustments and improvements, we eventually converted 100% of traffic to Kubernetes. At that point, we were comfortable with the environment and were able to observe its stability over subsequent days.

The recipe for the success of our strategy was to keep the cluster log separated from the K8s cluster. With that, both environments could send logs to OpenSearch, and through tags, we could identify the cluster in each log line, facilitating the troubleshooting.

Conclusion

Kubernetes is a powerful tool, but it needs extra effort to install, configure and maintain, and requires other components to build a K8s cluster that is production-ready. 

This first article is an overview of the Kubernetes architecture at Trustly. If you want to see more details about autoscaling, pods health check strategy, and Java best practices using containers, follow this blog for future content.

 

Denner Padilha
Head of Cloud & Security Operations, Trustly America

Denner is Head of Cloud & Security Operations at Trustly. In early 2018 Denner joined Trustly as IT Operations Manager where he was responsible for the entire cloud infrastructure and IT operations for the north Americas.
Denner has vast experience in application performance, as well as in security auditing ISO27001, PCI and SSAE18 SOC Type 2. He previously worked as a professor in Brazil, teaching IT disciplines to undergraduates and MBA students.

  • Trustly Engineering