Kubernetes Infrastructure Security

Kubernetes has become the defacto standard for running container applications at scale. As in any large scale platform, security is a foundational element for a healthy, robust and stable platform.

However, while many Kubernetes users are clearly aware of the security concerns regarding their application configurations – such as network segmentation (network policies), secrets configuration (using one of the key management systems available), codebase/image scanning for vulnerabilities, and others– there is another important layer of issues. That of the underlying Kubernetes infrastructure security across the main processes of CI/CD and Runtime.

CI/CD

The continuous integration process usually gets people to think about building, verifying and managing artifacts (JARs, Gems, Binaries, Docker images, AMIs, and such) with a variety of tooling and products involved in this process. When it comes to infrastructure, the assumption is that it’s already “there.” However, there are several things that we should take into consideration.

Automatic Management of Development, Staging, and Production Clusters

When managing infrastructure for your Kubernetes cluster, it is important to avoid the misconfiguration of instances connected to the cluster.

For example, developers might want to test new features in their development clusters. That means that they might want to make the cluster a little bit less secure during the development process and perhaps open some ports on the nodes, remove restricted network policies, run different AMI than the one running in production, etc. Making such configuration changes manually can accidentally affect other “higher environment” clusters.

To avoid such unintended effects, an automated and manageable way of running Kubernetes infrastructure is needed. This is where tools like Ocean by Spot kick in, taking care of all scaling activities for your Kubernetes cluster.

Central Configuration

When managing cluster infrastructure, we need to make sure that all instances are configured properly. For example, we wouldn’t want to use our staging cluster’s IAM role for our development cluster. Or we wouldn’t want to allow traffic intended for debugging on our development cluster going to our production cluster.

This is why the central configuration is so important. That means that no one should simply “launch” an instance. All instances should be created from a single configuration. Whether this configuration resides in a CloudFormation template, Terraform file or any auto-scaling mechanism that supports blueprints, we’re covered. This will ensure to avoid mixing configurations across environments.

When multiple blueprints of configurations for instances are needed in the same cluster, mechanisms such as Kops instance-groups, AWS nodegroups, GKE node-pools, or Ocean launch-specifications are the right way to go.

Updated OS Image

By using containers, we’re not free from applying updates and patches to the underlying infrastructure. That means that Kubernetes operators still need to apply those patches and update regularly. Luckily for us, cloud managed services make our lives much easier by providing up-to-date OS images for our Kubernetes clusters (whether Amazon EKS Optimized AMI, COS Image of GKE, or any other cloud provider-specific instance).

However, we still need to make sure that there is a process that auto-updates the AMI. Such processes can be either creating alerts when new AMIs are available or using automated services such as:

Ocean’s AMI-Auto-Update for ECS
Kops upgrades which automatically update the AMI configuration on the instance groups managed by Kops
AWS official eksctl tool which uses immutable upgrades for node group upgrades (creating a new nodegroup with updated AMI)

Runtime

Security does not end when applications have been securely deployed to the cluster. We should make sure that our infrastructure is able to react to security threats on an ongoing basis. Below are some of the infrastructure security concerns that need to be addressed while running applications on Kubernetes.

Restricted Access to Infrastructure

Access to the underlying infrastructure should be restricted and management should be fully automated to reduce human errors. With Ocean by Spot.io, all configurations and scaling activities are fully automated, making it easy for the Kubernetes administrator to restrict access to only those who absolutely need it.

Automated Scaling

When demand increases for your application, your pods start to scale-out (usually using Kubernetes Horizontal Pod Autoscaler). In that case, you should have enough capacity in your cluster for those additional pods to get to a “running” state.

However, trying to manually launch additional nodes to get the needed capacity can result in misconfigured nodes due to human error. This can result in usage of old OS Images, incorrect security-group rules, instances being launched in public subnets instead of a private subnet and other potential security risks. Here too, Ocean by Spot.io can help out with properly configured, container-driven autoscaling that takes into account the requirements of all the pods that are running and that are waiting to launch.

Reacting to Attacks

When running your infrastructure in an automated way, there is a potential risk of losing control over your infrastructure scale in case of an attack on your application. The reason for this, is that when your application is under attack, your pods will begin to scale-out to support the surge in traffic/usage, which in turn will trigger infrastructure scale-out.

To prevent this, Ocean allows you to define not only the maximum number of instances, but also cluster-wide resource limits of CPU and memory. This is very useful when you have heterogeneous instances in your cluster, each with different sizes and therefore with different capacities for pods in different instances. That means that when your cluster reaches the threshold of the amount of CPU allowed in your cluster, scaling will be suspended.

Unauthorized Instance Types

Managing heterogeneous instances for your Kubernetes cluster requires the administrator to create different blueprints for all of the instance-types/sizes that the application needs to run on. That means that while implementing autoscaling techniques on the cluster the operator still needs to limit the instances that are not allowed to run in the cluster (whether it’s from a cost or performance perspective).

This can be achieved by creating ASGs that support multiple instance-types, using GKE node auto-provisioning which create and delete node pools automatically or using Ocean’s allow/deny list for instance-types. With Ocean, you can create a subset of allowed instance types for the cluster, while allowing flexibility for your developers to specify any nodeAffinity/nodeSelector for the built-in instance-type label in Kubernetes. Ocean will launch the desired instance-types that satisfy the nodeAffinity/nodeSelector only if this instance type is in the allow-list.

Summary

In closing while security at the container level is of the utmost importance, we must pay close attention to how we are handling the underlying infrastructure of our Kubernetes cluster. Hopefully the concepts and suggestions above will go a long way in keeping your containerized environment as safe as possible.

CI/CD