Container Storage Interface

How CSI enables transparent storage service capabilities to Kubernetes clusters

CSI blog-1

 

Introduction

First of all, if you don’t know what Kubernetes is, it is the defacto standard for how to build container platform infrastructures and schedule both application- and infrastructure- resources. Kubernetes enables true cloud-native application stacks running on nearly any type of underlying infrastructure or platform. This is the future of computing and is well on its way to replace legacy IT infrastructures and platforms, including Virtualization as we see it today. If you want to know more about Kubernetes, you can read more about it here: What is Kubernetes? - kubernetes.io

 

Because Kubernetes is seen as the future of IT infrastructure, all vendors do what they can to stay relevant in that context. Kubernetes abstracts resources and schedules those resources to become available to the applications where and when they’re needed. This is a very complex task in a containerized infrastructure, and the Kubernetes project moves forward at a very high pace. For that reason, we need to have some well-defined rules for how things can connect to Kubernetes and be consumed by the applications in a secure and easy to maintain way.

 

 

CSI is important - but why?

Storage can be consumed by Kubernetes applications in many ways, but there are currently 3 major ways to do so. With in-tree drivers, flexvolume drivers, or Container Storage Interface (CSI) . At the beginning of the Kubernetes project there wasn’t much of a thought about storage at all. The reason was that for containerized cloud-native applications, the perception was that you wouldn’t need persistent storage at all. Now reality showed something else. You need persistence for some of the Kubernetes services in itself and a lot of applications, whether they’re modern applications or if you containerize your legacy applications, will need storage services, and somewhere to store and share data.

 

The first solution to the problem was to write storage drivers in the Kubernetes project, so-called in-tree drivers. To understand what in-tree drivers are, you can look at it this way; Kubernetes doesn’t do well on its own - you need to take care of networking, security, application packing and distribution, infrastructure maintenance, etc. In that context, Kubernetes works like the kernel works with an operating system.

 

Note: A common way to make Kubernetes work seamlessly and as a turn-key solution similar to an operating system, is to buy a Kubernetes distribution with all of the above is taken care of. Where the most popular one is Red Hat OpenShift Container Platform.

 

In-tree drivers are basically an integral part of the Kubernetes ( “the kernel” ) code-tree. The problem with that approach is that it’s very hard to maintain for both the Kubernetes project and all the vendors that have drivers in-tree. It’s hard to get new features approved, to test all types of storage for current and backward compatibility, hard to maintain security, etc. As a first attempt to solve this problem the flexvolume project was introduced. As flexvolumes early was considered a bad idea and is already considered legacy, I’ll skip that part and jump straight into CSI instead.

 

 

CSI is more than just a driver

Now CSI is the new way of doing persistent storage available to Kubernetes services and applications running on Kubernetes. Other methods of accessing storage are getting deprecated as they make maintaining the Kubernetes code-tree difficult and inflexible. It also makes it complicated for storage vendors as they need to stay very close to the Kubernetes project and test every version very carefully to make sure that their storage works, and that they don’t break anything when upgrading either Kubernetes or the storage drivers. It’s also way more difficult to govern security and have fixes accepted by the Kubernetes community, meaning that it’s hard to support end customers when you need to fix a bug for example.

 

CSI is a Kubernetes API framework that enables many different Kubernetes persistent storage and data services capabilities. The storage vendor will need to write CSI drivers that apply to the CSI framework specifications and can choose which of these functions that they can and will support.

 

CSI blog-2

As you can see in the image above, the CSI Framework describes different storage service capabilities. Depending on the underlying storage technology being used and which of these standards the storage vendor has complied with, the level of functionality may vary. Now with that CSI Framework, writing and maintaining all these capabilities are completely separated from the Kubernetes project. From a Kubernetes and an application point-of-view, all we need to know about is how to call the CSI API’s and request the services needed from the underlying storage infrastructure.

CSI blog-3

https://docs.openshift.com/container-platform/4.5/storage/container_storage_interface/persistent-storage-csi.html#persistent-storage-csi-architecture_persistent-storage-csi

 

CSI is typically deployed as containers in the Kubernetes cluster. The image above shows a diagram of an OpenShift cluster with CSI deployed. Here’s a high-level description of how it works:

 

External CSI Controllers

The external CSI controller deploys a pod with 3 containers:

1

An external CSI attacher container translates attach and detach calls from OpenShift Container Platform to respective ControllerPublish and
ControllerUnpublish calls to the CSI driver.

2

An external CSI provisioner container that translates provision and delete calls from OpenShift Container Platform to respective CreateVolume and DeleteVolume calls to the CSI driver.

3

A CSI driver container

 

CSI Driver DaemonSet

The CSI driver DaemonSet runs a pod on every node that allows OpenShift Container Platform to mount storage provided by the CSI driver to the node and use it in user workloads (pods) as persistent volumes (PVs).

1

A CSI driver registrar, which registers the CSI driver into the openshift-node service running on the node. The openshift-node process running on the node then directly connects with the CSI driver using the UNIX Domain Socket available on the node.

2

A CSI driver.

 

 

Automating Storage Services and infrastructure through its lifecycle

 

CSI is the modern Storage Services Framework that enables the Kubernetes project to keep developing at a high pace while maintaining high-security standards, uninterrupted operations, and maintenance, and can treat any storage infrastructure in the exact same way. Another best practice in Kubernetes is to comply with the Kubernetes operator framework.

https://kubernetes.io/docs/concepts/extend-kubernetes/operator/#operators-in-kubernetes


In short Kubernetes operators are like automated infrastructure & application janitors.

Note: Kubernetes operators just like CSI drivers are written and maintained by the vendors of each application. Therefore operators may look different for each vendor and have different levels of feature sets and maturity.

 

CSI blog-4

So with CSI driver development it’s best implemented with a high level of Operator development to complete the package. Operators are published through the Kubernetes Operator Hub https://operatorhub.io/. This is basically a catalogue that describes the operator and what it does, including its capability level . If you find an operator of interest you simply deploy it to the Kubernetes cluster by clicking install.

 

Author:
Johan Robinsson

Subscribe to blog