Storage
Master Kubernetes storage: Volumes, PersistentVolumes, PersistentVolumeClaims, StorageClasses, and StatefulSets.
1. Why Storage is Different in Kubernetes
Containers are stateless by design โ any data written inside a container's filesystem is lost when the container crashes or is restarted. This is perfectly fine for stateless web servers, but databases, file storage systems, and any persistent application require storage that survives container lifecycle events.
Kubernetes solves this with a layered storage system: Volumes โ PersistentVolumes โ PersistentVolumeClaims โ StorageClasses.
2. Volumes โ Ephemeral Container Storage
A Volume is a directory accessible to containers in a Pod. Unlike a container's filesystem, a Volume's lifetime is tied to the Pod โ it persists through container restarts within the same Pod, but is destroyed when the Pod is deleted.
Common Volume Types
- emptyDir: Created fresh when a Pod is assigned to a node. Empty at start. Shared between all containers in the Pod. Deleted when the Pod is removed. Only use for scratch space or inter-container communication.
- hostPath: Mounts a file or directory from the host node's filesystem into the Pod. Useful for accessing node-level data (Docker socket, /proc). NOT portable โ the data lives on one specific node.
- configMap / secret: Injects configuration data or sensitive values as files or environment variables. Read-only.
- nfs: Mounts an NFS share from a remote NFS server. Survives pod restarts and can be mounted ReadWriteMany by multiple pods simultaneously.
apiVersion: v1
kind: Pod
metadata:
name: volume-demo
spec:
volumes:
- name: shared-data # Define volumes at Pod level
emptyDir:
- name: config-vol
configMap:
name: app-config
containers:
- name: app
image: nginx
volumeMounts:
- name: shared-data
mountPath: /usr/share/nginx/html # Mount inside container
- name: config-vol
mountPath: /etc/config
- name: sidecar # Second container in same Pod
image: busybox
volumeMounts:
- name: shared-data
mountPath: /data # Both containers share same emptyDir! 3. The PV/PVC System โ Cluster Persistent Storage
For truly persistent storage that outlives Pods, Kubernetes uses a two-tier system that separates storage administration from storage consumption:
- PersistentVolume (PV): The actual storage resource in the cluster. Can be a cloud disk (AWS EBS, GCP PD), NFS share, or even a local disk. Created by a cluster administrator. Has a capacity, access mode, and reclaim policy.
- PersistentVolumeClaim (PVC): A request for storage by a developer. Specifies desired size, access mode, and optionally a StorageClass. Kubernetes automatically binds it to a suitable PV.
## ADMIN creates PersistentVolume
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce # RWO: only one node can mount r/w
persistentVolumeReclaimPolicy: Retain # Don't delete data when PVC is removed
hostPath:
path: /mnt/data # Example: using node local path
---
## DEVELOPER creates PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi # Request 5Gi โ will bind to the 10Gi PV above
---
## POD uses PVC
apiVersion: v1
kind: Pod
metadata:
name: db-pod
spec:
volumes:
- name: db-storage
persistentVolumeClaim:
claimName: my-pvc # Reference the PVC, never the PV directly!
containers:
- name: postgres
image: postgres:15
volumeMounts:
- name: db-storage
mountPath: /var/lib/postgresql/data Access Modes Explained
| Access Mode | Short | Description |
|---|---|---|
| ReadWriteOnce | RWO | One node mounts r/w. Most cloud disks (EBS, PD). |
| ReadOnlyMany | ROX | Many nodes mount read-only. Good for config/static assets. |
| ReadWriteMany | RWX | Many nodes mount r/w simultaneously. Requires NFS, CephFS, or Azure File. |
4. StorageClasses โ Dynamic Provisioning
Manually creating PVs for every application is tedious. StorageClass automates PV creation. When a PVC references a StorageClass, Kubernetes automatically creates a matching PV from the cloud provider.
## Define a StorageClass (usually done by cluster admin or cloud provider)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: ebs.csi.aws.com # AWS EBS CSI driver
parameters:
type: gp3
encrypted: "true"
reclaimPolicy: Delete # Delete EBS volume when PVC is deleted
volumeBindingMode: WaitForFirstConsumer # Only create volume when pod is scheduled
---
## PVC referencing the StorageClass
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fast-db-pvc
spec:
storageClassName: fast-ssd # Reference the StorageClass
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi # AWS EBS disk created automatically! 5. StatefulSets โ Stateful Applications
For stateful applications like databases, Kubernetes provides StatefulSets. Unlike Deployments (where pods are interchangeable), StatefulSet pods have:
- Stable, unique network identities: Pods are named with an ordinal index (
db-0,db-1,db-2) and keep those names on restart. - Stable persistent storage per pod: Each pod gets its own PVC via
volumeClaimTemplates. Whendb-0is rescheduled to another node, it reattaches to the same PVC โ its data is never lost. - Ordered, graceful deployment: Pods start in order (0, 1, 2) and terminate in reverse (2, 1, 0). Critical for leader-follower replication setups like MySQL or Kafka.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres-headless # Required for stable DNS
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates: # Each pod gets its OWN PVC!
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 10Gi
# Result: postgres-0 โ data-postgres-0 (PVC)
# postgres-1 โ data-postgres-1 (PVC)
# postgres-2 โ data-postgres-2 (PVC) Deployment vs StatefulSet โ When to Use Each
Use Deployment for:
- โ Stateless web servers
- โ API services
- โ Batch workers
- โ When pods are interchangeable
Use StatefulSet for:
- โ Databases (MySQL, PostgreSQL)
- โ Distributed stores (Kafka, Zookeeper)
- โ Search engines (Elasticsearch)
- โ When each pod needs its own identity/storage