Home/Blog/All/Kubernetes Node Internals — Part 2: Bootstrap

Tech

Kubernetes Node Internals — Part 2: Bootstrap

Part 2 of a 5-part series: Linux namespaces, cgroups, networking primitives, TLS bootstrap, and how a node earns trust from the cluster.

← Back to all blogsMarch 22, 2026#tech#kubernetes#linux#containers

Kubernetes Node Internals — Part 2: Bootstrap

Part 1 introduced the main characters on a node: kubelet, the container runtime, runc, kube-proxy, and the Linux kernel underneath them all.

Now we can ask a more interesting question:

How does a machine go from "just another Linux host" to "a trusted Kubernetes node that can run Pods"?

That transformation has two halves:

  1. the machine must have the right Linux primitives for isolation and resource control
  2. the kubelet must complete a trust handshake with the API server

This is where containers stop being abstract and start becoming very concrete.

Series roadmap

  1. Part 1 — The anatomy of a node
  2. Part 2 — Bootstrap and the secret handshake
  3. Part 3 — A pod is born
  4. Part 4 — Keeping the node alive
  5. Part 5 — CSI, volumes, and mounts on the node

Linux namespaces — the isolation primitives

When developers say, "A container thinks it owns the machine," what they really mean is:

the process sees a carefully restricted view of the machine.

That restricted view is created mainly through Linux namespaces.

A namespace says, in effect:

"For this process and its children, show a different version of this system resource."

Here are the most important ones.

NamespaceWhat it isolatesWhat the process sees
pidProcess IDsA small process tree instead of the whole host
netNetwork stackIts own interfaces, routes, firewall view, and ports
mntMount pointsIts own filesystem mount layout
utsHostname and domain nameIts own hostname
ipcShared memory and IPC objectsIts own IPC world
userUser and group ID mappingsA remapped identity model

How a container "thinks" it owns the machine

Imagine starting a process inside its own namespaces.

  • In its PID namespace, it may see itself as PID 1.
  • In its network namespace, it sees a network interface like eth0 and a default route, but not the host's full networking setup.
  • In its mount namespace, / points to the container's root filesystem, not the host root.
  • In its UTS namespace, it can have its own hostname.

From inside that process, the world looks self-contained.

From the host's perspective, nothing magical happened. It is still just another Linux process, visible in the host PID tree, scheduled by the same kernel, using the same real CPU and memory.

That is the first deep Kubernetes lesson:

A container is not a tiny VM. It is a regular Linux process with a modified view of reality.

Namespace isolation diagram per process

Rendering diagram…

cgroups — the resource accountant

Namespaces provide isolation of view.

cgroups provide control of resources.

If namespaces answer, "What does the process see?" then cgroups answer, "How much CPU, memory, and I/O can the process consume?"

This is how Kubernetes resource requests and limits eventually become enforceable at the Linux level.

What cgroups do

cgroups can track and control resource usage for a group of processes, including:

  • CPU shares or quotas
  • memory limits
  • block I/O behavior
  • process counts

So when a Pod has:

yaml
resources:
  limits:
    cpu: "1"
    memory: "1Gi"

the runtime translates that into cgroup configuration that the kernel can actually enforce.

cgroups v1 vs v2

You will often hear about cgroups v1 and cgroups v2.

At a high level:

VersionShapeMental model
v1Multiple controller hierarchiesOlder, fragmented, historically widespread
v2Unified hierarchyCleaner, more consistent, increasingly the default

You do not need to memorize every file under /sys/fs/cgroup to reason correctly about Kubernetes.

The practical takeaway is simpler:

  • Kubernetes expresses desired CPU and memory policy
  • the container runtime maps that into cgroup settings
  • the kernel enforces those settings

How limits are enforced at the kernel level

If a container tries to use more CPU than its quota allows, the kernel throttles it.

If it tries to use more memory than its cgroup limit allows, the kernel may invoke the OOM killer inside that cgroup context.

That is why Kubernetes resource behavior is not just scheduler bookkeeping. At the end of the chain, there is real kernel enforcement.

Networking groundwork

Before a Pod can talk to anything, the node needs some network plumbing in place.

Kubernetes networking often feels complicated because several layers interact:

  • Linux networking primitives
  • the CNI plugin
  • kube-proxy
  • route tables
  • service VIP translation

Let's break it down into the building blocks.

veth pairs

A veth pair is like a virtual Ethernet cable with two ends.

Packets that enter one end appear on the other.

Typically, one end sits inside the Pod's network namespace as something like eth0, while the other end remains in the host namespace and is attached to some higher-level construct such as a bridge.

Bridges

A Linux bridge acts like a virtual switch.

Multiple Pod-facing veth endpoints can attach to it, allowing local connectivity on the node. Depending on the CNI plugin, this bridge may be a simple Linux bridge, a more advanced virtual device, or a datapath implemented differently.

Route tables

Once a Pod has an IP, the node needs routes that say where packets should go.

That includes:

  • local Pod-to-Pod traffic on the same node
  • traffic to Pods on other nodes
  • traffic to Services
  • traffic out to the internet or VPC network

iptables chains kube-proxy manages

In the classic kube-proxy model, Services are implemented by programming iptables chains.

So when a Pod sends traffic to a Service ClusterIP:

  1. the packet hits rules installed by kube-proxy
  2. the packet is matched to the Service virtual IP
  3. kube-proxy jumps into service-specific chains
  4. one backend Pod endpoint is selected
  5. destination NAT rewrites the packet toward the chosen Pod IP

iptables chain flow for a Service ClusterIP

Rendering diagram…

The exact chain names and mode can vary, but this is the core idea: a Service IP is a routing trick, not a real process listening on the network.

TLS bootstrapping — the secret handshake

So far we have talked about Linux mechanics. Now we move to the trust relationship.

How does the control plane decide that this kubelet is allowed to join the cluster and report status?

Through TLS bootstrapping.

This is the node's secret handshake with the API server.

The bootstrap token flow

At first, a new kubelet does not yet have a long-lived client certificate. Instead, it may be given a bootstrap token.

That token is only enough to say:

"I am a joining node. Please let me request a proper identity."

The rough flow looks like this:

  1. kubelet starts with bootstrap credentials
  2. kubelet connects to the API server
  3. kubelet submits a CertificateSigningRequest (CSR)
  4. the CSR is approved, often automatically according to cluster policy
  5. kubelet receives a signed client certificate
  6. kubelet now authenticates with that certificate going forward

Certificate Signing Request to the API server

The kubelet asks the control plane for a certificate instead of inventing trust locally.

That matters because the cluster wants a centrally recognized identity. A random process should not be able to say, "Trust me, I am node worker-12."

The API server and certificate approvers participate in deciding whether that identity should be issued.

kubelet's client cert and the kubelet-serving cert

There are two certificates people often mix up.

1. kubelet client certificate

This is the certificate the kubelet uses as a client when talking to the API server.

It proves the kubelet's identity to the control plane.

2. kubelet-serving certificate

This is the certificate the kubelet can use as a server for inbound TLS connections to the kubelet itself.

They solve different problems:

  • client cert: "kubelet proves who it is to the API server"
  • serving cert: "others can securely talk to the kubelet"

Rotation — how certs are renewed automatically

Certificates expire, so Kubernetes supports automatic rotation.

The kubelet can request renewed certificates before the old ones expire, which keeps long-lived nodes from turning into time bombs.

This is one of those details that feels boring until it breaks. If node certificate rotation fails silently, a previously healthy node may suddenly stop being able to authenticate.

TLS bootstrap sequence diagram

Rendering diagram…

Bootstrap failure symptoms you should recognize fast

Most bootstrap incidents look confusing at first, but they usually collapse into a small set of patterns.

1. CSR stays pending or gets rejected

If the kubelet cannot get a signed client certificate, node trust establishment never completes.

Typical symptoms:

  • Node never becomes Ready
  • CSR objects remain pending for too long
  • kubelet repeatedly retries bootstrap authentication

2. Node object never registers or keeps flapping

Sometimes credentials exist, but node registration still fails or oscillates.

Typical symptoms:

  • node appears briefly, then disappears or becomes NotReady
  • repeated authentication/authorization errors in kubelet logs
  • bootstrap worked once, but subsequent reconnects fail

3. Certificate rotation silently breaks later

A node can join successfully and still fail months later if certificate rotation fails.

Typical symptoms:

  • previously stable node suddenly cannot authenticate
  • heartbeat path degrades despite machine being up
  • expiration windows are crossed without successful renewal

This is why bootstrap is not a one-time ceremony. It is the beginning of an ongoing trust lifecycle.

Node registration and the Node object

Once kubelet has working credentials, it can register the node with the API server.

That means creating or updating the cluster's Node object.

The Node object is Kubernetes' control-plane representation of the machine.

It includes information such as:

  • the node's name
  • labels and annotations
  • capacity
  • allocatable resources
  • addresses
  • runtime details
  • health conditions

What kubelet writes on first join

When a node first appears, the kubelet reports information like:

  • CPU and memory capacity
  • ephemeral storage information
  • supported runtime version
  • OS and architecture
  • node addresses

This lets the rest of the system reason about the machine.

For example, the scheduler later uses node capacity and allocatable information to decide whether a Pod can fit.

Node conditions

The kubelet also updates Node conditions, such as:

  • Ready
  • MemoryPressure
  • DiskPressure
  • PIDPressure
  • NetworkUnavailable

These conditions communicate the node's health to the control plane.

Capacity vs allocatable resources

These two are related but not identical.

FieldMeaning
CapacityThe total resources physically available on the node
AllocatableThe resources Kubernetes is willing to make available for Pods

Why the difference?

Because the node itself needs resources too.

The operating system, kubelet, container runtime, system daemons, and reserved slices all consume CPU and memory. Kubernetes therefore reasons with allocatable, not raw capacity, when placing workloads.

Concepts introduced

Part 2 also introduces two security concepts that matter a lot once you think about node trust.

RBAC for the kubelet user: system:node

After successful bootstrap, a kubelet authenticates as a node identity, typically in the system:nodes group.

RBAC rules grant that identity only the permissions a node should have.

That means the kubelet is not a cluster admin by default. It has a narrowly scoped role tied to node operations.

Node Authorizer — why each kubelet can only read its own Pods

RBAC is only part of the story.

Kubernetes also has a Node Authorizer, which further restricts what a kubelet can access.

The intuition is straightforward:

A kubelet should only learn about the Pods and secrets it needs in order to run workloads assigned to its own node.

That design limits blast radius. If one node is compromised, Kubernetes tries to avoid giving that kubelet blanket visibility into everything in the cluster.

This is also the missing link between node identity and secret access:

  • identity says who the kubelet is
  • authorization decides what it can read

So becoming a trusted node does not mean becoming an all-seeing cluster principal.

Final mental model

By the time a machine is a real Kubernetes node, two big things are already true:

  1. Linux isolation and resource primitives are ready underneath it
  2. the API server trusts the kubelet enough to treat it as a node identity

That is the quiet setup work before any application container starts.

In Part 3, we switch from trust to execution: once identity is established, kubelet can begin the runtime path of sandbox creation, networking, and process launch.

And now the stage is set for the fun part.

"The node is registered. The API server trusts it. Now, a scheduler has just decided to send it a pod. What happens next is where things get interesting."

Next: Part 3 — A pod is born