Skip to content
>GLB_
Go back

The Architecture of HDFS: NameNode, DataNodes, and Metadata

HDFS (Hadoop Distributed File System) was built to support the reliable storage and access of large datasets distributed across commodity hardware. To make this possible, HDFS relies on a master/slave architecture composed of two main types of nodes: the NameNode and the DataNodes.


1. The NameNode (Master)

The NameNode is the brain of HDFS. It manages:

However, the NameNode does not store the file data itself—only metadata about the data.

Example:

If you store a 300MB file (with 128MB block size), the NameNode will know:


2. The DataNodes (Workers)

DataNodes are responsible for:

They don’t know what they’re storing—just that they hold a block identified by a block ID.


3. The Client

The client interacts with both:

This design reduces the load on the NameNode and allows for high-throughput data transfer.


4. Block-Based Storage

Files in HDFS are split into large blocks (usually 128MB or 256MB). These blocks are:


5. How Metadata Is Stored

The NameNode stores metadata in memory for fast access, and persists it to disk in:

On restart, the NameNode combines these to restore its state.


Summary of HDFS Architecture

ComponentRole
NameNodeStores metadata and controls the system
DataNodesStore actual file data (blocks)
ClientReads/writes data by talking to both

This architecture allows HDFS to scale horizontally and handle very large volumes of data reliably, even in the face of hardware failures.


Share this post:

Previous Post
How Metadata Works in HDFS and What It Stores
Next Post
What Happens When HDFS Splits Files Mid-Word or Mid-Row?