Hadoop Distributed File System

7/22/2014

The Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop.

HDFS provides high-performance access to data across Hadoop clusters. HDFS has become a key tool for managing pools of big data and supporting big data analytics applications.

HDFS is mostly deployed on low-cost commodity hardware therefore server failures are common. Therefore this system is designed to be highly fault-tolerant by facilitating the rapid transfer of data between compute nodes and enabling Hadoop systems to continue running if a node fails. That decreases the risk of catastrophic failure even when numerous nodes fail.

HDFS takes in data, it breaks the information down into separate pieces and distributes them to different nodes in a cluster allowing for parallel processing. The file system copies each piece of data multiple times and distributes the copy to individual nodes, placing at least one copy on a different server rack as a result the data on nodes that crash can be found within a cluster which allows processing to continue without any halt.

0 Comments

Hadoop Distributed File System

Leave a Reply.

Author

Archives

Categories