NameNode is the most critical resource in Hadoop core cluster. Once very large files loaded into the Hadoop Distributed File System (HDFS), the files get broken into block-sized chunks as per the parameter configured (64 MB by default). The chunks are then stored as independent units across the data nodes in the cluster. The primary responsibility of the data nodes is to hold the actual data in the form of chunk and NameNode holds the information where all the chunks located/stored in the data nodes. Basically, NameNode manages the filesystem namespace. By maintaining filesystem tree and the metadata for all the files and directories in the tree, the NameNode is getting recognized as Master Node in the entire cluster. Besides, the NameNode stores data nodes location, replicas and other details. If the NameNode in the cluster gets crushed or removed/isolated, we won’t be able to perform any kind of operations in the data nodes and eventually, Hadoop core cluster becomes incompetent. Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Analyzing the importance of the NameNode, standby or secondary NameNode was introduced in Hadoop 2.0.0. Secondary NameNode is not a backup of NameNode (Active), but instead works as helper of the primary/active NameNode by storing a copy of FsImage file and edits log. It also, periodically applies edits log records to FsImage file and refreshes the edits log. The NameNode leverages the updated FsImage file in order to avoid re-applying the EditLog records during its startup process. If NameNode fails/crushes, File System metadata can be recovered from the last saved FsImage on the Secondary NameNode but Secondary NameNode can’t be placed as the primary NameNode.
Can be contacted for real time POC development and hands-on technical training. Also to develop/support any Hadoop related project. Email:- [email protected] Gautam is a consultant as well as Educator. Prior to that, he worked as Sr. Technical Architect in multiple technologies and business domain. Currently, he is specializing in Big Data processing and analysis, Data lake creation, architecture etc. using HDFS. Besides, involved in HDFS maintenance and loading of multiple types of data from different sources, Design and development of real time use case development on client/customer demands to demonstrate how data can be leveraged for business transformation, profitability etc. He is passionate about sharing knowledge through blogs, seminars, presentations etc. on various Big Data related technologies, methodologies, real time projects with their architecture /design, multiple procedure of huge volume data ingestion, basic data lake creation etc.