A soft Introduction to YARN

YARN : The Next Generation Hadoop Framework

MapReduce framework for processing Bigdata has undergone a complete change in its architecture and functionalities from hadoop-0.23 onwards. MapReduce 2.0 (MRv2) or YARN (Yet Another Resource Negotiator) is the nomenclature with which it is popularly known.

There has been a significant change in terms of resource management and application management along with solutions for problems in MRV1, like under utilization of cluster resources, single point of failure of Namenode etc. Hadoop2 (YARN) allows workloads to share cluster resources dynamically between varieties of processing frameworks like MapReduce.

I would like to take through the details of each subcomponent in the YARN framework and its architecture. Let us have a quick look of components in YARN.

Source: Apache

The core components of YARN architecture are, Clients which submit the job for processing, ResourceManager, NodeManager and Application Master.

The driving force behind MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM).

In the subsequent blogs we would see details of each component in terms of their responsibilities in the context of processing Bigdata.