![]() ![]() This architecture consist of a single NameNode performs the role of master, and multiple DataNodes performs the role of a slave.īoth NameNode and DataNode are capable enough to run on commodity machines. The Hadoop Distributed File System (HDFS) is a distributed file system for Hadoop. The master node includes Job Tracker, Task Tracker, NameNode, and DataNode whereas the slave node includes DataNode and TaskTracker. The MapReduce engine can be MapReduce/MR1 or YARN/MR2.Ī Hadoop cluster consists of a single master and multiple slave nodes. The Hadoop architecture is a package of the file system, MapReduce engine and the HDFS (Hadoop Distributed File System). ![]() Hadoop Common: These Java libraries are used to start Hadoop and are used by other Hadoop modules.The output of Map task is consumed by reduce task and then the out of reducer gives the desired result. The Map task takes input data and converts it into a data set which can be computed in Key value pair. Map Reduce: This is a framework which helps Java programs to do the parallel computation on data using key value pair.Yarn: Yet another Resource Negotiator is used for job scheduling and manage the cluster.It states that the files will be broken into blocks and stored in nodes over the distributed architecture. Google published its paper GFS and on the basis of that HDFS was developed. Moreover it can be scaled up just by adding nodes in the cluster. It is used for batch/offline processing.It is being used by Facebook, Yahoo, Google, Twitter, LinkedIn and many more. Hadoop is written in Java and is not OLAP (online analytical processing). Hadoop is an open source framework from Apache and is used to store process and analyze data which are very huge in volume.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |