Tutorial : Hadoop -- Architecture

For other tutorials and index of tutorials in this series click here : Tutorial : Hadoop -- Main Thread

Now as we are aware of the common terminologies that are involved, lets get on to the architecture of Hadoop.

Hadoop has two major components:
-> Distributed File System Component (Also called Hadoop Distributed File System)
-> MapReduce Component (which is a framework for performing calculations on data stored in distributed file system).

I am going to explain in brief about MapReduce component in this tutorial and then go on a detailed explanation of HDFS over two separate threads.

MapReduce Engine:

MapReduce is a technology from Google. MapReduce program consists of a map function and a reduce function. A scheduled MapReduce function is called as a MapReduce job.

A MapReduce job is broken into map tasks that run in parallel and reduce tasks that run in parallel too. This was a brief explanation of MapReduce. A detailed explanation of MapReduce will be covered in later tutorials.

Now lets go on to HDFS.

Hadoop Distributed File System (HDFS)

HDFS runs on top of the existing file system on each node of the Hadoop cluster. It is designed to handle very large files with streaming data access patterns.

The larger the file, the less time Hadoop spends seeking for the next data location on the disk and most times Hadoop runs at the limit of the bandwidth of the disks. As everyone would know, seeks are pretty expensive operations and are useful only when you only need to analyze a small subset of a data set.

Since, Hadoop is designed to run over the entire data set, it is best to minimize seeks by using large files. Hadoop is designed for streaming or sequential data access rather than random access.

Sequential data access means fewer seeks, since Hadoop only seeks the beginning of each block and begins reading sequentially from there.

Hadoop uses blocks to store file or parts of file as shows below:


I am going to cover more about HDFS blocks and replication in the next tutorial.

Check in the Main thread here for links to other tutorials : Tutorial : Hadoop -- Main Thread


You are reading an archived discussion.

Related Posts

Kaustubh KatdareSpend some time here and you will become familiar with this site quickly.hai sir...i am doing final year instrumentation and control engineering....can u give some ideas for my final...
Samsung unpacked is scheduled on September 4 in Berlin, but we've photos of the leaked Samsung Galaxy Gear Smartwatch for the impatient ones. The nicest folks at VentureBeat got access...
Wishing you many happy returns of the day @anoopthefriend and @mayurpathak Have a blast, buddy. 😀
LG, the South Korean manufacturer, had got its tablet LG G Pad 8.3 under lot of anticipation and rumors. But without much adieu, ahead of the IFA 2013 Berlin launch,...
Yes friends its a dream come true for a automobile fanatic in India.One of the big Auto giants in India-Hindustan Motors had today launced an all new car for its...