Map Reduce

8/1/2014

MapReduce is a software framework that allows developers to write programs to process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers. It was developed at Google for indexing Web pages in 2004.

The framework is divided into two parts:

Map – A function that parcels out work to different nodes in the distributed cluster.

Reduce – This function collates the work and resolves the results into a single value.

The MapReduce framework is fault-tolerant because each node in the cluster is expected to report back periodically with completed work and status updates. If a node remains silent for longer than the expected interval then the master node makes note and re-assigns the work to other nodes.

The key to how MapReduce works is to take input as a list of records. These records are split among the different computers in the cluster by Map. The result of the Map computation is a list of value pairs. Reduce then takes each set of values that has the same key and combines them into a single value i.e. Map takes a set ofdata chunks and produces key pairs and Reduce merges things so that instead of a set of key pair sets we get one result. MapReduce is intended to provide a lightweight way of programming things so that they can run fast by running in parallel on a lot of machines.

0 Comments

Map Reduce

Leave a Reply.

Author

Archives

Categories