MapReduce logo

MapReduce in a nutshell

We talked about MapReduce in previous articles. Many people asked me since then what it is. MapReduce is a programming model for processing huge amounts of data in a parallel and distributed. In this model, there are two tasks that are undertaken Map and Reduce and there is a map function that processes a key/value pair to generate a set of intermediate key/value pairs and a reduce function that merges all intermediate values associated with the same intermediate key. MapReduce is used by Hadoop.

example– Consider the problem of counting the number of occurrences of each word in a large collection of documents.

The map function emits each word plus an associated count of occurrences (just ‘R’ in this simple example). The reduce function sums together all counts emitted for a particular word.