神刀安全网

Free Big Data Book

Prepare the Map() input
The system splits the input files into M pieces and then starts up M Map workers on a cluster of machines.
Run the user-defined Map() code
The Map worker parses key-value pairs out of the assigned split and passes each pair to the user-defined Map function. The intermediate key-value pairs produced by the Map function are buffered in memory. Periodically, the buffered pairs are written to local disk, partitioned into R regions for sharding purposes by the partitioning function (called partitioner) that is given the key and the number of reducers R and returns the index of the desired reducer.
Shuffle the Map output to the Reduce processors
When ready, a reduce worker reads remotely the buffered data from the local disks of the map workers. When a reduce worker has read all intermediate data, it sorts the data by the intermediate keys so that all occurrences of the same key are grouped together. Typically many different keys map to the same reduce task.
Run the user-defined Reduce() code
The reduce worker iterates over the sorted intermediate data and for each unique intermediate key encountered, it passes the key and the corresponding set of intermediate values to the user’s Reduce function.
Produce the final output
The final output is available in the R output files (one per reduce task).

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Free Big Data Book

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址