Map/Reduce
is for huge data sets that have to be indexed, categorized, sorted, culled,
analyzed, etc. It can take a very long
time to look through each record or file in a serial environment. Map/Reduce allows data to be distributed
across a large cluster, and can distribute out tasks across the data set to
work on pieces of it independently, and in parallel. This allows big data to be processed in
relatively little time.
Laundromat analogy of Map/Reduce
Imagine that your data is laundry. You wash this laundry by similar colors. Then you dry this laundry by similar material
(denims, towels, shirts, etc.)
Serial Operation:
Map/Reduce operation
Word Count example of Map/Reduce
Other Potential uses of Map/Reduce
Since it takes a large data set, breaks it down into smaller data sets, here are some potential uses:
- indexing large data sets in a database
- image recognition in large images
- processing geographic information system (GIS) data - combining vector data w/ point data (Kerr, 2009) analyzing unstructured data
- analyzing stock data
- Machine learning tasks
0 comments: