Give me a Buzz: mail2boopal@gmail.com

What is Map/Reduce ?

Map/Reduce is for huge data sets that have to be indexed, categorized, sorted, culled, analyzed, etc.  It can take a very long time to look through each record or file in a serial environment.  Map/Reduce allows data to be distributed across a large cluster, and can distribute out tasks across the data set to work on pieces of it independently, and in parallel.  This allows big data to be processed in relatively little time.

Laundromat analogy of Map/Reduce

Imagine that your data is laundry.  You wash this laundry by similar colors.  Then you dry this laundry by similar material (denims, towels, shirts, etc.)

Serial Operation:

 Map/Reduce operation


Word Count example of  Map/Reduce


Other Potential uses of Map/Reduce

Since it takes a large data set, breaks it down into smaller data sets, here are some potential uses:
  •  indexing large data sets in a database
  •  image recognition in large images
  •  processing geographic information system (GIS) data - combining vector data w/ point data (Kerr, 2009) analyzing unstructured data
  •  analyzing stock data 
  • Machine learning tasks 

0 comments:

About

Contact Details