Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. init converting input text to <key,value> format (my particular choice of the meaning )
  2. mapp and reduce functions, run in pair, multiple iterations
  3. finish function exporting final list of pages ordered by page rank.
  4. I allocated the smallest (least expensive) CPUs at EC2 : ami=ami-6159bf08, instance_type=m1.small
    The goal was to perform all ini + N_iter + fin steps using 10 nodes & hadoop framework.
Test 1: execution of the full chain for

...

ini +2 iter +

...

fin

using a ~10% sub-set of wikipedia pages (enwiki-20090929-one-page-per-line-part3)

...

  1. copy local file to HDFS : ~2 minutes
  2. init : 410 sec
  3. mapp/reduce iter 0 : 300 sec
  4. mapp/reduce iter 1 : 180 sec
  5. finish : 190 sec
    Total time was 20 minutes , 11 CPUs were involved.
Test 2: execution of a single map/reduce step on 27M linked pages,

using full set of wikipedia pages (enwiki-20090929-one-page-per-line-part1+2+3). I did minor modification of map/reduce code which could slow it down by ~20%-30%.

...