This was 4 hours workshop, I attended remotely. Instruction for hands-on exercise is here: [http://tinyurl.com/nerschadoopoct\*|http://tinyurl.com/nerschadoopoct*] Hadoop admin page is: [http://maghdp01.nersc.gov:50030/jobtracker.jsp] *my Notes* * my shell was not bash, I changed it by typing {color:#0000ff}bash \-l{color}{color:#3366ff} {color}, type {color:#0000ff}echo $SHELL{color} * {color:#0000ff}module load tig hadoop{color} * generic hadoop command: *hadoop command \[genericOptions\] \[commandOptions\]* * Create hadoop FS:{color:#0000ff} hadoop fs \-mkdir /user/balewski{color} * List its content (should be nothing now, but no error) : {color:#0000ff}hadoop fs \-ls{color} Exercise 1: create , load, read back text file to HFS {code} $ vi testfile1 This is file 1 This is to test HDFS $ vi testfile2 This is file 2 This is to test HDFS again $ hadoop fs -mkdir input $ hadoop fs -put testfile* input/ $ hadoop fs -cat input/testfile1 $ hadoop fs -cat input/testfile* $ hadoop fs -get input input $ ls input/ {code} Exercise 2: run hadoop job from the package {code} $ hadoop fs -mkdir wordcount-in $ hadoop fs -put /global/scratch/sd/lavanya/hadooptutorial/wordcount/* wordcount-in/ $ hadoop jar /usr/common/tig/hadoop/hadoop-0.20.2+228/hadoop-0.20.2+228-examples.jar wordcount wordcount-in wordcount-op $ hadoop fs -ls wordcount-op $ hadoop fs -cat wordcount-op/p* | grep Darcy {code} Monitor its progress form URL: [http://maghdp01.nersc.gov:50030/] [http://maghdp01.nersc.gov:50070/] To re-run a job you must first CLEANUP old output files:{color:#0000ff} hadoop dfs \-rmr wordcount-opd{color} Next run Hadoop on 4 reducers : {color:#0000ff}hadoop jar /usr/common/tig/hadoop/hadoop-0.20.2+228/hadoop-0.20.2+228-examples.jar wordcount{color} {color:#ff0000}\-Dmapred.reduce.tasks=4{color} {color:#0000ff} wordcount-in wordcount-op{color} Some suggestion: change user permision to allow me to read the Hadoop output because Hadopp owns all by default on the Scratch disk Or use provided script: *fixperms.sh /global/scratch/sd/balewski/hadoop/wordcount\- gpfs/* * d * d * |