Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

This was 4 hours workshop, I attended remotely.

Instruction for hands-on exercise is here: 

http://tinyurl.com/nerschadoopoct*

Hadoop admin page is:

http://maghdp01.nersc.gov:50030/jobtracker.jsp

my Notes

  •  my shell was not bash, I changed it by typing bash -l , type echo $SHELL
  • module load tig hadoop
  • generic hadoop command: hadoop command [genericOptions] [commandOptions]
  • Create hadoop FS: hadoop fs -mkdir /user/balewski
  • List its content (should be nothing now, but no error) : hadoop  fs -ls

 Exercise 1:  create , load, read back text file to HFS 

Code Block
$   vi testfile1
This is file 1
This is to test HDFS
$   vi testfile2
This is file 2
This is to test HDFS again
$   hadoop fs -mkdir input
$   hadoop fs -put testfile* input/
$   hadoop fs -cat input/testfile1
$   hadoop fs -cat input/testfile*
$  hadoop fs -get input input
$   ls input/

Exercise 2: run hadoop job from the package

Code Block
$   hadoop fs -mkdir wordcount-in
$   hadoop fs -put /global/scratch/sd/lavanya/hadooptutorial/wordcount/* wordcount-in/
$   hadoop jar /usr/common/tig/hadoop/hadoop-0.20.2+228/hadoop-0.20.2+228-examples.jar wordcount wordcount-in wordcount-op
$   hadoop fs -ls wordcount-op
$   hadoop fs -cat wordcount-op/p* | grep Darcy

Monitor its progress form URL: http://maghdp01.nersc.gov:50030/ http://maghdp01.nersc.gov:50070/

To re-run a job you must first CLEANUP old output files: hadoop dfs -rmr wordcount-opd

Next run Hadoop on 4 reducers : hadoop jar /usr/common/tig/hadoop/hadoop-0.20.2+228/hadoop-0.20.2+228-examples.jar wordcount -Dmapred.reduce.tasks=4   wordcount-in wordcount-op

Some suggestion: change user permision to allow me to read the Hadoop output because Hadopp owns all by default on the Scratch disk
Or use provided script: fixperms.sh /global/scratch/sd/balewski/hadoop/wordcount- gpfs/

  • d
  • d
  •