Page History

Wiki Markup
This was 4 hours workshop, I attended remotely.

...



Instruction for hands-on exercise is here

...

:&nbsp;

[http://tinyurl.com/nerschadoopoct\*|http://tinyurl.com/nerschadoopoct*

...

]

Hadoop admin page is:

...



[http://maghdp01.nersc.gov:50030/jobtracker.jsp

...

]


*my

...

 Notes*

* &nbsp;my shell was not bash, I changed it by typing {color:#0000ff}bash \-l{color}{color:#3366ff}&nbsp;{color}, type {color:#0000ff}echo $SHELL{color}
* {color:#0000ff}module load tig hadoop{color}
* generic hadoop command:&nbsp;*hadoop command \[genericOptions\] \[commandOptions\]*

...


* Create hadoop FS

...

:{color:#0000ff}&nbsp;hadoop fs \-mkdir /user/balewski

...

{color}
* List its content (should be nothing now, but no error)

...

Exercise 1: create , load, read back text file to HFS

Code Block

 :&nbsp;{color:#0000ff}hadoop &nbsp;fs \-ls{color}

&nbsp;Exercise 1: &nbsp;create , load, read back text file to HFS&nbsp;
{code}
$   vi testfile1
This is file 1
This is to test HDFS
$   vi testfile2
This is file 2
This is to test HDFS again
$   hadoop fs -mkdir input
$   hadoop fs -put testfile* input/
$   hadoop fs -cat input/testfile1
$   hadoop fs -cat input/testfile*
$  hadoop fs -get input input
$   ls input/
{code}

Exercise 2: run hadoop job from the package

...


{code

}
$   hadoop fs -mkdir wordcount-in
$   hadoop fs -put /global/scratch/sd/lavanya/hadooptutorial/wordcount/* wordcount-in/
$   hadoop jar /usr/common/tig/hadoop/hadoop-0.20.2+228/hadoop-0.20.2+228-examples.jar wordcount wordcount-in wordcount-op
$   hadoop fs -ls wordcount-op
$   hadoop fs -cat wordcount-op/p* | grep Darcy
{code}
Monitor its progress form URL: [http://maghdp01.nersc.gov:50030/] [http://maghdp01.nersc.gov:50070/

...

]

To re-run a job you must first CLEANUP old output files:

...

{color:#0000ff}&nbsp;hadoop dfs \-rmr wordcount-opd

...

{color}

Next run Hadoop on 4 reducers

...

 :&nbsp;{color:#0000ff}hadoop jar /usr/common/tig/hadoop/hadoop-0.20.2+228/hadoop-0.20.2+228-examples.jar wordcount{color} {color:#ff0000}\-Dmapred.reduce.tasks=4

...

{color} {color:#0000ff}&nbsp; wordcount-in wordcount-op

...

{color}

Some suggestion: change user permision to allow me to read the Hadoop output because Hadopp owns all by default on the Scratch disk

...


Or use provided script:

...

&nbsp;*fixperms.sh /global/scratch/sd/balewski/hadoop/wordcount\- gpfs/

...

*


* d
* d
* &nbsp;

Child pages

Versions Compared

Old Version 6

New Version 7

Key