Introduction

I've collected information about tricks that a newbie might not know, but which is useful to help you get around computational work. I'll try to keep adding stuff as I learn it. Please add your tricks too!

Parallel computing

I'm trying to get a better sense about designing parallel scripts. Typically, I've inherited someone's code and I've made it work for me. However, I have been looking for a good basic resource and I've found at least one site that looks promising. They have a few free courses that look relevant like "Parallel computing explained", "Introduction to MPI" and "Intermediate MPI". I found this by looking at an MIT computing course which pointed to this site.

http://www.citutor.org/index.php

Although it's got a lot of basic information, it's hard to figure out how it helps because I'm really not sure what type of clusters I'm actually using (i.e. which parts are relevant to me). Didn't really help me do any actual coding yet, although some background about computers was semi-interesting.

How to find stuff out about computing clusters

I wanted to know whether there was a website where you could just find out about how to run stuff on a computer cluster (i.e. beagle, aces, or coyote). Basically, Scott said that only the sys admin knows all of the specific rules associated with each cluster and if you don't pick their brain about it, you won't really know how to use it right. I will hopefully pick brains for you and put it on this website in another post about each system. That's a work in progress.

You can find out about specifics of aces queues with:

qstat -Qf | more

or

qstat -q

Which results in this on aces:

server: login

Queue            Memory CPU Time Walltime Node  Run Que Lm  State
--------------- ---- ------ ------ --  -- -- -  -----
geom               -      -       -      -    0   0 --   E R
one                -      -    06:00:00     1   8 319 10   E R
four-twelve        -      -    12:00:00   --    8   4 10   E R
four               -      -    02:00:00    16   8 437 10   E R
long               -      -    24:00:00    16   1   0 10   E R
all                -      -    02:00:00  1024   0   0  4   E R
mchen              -      -    02:00:00  1024   0   0  4   E R
mediumlong         -      -    96:00:00    30   0   0 10   E R
special            -      -       -       36   0   0 -   E R
toolong            -      -    168:00:0     4   0   0 10   E R
                                               ---- ----
                                                  25   760

An this on coyote:

server: wiley

Queue            Memory CPU Time Walltime Node  Run Que Lm  State
--------------- ---- ------ ------ --  -- -- -  -----
speedy             -      -    00:30:00   -    0   0 -   E R
short              -      -    12:00:00   -    2  -2 -   E R
long               -      -    48:00:00   -   68  46 -   E R
quick              -      -    03:00:00   -    0   0 -   E R
be320              -      -    00:30:00   -    0   0 -   E R
ultra              -      -    336:00:0   -    2   0 -   E R
                                               ---- ----
                                                  72    44

You can also use this to find more information about qsub (I would be in a place like the head node because not all nodes have the same qsub data):

man qsub

You can find out more about the various flags you can use with qsub.

Queuing system on clusters

Never run anything on the head node!!! When you log into a cluster, you need to submit jobs to a queue or work interactively on a dedicated interactive node. The dedicated interactive nodes will have different names, so you just have to find them. Sometimes you can request nodes by qsub -I or qsub to a dedicated interactive node (qubert on aces and super-genius on coyote), but these also depend on your system.

So, on any given cluster, there might be different queues (i.e. short, long, ultra-long) that you want to submit your jobs to. To find out (if you don't know already) you can qstat and the queue names will be the last column. It might be obvious about what each of these queues mean from the name and the amount of times thing have run in each queu (short < 12 hours, ultra-long > 2500 hours), but this is likely just something you need to find out from someone who knows about the cluster or from the sys admin again. Then if you want to submit to a specific queue, use something like this (I think, but I actually haven't done it exactly like this):

qsub -q short ....

Shortcut for ssh'ing

Scott also told me about how to set up your computer to automatically fill-in ssh information so you don't have to type it in each time.You have a folder  ~/.ssh/ and file ~/.ssh/config which should be modified to contain each of the following for each host

Host aces

   Hostname login.acesgrid.org

   Username spacocha

Then each time you want to ssh just type:

ssh aces

Works for scp too (and presumably other things).

Downloading directly to the clusters

You can get stuff from a website using wget, for example:

wget https://github.com/swo/lake_matlab_sens/archive/master.zip

 

Running something on a detatched screen:

use screen. This will help you figure stuff out:

screen -man

or

screen --help

This starts screen:

screen -S SPPtest

This detaches but keeps it running:

hold "control" and "A" keys then type "D"

To reattach to detached screen:

screen -R  SPPtest

To get rid of the screen altogether type this from within a screen:

exit