Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0
Wiki Markup
h2. Queue system

...

To do list

The following is a list of items requiring attention, in order of priority:

  1. Reinstall OS on Pegasus.
  2. Measure usage on each machine using PBS/Torque accounting tools.
  3. Clean up our junk in NE47-181 and E40-008: throw away boxes, remove old Cyrus2 cluster.
  4. Find better way to manage user data after they leave the group

Backup services

Backup service for our four clusters is provided by MIT TSM.  This comes at a cost of $65 per month per system.

Restore procedures

There are two cases for restoring the backed up data:

  1. When the cluster is accessible and the data is accidentally deleted. The lost data is to be restored to the same location. TSM backup server works best in this case. 
  2. When the cluster is inaccessible and the data is to be restored to a new location i.e a drive connected to your computer.

TSM works best for the first case. TSM software for linux is already installed on our clusters. 

For case 1, follow the following steps:

First, establish connection to the TSM server.

Code Block

$ sudo dsmc

Next, you can see the version of the file stored on the TSM server.

Code Block
 administration


h2. To do list

The following is a list of items requiring attention, in order of priority:

# Reinstall OS on Pegasus.
# Measure usage on each machine using PBS/Torque accounting tools.
# Clean up our junk in NE47-181 and E40-008: throw away boxes, remove old Cyrus2 cluster.
# Find better way to manage user data after they leave the group

h2. Backup services

Backup service for our four clusters is provided by [MIT TSM|http://ist.mit.edu/services/backup/tsm].  This comes at a cost of $65 per month per system.


h3. Restore procedures

There are two cases for restoring the backed up data:

# When the cluster is accessible and the data is accidentally deleted. The lost data is to be restored to the same location. TSM backup server works best in this case. 
# When the cluster is inaccessible and the data is to be restored to a new location i.e a drive connected to your computer.


TSM works best for the first case. TSM software for linux is already installed on our clusters. 

For case 1, follow the following steps:

First, establish connection to the TSM server.
{code}
$ sudo dsmc
{code}
Next, you can see the version of the file stored on the TSM server.
{code}
$ query backup /path/filename
{code}
You can restore the file to its original location,

...

 or restore the file to a new location. For restoring folders with subdirectories use option \--sub=yes.

...


{code
}
$ restore backup /path/filename
OR
$ restore backup “/path/filename” /newpath/newfilename
{code}

For case 2, the procedure is little involved. There are few important things to take into account. The exact procedure is included below:

...



1. The backup can only be restored using a linux machine. This is because the downloading machine has to masquerade as the original cluster in order to retrieve data. All our clusters use some form of linux. Therefore, a linux machine is required to retrieve data from TSM server.

...



2. The filesystem of the disc on which you are writing retrieved data should be exactly similar to the filesystem used on the cluster. For example, if the files on the cluster are written on a drive with xfs file system, you have to use a disc on the third machine, which is also formatted as xfs.

...



3. TSM software for linux can be downloaded from the IS&T website.

...

  The installation procedure for a new Ubuntu version is described [here

...

4.  The older version of the software is written for RHL5 and is available as an rpm. Install the "ksh" package and the "alien" package. ksh is needed since several of the scripts included with TSM use ksh. More important is "alien" as, this allows users to install RPM packages on Ubuntu or other Debian-based distributions.

Code Block
|http://kb.mit.edu/confluence/display/istcontrib/TSM+for+32+or+64+bit+debathena+or+Ubuntu+-+Install%2C+Configure%2C+Set+Up+and+Confirm+the+Scheduler+for+TSM].

4.  The older version of the software is written for RHL5 and is available as an rpm. Install the "ksh" package and the "alien" package. ksh is needed since several of the scripts included with TSM use ksh. More important is "alien" as, this allows users to install RPM packages on Ubuntu or other Debian-based distributions.
{code}
$ sudo apt-get install ksh alien
{code}

The next step is to use alien to install the appropriate RPMs:

...


{code
}
$ sudo alien -i --scripts TIVsm-API.i386.rpm TIVsm-BA.i386.rpm
{code}

6. There are several other libraries which are required by TSM like libstdc++.so.5 etc. Download and install the required files from apt-get or some other source.

...



7. Change the Nodename, backup server and errorlog file location in dsm.sys. This file is located in the /opt/tivoli/tsm/client/ba/bin/ folder. Settings for each cluster are given on this page.

...



8. Follow the instructions from case 1 above to restore files to a new location. Look up the documentation pages for TSM commands like restart restore, cancel restore, etc

...

Installing TSM backup software

...

.

h3. *Installing TSM backup software*

The TSM 5.4 software has been installed in accordance with the instructions on the [TSM page

...

|http://ist.mit.edu/services/software/tsm/54/linux/setup-backups].  There is a need to install older libraries, namely libstdc\++.so.5.

...

  On Darius1 this was done as follows: the compat-libstdc++ package was downloaded [from

...

Code Block
 here|http://rpm.pbone.net/index.php3/stat/4/idpl/13943754/dir/centos_5/com/compat-libstdc++-33-3.2.3-61.x86_64.rpm.html] and [here|http://rpm.pbone.net/index.php3/stat/4/idpl/13943753/dir/centos_5/com/compat-libstdc++-33-3.2.3-61.i386.rpm.html], and then installed using the "yum localinstall" command:
{code}
sudo yum localinstall compat-libstdc++-33-3.2.3-61.x86_64.rpm
sudo yum localinstall compat-libstdc++-33-3.2.3-61.i386.rpm
sudo yum localinstall TIVsm-API.i386.rpm
sudo yum localinstall TIVsm-BA.i386.rpm
{code}

The next steps are to edit the dsm.opt and dsm.sys files as described in the instructions. Those files include the default location for the backup logs:

...


{code
}
/opt/tivoli/tsm/client/ba/bin/dsmsched.log and
/opt/tivoli/tsm/client/ba/bin/dsmerror.log
{code}

Finally, running the _dsmc_ program as root will let the user enter the initial password. Next, a line can be added to _/etc/inittab_ to automatically start the dsmc scheduler; to initialize it after installing, the root user can simply execute the dsmc command with the "sched" argument:

...


{code
}
# nohup /usr/bin/dsmc sched > /dev/null 2>&1 &

TSM registration information

The four clusters backed up with TSM have the following registration information.  The TSM system automatically assigns an initial password (newpass), but according to the registration e-mail, this will be automatically changed to a new, encrypted password, and stored on the machine after the first connection to the TSM servers.

...


{code}

{color:#003366}{*}TSM registration information{*}{color}

The four clusters backed up with TSM have the following registration information.  The TSM system automatically assigns an initial password (_newpass_), but according to the registration e-mail, this will be automatically changed to a new, encrypted password, and stored on the machine after the first connection to the TSM servers.

{color:#000000}{*}Darius2{*}{color}
{color:#000000}Server: oc11-bk-ent-1.mit.edu

...

{color}
{color:#000000}Nodename:

...

 DARIUS2.CSBI

...

{color}
{color:#000000}Schedule:

...

 BUS-0700

...

{color}

*Darius1*
Server: backup-i.mit.edu

...


Nodename:

...

 DARIUS1.CSBI

...


Schedule:

...

 BUS-2400

...



*Cyrus1*
Server:

...

 backup-i.mit.edu

...


Nodename:

...

 CYRUS1.CSBI

...


Schedule:

...

 BUS-2400

...



*Quantum2*
Server:

...

 backup-i.mit.edu

...


Nodename:

...

 QUANTUM2
Schedule:

...

 BUS-2400


h3.