Queue system administration
To do list
The following is a list of items requiring attention, in order of priority:
- Reinstall OS on Pegasus.
- Measure usage on each machine using PBS/Torque accounting tools.
- Clean up our junk in NE47-181 and E40-008: throw away boxes, remove old Cyrus2 cluster.
- Find better way to manage user data after they leave the group
Backup services
Backup service for our four clusters is provided by MIT TSM. This comes at a cost of $65 per month per system.
Restore procedures
There are two cases for restoring the backed up data:
- When the cluster is accessible and the data is accidentally deleted. The lost data is to be restored to the same location. TSM backup server works best in this case.
- When the cluster is inaccessible and the data is to be restored to a new location i.e a drive connected to your computer.
TSM works best for the first case. TSM software for linux is already installed on our clusters.
For case 1, follow the following steps:
First, establish connection to the TSM server.
$ sudo dsmc
Next, you can see the version of the file stored on the TSM server.
$ query backup /path/filename
You can restore the file to its original location, or restore the file to a new location. For restoring folders with subdirectories use option --sub=yes.
$ restore backup /path/filename OR $ restore backup “/path/filename” /newpath/newfilename
For case 2, the procedure is little involved. There are few important things to take into account. The exact procedure is included below:
1. The backup can only be restored using a linux machine. This is because the downloading machine has to masquerade as the original cluster in order to retrieve data. All our clusters use some form of linux. Therefore, a linux machine is required to retrieve data from TSM server.
2. The filesystem of the disc on which you are writing retrieved data should be exactly similar to the filesystem used on the cluster. For example, if the files on the cluster are written on a drive with xfs file system, you have to use a disc on the third machine, which is also formatted as xfs.
3. TSM software for linux can be downloaded from the IS&T website. The installation procedure for a new Ubuntu version is described here.
4. The older version of the software is written for RHL5 and is available as an rpm. Install the "ksh" package and the "alien" package. ksh is needed since several of the scripts included with TSM use ksh. More important is "alien" as, this allows users to install RPM packages on Ubuntu or other Debian-based distributions.
$ sudo apt-get install ksh alien
The next step is to use alien to install the appropriate RPMs:
$ sudo alien -i --scripts TIVsm-API.i386.rpm TIVsm-BA.i386.rpm
6. There are several other libraries which are required by TSM like libstdc++.so.5 etc. Download and install the required files from apt-get or some other source.
7. Change the Nodename, backup server and errorlog file location in dsm.sys. This file is located in the /opt/tivoli/tsm/client/ba/bin/ folder. Settings for each cluster are given on this page.
8. Follow the instructions from case 1 above to restore files to a new location. Look up the documentation pages for TSM commands like restart restore, cancel restore, etc.
Installing TSM backup software
The TSM 5.4 software has been installed in accordance with the instructions on the TSM page. There is a need to install older libraries, namely libstdc+.so.5. On Darius1 this was done as follows: the compat-libstdc+ package was downloaded from here and here, and then installed using the "yum localinstall" command:
sudo yum localinstall compat-libstdc++-33-3.2.3-61.x86_64.rpm sudo yum localinstall compat-libstdc++-33-3.2.3-61.i386.rpm sudo yum localinstall TIVsm-API.i386.rpm sudo yum localinstall TIVsm-BA.i386.rpm
The next steps are to edit the dsm.opt and dsm.sys files as described in the instructions. Those files include the default location for the backup logs:
/opt/tivoli/tsm/client/ba/bin/dsmsched.log and /opt/tivoli/tsm/client/ba/bin/dsmerror.log
Finally, running the dsmc program as root will let the user enter the initial password. Next, a line can be added to /etc/inittab to automatically start the dsmc scheduler; to initialize it after installing, the root user can simply execute the dsmc command with the "sched" argument:
# nohup /usr/bin/dsmc sched > /dev/null 2>&1 &
TSM registration information
The four clusters backed up with TSM have the following registration information. The TSM system automatically assigns an initial password (newpass), but according to the registration e-mail, this will be automatically changed to a new, encrypted password, and stored on the machine after the first connection to the TSM servers.
Darius2
Server: oc11-bk-ent-1.mit.edu
Nodename: DARIUS2.CSBI
Schedule: BUS-0700
Darius1
Server: backup-i.mit.edu
Nodename: DARIUS1.CSBI
Schedule: BUS-2400
Cyrus1
Server: backup-i.mit.edu
Nodename: CYRUS1.CSBI
Schedule: BUS-2400
Quantum2
Server: backup-i.mit.edu
Nodename: QUANTUM2
Schedule: BUS-2400