Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Learn python

  • Learn Eclipse

  • Learn the basics of Xml. Try this tutorial or google Xml tutorial.

  • Learn about the RAPID project (see overview)

  • Learn the OODT python module

  • Solve a test problem using OODT

  • Possibly compare to solving the same problem without OODT

...

  • I installed OODT as best I could on your local machines under /home/pre-col[1,3]/brideout/OODT/OODT-0.5. You should have write permission for that entire directory, because you will ultimately be testing you code by adding it there.

To test the installation, go to http://localhost:8080/my-curator on your browser.

Starting up OODT if you reboot

  • cd /home/pre-col[1,3]/brideout/OODT/OODT-0.5/apache-tomcat-6.0.37/bin

  • ./startup.sh

Projects

Your projects will be to add new metadata parsers to the web example. Basically you will be writing python scripts that get metadata from files, and convert it into an xml file, just like in the CAS Curation Guide. You can test it outside OODT by giving your script one argument - the filename to parse. But ultimately the goal is to make your code run inside OODT, just like the mp3 example.

Your python script will always take one argument: the full path to the input file. It will always write to an output xml file called the same name as the input file + .met .

Here's the xml produced by the example for an mp3 file:

...

Note that the mp3 file was: /home/pre-col1/brideout/OODT/OODT-0.5/staging/products/mp3/Bach-SuiteNo2.mp3

FileLocation is the directory the file is stored in, and Filename is the base name of the file. Use the python module os.path to get the directory name and base name of a file.

Note also that the character / was replaced with %2F in the xml file above. This is to make the text compatible with xml, because / has a special meaning in xml. Use the python replace method to do this in your code.

...

Lines either begin with a #, in which case they are ignored, or with a name such as MADSERVERCGIABS. That name becomes a new key, and the part after the = is the val. You can use the python ConfigParser module to parse the file.

For the output xml file, the key will be the section name and the key name separated by a colon, for example: Madrigal:madserverdocabs. Here's the expected xml file this example should produce id it was called test.ini:

...

In this project you will open a Madrigal Hdf5 data file. You will dynamically determine the the parameters it contains, and the date range of the data. You will use the python h5py module to get information from the binary Hdf5 file. You can download an example Hdf5 file here. If you want to browse inside the Hdf5 file, run /home/pre-col[1,3]/brideout/bin/hdfview, and then open the Hdf5 file.

The key/values pairs you be using will be:

  • parameters: a comma separated list of the parameters in the Hdf5 file

  • startDate: a time in the form "2013-01-12 00:005:59"

  • endDate: a time in the same form

...