Learning and using python

You will first need to make sure python is installed on your computer. If it is not, go the the python download page to install python.

Here is a simple example of python that demonstrates lists and loops:

"""This is example python code that demonstrates loops and lists.

The idea is to output a simple version of the story "Goldilocks and the Three Bears" using
lists and loops.
"""

characterList = [('Goldilocks', 'tried'), ('The three bears', 'looked at')]
roomList = [('living room', 'rocking chair'),
            ('kitchen', 'porridge'),
            ('bedroom', 'bed')]
sizeList = ['biggest', 'medium sized', 'smallest']


# loop through each character
for character, action in characterList:
    # loop through each room
    for room, item in roomList:
        # loop through each size
        print('%s went into the %s.' % (character, room))
        for size in sizeList:
            print('%s %s the %s %s.' % (character, action, size, item))

 

The first step in learning python is to go through the python tutorial, found on the main python website.

The next step is to read through the Millstone Hill python coding conventions.

The next step is to get familiar with Eclipse (type eclipse & at the command line). Go through the Workbench Users Guide under Help. The Eclipse program is designed to help you code in a number of languages. You'll be coding in python using an extension to Eclipse called PyDev. Go to http://pydev.sourceforge.net/ for an introduction, and also check out the demo videos at the bottom of the page.

Finally, code the following exercises:

Exercise 1: Write a simple python script to parse text data file

In this exercise you will write a simple python script to parse a text file containing columns of numeric data. There will be no arguments - everything will be hard-coded. First, save the text file at http://www.haystack.mit.edu/~brideout/test.txt to your local computer. Then write a script that opens that file, and reads all its lines. Have your script simply count how many lines contain just five floating point numbers. It should end by printing that number.

Exercise 2: Work with dictionaries

Repeat Exercise 1, but this time create a dictionary using all the lines that have five floats in it. The key to the dictionary should be the line number in the file, and the value should be a list of five floats. Hint: Use "enumerate" to loop through the lines in the file to get both the lines and the line numbers at the same time. Print the dictionary at the end of the script.

Exercise 3: Work with classes

This will be similar to the exercise above, except this time you will create a class for each line in the file. Call your class "DataLine", and make if have two attributes: lineNumber and floatList. As you parse each line, create a new DataLine class, and set lineNumber to its line number and floatList to the list of five floats. Then add each new DataLine object to a list. At the end of the script, print the list of DataLine objects.

Then improve your class by giving it a __str__ method that prints your class in a better way than what happened above. You will need to use format characters like %s. At the end of the script, print the list of DataLine objects - they should look better now.

Finally, add a __cmp__ method to your class that compares two objects. Objects should be compared based on the fifth number in the list. Then, before you print your list, sort it. When you print it, make sure the fifth number in each object gets bigger.

Exercise 4: Parse and analyze a text data file

In this exercise you will create a python module to parse a text file containing columns of numeric data. This class's methods will analyze the file's data in various ways. You will then write a second test script that calls that class's methods. Be sure to follow the recommended documentation standard in your code. If you get stuck, ask for help from a mentor - don't spend too long spinning your wheels.

  1. Create a python file (module) called DataAnalysis.py using Eclipse.

  2. Create a class called TextDataAnalysis in DataAnalysis.py. The init method will take two arguments: a file path, and a delimiter string. The delimiter is the type of character that separates columns (such as space or comma). The init method will parse the text file, and store the result in a python dictionary. The key to the dictionary will be the column number, and the value will be a list of floats. Any line in the text file that begins with a non-number should be ignored. The class should dynamically figure out the number of columns in the text file. If a line begins with a float, but a later column contains a non-float, just that column should be skipped. The rest of the columns in that line should still be included if possible.

  3. Write a method in the TextDataAnalysis class called sort. The one input argument is the column number, and it returns a sorted list of floats found in that column.

  4. Write method in the TextDataAnalysis class called average and median. For each the one input argument is the column number, and each one outputs a scalar (either the average or median of that column's values).

  5. Write a separate python script called testDataAnalysis.py. It should parse the file /home/jupiter/brideout/public_html/test.txt, which is space delimited. It should then print out the results of each class method for column 3. It should then do the same for the file /home/jupiter/brideout/public_html/test2.txt.

  6. Modify TextDataAnalysis so that is accepts either urls or file paths. Rerun testDataAnalysis.py for the urls http://www.haystack.mit.edu/~brideout/test.txt and http://www.haystack.mit.edu/~brideout/test2.txt.

  7. Go over final code with Bill Rideout.

  • No labels