Accessing the Web with Python

2016, Feb 23    

Introduction

As a sophomore it is difficult to get into all of the classes you want. Registration time is determined by class standing so all of the interesting classes are filled by the time sophomores are able to register. Most classes have a waitlist, but it does not open until weeks after registration opens. During the initial phase of registration students will enroll in a class and then later drop the class. Instead of having to check the catalog every five minutes to see if a spot has become available I wrote a program to check for me. The program goes to the course catalog and send me an email alerting me when someone drops a class.

Requirements of the Project

  • Given a list of classes the program will check the number of available spots
  • If a spot becomes available, then the program will send an email notification.
  • The program should be able to run 24/7.

Design

The first step was to create a program that would be able to fetch the correct webpage from Oregon State University’s course catalog. Each course has its own webpage with a table of class times and availability. I started by researching a Python library that would be able to fetch HTML code from the internet. The library I found was urllib2. On the course page there are many inputs that allow a user to filter the information being displayed. The form used GET requests, so I could append data fields to the URL. This was beneficial because less data had to be downloaded and the relevant information could be found faster.

Once the HTML file was downloaded the program searched for different HTML tags to identify the position of the needed information. The table that contains class information had a unique ID that was constant between all courses. The start of the table was easy to find because of the ID. Once all of the HTML elements except for the table was removed, I could use the table row <tr> and table data <td> tags to parse all of the data. The program then stored all of the relevant info.

example catalog pgae

Figure: The webpage for GEO 333. Each section has a unique CRN.

The next step for the program was to be able to remember the number of available seats from the last time it ran. I thought the best solution would be to create a text document for each class that is located in the same folder as the program. The program would be able to read the number of seats available and compare it to the value the program got from the current webpage.

If the program determined that a spot had opened in a class, it needed a way to communicate that information to me. I am not always at my computer, but I usually have my phone with me. I used the MIMEText library to send an email to myself. This was an efficient way to send the notification because I can receive email on my phone, and it is free to use the library.

program flow diagram

Figure: Flow chart of the program

Once it was verified that the program could read and store the number of available seats I moved to the next step. As a student I was given space on the school’s server. I set up Crontab on the server to run the Python program every 5 minutes. Testing was difficult with this project because it was hard to create unit test to know it would work in different scenarios. I could not control when a class would fill up or when someone would drop the class. To test for these scenarios, I had to check the catalog and compare the expected result with what the program was returning.

Final Thoughts

The next steps for the project would be to modify the code so the program will send text messages instead of emails. This would give a more immediate notification.

https://github.com/mdennis070/python_web_search