There is no single repository for data related to the Athena Computing Environment. We have been collecting usage data for quite sometime, but it has not found its way to the appropriate audience. There is also a desire to know not just how many users are using Athena, but what they are using it for. While we can produce numerous anecdotes, the plural of anecdote is not data. Any data collected needs to be presented in an easy accessible format that accurately represents how the MIT Community uses Athena.
We will need to collect the following data:
We currently receive a monthly count of a the number of active machines, broken down by platform type. This is posted monthly in sysd_stats
.
We currently receive a count of the number of logins, broken down by quickstations, cluster machines, other machines, and dialups. Unique login counts are also available. This is posted weekly in sysd_stats
.
We also have data from a script which polls machines via busyd every 5 minutes and thus gets login session duration (accurate to +/- 5 minutes). That data lives on skywest
in a SQL db.
For slw-wrapped applications, we have access to logs which tell us when an application was used, for how long, and the hostname of the machine on which it was run. The hostname of the machine can be used to determine whether or not it's a cluster machine. We know that certain wrapped applications provide incorrect duration information, but we can take that into account when compiling the data. Even without duration information, unique launches are still relevant.
For locally-installation applications, we have no technical solution in place. Work is ongoing (http://debathena.mit.edu/trac/ticket/340) to gather information on the applications used during the login session, but there are privacy concerns to be addressed.
Note that usage statistics will be skewed once fall term starts and Debathena is used in earnest. For example, users launching OpenOffice from the panel or by typing
ooffice
will get the locally installed version, and users explicitly using athrun
or add -f
will get the wrapped version in AFS.
While data at the weekly level may be relevant, there are enough fluctuations that data should likely be presented at the monthly level, possibly with quarterly summaries. We should keep in mind that usage declines somewhat during IAP, and declines significantly in the Summer.