You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

A. Web Logs from Itinfo

The itinfo web pages have the MIT counter url in them, but that counter truncates anything after the '?' in the url, and that is where the identifying page information is.  So we can't use the counter info to get topic info.  Happily, Itinfo.mit.edu runs its own apache web server, and thus generates httpd log files.  We scissors these apart into monthly chunks, and run it through a web log analyzer to get hits by url.  Hits by url is fed to a topics-from-urls spreadsheet engine that assigns a topic keyword to each url and then adds this hits per keyword.   The urls we look at are only from users with an 18...* address -- I don't want to have to weed out the lurkers from overseas and the search engines, which generate huge numbers of hits.  By keeping the data set campus-only we are likely to keep year-to-year comparisons as meaningful as possible.

  1. Use SecureCRT to telnet to itinfo.mit.edu; log in as 'root'.
  2. cd /var/log/httpd
  3. grep Jul/2006 access_log > itinfo-2006-07.txt    # (this is by way of example obviously)

Copies of these log files are just left on the server as disk space there doesn't seem to be an issue.  (Log files are runnng about 100 M each right now.  Compress should probably be applied tot the older ones.)
Transfer these to the PC where the web log analyzer is using SecureFX.  The analyzer we currently use is WebLogExpert

WebLog Expert is set up to look at log files in this directory:

  • c:\projects\dashboard\publishing\itinfo-web-logs\fy2007
  • No labels