Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

A. Web Logs from Itinfo

The itinfo web pages have the MIT counter url in them, but that counter truncates anything after the '?' in the url, and that is where the identifying page information is.  So we can't use the counter info to get topic info.  Happily, Itinfo.mit.edu runs its own apache web server, and thus generates httpd log files.  We scissors these apart into monthly chunks, and run it through a web log analyzer to get hits by url.  Hits by url is fed to a topics-from-urls spreadsheet engine that assigns a topic keyword to each url and then adds this hits per keyword.   The urls we look at are only from users with an 18...* address -- I don't want to have to weed out the lurkers from overseas and the search engines, which generate huge numbers of hits.  By keeping the data set campus-only we are likely to keep year-to-year comparisons as meaningful as possible.

...

  • c:\projects\dashboard\publishing\itinfo-web-logs\fy2007

The log file analyzing script has some settings that are important:

  • look at addresses in 18...* only.
  • show 10000 pages in the "most popular" section of the report, and sort them from most to least.  (10000 may be the max allowed)
  • export the report in CSV format, for ready inclusion in the topic-tagging spreadsheet engine.