This is a list of steps (as of May 23, 2007) to transfer current IS&T web pages into Alfresco:

  •  wget http://web.mit.edu/ist/
  • Copy ist directory only to new ist project directory for processing, but keep other directories wget found for link management research (which extra-IS&T mit pages link into current IS&T pages)
  • Find and delete files with ? or &   in their names ( Alfresco chokes on these characters)
  • Delete "dontindex" folders ( probably not going to migrate these)
  • Search and replace internal links "/" for "http://web.mit.edu/ist/" (Alfresco repository requies links to be relative to ROOT for virtualization)
  • Search and replace "/" for "http://web.mit.edu/is/"   (alias folder to web.mit.edu/ist/ ,for historical reasons)
  • Create empty scr/templates/ branch template directories and add to ist folder ( take current ist directories and empty them, then copy into src/templates/whatever_template dir)
  • Run script (istSiteXmler.cgi) over site to: find html, htm, and shtml IS&T pages, create xml for each template and then place in appropriate src branch
  • Run further scripts (checkNoGoFurther.cgi and chkforReDirbluebox.cgi) to find: blue-box, redirects and archived files among the pages that wouldn't easily translate from previous step (these scripts will be folded into istSiteXmler shortly, which will eliminate this step)
  • Delete html associated with each xml file (this step may not be necessary in alfresco 2.0.1+)
  • Make sure appropriate templates are in the Alfresco Data Dictionary
  • War up and Bulk upload to the Alfresco (if creating new alfresco web-project) or CIFS transfer to Content Manager Sandbox (remember to submit to staging later.)
  • Use Steve's app to attach aspects to xml
  • Hand transfer remaining recalcitrant (not translated to xml by script) pages to appropriate template-xml ( easiest to use Alfresco x-forms interface for this, as it will result in properly escaped characters)
  • Use R button to regenerate html from xml  (by directory in scr branch)
  • Submit to Staging (if regenerated in Content Manager Sandbox)

istSiteXMLer.cgi can be found in SVN under aphickey user folder

A diagram describing this process can be found as an attachment to this document, or directly here: https://wikis.mit.edu/confluence/download/attachments/31137/MapOlists.pdf

  • No labels