If you performed physical imaging (in other words, created a disk image), in order to extract the files from the disk images, there are a few tools that we can use. The main tool for extracting disk images is CCA disk processor. HFS Explorer is used as backup or when confirming you created an image correctly. HxC Floppy Emulator is primarily a backup for extracting floppy disk images. If all of these fail, consult with the digital archivist to try other methods.

All of these tools except HxCFloppy Emulator run on BitCurator, a customized Linux operating system, that we run on a virtual machine through Windows. To access do the following steps:

  1. Click on the Oracle Virtual Box shortcut on the Windows desktop screen

  2. Virtual Box will open, click on the BitCurator with the latest version number after it

  3. Wait for the operating system to load. Once you see a screen with a blue background, BitCurator is ready to use.

CCA disk processor

The main tool for extracting disk images is CCA disk proccessor. It is used on all disk images when possible. This will extract the files and you can choose whether you want to retain the disk image or not. It will also create a number of reports that will also be used in surveying the collection.Archivematica is not the best for extracting and analyzing files within a disk image. We also want to know something about the files we are keeping and disk images can be opaque. For those reasons we will do some processing prior to Archivematica. The Processing tab of this tool also has the added advantage of creating a package that Archivematica can understand.

Processing tab

  1. Open the “CCA tools” folder on the BitCurator desktop screen and click the Click the “Disk Processor” icon.
  2. Choose the processing tab.

  3. For the Source, select the folder where disk image(s) are stored.

  4. For Destination create a folder within the folder where the disk image(s) are stored and name it the same as the parent folder. You can then move the folder once the package is created.

  5. Unless you have additional information to add to the metadata/submissionDocumentation folder, select the bag SIPs option.
  6. If you believe that you may not end up keeping the disk image, choose the “Make SIPs from carved files only (no disk image) option.
  7. Do not select the run bulk_extractor option. (This looks for PII and other restrictions, we will do this later with another tool.).
  8. Click the Start processing button.
  9. If you did not select the bagging option, add your submissionDocumentation to the metadata/submissionDocumentation folder within the appropriate SIP.
  10. If you have not already looked at the reports generated (i.e. did not use the analysis tab before), proceed to the Appraisal section for help with appraisal of the items.
  11. If you have already appraised, proceed to the access restrictions section.

Analysis tab

The actions performed by the analysis tab allows you to analyze your files to better understand what the contents are and for appraisal with the same reports as the Processing tab. In general we do not use the Analysis tab and use the output from the Processing tab for both analysis and extraction (re-running it if necessary)

HFS Explorer

Allows you to mount and export files from HFS disks (older Macs) individually. This is a back up to CCA tools disk processor or if you want to check to see if disk imaging was successful. See the documentation the BitCurator Consortium's wiki for how to use the tool. The ability to open the disk image and export a file is sufficient to determine if imaging was successful.

  1. If you are running this tool to extract the files, go to the CCA tools SIP creator section to run that on the extract files and the disk image (if you want to keep it). Once that is complete you will proceed to the appraisal section or the reviewing for restrictions section (if you have reason to believe there could be some).

HxC Floppy Emulator

The HxC Floppy Emulator is a tool that allows for opening of floppy disk images, browsing some types, exporting files, and convert disk image formats. It also allows you to visual the tracks of a disk image, to see where the errors/data are. If CCA disk processor has failed or you have .raw disk images that you couldn't extract files from, you can load the raw files and try to extract files created on various DOS systems.

  1. In Windows, click the HxC Floppy Emulator on the desktop.
  2. First click load and select the image from its file location. 
  3. Then select disk Browser to view the contents of a DOS formatted disk.
  4. You can then use Ctrl-A to select all the files and click the "Get Files" button to extract the files to your processing folder.

    Note

    More advanced users may also try the “Track Analyzer” option, which will allow you to look at a graphical representation of the tracks and the disk as well as a hex editor view of the contents. You can select track mode, which will show the individual tracks close up or disk mode, which will show you  an overall view of a disk.

  5. If you are running this tool to extract the files, go to the CCA tools SIP creator section to run that on the extracted files and the disk image (if you want to keep it). Once that is complete you will proceed to the appraisal section or the reviewing for restrictions section (if you have reason to believe there could be some).

itstar

itstar is a tool for extracting files from ITS backup tape images. This tool is used with the Tapes of Tech Square ToTS collection to extract files.

Background

 Tapes are structured into "files", and the files are divided into "records".  The .tap files preserve this structure with some metadata, which encode the size for each record.

 Each record is stored like this inside the .tap file:

   <record size, four bytes>

   <data, "size" number of bytes>

   <record size again, four bytes> 

 A tape "file" is a series of records, ending with an "tape mark".  The mark is encoded as an empty record, which is just <record size = 0>.

After this comes the next file.  A tape is ended with two consecutive tape marks.  (BUT! this is just a software convention; there may be more data after this.)

Note, a "tape file" doesn't have to correspond to the files we see stored on the tape.  An operating system or backup software can read and write tape files and records any way it sees fit.  For example, a tape could be a single "tape file" but store several "software files".  Or it could be many "tape files" but store a single "software file".  ITS does store its "software files" as "tape files", with records limited to 1024 36-bit words, which are either 5120 (9-track) or 6144 (7-track) bytes.

Running the tool

  1. Open up the terminal in BitCurator 
  2. Navigate to the folder containing the itstar program

  3. Construct the command you'll use to extract the files.
    1. The first time(s) we run the command we will use the "t" option to only print to the terminal what files would be extracted by the command. This will help when determining the other options to use.

    2. "f" incicates we are using a tape image file, not reading from a tape itself.

    3. The .tap files are not all encoded the same way, and there's no telling how a particular file is encoded except either 1) look at the byte codes inside, or 2) trial and error.  We will use trial and error first (with the "t" option passed for print only).

      1. -E is for using the E-11 tape image format or even-sized tape records

      2. Big or little endian record size. Detault is little, pass -B for big.

      3. 9 or 7 track tapes.  9-track stores eight bits per byte.  7-track store 6 bits plus parity.  Default is 9, pass -7 for the latter.

      4. Two of the most common are "-7B": 7-track tape with big endian record sizes. Or just -E for 9-track little endian with an even-sized tape record.

        # Basic elements of the command
        $ itstar -t[tape encoding options] -C /path/where/you/want/the/extracted/files/to/go -f /path/to/tape/image.tap
        # Example of 7-track tape with big endian record sizes
        $ itstar -t7B -C /path/where/you/want/the/extracted/files/to/go -f /path/to/tape/image.tap
        # 9-track little endian with an even-sized tape record
        $ itstar -tE -C /path/where/you/want/the/extracted/files/to/go -f /path/to/tape/image.tap
      5. Replace the t in the command with x in order to extract the files. For instance: "-xE" instead of "-tE"

tapeutils

tapeutils is a set of tools for creating and working with various types of tape images.

read20

We use the read20 program within tapeutils for extracting files from the TOPS-20 tape images that are found within the ToTS dataset.

  1. Open up the terminal in BitCurator 
  2. Navigate to the folder where you want the extracted files to end up (you need to b

  3. Construct the command you'll use to extract the files.
    1. The first time(s) we run the command we will use the "t" option to only print to the terminal what files would be extracted by the command to make sure everything looks correct. 
    2. The "b" option is then use to force the extraction of all files.
    3. The final option is the "f" to specify the image file.

      #Elements of the command
      /path/to/tapeutils/read20 -t -b -f /path/to/tape/image.tap


    4. Replace the t in the command with an x in order to extract the file