You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Reviewing for restrictions in digital material is mostly the same as reviewing physical material except there is software that we use to search the files for restrictions as well.

Reports from the transfer workflow

During the digital transfer workflow when using CCA tools, a tool is run in the background called bulk_extractor that will look for PII, social security numbers, and keywords related to MIT restrictions within the text of files in the transfer. This does not work on audio, video, images, and some types of files (PDFs of scanned files, etc.). You can review the output of this tool using the Bulk Reviewer software (see section below).

Bulk Reviewer all requires use of BitCurator, a customized Linux operating system, that we run on a virtual machine through Windows. To access do the following steps:

  1. Click on the Oracle Virtual Box shortcut on the Windows desktop screen

  2. Virtual Box will open, click on the BitCurator with the latest version number after it

  3. Wait for the operating system to load. Once you see a screen with a blue background, BitCurator is ready to use.

Bulk Reviewer (pre-existing reports)

Opening the reports

  1. Click on the Bulk Reviewer shortcut on the BitCurator Desktop.
  2. Select "Scan new directory or disk image" button
  3. For the name, use the package name, such as 2020_029acc or if there individual packages from one accession with their own reports, use the object id such as 2020_029_001.
  4. Select the source as Directory from the drop down menu
  5. Click the Choose directory button and choose the directory containing the files
  6. Click the "Use existing bulk_extractor reports" box
  7. Click the Choose directory button and select the directory containing the bulk_extractor reports. This will be within the package under /metdata/submissionDocumentation/bulk_extractor
  8. The other options can be ignored and you can now click Start scan (which in this case means, load the report).

Reviewing the report

  1. The documentation goes over how to review the report. In general you want to go through the different results looking at their context. Open the files if necessary for further context.
  2. Those files that are false positives should be marked as dismissed.
  3. If there are positive that need restriction, add a note using the pencil button, indicating the type of restriction and a short reason, such as "R-75, student health information."
  4. If you can't complete review of the report immediately, click the Save button to save your progress and return later.
  5. When complete, choose the Download CSV option and save it as bulkreviewer.csv in the submissionDocumentation folder.

    This file lists the items found and what was dismissed and what wasn't as well as the note added explaining a restriction. The csv could be used for further description later.

  6. If you encountered files needing restriction, you should also click the Download tar exclude file with the default name the system generates and add that the submissionDocumentation folder as well.

    This file will allow us the means for excluding the restricted files in bulk in the future when providing access to researchers.
  7. When done there are a couple options depending on the situation:
    1. Continue onto the next group of material with a bulk_extractor report if there are additional ones for the accession.
    2. If completely finished, proceed to the Preparing for Archivematica steps.

Archivematica

In addition to bulk reviewer, you can look choose the "examine contents" option when transfer the material through Archivematica. This won't cover as much mainly surfacing credit card numbers, social security numbers, telephone numbers, and email addresses through their interface. More information can be found in the appraisal tab section of the Archivematica documentation. (link)

Creating and reviewing new reports

If you did not use one of the CCA tools, you can create reports for looking for restricted materials by running reports directly through Bulk Reviewer.

Bulk Reviewer (new reports)

  • No labels