Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: reworked opening section

Table of Contents

Reviewing In reviewing for restrictions in digital material is mostly the same as reviewing physical material you should take a similar approach as the general review guidelines except there is software that we use to search the files for restrictions as well. It is important to identify any restricted content as it determines where we store the content in digital preservation storage and what level of security is applied. If the amount of files under review are very small you might just stick to reviewing the folder and filenames and possibly looking at the files without running the tools below.

During the digital transfer workflow when using CCA tools, a tool is can be run in the background called bulk_extractor that will look for PII, social security numbers, and keywords related to MIT restrictions within the text of files in the transfer. This does not work on audio, video, images, and some types of files (PDFs of scanned files, etc.). You can review the output of this tool using the Bulk Reviewer software (see Bulk Reviewer (pre-existing reports) section below). You can also run bulk_extractor through the Bulk Reviewer below if you did use one of the CCA tools or chose not to create reports at time of transfer, as outlined in the creating and reviewing Creating new reports section.

Bulk Reviewer all requires use of BitCurator, a customized Linux operating system, that we run on a virtual machine through Windows. To access do the following steps:

...

  1. The documentation goes over how to review the report. In general, you want to go through the different results looking at their context. Open the files if necessary for further context.
  2. Those files that are false positives should be marked as dismissed.
  3. If there are positive matches that need restriction, add a note using the pencil button next to the feature, indicating the type of restriction and a short reason based on the restriction categories, such as "R-75, student health information."
  4. If you can't complete review of the report immediately, click the Save button to save your progress and return later.
  5. When complete, choose the Download CSV option and save it as bulkreviewer.csv in the submissionDocumentation folder.

    Info

    This file lists the items found and what was and what wasn't dismissed as well as the note added explaining a restriction. The csv could be used for further description later.

  6. If you encountered files needing restriction, you should also click the "Download tar exclude file" with the default name the system generates and add that the submissionDocumentation folder as well.

    Info
    This file will allow us the means for excluding the restricted files in bulk in the future when providing access to researchers.
  7. When done there are a couple options depending on the situation:
    1. Continue onto the next group of material with a bulk_extractor report if there are additional ones for the accession.
    2. If completely finished, proceed to the Preparing for Archivematica steps.

...

  1. Click on the Bulk Reviewer shortcut on the BitCurator Desktop.
  2. Select "Scan new directory or disk image" button
  3. For the name, use the package name, such as 2020_029acc or if there individual packages from one accession with their own reports, use the object id such as 2020_029_001.
  4. Select the source as Directory or DIsk Image (depending on the type of files being analyzed) from the drop down menu
  5. Click the "Choose directory" button and choose the directory containing the files or disk image.
  6. Under "Regular Expressions File" click the "Choose file" button and select the text file called "InstArch_regex.txt" from the top-level of the BitCurator shared folder. This file performs regular expression keyword matching based on things related to our restriction categories, which enhances the quality of the searching.
  7. Under Social Security Identification Mode, choose Medium.
  8. Do not select either of the scanner options
  9. Click Start scan.
  10. This may take a little while, once it is complete, go to the reviewing the report section (link) above.
  11. Once done with reviewing for restrictions, you can move the bulk_extractor reports directory generated by Bulk reviwer to  Reviewer to the submissionDocumentation folder. These reports will be in the "/home/Bulk Reviewerbulk-reviewer" directory of BitCurator in a directory subdirectory with the name you gave when creating the reports in Bulk Reviewer (i.e. the transfer package name such as 2020_029acc or 2020_029_001).