See File Naming Scheme - MIT Libraries for a complete version of these guidelines.

Guidelines for filenames

  1. When creating filename standards for a new collection, the standards should be based on existing collections/objects with similar characteristics.
  2. Portions of the filename should indicate more specific detail as they are read from left to right. That is, the far left portion of the filename should indicate the collection name or home library of the item, the next portion should indicate the subcollection or aggregation, followed by a piece (page/section) number, and ending with the indication of derivative size. (Any of these portions that do not apply to the current file may be omitted.)  See examples below.
  3. Whenever possible, the digital object's "primary" identifier (the identifier appearing in the filenames) should correspond to an identifier in use for the original (physical) object, such as the official or unofficial collection name or Archives collection number. If the format of the primary identifier conflicts with the absolute filename requirements, appropriate changes should be made. If the format of the primary identifier conforms to the absolute filename requirements but violates best practices, it may be left intact.
  4. Page numbers should be padded with leading zeros so that all filenames in a collection have the same number of characters for the page number portion. In most cases, this will be two or three digits.  When determining the number of characters, consider how the collection might grow and the number of loose pieces, foldouts, or other physical elements that may amount to more than one image per page.
  5. Filenames must not include spaces.
  6. The first character of the filename must be an ASCII letter ('a' through 'z' or 'A' through 'Z').
  7. The filename may include only ASCII letters ('a' through 'z' and 'A' through 'Z'), ASCII digits ('0' through '9'), hyphens, underscores, and periods. No other characters are permitted.  
    1. While periods are permissible in filenames, it is highly recommended that they be avoided.
    2. It is preferable that all letters in a filename be lowercase. If a filename includes consecutive human-readable words, they may be denoted by CamelCase (e.g., wnp-04-RoyalSociety-ncn-t123.tif). This is expected to be relatively rare, though.
    3. Distinct portions of the filename should be separated by underscores.  Note that it is reasonable for the "identifier" portion of the filename to retain hyphens in identifiers from external sources, as in ihs-SHMU_01_13-01-05.tif.
  8. File names should be limited to 31 characters or fewer (including the period and file extension).  Total path length (directories + file name) should not exceed 256 characters.
  9. The filename must be followed by a single period and a suitable extension to specify the type of file. The extension should consist of three letters (e.g., jpg, txt, xml, tif), but longer extensions are permissible if they are widely used (e.g., html, tiff, djvu, aiff).
  10. A derivative file must have the same name as the master file, except the filename should have an indication of the derivative's type appended (e.g., "full" or "screen" for images, an indication of the bitrate for audio files). Derivative files will typically have a different file type, and therefore a different extension, than the master file.  Derivative names are based on Stellar: cp=class projection (ideal for Flickr), sv="screen view" (viewing and printing), tm="thumbnail"
  11. For derivative files intended primarily for Web display, one consideration for naming is that images may need to be cited by users in order to retrieve other higher-quality versions.  If so, the derivative file name should contain enough descriptive or numerical meaning to allow for easy retrieval of the original or other digital versions.
  12. Directory (folder) names should not include periods.
  13. When vendor is sending files on hard disk, the top level folder should be an MIT-generated shipment number.  The next level should be the collection number or Aleph system number, with volume number if applicable.  This folder will contain the image files, with derivatives interfiled.  The vendor will also provide a checksum.md5 file.
  14. Document the file naming scheme for each project (in a Word document or on a project wiki) and explain the decisions that were made.  If one of these Guidelines is not followed, the change should be well documented, with a description of the reasons for not following the guideline.

Examples

Collection

Aggregate

Piece

 

Resulting File Name

Notes

Archives: Edgerton Collection (MC0025)

notebook

Sequential image #

 

MC025_nb41-mf_017.tif

MC025 is the collection number for the Edgerton Collection in the Institute Archives.  nb stands for notebook; nb41-mf stands for the microfilm (mf) of notebook #41.  017 stands for the image of the 17th whole frame on the microfilm. (which may or may not be exactly page #17). 

Archives: Edgerton Collection (MC0025)

notebook

derivative version of image file

 

MC025_nb41-mf_017-tm.jpg

As above, except this is a derivative file.  017-tm stands for the thumbnail (tm) of the 17th image from the microfilm frames.

Generic Archives collection using box and folder numbers

Box, then folder, then item

image

 

MC###_b06_f021_003_003.tif

MC### stands for the collection number.  b06 stands for box #6.  f021 stands for folder #21.  003 stands for the third item in folder #21.  The last 003 stands for the page/image number from the third item in folder #21

Rotch and Archives "Perceptual Form of the City" (PFC)

City

Image

 

PFC_Boston_123456.tif

PFC stands for Perceptual Form of the City (collection name).  Boston differentiates this subcollection from images from New York City and other locations.  123456 is the IRIS image number.

  Multi-volume item from general collection

Title, then volume

image


barker_TrAmSocStTr_v001_0001.tif

barker is the home library collection for this item.  TrAmSocStTr is an abbreviation of the title: Transactions of the American Society for Steel Treating.  v001 stands for volume 1, and 0001 stands for the first image of this volume.  The Aleph system number (00291693) would be in the metadata.


 





 

   

  • No labels