Information gathered July 2008:
BCR-CDP's Digital Imaging Best Practices, version 2.0
http://www.bcr.org/cdp/best/index.html
University of Michigan's Digital Image Specifications.
http://www.lib.umich.edu/lit/dlps/dcs/UMichDigitizationSpecifications20070501.pdf
- (Yale) use the bibliographic record identification number for the specific resource to be scanned that is found in the MARC record for the title in your OPAC. Most of the materials that we are digitally reformatting are cataloged in our OPAC. Call numbers can change, several books can have the same title, and using truncated titles for file names frequently don't offer much information. The bibliographic record number is unique, does not change, and we use this as the persistent identifier for the files. Also, data from OPACs already have a fairly reliable track record of being migrated into the future.
- The New York State Library employs such a convention, with OCLC number or local control number being the major portion of most file names (not all items imaged have been cataloged). Some of our larger items are scanned in parts, some imaging equipment saves raw scans at one file per page (later combined for a use copy), and some of our imaged titles are multi-part sets or serials. And so while the bib record identification number is a good start, we necessarily create file names that may include Volume, Number, Year, Month, Day, Part, Page, etc.
- (Sibley/Univ of Rochester) We base our file names on the bar code assigned to the copy we digitize, since it is logically and permanently linked both to the specific copy we digitized (we retain the original with its bar code attached to the housing) and to the bib record.
- We also use our bib identifier for most special and older projects. The materials being scanned by (or by us for) the Google Michigan Digitization Project use the barcode attached to the bib record.
- I posted a reply to this on the ARSC Associated Audio Archivists Message Board because it was easier to format that way. Here's the link:
http://arsc-aaa.invisionzone.com/forums/index.php?showtopic=38 - it is extremely important to document the filenaming scheme some where else as well. I keep a running word document for each collection we load with a written explanation of the file naming scheme and why we chose it. It is really for in-house use and for future employees.
- (Indiana Univ) The Archives of Traditional Music updated its file naming scheme in 2006, working with our Digital Library Program which was simultaneously developing the recommendations presented by the first link in Nancy's message, below. You can see our implementation for audio files in chapter 3 of the publication Sound Directions: Best Practices for Audio Preservation, available at http://www.dlib.indiana.edu/projects/sounddirections/papersPresent/index.shtml
- (Rochester?) When I was researching file naming some time ago, I had bookmarked these pages (I found the first to be particularly helpful):
http://wiki.dlib.indiana.edu/confluence/display/INF/Filename+Requirements+for+Digital+Objects
http://www.archives.gov/preservation/technical/guidelines.pdf (see page 60)
http://www.controlledvocabulary.com/imagedatabases/filename_limits.html
http://edocs.lib.sfu.ca/projects/Doukhobor-Collection/technical.html
http://staffweb.library.northwestern.edu/dl/adhocdigitization/storage/ - Here at the Northeast Document Conservation Center, we use the following:
for our in-house client work:
first five letters of client name, plus date stamp (yymmdd), plus workstation
code(a,b,c,d,etc.), plus four digit ascension number (three digit number with
leading zero)
for our digital conservation treatment photography:
client job number (without spaces or points), department code (p, b, i, f), job
sub number, status code (bt, dt, at), leading zero and three digit ascension
number.
Keeping the name structure (number of characters, code lengths, etc.) has been
helpful for file management and batch processing and migration. Underscore (_)
separators before and after code segments in the filenames would have made it
slightly easier for someone writing a batch processing code in XML, Java, Perl,
etc. We opted for shorter filenames.
- (Harvard)
Digital preservation is not dependent upon the file name being anything but unique. Therefore a simple number string will suffice as long as metadata is linked to the file. That said, there is a lot of value in having human readable names that convey information about the file such as catalog number, role, sequence number, etc. These things make the actual preservation workflow easier to follow and de-bug in case of problems.
In our workflow we use filenames that incorporate the call number, volume number, preservation role, face number, and file sequence number. Upon ingestion into the digital repository, this human-readable name is stored in metadata, and the file is named by the repository automatically with a unique number string which is more efficient for data processing. Upon retrieval from the digital repository, the human readable name may be restored from the metadata so that humans can work with the file without confusion.
Consistency and uniqueness are most important, regardless of the method used.
- (Victoria University of Wellington) We have purposefully used 'dumb' or 'generic' filenames in loading digital material into the research repository, e.g. thesis.pdf; paper.pdf; form.pdf; report.pdf, etc. I expect the metadata that the object is associated with to enable information and object retrieval. However, in saying that, the filenames we use are also a means to guide/remind users of the type of file that they are downloading. That in itself is 'doubling' up the load on the filename to act also as a resource type label. However, if all the filenames change in a preservation migration or transformation, we have metadata associated with digital object to identify the resource type.