Dome Metadata Application Rules

MEETING NOTES AND DRAFT IDEAS - BECAME DOME MAP - NO LONGER UPDATED - PROJECT COMPLETE

Contributor Elements - Discussed

  1. Custom contributor fields (dc.contributor.illustrator, dc.contributor.advisor, dc.contributor.author, dc.contributor.editor, dc.contributor.other) will no longer be used in Dome metadata.
  2. Existing custom contributor field will be consolidated to the dcterms:contributor (dc.contributor.none) element.
  3. Either dcterms:contributor or dcterms:creator (dc.creator.none) must be present in an Item's metadata to satisfy the Dome Core MD Requirement.
  4. A new element will be added to the Dome metadata registry (custommd:namesDisplay(dc.contributor.other)). It will be established in a new "Dome" namespace that we will create for our custom elements. It will aggregate all of the contributor names, dates and roles associated with an Item.  It is intended to be used in brief Item records in DOME.  The aggregated contributor information will be provided by depositing repositories upon import.

Carl's Comments

  1. For rule 2, we need to figure out how we can do this?  We've never done this before. We've changed the contents of a field, but haven't changed the field name without changing the content. Can Bill's ItemUpdater utility do this?
  2. For rule 4, we need to figure out how we can do this as well? Carl will look at Bill's ItemUpdater to see if it can accomplish this.

The way ItemUpdater works is you feed it a dublincore.xml file you prepare in advance that contains the changes you would like to make.  So we would need to reexport all of these records out of IRIS.

? How do we evaluate the ROI(?) value added(?) in making these changes?

How to accomplish rules 2 and 4

Step 1: Enable the field for use by future collections deposited to Dome.

Step 2: On a collection by collection basis, identify the field that currently contains the aggregated contributor information to go into custommd:namesDisplay, if it exists (this should exist for Rotch collections, but not Archives).  Then we need to migrate the contents from the current field to custommd:namesDisplay. Migration means doing the following:

We would accomplish these changes on a collection by collection basis. We would need to reexport metadata from IRIS for those Dome records we would like to adjust. We would identify this IRIS metadata by order number and dspace handle fields.  We would prepare the dublincore.xml file that the ItemUpdater requires from this exported metadata.  This dublincore.xml field would also contain the instructions for removing the metadata fields that we are seeking to replace in existing Dome Item records.

Step 3: For collections that do not have a field with all of the names, roles, dates concatenated, add it.  -- If we reimport we can grab and add where name fields don't exist.

Step 4: Strip date and role information for all names in dc.contributor.none and dc.creator.none fields. -- Too hard and variable across collections, just replace all the name metadata instead.

The easiest way to accomplish all of this is to reimport names for all Items in Dome by grabbing the right fields from IRIS or AT and replacing the current contributor and creator fields with the newly exported values from IRIS or AT.  We still want to do this collection by collection.

Note: We need to discuss the use of birth and death dates in the contributor field that is intended to contain only a contributor's name.  These dates may be useful to distinguish between contributors that share a name.  They may also be included in LC NAF versions of names. We think that the MIT Libraries are moving towards asserting LC NAF control over names in DSpace/Dome.  We should also consider using dates when they are used in other authority files (for example, ULAN Getty Union List of Artist Names).

Note: Do we want to officially recommend as the Metadata Operations Team that we start enforcing compliance with LC NAF in the choice of form of author name.

Example: http://dome.mit.edu/handle/1721.3/18867 (see DomeMetadata_ex1.xlsx)

Date Elements - Discussed

  1. dcterms:created (dc.date.created) will be used for the creation, publication, or origination date of all Items. If an Item has multiple origination dates (created, modified, published, etc.), it is up to the collection administrator to identify the primary origination date for this field and put all others in dcterms:date (dc.date.none).
  2. dcterms:created must be present in an Item's metadata to satisfy the Dome Core MD Requirement.
  3. When describing the coverage of an Item (the date as the subject) use dcterms:temporal (dc.coverage.temporal).
  4. dcterms:datecopyrighted (dc.date.copyright) will be used for Items with different copyright and publication dates.
  5. dcterms:available (dc.date.available) will be used for embargoed Items.
  6. dcterms:accesioned (dc.date.accessioned) will only be used by the DSpace system.
  7. dcterms:issued (dc.date.issued) will only be used by the DSpace system.
  8. dcterms:date (dc.date.none) will be used for all other dates.  It may be repeated for multiple dates.
  9. A new element will be added to the Dome metadata registry (dome:datesDisplay).  It will be established in a new "Dome" namespace that we will create for our custom elements. It will aggregate all the various kinds of dates associated with an Item.
  10. When including multiple kinds of dates (for example, the date a photograph was taken and the date the content of the photograph was created) choose the most important date and put in either dcterms:created.  Put other dates in dcterms:date or in dcterms:temporal.  dcterms:created or dcterms:issued should be reserved for the date or origin of the object and dcterms:temporal for the date of the subject matter of the object.  If we just use dcterms:created for all orgin dates, collections would be free to choose the most significant date associated with the origination of the object.  All other dates should go into dcterms:date.
  11. dcterms:temporal and dome:datesDisplay will have no enforcement of date format. We will enforce a consistent format for dcterms:created and dcterms:date. We should consider ISO 8601 or Fuzzy Dates (need to find a standard or authority for Fuzzy Dates).  Both can handle ranges and other kinds of complex date statements.

We need to determine how prescriptive we need to be in controlling the syntax of the various date fields.  THIS IS THE MOST IMPORTANT QUESTION  We need to ask Carl how best to use date ranges and other kinds of date expressions (for example, ca. 1976) in indexed date fields. Carl thinks this field is indexed as a string of text.  We are leaning towards unconcern over the indexing of date fields and the allowance of what ever kind of date expression is most appropriate for the content being described.  We are also considering restricting dcterms:created and dcterms:issued to single dates and ranges, reserving dome:dates for ca., approx., etc.

? What steps need to be taken to bring the existing use of date fields in Dome into compliance with the above rules?

Carl's Comments

1. Current DSpace system functionality is as follows: If an issue date is provided upon ingest it is put in dc.date.issued. If one is not provided, DSpace will put the date of deposit in the dc.date.issued field.  The question is can we change DSpace to not automatically populate this field.

An alternative is to put both a publication or a creation date in to dc.date.created, ignoring the distinction between formally published and informally distributed items.  This would allow DSpace to do it's own thing with dc.date.issued.  We would then ignore dc.date.issued for browse and search indices. WE ELECTED TO ADOPT THIS ALTERNATIVE.  ALL ORIGINATION (CREATION OR ISSUANCE) DATES WILL BE PUT INTO DCTERMS:CREATED AND DCTERMS:ISSUED WILL BE LEFT TO THE DSPACE SYSTEM TO POPULATE AUTOMATICALLY WITH THE DATE OF DEPOSIT.

2. Question for Richard: If we format our dates to conform to ISO 8601 how will that affect how DSpace indexes these dates for searching and browsing?  Will it be able to parse these dates to present them in a meaningful way in a search or browse index?

3. Data cleanup will follow the same general procedure as names, that is, reexporting dates from IRIS and AT in the format that we need them and them reimporting the metadata to existing Items and replacing the current date fields. Date field cleanup (rexport and import) will be more complicated and less of a priority than names.

Example: http://dome.mit.edu/handle/1721.3/18867 (see DomeMetadata_ex1.xlsx)

Notes Elements - Discussed

  1. Use dcterms:abstract (dc.description.abstract) for formal abstracts.
  2. Use dcterms:tableOfContents (dc.description.tableofcontents) for formal tables of contents.
  3. Use dcterms:description (dc.description.none) for all other notes or descriptions.
  4. At least one of the above three elements are required to satisfy the Dome Core MD Requirement.
  5. The following fields will no longer be used in Dome (dc.description.sponsorship, dc.description.statementofresponsibility, dc.description.uri).

?What steps need to be taken to bring the existing use of note fields in Dome into compliance with the above rules, especially rule 5?

Need Usage Reports for

  1. dc.description.sponsorship -- Has been used once (We should just move this by hand).
  2. dc.description.provenance -- Has been used  (This is the provenance statement, we will let DSpace use this as it pleases).
  3. dc.description.statementofresponsibility -- Not been used
  4. dc.description.uri -- Not been used

All we just need to do is move the one dc.description.sponsorship to dc.description.none and sort the 1719 existing dc.description.abstract fields, leaving the proper abstracts in the field and moving all others to dc.description.none.  We sorted these on 2011-05-20, found that all 1719 uses of the field occurred in the Project Whirlwind collection and decided to just leave them there.

Example: N/A

Identifier Elements - Discussed

  1. dcterms:identifier (dc.identifier.none) will be used for an Item's primary identifier, assigned prior to deposit in Dome.
  2. Custom identifier fields (dc.identifier.govdoc, dc.identifier.isbn, dc.identifier.ismn, dc.identifier.issn, dc.identifier.sici, dc.identifier.vendorcode) will no longer be used in Dome.
  3. Existing custom identifier fields will be consolidated to a new element that will be added to the Dome metadata registry (custommd:otherIdentifier(dc.identifier.other)). The new element will be established in a new "Dome" namespace that we will create for our custom elements.
  4. dc.identifier.uri will only be used by the DSpace system to record the Handle it assigns to the Item. Can it be mapped to a new element (dome:URI(?)) in the newly established "Dome" namespace that we will create for our custom elements? Answer: No. Will moving this field to a new namespace cause problems for external harvesters of Dome metadata?  Answer: Carl says we'd probably run into problems moving this field, he can't for certain, but it is highly likely.  Solution: We will leave handles in dc.identifier.uri.

?What steps need to be taken to bring the existing use of identifier fields in Dome into compliance with the above rules, especially multiple instances of dcterms:identifier (dc.identifier.none)? 

Existing practice in Dome almost matches our rules. Minimally: All we need to do is map customs to dc.identifier.other. Preferrably: Move dc.identifier.other to custommd:otherIdentifier

Example: N/A (same as subjects in Example: http://dome.mit.edu/handle/1721.3/18867 (see DomeMetadata_ex1.xlsx))

Language Elements - Discussed

  1. dc.language.iso will no longer be used in Dome.
  2. Existing dc.language.iso fields will be mapped to dcterms:language (dc.language.none).

We haven't yet actually proposed keeping the contents of a filed and just changing the name. Do we want to follow the name/date procedure and replace current language fields with new fields from a reexport from contributing repositories? Certainly, this field's data cleanup is not the highest priority. Carl says if we are already reexporting/reimporting name and date fields, especially since we'll probably have to do this for every record, it will not add much effort or complexity to add this field in with the rexport/reimport.

Example: N/A (same as subjects in Example: http://dome.mit.edu/handle/1721.3/18867 (see DomeMetadata_ex1.xlsx))

Relation Elements -- Discussed

  1. dc.relation.ispartofseries will be mapped to dcterms:isPartOf (dc.relation.ispartof)
  2. The following relation elements will be added to the dcterms (or dc.) namespace dcterms:conformsTo (dc.relation.conformsTo), dcterms:hasFormat (dc.relation.hasFormat), dcterms:isRequiredBy (dc.relation.isRequiredBy), and dcterms:references (dc.relation.references) will be added to the dublin core schema.  These elements exist in the dublin core standard, but are missing from the Dome metadata registry.
  3. dc.relation.isbasedon will be mapped to dcterms:source (dc.relation.source).  There is no dcterms:isbasedon element.  We will leave dc.source.uri in the dc metadata schema for DSpace to use during OAI PMH export or harvesting.  DSpace uses the field to attach its own name as the source for items it exports or has harvested.
  4. dc.relation.uri will be mapped to dcterms:relation (dc.relation.none)
  5. dc.relation.type is not used and will be removed from the schema.  The information for which the field was intended will be assigned to a worktype element in a new VRA Core schema that will be established in the Dome metadata registry.

Our approach to enforcing Rules 1 and 4 will be similar to the custom identifier fields, dc.language.iso, and custom subject fields.

Example: N/A (same as subjects in Example: http://dome.mit.edu/handle/1721.3/18867 (see DomeMetadata_ex1.xlsx))

Rights Elements -- Discussed

  1. dc.rights.uri will no longer be used in Dome (there are currently no populated dc.rights.uri fields).
  2. dc.rights.copyright will be mapped dcterms:rights (dc.rights.none).

We will need to adjust our rights metadata requirement in the MIT Libraries Dome Collections DomeCore Metadata Element Set.  It should read, follow the guidelines in the Flickr Team Report and Recommendations (current version 05 May 2011).

Our approach to enforcing Rule2 will be similar to the custom identifier fields, custom relation fields, dc.language.iso, and custom subject fields.

Example: N/A (same as subjects in Example: http://dome.mit.edu/handle/1721.3/18867 (see DomeMetadata_ex1.xlsx))

Subject Elements -- Discussed

  1. Custom subject fields (dc.subject.lcsh, dc.subject.classification, dc.subject.ddc, dc.subject.lcc, dc.subject.mesh, dc.subject.other) will no longer be used in Dome metadata.
  2. Existing custom subject fields will be consolidated to the dcterms:subject (dc.subject.none) element.

The only complication with this is shared with language.  We would like to change the name of a field without changing the contents and we don't have as nice a way to reexport subjects from IRIS/AT and then reimport to Dome, replacing the current subject fields.  Carl says if we are already doing this for names and dates, especially since we'll probably have to do this for every record it will not add much effort or complexity to throw this field and the dc.language.iso, the custom identifer fields, dc.rights.copyright, dc.relation.ispartofseries, and dc.relation.uri fields in with the rexport/reimport.

Example: http://dome.mit.edu/handle/1721.3/18867 (see DomeMetadata_ex1.xlsx)

Type Elements -- Discussed

  1. We will identify the one vocabulary that we will use for the Dome Core type field (we choose DCMI Type Vocabulary).  This vocabulary will be mapped to dcterms:type (dc.type.none).
  2. When a type vocabulary is defined within a specificaiton (for example, VRA Core type), we will use the identified field within that schema.  We will add the schema and field to the Dome metadata registry.
  3. For type vocabularies that are not defined within a metadata specification (for example, local type vocabularies) we will use a 'localType' (need to find a better name) element in our homegrown Dome metadata schema (current prefix: custommd).

Currently type values from all vocabularies are entered in a dc.type.none element. Separating the values by vocabulary from existing Dome records will be tricky.  It would have to be done collection by collection.

For the Visual Collections community all values in the type filed are VRA worktypes.

For the archives collections most values in the type field are MODS Type Values.

It looks like the workflow for updating the existing type values will be:

1) Move existing values from dc.type.none to another appropriate element (either from another schema/vocabulary, or to a custom type element)

2) Add a DCMI type vocabulary value to the record.

Example: (Mikki will provide)

Metadata Manipulation Notes

There are a few different kinds of actions:

  1. Move data from one field to another within the existing dc namespace (for example custom contributor fields).
  2. Move data from a field within the existing dc namespace to a field in a newly created namespace (for example custom identifier fields).
  3. Import new metadata for existing Items into fields in a newly created namespace (for example concatenated/aggregated dates).
  4. Parse data in an existing fields and separate it into multiple fields, some into existing fields in the dc namespace, some into new fields in a newly created namespace (for example identifier and type fields).
  5. Move all of the elements in the existing dc namespace into a new dcterms namespace.
  6.  
  • No labels