These details start out with matching.  Before we get to that point, there are some steps we run with every loader (like move 001->035, 949 expansion, convert MARC to Aleph Seq. formatting, etc...).

Following are the steps that are particular to the e-book loader:

The script will have multiple configuration options.  --config parameter

--Match Only (with generic URL fix routine)
--Match and Load (with generic URL fix routine)
--Match Only (with special fix routine)
--Match and Load (with special fix routine)

PRE-1.  (While files are still in MARC format)
    Clear out the load files and lists of piggyback system numbers, because various steps in the load append to these, and we don't want duplicates.
    This is mainly for the purpose of a loading run which follows a match-only run on the same file.

PRE-1.  Run a fix routine, which may be provider-specific via the config file.
Params: MIT01,$file.035,$file,$file.rejt,,,EBKU,,Y,
tab_fix: EBKU  fix_doc_do_file_08             ebookurl.fix
         EBKU  fix_doc_do_file_08             ebook2.fix

    Specific fixes may strip out other providers' URLs or do other desired record clean up.
    Generic fix handles 856 $$z
    Generic fix copies main 035 into a virtual field and flags records with no 856 tag, and multiple 856 tags.
    Generally, call the generic fixes AFTER any specific fixes, which
    may have removed all 856s.

    Report, but do not reject, records with the no 856 flag.
    Report, but do not reject, records with the multi 856 flag.

1.  Match incoming 035 $$a to existing 035 $$a and 776 $w.  p_manage_36 OCC index.  Report Aleph match result counts.
Params: MIT01,$file,$file.nooclc,$file.oclcmatch,$file.multio,EBK1,MLT, >$file.oclclog
tab_match: EBK1  match_doc_gen                  TYPE=IND,TAG=035##,CODE=OCC,SUBFIELD=a

      A.  Match.  

            i. Remove the 035 and change the 776 to an E76, so that these records are not treated like print records having e-format added (OCLC match on 035 means this record IS an e-version!)
            ii. Compare incoming/existing 856 $u. (check_url)
            iii. If 856 $u matches, stop processing.  

**** Report OCLC#?  We need to count these records among the rejects.  
**** Will it be confusing if there are 8 rejects in the final count,
**** but only 7 OCLC#'s get reported from the other steps?

            iv. If no 856 $u match, piggyback load when script is run with load option.
            

      B.  No match.  

            i.  Goto step 2.

2.  Match incoming 776 $$w to existing 035 $$a and 776 $w.  p_manage_36 OCC index.  Report Aleph match result counts.
Params: MIT01,$file.nooclc,$file.no776,$file.776match,$file.multi776,EBK2,MLT, >$file.776log
tab_match: EBK2  match_doc_gen                  TYPE=IND,TAG=776##,CODE=OCC,SUBFIELD=EACHw

      A.  Match.  

            i. Piggyback load when script is run with load option.

      B.  No match.  

            i.  Goto step 3.

**** Is there any possibility of a multi-match here?  
**** Currently those records would be rejected with no report!  
**** This has not happened yet.

3.  Match incoming 245 $$ab to existing 245 $$ab.  p_manage_36 TTL index.  Report Aleph match result counts.
Params: MIT01,$file.no776,$file.nottl,$file.ttlmatch,$file.multittl,EBK3,MLT, >$file.ttllog
tab_match: EBK3  match_doc_gen                  TYPE=ACC,TAG=245##,CODE=TTL,SUBFIELD=ab

      A.  Match.

            i.  Export matched records, for additional checking.  print_03
            ii. Goto step 4.

      B.  No match.

            i.  Load as new when the script is run with load option.

      C.  Multi match.

        i.   List all system numbers from MLT tag.
        ii.  Export these records.
        iii. Report incoming OCLC # and 260c, and 260c from each matched record.
        iv.  Load as new when the script is run with load option. (changed 3/2012, previously did not load these)

4.  Check for 245 $n or $p in both incoming and existing records.

      A.  Found 245 $n $p.

            i.  Report OCLC # (245 $n $p found)
            ii. Stop processing.

      B.  No 245 $n $p.

            i. Goto step 5.

5.  Match existing publisher 260 $$b (from title matched records) to incoming 533 $c using local script.  
If no incoming 533 tag, match using incoming 260.
Report counts of matched/non-matched records.

      A. Check existing record for EEBO, format BR, or contains any STA tag.  (As of 3/2012, Books24x7 records may be merged)
        i.   Report SKIP (reason)
        ii.  Report any records that are BR, no STA, and have tag 981 to Charlene.
        iii. Load as new when script is run with load option.

      B.  Match.  533$c/260$b

            i. Match incoming 533 $$d year to existing 260 $$c year, using 4 digit string, nothing longer.

        ii. Match incoming 260 $c year to existing 260 $c year, using 4 digit string, nothing longer, when 533 $d does not match 260 $c.

            iii. Report any numeric year strings longer than 4 digit.

        a. Match 533$d/260$c or 260$c/260$c
            a). Report COUNT (260 $c match)
        b). Piggyback load when script is run with load option.

        b. No match 533$d/260$c nor 260$c/260$c
                   a). Report COUNT (260 $c no match)
        b). Load as new when script is run with load option.

      B.  No match.  533$c/260$b or no 533

            i.  Match incoming 260 $$c year to existing 260 $$c year, using 4 digit string, nothing longer.
            ii. Report any numeric year strings longer than 4 digit.

        a. Match 260$c/260$c.  Compare 260$b/260$b.

            a). Match  260$bc/260$bc
            1). Report COUNT (260 $c match)
            2). Piggyback load when script is run with load option.

        b). No match
                    1).  Report OCLC # (no match 533$c/260$b nor 260$b/260$b)
                    2). Stop processing.

        b. No match 260$c/260$c
                   a). Report COUNT (260 $c no match)
        b). Load as new when script is run with load option.

      C. No match.  533$c/260$b nor 260$b/260$b

            i.  Report OCLC # (no match 533$c/260$b nor 260$b/260$b)
            ii. Stop processing.

**Match Only stops here**

After review, run script with Match and Load.

While preparing piggyback records for load, check whether any matched records are SE format.
Move record from piggyback to new file, report to Ben.

For all records to piggyback, run fix to: copy 1st 035 to 776$w; change all 035 $a to $l to allow linking from Worldcat; turn the E76s back into 776s after the other 776 processing is done; and change 2nd indicators on URLs to 1.

*** Display and verify total record counts prior to p_manage_18 load steps) ***
 
1.  New records to load.  (Combine all records to be loaded as new, with sequential incoming system no.)

      A.  URL clean up, e.g. Remove any 856 URL that is not doi.org.  (Aleph fix EBKU, or other provider specific fix, see above)

      B.  Add 856 $$z MIT Access Only.  (Aleph fix EBKU, see above)

      C.  Add 910 $aBatchloadYYMMDD $iBatchload $dYYMMDD $nn
      (done in bib_convert script, prior to matching steps)

      D.  Strip out any temporary E35 and MLT tags that were added temproarily for processing.

      E.  No overlay.  Load as NEW.  p_manage_18
Params: MIT01,$file.newload,newbad,$file.newload.sysno,NEW,NEW,,FULL,REP,M,OCLC_TO_UTF,,ebook,00,1995, > $file.newlog

      F.  Build NET holdings.  p_manage_50
Params: MIT01,$file.newload.sysno,000000000,999999999,mit50,mit60,,949##,tab_hol_item_create,tab_hol_item_map,A,M,N,loader,00,loader,00,Y,Y, > $file.new.hollog

2.  Piggyback load.  (Combine all records for piggybacking.)

      PRE-A.  Export all matched records to be piggybacked.

      A.  Check whether existing records to be merged have NET holdings.
      p_print_03
    use list of system numbers from manage_36, input to print_03
    use fix that makes PST (GUI-BRIEF)
    export PST, 006
    script make_overlay.pl

            i.  NET hol exists.

                  a.  Remove 949 from incoming record.

            ii. No NET hol.

                  b.  Convert 949 from incoming to 959.

      B.  Check whether existing records to be merged have 006.  (using p_print_03 already run above)

            i.  006 exists.

                  a.  Remove 006 from incoming record.

            ii. No 006.

                  b.  Load incoming 006.    

      C.  URLs  

        i. Keep incoming 856 URL containing doi.org only.  Discard other 856. (Aleph fix EBKU)

        ii. Add $$z MIT Access Only to doi.org URL. (Aleph fix EBKU)

        iii. Load incoming 956 as-is. (Aleph fix EBKU)

        iv. Change second indicator in 856 and 956 to 1.  (Aleph fix EBPGY)

Params for p_manage_25: MIT01,$file.overlay,$file.matchload,$file.pgy.rejt,,,EBPGY,,Y,
tab_fix: EBPGY fix_doc_do_file_08             035to776.fix

      D.  Merge/overlay.  p_manage_18
Params: MIT01,$file.matchload,matchbad,$file.load.sysno,OLD,NEW,,FULL,MERGE,M,OCLC_TO_UTF,OVERLAY-37,ebook,00,1995, > $file.matchlog
tab_merge:
37 2 Y #####
37 2 N 959##
37 1 Y 006##
37 1 Y 776##
37 1 Y 856##
37 1 Y 946##
37 1 Y 956##
37 1 Y 959##

            i.  Add tags:

                  a. 946 $$m (from incoming record)

                  b. Add incoming 035 OCLC # to 776 $$w
          If 776 already exists, remove incoming 776, make new 776 $$w; if not add 776 $$w)  Aleph fix EBPGY.
 
          c. 006, 856, 956, 959 (as corrected above)

      E. Build NET holdings.  p_manage_50 (tag 959)
Params: MIT01,$file.load.sysno,000000000,999999999,mit50,mit60,,959##,tab_hol_item_create,tab_hol_item_map,A,M,N,loader,00,loader,00,Y,Y, > $file.hollog
 

No special Get URLs based on series at this point.  Perhaps could be
added via global change after LTI processing if this is deemed
important.  (If this has to be done, I just wanted to add that the result won't be consistent.  In addition, 'extra steps' will have to be created to check if the series URL(s) are already in the existing records for piggyback load.)

(NOTE: I forgot to mention that the script should completely ignore the records from Books24x7 and EEBO.  For now our incoming records may be only related to those from Books24x7, but it's good to include EEBO's for the potential use in the future.  Would the process be faster for the title match if there is a way to exclude records from these two providers?)

 All bks records right now have 035 $$abk*  (some are bks and some are bke plus the number) At some point we're talking about trying to connect up our holdings for the in WorldCat and then we might get OCLC #'s into them, but for now we are totally safe from OCLC # matching.

EEBO record all have 035 $$a(EEBO) in them.  They have traces of oclc numbers, but NOT in the fields we use for OCLC # matching.  I haven't looked if they have OCLC numbers in the fields that are included in the OCC index though. (This index is where the 776$w in existing records is matched)

 
April 20, 2010
added params and table settings: May 4, 2011
updated for recent script changes: May 23, 2012

  • No labels