Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Corrected links that should have been relative instead of absolute.

Roles Database Data Feed Programs

Last modified 6/04/2010

Connecting to the Roles production or test server

When maintaining or debugging data feed programs, connect to either roles.mit.edu or roles-test.mit.edu as user rolesdb.

The crontab file

On the production Roles server (roles.mit.edu, aka cloverleaf) and the test Roles server (roles-test.mit.edu, aka parsley), there are several sets of data feed jobs that are automatically run each day. The overall schedule can be found in the crontab file:

  • Production: ~rolesdb/cronjobs/crontab.cloverleaf
  • Test: ~rolesdb/cronjobs/crontab.parsley

If you ever need to change the crontab file, do the following:

  1. Connect to the server machine using telnet or SSH as user rolesdb.
  2. Go to the cronjobs directory
    cd cronjobs
  3. Check out the crontab file from RCS using the alias "checkout"
    checkout crontab.cloverleaf
    (or)
    checkout crontab.parsley
  4. Use emacs or vi to make the desired changes
  5. Check in the new crontab file into RCS using the alias "checkin"
    checkin crontab.cloverleaf
    (or)
    checkin crontab.parsley
  6. *** Don't forget this step! ** Run the crontab command.
    crontab crontab.cloverleaf
    (or)
    crontab crontab.parsley

You can display the current crontab entry with the command crontab -l

Shell scripts run by the crontab file

Each crontab file runs several shell scripts that, in turn, run individual programs. These high-level shell scripts include:

morning_jobs_early

- Extracts most data from the Warehouse, including most types of Qualifiers, and does some processing for the Master Department Hierarchy
 (Runs in early morning)

morning_jobs_late

- Runs some steps that depend on PERSON data from the Warehouse including (a) loading the PERSON table from krb_person@warehouse, (b) loading EHS-related Room Set data from the Warehouse into RSET qualifiers, (c) processing externally-derived authorizations (table EXTERNAL_AUTH).
(Runs in the morning, not quite so early)

cron_run_exception_notify

- Generates Email about Authorizations that still exist for people with deactivated Kerberos usernames
(Runs after morning_jobs_late)

weekend_jobs

- Runs a procedure to run an Oracle ANALYZE on all the tables in the Role DB.
(Runs each Saturday.)

hourly_jobs

- Currently only runs once a day, not hourly. Updates derived database tables ("shadow" tables) for the Master Department Hierarchy

cron_roles_cleanup_archive

- Cleans up some old files in the archive directory

cron_run_sapfeed

- Runs a job to create several daily files about SAP-related authorizations and uses the scp command to copy them to a directory on one of the SAP servers where they are picked up, further processed, and loaded into the appropriate SAP objects. The files built and sent are incremental, including authorization information only for those people whose "expanded" authorizations (after including all child qualifiers) have changed since the previous run.
(Runs once each morning, only on the production
Roles server)

cron_pdorg_prog.sh

- Compares SAP-related approver authorizations in the Roles DB with parallel information in the pd org structures in SAP, and generates a file indicating differences.
(Runs once each morning)

cron_pddiff_feed.sh

- Sends the file pdorg_roles.compare, generated from cron_pdorg_prog.sh, to the SAP dropbox
(Runs each morning after cron_pdorg_prog.sh)

cron_ehs_extract run_ehs_role_prog.pl

- Runs ~rolesdb/bin/ehs/run_ehs_role_prog.pl to compare DLC-level EHS roles (e.g., DEPARTMENTAL EHS COORDINATOR) to their equivalent Authorizations in the Roles DB (which is the system of record), generates a differences file, and sends it to the SAP dropbox so the changes can be applied.

Directories

Directory

Description

~rolesdb/archive

Some compressed historical files from previous days' data feed runs

~rolesdb/bin

Generic data feed perl scripts and other program files

~rolesdb/bin/ehs

EHS-related data feed programs

~rolesdb/bin/extract

Programs related to out-going data for DACCA, LDS (SAP component being phased out),  etc.

~rolesdb/bin/pdorg

Programs related to out-going data for updating PD Org entries in SAP related to APPROVER authorizations

~rolesdb/bin/repa_feed

Temporary or test versions of programs

~rolesdb/bin/roles_feed

Most data feed programs for data coming into the Roles DB

~rolesdb/data

Data files used by data feed programs. Most data files are temporary, but some, such as roles_person_extra.dat, are permanent.

~rolesdb/doc

Miscellaneous notes and documentation

~rolesdb/extract

Empty

~rolesdb/lib

A few generic perl modules, and some config files

~rolesdb/log

Most recent log files from data feed programs

~rolesdb/misc

Miscellaneous notes and working files

~rolesdb/sap_feed

Obsolete versions of Roles->SAP data feed programs

~rolesdb/sql

SQL source files for creating tables, views, and stored procedures. Files for creating tables (new_schema*.sql) are preserved for documentation purposes and should NOT be rerun -- tables should never be dropped and recreated since we do not want to lose the data. Files for creating stored procedures and views can be modified and rerun.

~rolesdb/sql/frequently_run_scripts

Special SQL scripts that are run periodically, e.g., to analyze tables

~rolesdb/sap_feed

Obsolete versions of Roles->SAP data feed programs

~rolesdb/xsap_feed/bin

Programs for Roles->SAP data feed programs

~rolesdb/xsap_feed/config

Config files for Roles->SAP data feed programs

~rolesdb/xsap_feed/data

Nightly data for Roles->SAP data feed programs

Extract, Prepare, and Load steps

Most data feed programs are perl modules for maintaining one type of data in Roles DB tables, such as people in the PERSON table or one type of Qualifier (e.g., Funds/Funds Centers) in the QUALIFIER table. Each perl module has a separate subroutine for the Extract, Prepare, and Load step.

The steps do the following:

Extract

Extract a full set of data for a particular type of object from the external source, generally the Data Warehouse. The data are written to a flat file in the ~/data directory, generally a file with a name ending in ".warehouse". The "warehouse" suffix is used even if the source of the data is something other than the Warehouse.

Prepare

(a) Select a full set of parallel data for a particular type of object from Roles DB tables into a flat file in the ~/data directory with a name ending in ".roles".
(b) Compare the data from the Extract and Prepare steps and produce a flat file "*.actions" listing actions to be applied to the Roles DB tables to synchronize them with the source data.

Load

Apply the actions from the previous step's "*.actions" table to actually update the data in the Roles DB tables

Running Extract, Prepare, and Load steps by hand

Normally, the 3 steps of most data feed processes are run automatically by shell scripts (~/cronjobs/early_morning_jobs, ~/cronjobs/late_morning_jobs, ~/cronjobs/evening_jobs, etc.)

However, when debugging or correcting a problem, it is possible to run any or all steps manually, and check the results after each step. There are two ways to do this:

  • Edit the appropriate shell script (e.g., ~/cronjobs/early_morning_jobs), and cut-and-paste a line into the Unix shell
  • Use one of two "menu" scripts in the ~/bin directory: run_procedure.pl or test_procedure.pl. These have different sets of data feed programs available. The "run" and "test" parts of the names are artifacts of previous work -- they both are equally useful for running jobs in production or test mode and just happen to contain references to different data feed programs.

It is often useful to follow this sequence

  1. Run the Extract step to see if errors occur, and if not, look at the *.warehouse file in the ~/data directory to examine the intermediate results.
  2. Run the Prepare step to produce a "*.actions" file, and look at the resulting file. You can also use an editor to modify lines in the actions file, e.g., to remove one or more problematic lines that are preventing the Load step from running properly.
  3. Run the Load step to apply the modified set of actions.

The most common problem with nightly feeds is that there may be too many changes since the previous night, exceeding the max-records setting for the particular object type. We will describe techniques for dealing with this below.

Example 1:
To run the steps of the EHS PIs, Room Sets, and Rooms feed program on the production Roles database, do the following - Connect to roles.mit.edu as user rolesdb

  • From any directory
    test_procedure.pl (as mentioned above, different production data
    feed steps are available in run_procedure.pl and
    test_procedure.pl)
  • Enter the number 22 to run the person extract step
  • Enter 23 to run the prepare step
  • Enter 0 to stop
  • cd data (to get to the data directory)
  • tp (a little command wrapped around ls to list the most recent files
    in the current directory)
  • use emacs or vi to look at files rset.actions, or rset.warehouse, or
    rset.roles (The *actions file lists transactions to be performed)
  • Make adjustments in the rset.actions file, or use the web interface to
    change the MAX_RSET parameter
  • test_procedure.pl
  • Enter 24 to run the load step
  • Enter 0 to stop

Example 2:
To run the steps of the person feed program on the production Roles database, do the following - Connect to roles.mit.edu as user rolesdb

  • From any directory
    run_procedure.pl
  • Enter the number 16 to run the person extract step
  • Enter 17 to run the prepare step
  • Enter 0 to stop
  • cd data (to get to the data directory)
  • tp (a little command wrapped around ls to list the most recent files
    in the current directory)
  • use emacs or vi to look at files roles_person.DELETE, roles_person.UPDATE,
    or roles_person.INSERT
  • Make adjustments in one or more files, or use the web interface to
    change the MAX_PERSON parameter (max no. of person changes)
  • run_procedure.pl
  • Enter 18 to run the load step
  • Enter 0 to stop

Notification Email

There are three different ways that notification Email get generated by the Roles DB nightly feed programs.

  1. Messages about Authorizations for inactive Kerberos usernames

If an Authorization exists where the person (Kerberos_name column) represents a username that is no longer active (no longer included in the KRB_PERSON table from the Warehouse), then notification Email will be sent.

The people who should be notified are specified by setting up authorizations in the Roles Database where the function "NOTIFICATION - INACTIVE USERS" and the qualifier represents the function_category (application area) for which this person should receive notification.

The program ~/bin/roles_feed/exception_mail.pl finds all categories (column function_category in the authorization table) for which authorizations exist for deactivated Kerberos_names. The program then sends Email to the appropriate recipients (based on NOTIFICATION - INACTIVE USERS authorizations), one piece of Email per recipient for each function_category where there are one or more authorizations for inactive Kerberos_names. The Email lists the inactive usernames and the number of authorizations in the category that should be deleted or reassigned.

2. Errors detected by various data feed programs

Some errors are detected by various data feed programs, usually in the LOAD step, that result in Email being sent to a list of Email addresses stored in the file ~/lib/roles_notify. Currently, the Email addresses included are (the list) warehouse@mit.edu and repa@mit.edu.

3. Full log file send from various data feed programs

The full log from LOAD step of most data feed programs, and the full log from other data feed programs that do not have a separate LOAD step are sent to one or more Email recipients. Within the cronjobs directory, the scripts morning_jobs_early, morning_jobs_late, evening_jobs, and weekend_jobs include steps that send out this Email. Currently repa@mit.edu is the only recipient.

These log files do not need to be examined every day. It is useful to periodically examine them for warning messages. It is also useful to have them available if a problem is detected. However, usually the Email sent out for detected error messages is sufficient for detecting problems.

Data feed errors that can occur - Most common errors

  • Too many changes in one night
    o More than expected no. of transactions, due to batch load of
    student data, EHS data, or some other source with an unusually high
    volume of changes
    Common examples:
  • The WHCOST feed exceeds its limit because a change has been
    made to all cost objects related to one of the profit centers
    under the Department of Facilities. There are a couple of
    profit centers under the Department of Facilities that have
    tens or hundreds of thousands of cost objects, and all of these
    cost objects tend to have the same supervisor and addressee.
    If the supervisor or addressee changes, or if the room number
    or other info about the supervisor or addressee changes for all
    cost objects under one of these huge profit centers, then the
    number of changes exceeds the 10000 changes per night limit.
    The solution is to increase the MAX_WHCOST parameter from
    its default to a high enough number to accommodate the changes
    for the day. This is described in the solution section below.
  • The person feed exceeds its limit each year on the morning
    following the annual old Kerberos principle purge.
    The solution is to increase the MAX_PERSON parameter from its
    default (3000) to a higher number that accommodates the day's
    set of changes, as described in the solution below.
  • The person feed exceeds its limit each spring (May or June)
    when the graduating seniors lose their student status.
    The solution is to increase the MAX_PERSON parameter from its
    default (3000) to a higher number that accommodates the day's
    set of changes, as described in the solution below.

Solution: Examine the data changes (see the section "Running Extract
Prepare, and Load steps by hand"). Determine if the
changes are legitimate due to source data or system problems.
If the changes are legitimate, increase the appropriate
value in the ROLES_PARAMETERS table (see description
in the "Adjusting max-action per day in the ROLES_PARAMETERS
table" section below in this document), and either rerun
the 3 data feed steps by hand, or wait for tommorow
morning's cronjob.
o EXTRACT step failed due to a network or server problem
and there was no data for the PREPARE and LOAD steps

  • Deactivated users have one or more authorizations
    Solution: No immediate action is necessary. Peggie takes care of
    these issues over time.
  • less frequent problems
  • Run out of qualifier_id's in range allocated for this type of
    qualifier. (This should now be very rare – there should be lots
    of room for qualifiers of each type. However each qualifier has
    its own limited contiguous range of Qualifier IDs, so it is possible
    for any qualifier_type to overrun its limits. You'll see a
    unique index error if this happens.)
    Solution: See rolesweb.mit.edu/sys_admin_tasks.html for this task and others
  • EXTRACT step failed because the format of the data in the Warehouse
    has changed without corresponding changes in the Roles data feed
    Solution: You might need to alter the data feed program in the
    Roles DB, in the bin/roles_feed directory.

Adjusting max-actions per day in the ROLES_PARAMETERS table

From rolesweb.mit.edu, click on "System Administrator tools", then click on Update Roles DB parameters for data feeds and other processes.

  • The ROLES_PARAMETERS table
    This table contains records that define the max number of allowable changes
    for any of the incoming data feed programs. You can use the web UI to
    temporarily increase the number if it appears that the incoming data is
    valid
  • There is a web interface for maintaining data in the ROLES_PARAMETERS table
    This can be found by starting at rolesweb.mit.edu, clicking
    "System administrators tasks", and clicking
    "Update Roles DB parameters for data feeds and other processes".
  • Pick the parameter you want to change, e.g., "MAX_PERSON", and click
    "Change Parameter Value", and follow instructions.
  • There are two sets of parameter values for each parameter, the
    so-called "parameter value" and the "default value". If you change
    a "parameter value", it will be reset to its default in the evening after
    the next day's data feeds. An authorization for Roles DB function
    "MAINTAIN PARAMETER VALUES" is required for changing just the current
    value for a parameter. An authorization for Roles DB function
    "MAINTAIN ALL PARAMETER DATA" is required for changing default values
    and other fields.

Other Roles system administrators' documentation

See rolesweb.mit.edu/sys_admin_tasks.html for documentation on various system administrators tasks.