Last modified 6/04/2010
When maintaining or debugging data feed programs, connect to either roles.mit.edu or roles-test.mit.edu as user rolesdb.
On the production Roles server (roles.mit.edu, aka cloverleaf) and the test Roles server (roles-test.mit.edu, aka parsley), there are several sets of data feed jobs that are automatically run each day. The overall schedule can be found in the crontab file:
If you ever need to change the crontab file, do the following:
You can display the current crontab entry with the command crontab -l
Each crontab file runs several shell scripts that, in turn, run individual programs. These high-level shell scripts include:
morning_jobs_early |
- Extracts most data from the Warehouse, including most types of Qualifiers, and does some processing for the Master Department Hierarchy |
morning_jobs_late |
- Runs some steps that depend on PERSON data from the Warehouse including (a) loading the PERSON table from krb_person@warehouse, (b) loading EHS-related Room Set data from the Warehouse into RSET qualifiers, (c) processing externally-derived authorizations (table EXTERNAL_AUTH). |
cron_run_exception_notify |
- Generates Email about Authorizations that still exist for people with deactivated Kerberos usernames |
weekend_jobs |
- Runs a procedure to run an Oracle ANALYZE on all the tables in the Role DB. |
hourly_jobs |
- Currently only runs once a day, not hourly. Updates derived database tables ("shadow" tables) for the Master Department Hierarchy |
cron_roles_cleanup_archive |
- Cleans up some old files in the archive directory |
cron_run_sapfeed |
- Runs a job to create several daily files about SAP-related authorizations and uses the scp command to copy them to a directory on one of the SAP servers where they are picked up, further processed, and loaded into the appropriate SAP objects. The files built and sent are incremental, including authorization information only for those people whose "expanded" authorizations (after including all child qualifiers) have changed since the previous run. |
cron_pdorg_prog.sh |
- Compares SAP-related approver authorizations in the Roles DB with parallel information in the pd org structures in SAP, and generates a file indicating differences. |
cron_pddiff_feed.sh |
- Sends the file pdorg_roles.compare, generated from cron_pdorg_prog.sh, to the SAP dropbox |
cron_ehs_extract run_ehs_role_prog.pl |
- Runs ~rolesdb/bin/ehs/run_ehs_role_prog.pl to compare DLC-level EHS roles (e.g., DEPARTMENTAL EHS COORDINATOR) to their equivalent Authorizations in the Roles DB (which is the system of record), generates a differences file, and sends it to the SAP dropbox so the changes can be applied. |
Directory |
Description |
---|---|
~rolesdb/archive |
Some compressed historical files from previous days' data feed runs |
~rolesdb/bin |
Generic data feed perl scripts and other program files |
~rolesdb/bin/ehs |
EHS-related data feed programs |
~rolesdb/bin/extract |
Programs related to out-going data for DACCA, LDS (SAP component being phased out), etc. |
~rolesdb/bin/pdorg |
Programs related to out-going data for updating PD Org entries in SAP related to APPROVER authorizations |
~rolesdb/bin/repa_feed |
Temporary or test versions of programs |
~rolesdb/bin/roles_feed |
Most data feed programs for data coming into the Roles DB |
~rolesdb/data |
Data files used by data feed programs. Most data files are temporary, but some, such as roles_person_extra.dat, are permanent. |
~rolesdb/doc |
Miscellaneous notes and documentation |
~rolesdb/extract |
Empty |
~rolesdb/lib |
A few generic perl modules, and some config files |
~rolesdb/log |
Most recent log files from data feed programs |
~rolesdb/misc |
Miscellaneous notes and working files |
~rolesdb/sap_feed |
Obsolete versions of Roles->SAP data feed programs |
~rolesdb/sql |
SQL source files for creating tables, views, and stored procedures. Files for creating tables (new_schema*.sql) are preserved for documentation purposes and should NOT be rerun -- tables should never be dropped and recreated since we do not want to lose the data. Files for creating stored procedures and views can be modified and rerun. |
~rolesdb/sql/frequently_run_scripts |
Special SQL scripts that are run periodically, e.g., to analyze tables |
~rolesdb/sap_feed |
Obsolete versions of Roles->SAP data feed programs |
~rolesdb/xsap_feed/bin |
Programs for Roles->SAP data feed programs |
~rolesdb/xsap_feed/config |
Config files for Roles->SAP data feed programs |
~rolesdb/xsap_feed/data |
Nightly data for Roles->SAP data feed programs |
Most data feed programs are perl modules for maintaining one type of data in Roles DB tables, such as people in the PERSON table or one type of Qualifier (e.g., Funds/Funds Centers) in the QUALIFIER table. Each perl module has a separate subroutine for the Extract, Prepare, and Load step.
The steps do the following:
Extract |
Extract a full set of data for a particular type of object from the external source, generally the Data Warehouse. The data are written to a flat file in the ~/data directory, generally a file with a name ending in ".warehouse". The "warehouse" suffix is used even if the source of the data is something other than the Warehouse. |
---|---|
Prepare |
(a) Select a full set of parallel data for a particular type of object from Roles DB tables into a flat file in the ~/data directory with a name ending in ".roles". |
Load |
Apply the actions from the previous step's "*.actions" table to actually update the data in the Roles DB tables |
Normally, the 3 steps of most data feed processes are run automatically by shell scripts (~/cronjobs/early_morning_jobs, ~/cronjobs/late_morning_jobs, ~/cronjobs/evening_jobs, etc.)
However, when debugging or correcting a problem, it is possible to run any or all steps manually, and check the results after each step. There are two ways to do this:
It is often useful to follow this sequence
The most common problem with nightly feeds is that there may be too many changes since the previous night, exceeding the max-records setting for the particular object type. We will describe techniques for dealing with this below.
Example 1:
To run the steps of the EHS PIs, Room Sets, and Rooms feed program on the production Roles database, do the following - Connect to roles.mit.edu as user rolesdb
Example 2:
To run the steps of the person feed program on the production Roles database, do the following - Connect to roles.mit.edu as user rolesdb
There are three different ways that notification Email get generated by the Roles DB nightly feed programs.
If an Authorization exists where the person (Kerberos_name column) represents a username that is no longer active (no longer included in the KRB_PERSON table from the Warehouse), then notification Email will be sent.
The people who should be notified are specified by setting up authorizations in the Roles Database where the function "NOTIFICATION - INACTIVE USERS" and the qualifier represents the function_category (application area) for which this person should receive notification.
The program ~/bin/roles_feed/exception_mail.pl finds all categories (column function_category in the authorization table) for which authorizations exist for deactivated Kerberos_names. The program then sends Email to the appropriate recipients (based on NOTIFICATION - INACTIVE USERS authorizations), one piece of Email per recipient for each function_category where there are one or more authorizations for inactive Kerberos_names. The Email lists the inactive usernames and the number of authorizations in the category that should be deleted or reassigned.
2. Errors detected by various data feed programs
Some errors are detected by various data feed programs, usually in the LOAD step, that result in Email being sent to a list of Email addresses stored in the file ~/lib/roles_notify. Currently, the Email addresses included are (the list) warehouse@mit.edu and repa@mit.edu.
3. Full log file send from various data feed programs
The full log from LOAD step of most data feed programs, and the full log from other data feed programs that do not have a separate LOAD step are sent to one or more Email recipients. Within the cronjobs directory, the scripts morning_jobs_early, morning_jobs_late, evening_jobs, and weekend_jobs include steps that send out this Email. Currently repa@mit.edu is the only recipient.
These log files do not need to be examined every day. It is useful to periodically examine them for warning messages. It is also useful to have them available if a problem is detected. However, usually the Email sent out for detected error messages is sufficient for detecting problems.
Solution: Examine the data changes (see the section "Running Extract
Prepare, and Load steps by hand"). Determine if the
changes are legitimate due to source data or system problems.
If the changes are legitimate, increase the appropriate
value in the ROLES_PARAMETERS table (see description
in the "Adjusting max-action per day in the ROLES_PARAMETERS
table" section below in this document), and either rerun
the 3 data feed steps by hand, or wait for tommorow
morning's cronjob.
o EXTRACT step failed due to a network or server problem
and there was no data for the PREPARE and LOAD steps
From rolesweb.mit.edu, click on "System Administrator tools", then click on Update Roles DB parameters for data feeds and other processes.
See rolesweb.mit.edu/sys_admin_tasks.html for documentation on various system administrators tasks.