Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

IPS team members should consult this playbook every time they participate in issue-resolution activity on behalf of ISDA.

Issue Definition

  1. Issues should be emailed directly to RT via the Our primary point of contact for support issues is the isda-ops mailing list. Everyone on isda-ops@mit.edu . Everyone in the team should be monitoring this mailing list and the ISDA::Admin RT queue .  Issues might also come in by phone call or other email.  on help.mit.edu (https://help.mit.edu/Search/Results.html?Order=DESC&Query=(%20Queue%20%3D%20'ISDA%3A%3AAdmin'%20)%20and%20(%20Status%20%3D%20'new'%20or%20Status%20%3D%20'open'%20or%20Status%20%3D%20'stalled'%20)&Rows=50&OrderBy=id&Page=1&Format=%0A%20%20%20'%3Cb%3E%3Ca%20href%3D%22%2FTicket%2FDisplay.html%3Fid%3D__id__%22%3E__id__%3C%2Fa%3E%3C%2Fb%3E%2FTITLE%3A%23'%2C%0A%20%20%20'%3Cb%3E%3Ca%20href%3D%22%2FTicket%2FDisplay.html%3Fid%3D__id__%22%3E__Subject__%3C%2Fa%3E%3C%2Fb%3E%2FTITLE%3ASubject'%2C%0A%20%20%20Status%2C%0A%20%20%20QueueName%2C%20%0A%20%20%20OwnerName%2C%20%0A%20%20%20Priority%2C%20%0A%20%20%20'__NEWLINE__'%2C%0A%20%20%20''%2C%20%0A%20%20%20'%3Csmall%3E__Requestors__%3C%2Fsmall%3E'%2C%0A%20%20%20'%3Csmall%3E__CreatedRelative__%3C%2Fsmall%3E'%2C%0A%20%20%20'%3Csmall%3E__ToldRelative__%3C%2Fsmall%3E'%2C%0A%20%20%20'%3Csmall%3E__LastUpdatedRelative__%3C%2Fsmall%3E'%2C%0A%20%20%20'%3Csmall%3E__TimeLeft__%3C%2Fsmall%3E').
    1. Email representing immediate operational issues should be forwarded to RT (via the isda-admin-rt list), while email representing bug reports should be filed in Jira.
    2. If someone makes a support request
    3. If someone contacts IPS through another channel, we email the issue to the Request Tracker list ourselves, isda-ops (and file it appropriately in RT or Jira) to make sure the rest of the team is notified.
    4. Automated responses from monitoring software come through to isda-ops@mit.edu, where they can be triaged by ISDA Ops and emailed monitored and forwarded to the RT queue if necessary.
  2. Team All team members in an Operations role who are on the floor must meet to discuss the issue and choose who will . A person is selected to lead the issue- resolution cycle. This is the Tech Lead.
    1. If possible, they must include non-operational staff in the discussion who are assigned to, or conversant with, the system in question.
    1. The Tech Lead is not necessarily operations personnel. This assignment is up to staff available at the time of the issue report.
  1. The Tech Lead responds to the initiators of the message, alerting those parties that resolution is underway.
  2. The Tech Lead must define the type of issue and then proceed accordingly. Be sure to flag file the issue in Request Tracker as the type that you have determined: appropriately (Jira or RT) depending on type:
    1. Urgent
    2. Emergency Response: A system, whether that is a whole server or a particular application, is unresponsive.
    3. Bug Report: An application is not doing the right thing in some particular case, but it is not generally broken; a system is not down.
    4. For N.B.: for Operations staff, both kinds of issues are priority #1. It is acceptable for all other responsibilities to be on hold until resolution (system down) or handoff (bug report).

...

Urgent Response

...

(Resolution)

  1. For an unresponsive component, either the Server Operations team or an automated process should have attempted to restart the component.
    1. Check Someone familiar with the system must check to see if restart procedures occurred and if that temporarily resolved the problem.
    2. If not, Tech Lead or designee restarts component manually, determines if this resolves issue or if a more persistent problem exists.
  2. Notification: preliminary problem description (and resolution, if applicable) sent to Recipient List:
    initiator of problem ticket, zips@mit.edu, isda-leaders@mit.edu, isda-integrators@mit.edu, isda-ops@mit.edu, and if any end-user applications could have been affected, computing-help@mit.edu
  3. In conjunction with managers currently present, Tech Lead forms Team to troubleshoot issue.
    1. It is ZIPS expectation that Emergency Response takes precedence over other project work.
  4. SCRUM: No resolution work should proceed until SCRUM is performed with available resources to discuss process and possible resolutions.
    1. Tech Lead is project manager for duration of issue resolution. Tech Lead is final arbiter for delegation of tasks, priorities, and timing.
  5. Notification: If resolution is lengthy, Tech Lead will update Recipient List at least once per day of status of resolution.
  6. Post Mortem: Tech Lead reviews response with IPS team lead. If emergency response offers the opportunity for improvement of process, Tech Lead calls a post-mortem with parties who participated in the resolution.

...