IPS team members should consult this playbook every time they participate in issue-resolution activity on behalf of ISDA.
Issue Definition
- Issues should be emailed directly to RT via the Our primary point of contact for support issues is the isda-ops mailing list. Everyone on isda-ops@mit.edu . Everyone in the team should be monitoring this mailing list and the ISDA::Admin RT queue . Issues might also come in by phone call or other email. on help.mit.edu (https://help.mit.edu/Search/Results.html?Order=DESC&Query=(%20Queue%20%3D%20'ISDA%3A%3AAdmin'%20)%20and%20(%20Status%20%3D%20'new'%20or%20Status%20%3D%20'open'%20or%20Status%20%3D%20'stalled'%20)&Rows=50&OrderBy=id&Page=1&Format=%0A%20%20%20'%3Cb%3E%3Ca%20href%3D%22%2FTicket%2FDisplay.html%3Fid%3D__id__%22%3E__id__%3C%2Fa%3E%3C%2Fb%3E%2FTITLE%3A%23'%2C%0A%20%20%20'%3Cb%3E%3Ca%20href%3D%22%2FTicket%2FDisplay.html%3Fid%3D__id__%22%3E__Subject__%3C%2Fa%3E%3C%2Fb%3E%2FTITLE%3ASubject'%2C%0A%20%20%20Status%2C%0A%20%20%20QueueName%2C%20%0A%20%20%20OwnerName%2C%20%0A%20%20%20Priority%2C%20%0A%20%20%20'__NEWLINE__'%2C%0A%20%20%20''%2C%20%0A%20%20%20'%3Csmall%3E__Requestors__%3C%2Fsmall%3E'%2C%0A%20%20%20'%3Csmall%3E__CreatedRelative__%3C%2Fsmall%3E'%2C%0A%20%20%20'%3Csmall%3E__ToldRelative__%3C%2Fsmall%3E'%2C%0A%20%20%20'%3Csmall%3E__LastUpdatedRelative__%3C%2Fsmall%3E'%2C%0A%20%20%20'%3Csmall%3E__TimeLeft__%3C%2Fsmall%3E').
- Email representing immediate operational issues should be forwarded to RT (via the isda-admin-rt list), while email representing bug reports should be filed in Jira.
- If someone makes a support request
- If someone contacts IPS through another channel, we email the issue to the Request Tracker list ourselves, isda-ops (and file it appropriately in RT or Jira) to make sure the rest of the team is notified.
- Automated responses from monitoring software come through to isda-ops@mit.edu, where they can be triaged by ISDA Ops and emailed monitored and forwarded to the RT queue if necessary.
- Team All team members in an Operations role who are on the floor must meet to discuss the issue and choose who will . A person is selected to lead the issue- resolution cycle. This is the Tech Lead.
- If possible, they must include non-operational staff in the discussion who are assigned to, or conversant with, the system in question.
- The Tech Lead is not necessarily operations personnel. This assignment is up to staff available at the time of the issue report.
- The Tech Lead responds to the initiators of the message, alerting those parties that resolution is underway.
- The Tech Lead must define the type of issue and then proceed accordingly. Be sure to flag file the issue in Request Tracker as the type that you have determined: appropriately (Jira or RT) depending on type:
- Urgent
- Emergency Response: A system, whether that is a whole server or a particular application, is unresponsive.
- Bug Report: An application is not doing the right thing in some particular case, but it is not generally broken; a system is not down.
- For N.B.: for Operations staff, both kinds of issues are priority #1. It is acceptable for all other responsibilities to be on hold until resolution (system down) or handoff (bug report).
...
Urgent Response
...
(Resolution)
- For an unresponsive component, either the Server Operations team or an automated process should have attempted to restart the component.
- Check Someone familiar with the system must check to see if restart procedures occurred and if that temporarily resolved the problem.
- If not, Tech Lead or designee restarts component manually, determines if this resolves issue or if a more persistent problem exists.
- Notification: preliminary problem description (and resolution, if applicable) sent to Recipient List:
initiator of problem ticket, zips@mit.edu, isda-leaders@mit.edu, isda-integrators@mit.edu, isda-ops@mit.edu, and if any end-user applications could have been affected, computing-help@mit.edu - In conjunction with managers currently present, Tech Lead forms Team to troubleshoot issue.
- It is ZIPS expectation that Emergency Response takes precedence over other project work.
- SCRUM: No resolution work should proceed until SCRUM is performed with available resources to discuss process and possible resolutions.
- Tech Lead is project manager for duration of issue resolution. Tech Lead is final arbiter for delegation of tasks, priorities, and timing.
- Notification: If resolution is lengthy, Tech Lead will update Recipient List at least once per day of status of resolution.
- Post Mortem: Tech Lead reviews response with IPS team lead. If emergency response offers the opportunity for improvement of process, Tech Lead calls a post-mortem with parties who participated in the resolution.
...