...
- Issues should be emailed directly to RT via the isda-support@mitops@mit.edu. Everyone in the team should be monitoring the ISDA::customer-support Admin queue. Issues might also come in by phone call or other email.
- If someone contacts IPS through another channel, we email the issue to the Request Tracker list ourselves, to make sure the rest of the team is notified.
- Automated responses from monitoring software come through to isda-ops@mit.edu, where they can be triaged by ISDA Ops and emailed to the RT queue if necessary.
- Team members in an Operations role who are on the floor discuss the issue and choose who will lead the issue-resolution cycle. This is the Tech Lead.
- The Tech Lead responds to the initiators of the message, alerting those parties that resolution is underway.
- The Tech Lead must define the type of issue and then proceed accordingly. Be sure to flag the issue in Request Tracker as the type that you have determined:
- Emergency Response: A system, whether that is a whole server or a particular application, is unresponsive.
- Bug Report: An application is not doing the right thing in some particular case, but it is not generally broken; a system is not down.
- For Operations staff, both kinds of issues are priority #1. It is acceptable for all other responsibilities to be on hold until resolution (system down) or handoff (bug report).
...
- For an unresponsive component, either the Server Operations team or an automated process should have attempted to restart the component.
- Check to see if restart procedures occurred and if that temporarily resolved the problem.
- If not, Tech Lead restarts component manually, determines if this resolves issue or if a more persistent problem exists.
- Notification: preliminary problem description (and resolution, if applicable) sent to Recipient List:
initiator of problem ticket, zips@mit.edu, isda-leaders@mit.edu, isda-integrators@mit.edu, isda-ops@mit.edu, and if any end-user applications could have been affected, computing-help@mit.edu - In conjunction with managers currently present, Tech Lead forms Team to troubleshoot issue.
- It is ZIPS expectation that Emergency Response takes precedence over other project work.
- SCRUM: No resolution work should proceed until SCRUM is performed with available resources to discuss process and possible resolutions.
- Tech Lead is project manager for duration of issue resolution. Tech Lead is final arbiter for delegation of tasks, priorities, and timing.
- Notification: If resolution is lengthy, Tech Lead will update Recipient List at least once per day of status of resolution.
- Post Mortem: Tech Lead reviews response with IPS team lead. If emergency response offers the opportunity for improvement of process, Tech Lead calls a post-mortem with parties who participated in the resolution.
...