Procedure to restore or restart Thalia or Alfresco services

When system is pingable, and you are able to log in as root, but Nagios reports that Thalia, Alfresco, or MySQL services are down, or reports from customers about Thalia being down, restart the Thalia cluster:

  1. Identify the Thalia IME servers in the current cluster and stop the Thalia service. As of 24 July 2007, this is isda-thalia5 and isda-thalia8. If there are none, proceed to step 2.
    1. Login to each system as root and stop the Thalia services with the following command:
      /etc/init.d/web stop
    2. Wait for initialization script to report that services are stopped.
    3. Repeat on all Thalia-IME servers in this cluster.
  2. Identify the Alfresco Servers in the current cluster and stop the Alfresco service. As of 24 July 2007, this is isda-thalia6.
    1. Login to each system as root and stop the Alfresco services with the following command:
      /etc/init.d/web stop
    2. Wait for initialization script to report that services are stopped.
    3. Repeat on all Alfresco servers in this cluster.
  3. Identify the MySQL Servers in the current cluster and stop the MySQL service. As of 24 July 2007, this is isda-thalia7.
    1. Login to each system as root and stop MySQL with the following command:
      /etc/init.d/mysql stop
    2. Wait for initialization script to report that services are stopped.
    3. Repeat on all MySQL servers in this cluster.
  4. Start the MySQL servers.
    1. Login to each system as root and start MySQL with the following command:
      /etc/init.d/mysql start
    2. Wait for initialization script to report that services are stopped.
    3. Check that MySQL has restarted successfully with the Nagios test plugin for this server. If this step fails, escalate to ISDA.
    4. Repeat on all MySQL servers in this cluster.
  5. Start the Alfresco Servers in the current cluster.
    1. Login to each system as root and start Alfresco with the following command:
      /etc/init.d/web start
    2. Wait for initialization script to report that services are started.
    3. Check that Alfresco has restarted successfully with the Nagios test plugin for this server. If this step fails, escalate to ISDA.
    4. Repeat on all Alfresco servers in this cluster.
  6. Start the Thalia IME Servers in the current cluster.
    1. Login to each system as root and start Thalia IME with the following command:
      /etc/init.d/web start
    2. Wait for initialization script to report that services are started.
    3. Check that Thalia has restarted successfully with the Nagios test plugin for this server. If this step fails, escalate to ISDA.
    4. Repeat on all Thalia servers in this cluster.
  7. Check the Nagios reports for this cluster. If this Nagios continues to report errors for this cluster, escalate to ISDA.

If system is unpingable, or you are unable to login as root, diagnose system, network, or hardware as a normal down or crashed system.

Escalation to ISDA by calling the following people, in order:

Hunter Heinlen
Catherine Iannuzzo
Andrew Boardman

  • No labels