Monitoring the Thalia production environment with nagios

With check_http:

To see if the entire stack including the F5 is functional:

fetch http://hst.thalia.mit.edu/libraries
assertion: HTTP OK and result contains "<library-list>"

To see if the -5 IME is functional:

fetch http://isda-thalia5.mit.edu/libraries
assertion: HTTP OK and result contains "Unknown Domain"

To see if the -8 IME is functional:

fetch http://isda-thalia8.mit.edu/libraries
assertion: HTTP OK and result contains "Unknown Domain"

To see if the Alfresco application is running:

fetch http://isda-thalia6.mit.edu:8080/alfresco/faces/jsp/dashboards/container.jsp
assertion: HTTP OK and result contains "My Tasks To Do"

With check_db:

To see if mysql is happy, point it at isda-thalia7.mit.edu (they'll need a password); it's got some builtin basic database sanity check.

Notes:

We could also do things like check basic apache health or whatever, but as far as I know the app is either working or it's not, and if it's not we really don't care (for operational purposes) whether apache is up or not, it's time to restart the entire apache/tomcat/app stack and pray.

  • No labels