Many of our projects utilize the Alfresco repository for data storage and retreival, so it behooves us to maintain Alfresco repository instances which are as robust as possible.  Here is our plan for doing that.

First, a very basic discussion of the building blocks of Alfresco: there's the actual Java application which answers requests for content, a filesystem which the application uses for storing its bits, and a database which the application uses to store content metadata.  (Alfresco also sells a Web Content Management application which uses the basic Alfresco repository; this additional application is not considered here.)

The application will run on multiple servers, with load-balancing and failover managed by an F5 BIG-IP.  (Locking is arranged through the database, so race conditions between application instances aren't a concern.)

The filesystem will be a shared network filesystem used by all instances of the application.  We tentatively plan to use the Acopia network storage switch to provide transparent replication and failover for this filesystem, which will look to the Alfresco application servers like a normal NFS filesystem.

The database will be an Oracle 10g installation.  Oracle provides mechanisms for providing a database which spans multiple servers, thus providing redundancy and failover.  This mechanism (http://www.oracle.com/technology/deploy/availability/htdocs/HA_Overview.htm) is moderately opaque to the casual reader and probably explains why a good Oracle DBA is in such high demand.

Single points of failure under this plan:

  • fundamental infrastructure (the MIT power grid, MITnet)
  • the F5 BIG-IP (though redundancy is possible with multiple instances, this may be beyond scope)
  • the Acopia storage switch (again, though redundancy is probably possible with multiple instances, this may be beyond scope)
  • No labels