IS&T, as far as I know, does not have an across the board standard,

with the possible exception of letting TSM backup everything on a server.

ASST (Server Ops) manages the TSM backup service.  The automatic

system they set up on all of their servers backs up all files on a

server that have changed nightly, including marking when files are

deleted.  They have a staged system, where things are stored in a

'ready reserve' RAID array, a slower file store system, and then a

tape archive.  Things get transfered to tape after about (I believe) a

year, and the tapes are kept for 7 years, since that is the period

required for business records.  If we are going to use their service

for our archives, then the archives will be kept for 7 years by

default.  I am not certain if they will hold on to the tapes for less

time, since there is work to separate the files on a tape for the

files from other systems that are on the same tape, and the process is

argely automatic.  The Enterprise solution, which will store up to

100 Tbytes, costs $65 a month.

As for what Stellar does, I have asked Craig, and this is his response:Quick answer: layers of sometimes complex stuff.  You can check with ops on the details of what they handle.Database: live replication to standby databases, GQL can explain
Files: Stelar stores them twice on separate partitions just in case,

and a script copies the new files to the backup server.That's the main approach, trying to be sure nothing gets lost and

recovery from disaster doesn't require reloading gigabytes.But, ops also runs TSM on the machines backing up the files, and the

oracle database logs in a way which in principle allows recovery.

Check with them on the details of scheduling etc.

As for what we are currently doing, we are building an archive file of

all of the files in the Alfresco repository weekly, with changed files

put into an incremental archive file daily, and those plus a

transactional dump of the database gets stored onto our archive server

daily.  The next day, all of those files get stored into TSM for long

term storage.  They are currently held on the local machines for 2

weeks, and on our archive server for 6 months, but I will soon be

deploying a new backup script that, instead of building an archive

file, builds a copy of the directory structure on the archive server,

and eliminating the local stores of archive, and decreasing the

storage time on our storage server to 4 months.  Everything would

still be stored into TSM the next day.  Also planning on switching

over to keeping a 'living' backup on the recovery servers to make the

mean time from the emergency outage phone call to being back online

faster.  Unless we decide that something different should be done,

that is for the next iteration of updates to the Thalia service readiness configuration.Technically, all files on the Thalia repository servers are being

stored into TSM on the day they are created/changed, and their

deletions are marked.  However, due to the need of Alfresco to

synchronize the store of the state of the database with the state of

the file store, these file stores are not reliable, and may not work

if used to restore a Thalia repository.  If we could synchronize the

database backup with the TSM backup, then we might be able to avoid

most of this mess all together, but we, at this point in time, do not have that ability.We'd like to keep them for as long as is prudent, from a systems

administration standpoint, and to cover any business needs.Well, as it stands now, we are able to keep the archives on our

servers for 4 to 6 months, and in TSM for up to 7 years, without

changing the technology we are using.  Storing archives for less time

merely saves disk/tape space, which is already purchased or provided for.

The only downside, which I forgot to include previously, is that the archives are being stored on one of our systems, not one provided to us by server ops.  Wilson had previously stated that this was not acceptable as a long term strategy, because we do not have access to colo 24/7, but we can get a server ops technician over a weekend for a leased system.  Also, the system in question, al-dente.mit.edu, only has a 2 Terabyte RAID array in it.  While this means that 2 drives would need to fail before we loose data, it also means that this setup will not see us through then end of 2009 if we continue to perform complete backups on a weekly basis, and only archive changes during the week (and our current projections of storage usage is accurate).  Given that the amount of storage space provided to us by Server Ops on the Alfresco repositories is also below the 664 Gbytes projections, some plans will need to be made in the not distant future.

  • No labels