Concurrent Database Updates

${renderedContent}

Quick Links to:

About this Page

This document outlines the issues involved in dealing with concurrent database updates in our web apps - the situation where more than one user tries to update the same piece of data at the same time. See "Chosen Solution" below for... our chosen solution.

Example Situation

To set the scene, here's a typical situation in our web applications:

David pulls up data in his browser. Goes to lunch.
Felicia pulls up same data, makes a change and saves. Felicia's changes are committed to the DB.
David comes back from lunch, changes the data on his screen and saves. David's changes are committed, overwriting Felicia's changes.

Obviously this isn't good; when David finally submits changes, we need the app to detect that someone else has changed the data, and to handle the situation appropriately. Two aspects to doing this are:

Detection - how to detect a concurrent update
Handling - what to do if we detect a concurrent update

We'll take these in reverse order:

Handling a Concurrent Update Situation

There are three approaches to handling a concurrent update situation:

Last commit wins. This is the situation described above, where David wipes out Felicia's changes. This is what we're doing most of the time in our apps, and it does not involve any detection of a concurrent update.
First commit wins. In this scenario, David would come back from lunch, attempt to save his changes, and see an error message "Someone else has changed the data. Your changes have not been saved". David's changes would not be saved to the DB. This does require detection of a concurrent update.
Merge. Here, David would be presented with a message telling him that someone else had changed the data, and would be provided with a UI to give him a way of merging his changes with Felicia's changes. This does require detection of a concurrent update.

Unless there's a clear business requirement for the merge option, I think we need to take the second approach, "first commit wins". In order to pursue this approach, we do need to have a mechanism for detecting concurrent updates. Which leads to...

Detection

Hibernate does offer a facility for detecting concurrent updates by providing a version attribute. A database column (usually integer or timestamp) can be designated as a version field - for new tables we typically use the integer type.

However, the way most of our web apps are written, this offers only limited protection. Generally, when a user updates data in a web app, there are two steps:

User requests the data; we read the data from the database and present it on the screen.
User changes data on the screen and submits.

There can be a significant period of time between these two steps.

In step 2 (posting of data to be updated), we typically do this in the same HTTP request:

re-read the data from the database
apply the changes from the web form to this data
save the modified data & commit

Because we re-read the data, the version number check offered by Hibernate only will protect us from concurrent updates taking place during this request. As the request is typically very fast (<1s), this is unlikely to happen.

What we want to do is to detect any changes in the database that occurred between the data-retrieval step (step 1) and the data-posting step (step 2) (i.e. while David is at lunch). For this to happen, step 2 needs to know what the version field was in step 1. So far I've read about three techniques for doing this:

The web app stores the version number in the web page as a hidden field in step 1. The form submission in step 2 then submits this version number along with the rest of the data, and then in 2b ("apply the changes from the web form to this data") the version number from the form is copied into the domain object. When the data is comitted, Hibernate should then pick up the version mismatch.
At the end of step 1, the domain object is stored in the user's HttpSession - it becomes a "detached object" once the Hibernate Session is closed at the end of the request. Then step 2, instead of re-reading the data from the database, would get the object from the HttpSession and attach it to the newly created Hibernate Session. In this way, the object would contain the original version number, and Hibernate would catch version mismatch on commit.
At the end of step 1, the Hibernate Session is stored in the user's HttpSession. The Hibernate Session remains active for the entire "conversation" - this technique is described as "session-per-conversation". Step 2 would retrieve the Hibernate Session from the user's HttpSession and all domain objects associated with the Hibernate Session in step 1 would be available to step 2. In this approach, it would be essential for the end of step 1 to disconnect the JDBC connection; and for the beginning of step 2 to establish a new JDBC connection.

Chosen Solution

For Education Systems web apps we have decided to use detection option 1 - passing the version number to the JSP in the GET request, and moving the version number into the retrieved Hibernate entity during the POST request. Hibernate itself will then take care of checking version number on update. If a concurrency problem is encountered, Spring should throw a org.springframework.orm.hibernate3.HibernateOptimisticLockingFailureException. This exception should be intercepted by the web app and an appropriate UI action should take place (for example, forwarding to an error page, or displaying a message on the current page).

There is a simple web app demonstrating this scheme. It uses an in-memory H2 database so should be self-contained :

svn+ssh://svn.mit.edu/csf-playground/web-concurrency

Also there is a non-web project that uses a unit test [TestConcurrency.testConcurrentUpdate()] to illustrate the concurrent update situation:

svn+ssh://svn.mit.edu/csf-playground/hibernate-concurrency

NOTE: if the web page displays multiple objects that can be updated, the version numbers of all the objects must be included in the JSP, and must all be applied to the relevant entity objects when the form data is posted.

IMPORTANT: Load vs Get

There is a difference between Hibernate's load and get methods for retrieving data that could be important in checking for concurrent updates:

load does not issue a SQL SELECT statement to retrieve data. It simply creates a proxy object that follows the structure of the relevant domain class.
get issues an immediate SQL SELECT statement, creating a new domain class containing the data read from the database.

The reason this is important for concurrent update checking is that a proxy object generated by a load will not contain the version information we need.

It probably makes sense then to always use a hibernate get for retrieval.

This outline shows the issue - two sessions working with an object with the same identifier. Initial version for object in DB is 10.

    Session 1                  Session 2

    ---------                  ---------

    Load object

    Wait a while..

                               Load object

                               Modify object property

                               [triggers db 'select' -

                                version read as 10]

                               Commit

                               [triggers db update,

                                version modified to 11]

    Modify object property

      [triggers db 'select' -

      version read as 11]

    Commit

      [triggers db update,

      version modified to 12]

We actually want session 1's commit to fail with an optimistic lock exception, but it will succeed here.

Using "get" instead of "load" works around the problem, because get will immediately issue a select, and the version numbers will be loaded at the correct times for the optimistic lock checking.