This document outlines the issues involved in dealing with concurrent database updates in our web apps - the situation where more than one user tries to update the same piece of data at the same time. This is an ongoing discussion, and the document is an attempt to gather thoughts and ideas together.
Example Situation
To set the scene, here's a typical situation in our web applications:
- User A pulls up data in their browser. Goes to lunch.
- User B pulls up same data, makes a change and saves. Changes are committed to the DB.
- User A comes back from lunch, changes the data on their screen and saves. Changes are committed, overwriting User B's changes.
Obviously this isn't good; when User A finally submits changes, we need the app to detect that someone else has changed the data, and to handle the situation appropriately. Two aspects to doing this are:
- Detection - how to detect a concurrent update
- Handling - what to do if we detect a concurrent update
We'll take these in reverse order:
Handling a Concurrent Update Situation
There are three approaches to handling a concurrent update situation:
- Last commit wins. This is the situation described above, where User A wipes out User B's changes. This is what we're doing most of the time in our apps, and it does not involve any detection of a concurrent update.
- First commit wins. In this scenario, User A would come back from lunch, attempt to save their changes, and see an error message "Someone else has changed the data. Your changes have not been saved". User A's changes would not be saved to the DB. This does require detection of a concurrent update.
- Merge. Here, User A would be presented with a message telling them that someone else had changed the data, and would be provided with a UI to give them a way of merging their changes with User B's changes. This does require detection of a concurrent update.
Unless there's a clear business requirement for the merge option, I think we need to take the second approach, "first commit wins". In order to pursue this approach, we do need to have a mechanism for detecting concurrent updates.
Detection
Hibernate does offer a facility for detecting concurrent updates be providing a version attribute. A database column (integer or timestamp) can be designated as a version field - for new tables we typically use the integer type.
However, the way most of our web apps are written, this offers only narrow protection. Generally, when a user updates data in a web app, there are two steps:
1. User requests the data; we read the data from the database and present it on the screen.
2. User changes data on the screen and submits.
There can be a significant period of time between these two steps.
In step 2 (posting of data to be updated), we typically do this in the same HTTP request:
a) re-read the data from the database
b) apply the changes from the web form to this data
c) save the modified data & commit
Because we re-read the data, the version number check offered by Hibernate only will protect us from concurrent updates taking place during this request. As the request is typically very fast (<1s), this is unlikely to happen.
What we want to do is to detect any changes in the database that occurred between the data-retrieval step (step 1) and the data-posting step (step 2). For this to happen, step 2 needs to know what the version field was in step 1. So far I've read about three techniques for doing this:
1. The web app stores the version number in the web page as a hidden field in step 1. The form submission in step 2 then submits this version number along with the rest of the data, and then in 2b ("apply the changes from the web form to this data") the version number from the form is copied into the domain object. When the data is comitted, Hibernate should then pick up the version mismatch.
2. At the end of step 1, the domain object is stored in the user's HttpSession - it becomes a "detached object" once the Hibernate Session is closed at the end of the request. Then step 2, instead of re-reading the data from the database, would get the object from the HttpSession and attach it to the newly created Hibernate Session. In this way, the object would contain the original version number, and Hibernate would catch version mismatch on commit.
3. At the end of step 1, the Hibernate Session is stored in the user's HttpSession. The Hibernate Session remains active for the entire "conversation" - this technique is described as "session-per-conversation". Step 2 would retrieve the Hibernate Session from the user's HttpSession and all domain objects associated with the Hibernate Session in step 1 would be available to step 2. In this approach, it would be essential for the end of step 1 to disconnect the JDBC connection; and for the beginning of step 2 to establish a new JDBC connection.