Overview of MIT Modifications to Alfresco

This is an overview of the work that we've done extending the Alfresco codebase. These modifications were intended, in general, to improve the WCM portion of the web client. We have our own subversion repository of the Alfresco Enterprise code which we have modified. In some cases, it would be possible to separate out this work into its own AMP module. In others, it is really intricately tied to the code and would possibly be encumbered due to our use of the Enterprise product. Right now the code base we're using is a beta version of Alfresco Enterprise v2.1.1.

The following is listed in approximately the chronological order in which the features were implemented.

TinyMCE Upgrade & Plugins:

This is a small but important change for content authors: We upgraded TinyMCE to version 2.1.0 and installed all the plugins. This allowed us to configure web form TinyMCE windows to include important functionality like the HTML editor view.

Workflow

Another small piece: We added the "no workflow" workflow which does nothing. This was necessary to work around several issues (some bug-like behavior shows up if certain assets are submitted without a configured workflow... at least in some versions of Alfresco)

We did other workflows as well, specifically to attempt to do a proper parallel review (that is, globally reject the submission upon the first rejection by a reviewer), but never got something seamless.

Regenerate Button

In the web client web project browse view, we placed a Regenerate button action on every form asset, folder, and sandbox. If Regenerate is pressed on a form instance data document, all renditions of this document are regenerated. If Regenerate is pressed on a rendition document, it is as if Regenerate were pressed on the rendition's primary form instance data document (i.e., it and all related renditions are regenerated). If Regenerate is pressed on a folder, it is as if Regenerate were pressed on all subfolders and form assets in the folder (i.e., every form asset in this folder and all descendant folders are regenerated.) If Regenerate is pressed on the sandbox itself, all form assets in the entire sandbox are regenerated.

We needed this functionality for these reasons:

When first creating a web project and importing many form assets, we need to do a "bulk generation" of the content in order to start off in a consistent state.
If form assets are ever uploaded via CIFS, the renditions need to be regenerated. Since form asset regeneration is apparently a function of the web client (!), and no rules/triggers will be available via the AVM API until at least Alfresco v3.0 (!!), this cannot yet be fixed without a manual regeneration step like a Regenerate button.
The model of one-form-instance-data-to-multiple-renditions is a bit limiting. We sometimes want, for example, a single rendition to be dependent on multiple form instance data documents. For example, imagine a single "news-article-index.html" rendition dependent on all "news article" form instance data files. Ideally, whenever one of the news article xml files is updated, the index rendition should be updated. (Yes, this effect could be obtained via server-side includes or other dynamic behavior of the web server at runtime. But why? It is static information that does not change between deployments.) This cannot happen easily with the current model. Therefore, the Regenerate button is a workaround whereby we may possibly have to resync such form assets which are out-of-date.

In all cases, the situation is that there are times when the form asset contents of a web project sandbox are inconsistent and need to be resynced.

The Regenerate Renditions Wizard introduced in Alfresco v2.1 meets these needs, although not adequately enough for us to eliminate the Regenerate button yet:

The wizard is inconveniently located on individual Web Forms with the Data Dictionary. Although it may be useful for this functionality to be available there, it should also be available in some form in an actual Web Project which needs regeneration.
The Regenerate Wizard only works on a per-form basis, not a per-folder / per-item basis in a web project. If you only want to regenerate, say, the five assets you updated via CIFS, you may have to regenerate all 1,000 assets which use the same form. Maybe it's not a big deal, but it could represent an unacceptable time delay.
The showstopper: the wizard only works in the Staging Sandbox. Since regeneration is needed to resync inconsistent form assets, it means that in order to use the wizard, inconsistent form assets must first be promoted to Staging. Thus, a content author can no longer do this without polluting the entire web project and every other user sandbox in the project with an inconsistent state. Plus, presumably only a Content Manager could use the wizard, so a content author must work in concert with a Content Manager, including jumping through whatever workflow/review process hoops are present in the web project. The wizard is therefore only really useful for the content manager who first creates a web project for "bulk generation", and not for incremental changes.

We imagine that ultimately the Regenerate Button will not be necessary, but not until the Regenerate Renditions Wizard is altered to meet our needs.

Regenerate Output Path Pattern

We originally changed the way assets were regenerated to update the output path patterns. This issue is described in http://isues.alfresco.com/browse/WCM-860. Unfortunately, starting in Alfresco 2.1, this exposed a race condition. Alfresco Support did not want to help solve the problem since we had changed their regenerate code (even though the race condition is, in my opinion, merely exposed and not actually caused by our code. Oh well.) Instead, they told us to report the behavior we wanted... hence WCM-860. You can see how far that's gotten.

Deployment by Script

Originally, the deployment feature was introduced without a way to deploy a web site to a file system on a remote server. Instead, the only way was to deploy a web site to an Alfresco repository (which, as far as I know, is not a web server.) We extended the deployment feature so that it could kick off an external process which handles deployment, whatever that entails. The external process is handed some environment variables describing which version of which web project should be deployed.

So, if web-client-config-custom.xml contains the following information: <config><wcm><deployment><commands><command name="deploy-to-foobar" program="/home/alfresco/scripts/deploytofoobar.sh" /></commands></deployment></wcm></config>Then, if a web project is configured with a deployment server named "command:deploy-to-foobar", deployment means that the process "/home/alfresco/scripts/deploytofoobar.sh" is started with the DEPLOY_SRC and DEPLOY_VERSION environment variables set.

In our case, we used this to write a script which uses rsync to copy files through the CIFS interface onto a remote web server's file system. Starting in Alfresco v2.1, we don't need to use this method for such a deployment that simply copies files to a remote server. However, it is possible that in general "deployment" can mean something more complex than just file copies. For example, as part of a deployment, a system may need to update the contents of a database. Or it may need to transform some Alfresco metadata into a format which can be used on the remote system. Therefore, we think that the idea of a general deployment plug-in is useful.

XSLT Extensions

Since there is no rules engine which works on the AVM side, we decided to use the web form engine to do some of this work for the itinfo part of the IS&T web site. Specifically, we needed some way to index itinfo data so that the web server at runtime could quickly process keyword and topic searches. We did this by creating an XSL template which produces a dummy rendition file but primarily maintains a bunch of index files.

We used Xalan's extension mechanism for calling custom Java class methods. The XSLT Extensions class is mostly a wrapper for AVMService which handles exceptions by storing them and providing a getLastException() method, instead of throwing them (which kills the XSLT engine).

This workaround is the closest we could come to using WCM to maintain database-like information. Because we want to use the Virtualization server and a remote web server, the webapp must get all its data via ServletcContext.getResource(), and not from an external database. It is definitely a shoe-horn solution.

Data Pager with Find Page

One of the bigger complaints we got from potential content authors is that folders with many files or subfolders are simply impossible to navigate in Alfresco, even when you know the exact filename you're looking for.

The data pager that ships with Alfresco has clickable page number links which are always clustered around page 1 (!!!). This makes it impossible to go to page, say, 125, without at least 26 clicks (100, Next, Next, Next....) and, of course, the page loads between clicks. We updated the pager so that the clickable page links are always clustered around the current page number, so that you can get to page 125 with three clicks (100, 120, 125.)

We also updated the Data Pager so that the user can type the desired page number directly into a text input box. So you can get to page 125 in one page load.

Finally, it is rare that the content author knows the number of the page containing the asset they want. If you're looking for "foobar.xml", which page is that? You might guess it would be somewhere in the middle, but why should you be manually performing a binary search of a sorted list? To address this, we added the Find Page button, which allows the user to enter in the name of the desired asset. The page is then changed to the one containing that asset, if it exists. Otherwise, the page is changed to the one which would contain the asset if it did exist (for example, if you're looking for "foobar.xml", you'd be brought to the page containing "fonts.txt" and "football.html")

WCM Search Wizard

With the release of Alfresco v2.1, WCM Search was introduced in the Staging sandboxes of Web Projects, but this was not exposed in the UI. This is unfortunate because our content authors need to be able to find content when they don't know the exact path to it in the web project. This was one of the big requirements we had for the WCM web client since we started working with Alfresco.

When Alfresco v2.1 came out, we first wrote a WCM Search Web Script, but this was not easy to integrate well into the web client. So we then wrote a WCM Search Wizard, which can be called on a Web Project or sandbox. If the user calls the search wizard on a Web Project, she must choose which sandbox* within that project to search. If the user calls the search wizard on a particular sandbox, that sandbox is searched. The search query is (currently) a single text input which is passed as-is to Lucene.

When assets in a sandbox are found that satisfy the search query, they are shown to the user in a view that looks like the website-browse view, with all the same actions (edit, copy, preview, etc.).

*I said that a particular sandbox is searched, but in fact only the Staging sandbox is indexed by Lucene. We work around this by performing the search within the Staging sandbox, and then converting the results into sandbox-specific paths, discarding any entries which for some reason are not present in the sandbox (e.g., the user recently deleted them) This means the search isn't perfect: it can never find sandbox-local modifications. Hopefully a future version of Alfresco provides true sandbox-level search functionality.

Jpeg Metadata Extractor & Content Transformer

The Alumni Association will have a large number of jpeg images in their web project. They need to be able to locate an image by searching its metadata (EXIF/IPTC data). Our WCM Search Wizard allows the searching of metadata, but there is no actual extraction of JPEG metadata by default. We wrote* a Jpeg Metadata Extractor to populate Alfresco properties corresponding to this metadata, as well as a content transformer to allow this data to be found by a default (i.e., TEXT) Lucene search.

This also involved turning on the AVM Metadata Extraction in Alfresco v2.1, which is off by default.

*We are aware of an Alfresco Forge project to extract EXIF metadata from JPEGs. Unfortunately, the source code is not made available by the project author, and we are not interested in closed source solutions.

File Picker Widget: Search & Preview

A major complaint from the pubs team is that the File Picker Widget in Web Forms that use xs:anyURI fields and when browsing the repository within the link/image buttons of TinyMCE is unusable if the web project contains a large amount of data. That is, the same web project navigation problem crops up in the File Picker. We extended the File Picker to allow the content author to perform a WCM Search.

Plus, we added Preview buttons to the File Picker so that the content author can actually look at an item before selecting it. This is not as easy to use as a thumbnail view, but at least makes image linking bearable, especially for the Alumni Association.

Child pages

Overview of MIT Modifications to Alfresco