- DRAFT DRAFT DRAFT

Project Outline for Book Reader Add-On to DSpace

TEST SAMPLES:

Eliot Bible 100 dpi

Eliot Bible 300 dpi

Older Pres Reports 100 dpi

Older Pres Reports 300 dpi

Newer Pres Reports 100 dpi

Newer Pres Reports 300 dpi

Carl Jones 

+ Project Description/Overview
- BookReader as DSpace add-on
- BookReader is opensource and freely available from Internet Archive
OpenLibrary (http://openlibrary.org)
- Integrating OpenLibrary BookReader functionality with Dome
(dome.mit.edu) for displaying multiple page content

+ Describe behavior and benefits of the BookReader
- OpenLibrary BookReader consists of server and client
see
https://github.com/openlibrary/bookreader/blob/master/BookReaderIA/inc/Book
Reader.inc for more background
- performs functions such as single-page, two-page, and multi-page view,
page-turning, zoom in/out, pan, jump to page, full-text search
http://openlibrary.org/dev/docs/bookreader
Single-Page, Two-page, and Thumbnail view
Zoom
Right-to-left page progression (e.g. for Yiddish and Chinese)
Full-text search with highlighting of search results
Support for foldouts and variable page size
In-Browser Text-To-Speech
Embeddable
Bookmark-friendly URLs
Works with a variety of image servers, or a simple directory of images
Simple access control
- Typically used in conjunction with JPEG files that the client gets from
JP2 image server, such as Djatoka (which knows how to most efficiently
handle jp2 to jpg conversion), or static files on a local or remote
filesystem
- BookReader client displays JPEG or PNG images
- BookReader client can consume bookreader files anywhere, even remote
- Either as files 'published' from JP2 image server (Djatoka), or files
sitting on the file system or using a remote URL
- Describe what types of content BookReader could be useful forŠ.
[ fill in the blanksŠ ]
- More and more digital library web applications are using 'page-turner'
type functionality to display complex, multi-page, content. For example,
documents may contain images and text, benefit from additional
navigational aids because of length (e.g. hundreds of pages) or has a
complex structure where the user benefits from being able to move around
easily and quickly (the document does not need to be downloaded in its
entirety in order to view the document 'map') from one art of the document
to the next.
- There are a few existing DSpace models, such as the Brasiliana site
(http://www.brasiliana.usp.br/bbd) that have made admirable progress
towards integrating BookReader with DSpace in some fashion
- More on Brasiliana site: the BookReader has been modified to work with
TIFF and PDF files rather than only with JP2 files; also does direct
access to certain DSpace programming 'hooks' that may be considered
'counter' to recommended practice
- BookReader functionality not needed for all items in all collections

+ MIT Content Questions
+ Identify collections and/or groups of individual items that would
benefit from a 'book reader' view
- Collections currently under consideration include:
- Future projects: track potential content that would benefit from a
bookreader view?
+ List characteristics that renders an item (collection of items)
suitable for JP2 conversion
- Multi-page (navigation would be enhanced by ability to rapidly move
around the document)
- why PDF isn't necessarily good enough?

-mimics the natural way of browsing through a physical document
- complex formatting: documents may contain text, images, hand-written
notes
- zoomable content (illustrations, small details of printed text,
handwritten notes, fragments, anything where the user might want a closer
view)
- What kinds of text-only documents should be included as suitable?
- Does an all-text document qualify (e.g. see Presidents Reports below)?
- Retrospective conversion of older material to JP2 or are we only
talking about using it for future projects?
+ Edgerton Books(?) and Notebooks
- brief description : approx. 36 notebooks, 400 dpi, ca. 45-50MB .tif's,
.pdf and .tif master scans
- mixed text, some typed, some handwritten notes, sketches, photographs
- cataloged metadata in Archivists Toolkit
- See current MIT Edgerton site:
http://edgerton-digital-collections.org/ where page-turning view has
already been implemented:
http://edgerton-digital-collections.org/notebooks/11

+ Elliot Bible
- brief description (metadata not started?)
- general description: approx. 1227 individual pages, (600 dpi), .tif's
ca. 46mb per page

+ Vail Balloon Collection
- general description: newspaper, broadsides, other, on ballooning in
19th century
- approx. 1331 .tif files, 1 - 3 pages per item, 300 dpi, approx. 46MB
per page, PDF for each item

+ Misc. digital objects from Institute Archives collections(s)
- These may be single instances of unique objects
+ Other collections? [ Ask Beverly ]

+ MIT Course catalogs (historical to present; scanned from microfiche to
born digital)
- more description

+ MIT Faculty Working Papers
- typically text only and we only have PDF's and TIFF masters (2-bit
text scans not ideal or preferred candidates for JP2?)

+ President's Reports (70 - 700 pages, text only), more recent reports
are very large in size and
- description

+ JPEG 2000
+ Working with JP2
- JP2 not suitable for all types of scanned images..... [ describe what
we mean ]
- What are the criteria for identifying and analyzing collections or
individual images that have the appropriate characteristics for conversion?
- structural criteria
- other criteria
- benefits
- drawbacks
- system-wide workflow implications of using JP2?

+ Converting TIFF's to JP2
- Work with Jenn Morris to test creating JP2 files
- JP2 conversion can be more complicated because of the many parameters
that need to be set (or can be set?) in order to do a successful conversion
How expert do we have to be to generate JP2's to begin testing?
- conversion parameters
- tiling considerations
- what about pages that don't need tiling (is tiling optional?)

+ List Recommended tools for batch conversion (here are a few, must be
more)
- Kakadu
- OpenJPG
- ImageMagick
- TotalImageConverter
- others

+ BookReader as DSPace Add-On, Some Implementation Options
+ Context (general considerations)
- whatever programming changes are proposed should adhere to existing
DSpace software development best-practice guidelines
- changes should be kept as modular as possible
- bitstream specific delivery modes

+ Scenario A: add a BOOKREADER 'Bundle' type for items containing JP2
bitstreams which will invoke the BookReader application on user demand
- BookReader image server can convert input files in JP2, TIFF, JPEG, or
PNG format to jpeg or png files which can be displayed by a web browser
- brief description of how BookReader interacts with supporting JP2
software (what software does what, how are files served, etc.)

+ Bill suggests implementing the BookReader view as a display option,
similar to the existing ''Simple' and 'Full' Item views; this would be
the 'BookReader' view
- Simple view
- Full view
- BookReader view
- Proposed BOOKREADER bundle would typically consist of JP2 bitstreams
(but can other file types potentially be supported as well?).
or Will the BOOKREADER bundle consist of JP2 bitstreams only?
- Available bundle types include ORIGINAL, HIDDEN, TIFF, TEXT,
THUMBNAIL, and LICENSE bundles
On Dome, currently all of our bitstreams are tagged as belonging to
either the ORIGINAL, HIDDEN, TIFF, THUMBNAIL bundles
- ORIGINAL bundle typically consists of .jpg or .pdf files displayed by
default
- All items that contain JP2 (and potentially other file types)
bitstreams assigned to the BOOKREADER bundle would show the option to
display the BookReader view
- When this view is selected the BookReader app starts in the existing
browser window or in a new browser window (we have to decide which we want)
- If an item has bitstreams tagged as belonging to the BOOKREADER
bundle then the BookReader view becomes an available display option,
similar to 'full' or 'brief' item display
- By contrast, for example, on the Brasiliana site it appears all
content is given the BookReader view (they only have
BookReader-compatible content collections)
- On Dome we have mixed-collections of varied kinds of material where
the BookReader view would not always be the appropriate default display
choice.
- Available bundle types include ORIGINAL, HIDDEN, TIFF, TEXT,
THUMBNAIL, and LICENSE bundles
On Dome, currently all of our bitstreams are tagged as belonging to
either the ORIGINAL, HIDDEN, TIFF, THUMBNAIL bundles
- ORIGINAL bundle typically consists of .jpg or .pdf files displayed by
default
- All items that contain JP2 (and potentially other file types)
bitstreams assigned to the BOOKREADER bundle would show the option to
display the BookReader view
- When this view is selected the BookReader app starts in the existing
browser window or in a new browser window (we have to decide which we want)
- If an item has bitstreams tagged as belonging to the BOOKREADER
bundle then the BookReader view becomes an available display option,
similar to 'full' or 'brief' item display
- By contrast, for example, on the Brasiliana site it appears all
content is given the BookReader view (they only have
BookReader-compatible content collections)
- On Dome, we have mixed-collections of varied kinds of material where
the BookReader view would not always be the appropriate default display
choice.

+ Manakin
- Add new functionality to Manakin where it can call the BookReader
when encountering items containing bitstreams tagged with the Bookreader
bundle name
- For Manakin changes, specify what the BookReader display view needs
to do
- For example, need to think about general location and size of the
book reader window, button behavior and placement, etc.
+ Scenario 'B': Using Static Files with BookReader
- Point BookReader at static (jpeg) files: BookReader navigation
behavior similar to above but points instead to static files on local or
remote file system
- Does this necessarily preclude the use of DSpace or is there some
variation where we somehow hand BookReader the assetstore location of the
file(s)
- more detail needed to describe how this might work

+ Scenario 'C'
- Other alternatives?
- non-DSpace solutions?

+ Other software needed for BookReader implementation
- BookReader can use open source Kakadu, OpenJPG, or Djatoka image
server for rendering JP2 into JPEG's for display in web browser
- Explore what these packages do in slightly more detail
- do they need to run on the DSpace server?
+ Kakadu (proprietary, but non-commercial license available)
- encode and decode JP2 - compression/decompression and rendering
(viewing)
+ Djatoka image server (opensource)
- Djatoka image server and viewer: djatoka provides compression and
region extraction of JPEG 2000 images, URI-addressability of regions, and
support for a rich set of input/output image formats (e.g., BMP, GIF, JPG,
PNG, PNM, TIF, JPEG 2000). djatoka also comes with a plug-in framework
that allows transformations to be applied to regions and resolutions
(e.g., watermarking)
handles web server requests to convert and display jp2 image as jpeg
- OpenJPG (determine if there's any need for this?)
- Supporting metadata files
- Are there other files that may be needed for BookReader to display
pages correctly in BookReader? For example., in their overview of
BookReader, Internet Archive documentation lists XML files that contain
information about the page size and we're not sure if these are essential
or not

+ Other DSpace sites with similar page navigation with BookReader
initiatives include:
- Brasiliana: http://www.brasiliana.usp.br/bbd
- Bill says Marvin Pollard(?) at CalState - proprietary
+ Ohio State started but then stopped
- they did mock-ups and some code but didn't get to rendering in DSpace
- Need OSU project information URL, see Github ?
- Other BookReader type software (equivalent functionality)?

+ Useful URL's
- http://en.wikipedia.org/wiki/JPEG_2000
- Introducing Djatoka:
http://www.dlib.org/dlib/september08/chute/09chute.html
- How to serve IA-style books from your own cluster:
http://raj.blog.archive.org/2011/03/17/how-to-serve-ia-style-books-from-you
r-own-cluster/

-HaithiTrust page turner

http://www.hathitrust.org/pageturner

http://staff.lib.muohio.edu/~tzocea/files/dspace/

  • No labels