Published: March 15, 2001.
Requirements and Architecture for Administration and Maintenance of Digital Object Libraries
Manfred Bogen, Marion Borowski, Stephan Heisterkamp, Dieter Strecker, GMD - German National Research Center for Information Technology, Germany
AbstractWith the digitization of original documents cultural heritage institutions such as museums and archives are enabled to make their valuable collections accessible and thus better enjoyable by the public. Digitizing, however, is only a first step. Many libraries, archives, or other cultural heritage institutions already have a well functioning Library Information System (LIS) for the metadata management of their conventional objects. These systems work satisfactory as far as standard museum functions are concerned, such as cataloging, indexing, acquisition, or the OPAC (Online Public Access Catalog). What is however needed now are the administration and maintenance of the digital objects themselves. In this aspect, traditional Library Information Systems have limitations and shortcomings.In our paper we present a global requirement list for a Digital Library System and we develop a generic architecture. We put special emphasis on an integrated solution, which addresses all aspects of a publication chain, from digitization, over indexing, administration, Web presentation, printing, to electronic commerce. We describe state-of-the-art Digital Library Systems with their benefits, disadvantages, limitations, and possible fields of application. On this information basis we describe our approach to find an integrated Digital Library Solution.
Introduction: Needs of Libraries, Archives, and Museums
At different places in the world, museums, archives, and libraries have started to digitize their valuable stock for preservation and value-adding application purposes (Bogen & Bonkowski & Borowski & Löffler, 2000). The result of a digitization process is a collection of digital objects and metadata. There are the bibliographical metadata first. In addition, new metadata concerning the digital material itself must be captured and made accessible. There are new technical metadata, which are relevant such as format, size, resolution, access, and copyrights. Additionally, there are metadata representing the structure of a document, which must be taken into account such as the assignment of images to the structure of a music piece (title page, first movement, second movement, etc.).
In order to enable well-structured and economic handling of digital objects and related metadata, a Digital Library System is needed, as a prerequisite for enhanced functionality in a museum and new applications in general. These Digital Library Systems have to meet the special requirements of digitized material.
This paper structures as follows: In the first section, a few definitions and a generic digital library architecture are introduced to understand our notion of Library Information Systems, Museum Management Systems, and Digital Library Systems, and related differences. In the next section we present our selection from the market of Digital Library Products and Systems, together with our evaluation criteria. The last section provides the results of our evaluation and a list of candidates that we will inspect thoroughly in the near future to build a Digital Library Solution upon.
Definitions and Model
In order to share a common understanding of the application domain, a few definitions have to be introduced first as different interpretations are found in the related literature.
Library Information System (LIS)
Library Information Systems are computer-based systems that automate internal library procedures, public services, and those that connect users to electronic resources. Technical services offer support for professional librarian areas such as acquisition of materials, cataloging, authority control, and circulation control. Public services include online public access catalogs (OPACs) and access to licensed electronic.
Synonyms for Library Information Systems include any permutation of the words integrated, online, library, management, automation, and systems (http://www.coe.missouri.edu/~is334/projects/Project_LIS/faq1.html).
A Library Information System is not necessarily a digital library, since a library consisting entirely of conventional physical material (such as only printed books) may be very highly automated. This automation does not make it `digital' in the sense we are considering here.
Museum Management System
Museum Management Systems are Library Information Systems that offer even more functionality. Their services include collection administration, exhibition management, events and public relations, administration and shipping, and finally documentation. Exemplary features herein are inventory of objects, library, and photo stocks, various search capabilities, easy operability and handling, creation of reports and statistics, and finally Internet access.
Digital Library Systems
There are many definitions of a digital library. Terms such as electronic library and virtual library are often used synonymously. The elements that have been identified as common to these definitions are (http://sunsite.berkeley.edu/ARL/definition.html):
The important point is that a digital library has material stored in a computer system in digital form. It allows manipulation (e.g., for improved retrieval) and delivery (for instance, as a sound file for playing on a computer) in ways that the conventional version of the material can not (Noerr, 2000).
Because the material is in digital (or computer readable) form, some new possibilities are opened to the digital library which are not there for a conventional library, even one with the same material.
Figure 1: Functionality Overlap
Figure 1 shows that the functionality offered by the different system categories mentioned are not distinct. In the following, however, we will concentrate on Digital Library Systems and solutions as they are, being digital, the systems of the near future for any museum, library, or collection.
A Generic Architecture for a Digital Library System
A Digital Library System may consist of four layers (see figure 2). Digital objects themselves and the related metadata information are stored in a database (DB) or more precisely a Database Management System (DBMS).
A so-called Digital Objects Server handles all requests for this DBMS concerning digital objects. Depending on the media type, a request is handled by an Image Server, a Text Server, an Audio Server, a Video Server, a 3D Server, or a server for other digitized objects respectively.
Between a Digital Objects Server and the Digital Library Clients there is a layer called Digital Library Middleware. Middleware was once defined as the glue that keeps networks and applications together. In our case, it includes toolkits for the development of interfaces and clients, directory and security services, and management facilities.
The Digital Library Clients represent the top layer of this architecture. They are the interface to an instance (e.g., a user) requesting or storing information about digital objects. A special interface is the Z39.50 interface (Z39.50, 1998) as it provides access to conventional Library Information Systems as found in many museums today.
Figure 2: Digital Library Architecture
Digital Library Systems -- a Market Overview
The goal of our market overview is to find a suitable system for the administration and maintenance of digital objects. Figure 3 represents our approach. In a first step we searched the market for possible candidates and compiled a list of 70 products. We based our list upon publications dealing with the application scenarios of museums, archives, and libraries (Noerr, 2000; Thomson, 1998; Knirim & Graf, 1998; Chin, 2000).
In a second step we classified the systems into three categories that we defined in chapter 3: Library Information Systems (30 systems), Museums Management Systems (35 systems), and Digital Library Systems (9 systems). Some of these products covered several areas.
Figure 3: Steps 1-3 of our market overview
In a third step we had a closer look at those systems with potential as Digital Library Systems. They were inspected with a detailed criteria checklist. It covers different requirement areas. Product comprises general information about the system. Collection Management addresses the administration of digital objects including metadata. To ensure public access Web presentation is the issue of the next category of quality criteria. A Web-based digital library enables an appropriate distribution of digital material independently from location, time, and available copies of objects. Supported steps towards a digital library determine the next section of desirable features. An ideal Digital Library System supports the whole object path from digitizing to sales on the Internet. If those steps are not offered, the integration of other software packages and extensions becomes necessary. The technical requirements talk about the hard- and software environment. Comments close with additional important aspects. Chapter 5 presents a detailed enumeration of all criteria.
The Digital Library Systems are described with its main features in the following paragraphs. For the most promising candidates the complete results are summarized in table 1 (see chapter 5).
allegro-C was released by the Technical University of Braunschweig in Germany (http://www.biblio.tu-bs.de/allegro; allegro-C, 1997; Eversberg, 1997; Tews, 1997). It essentially is a programmable database system for DOS or Unix. It has been designed to meet the requirements of museums and libraries in building a catalog. More than 70 installations exist mainly in Germany and Austria.
allegro-C provides some important features for libraries: OPAC, a loaning module, support of the Z39.50 standard, a MAB data structure, and data access rights on file level.
A set of data structures for administration of print media is part of the system but digital multimedia objects have to be handled by external modules which can be called from within allegro-C depending on the media type. Dublin Core and RDF standards are not supported.
allegro-C lacks XML- and SGML-interfaces or those to other database systems and there is no ready-to-use Web interface, just a set of modules to assist in implementing an interface for data manipulation and retrieval.
The software architecture of allegro-C is still based on the first version, which was released in 1980. It is a collection of separate programs storing data in and being controlled by a lot of files of different types, all integrated in a common user interface. An API for direct access to the database contents is not provided.
allegro-C is a Library Information System and has no additional functionality for managing a museum. It also has no potential as a Digital Library System.
KE EMu (http://www.ke.com.au/emu/index.html; KE EMu, 1999; KE Software, 2000) is a multi-lingual Museum Management System based on the KE Texpress object oriented database system (http://kestrel.mel.kesoftware.com/texpress/). It was developed in 1997 and there are 16 installations in Australia, Canada and the USA. KE EMu is a client/server application for Windows 95/98/2000/NT clients and Unix or Windows NT servers. It provides the following museum management tasks:
KE EMu provides a lot of features of a Digital Library System. Administration and maintenance of digital multimedia objects is possible. The complete Dublin Core attribute set is recorded about each resource but the RDF standard (http://www.w3.org/TR/REC-rdf-syntax) is not supported. Almost all image formats can be displayed and audio and video data can be handled by external applications.
A lot of retrieval functionality is provided. The user can choose between query-by-example and the powerful Texql query language, which is a superset of SQL. Context sensitive and phonetic based search is possible. The search results are displayed in table form or as single datasets. The Web interface of KE EMu is aimed at query-only access to the published catalog and related data. It provides a simple keyword query facility and does not allow update.
KE EMu has database interfaces to Oracle and Sybase and the Z39.50 and XML standards are supported. The API Texapi is part of the software package.
This software is a Museum Management System but it comes very close to a Digital Library System just lacking the following features:
Museum Information Management System MIMS
MIMS (MIMS, 2000) by XWAVE Solutions (http://www.imx2.com) is a client/server collection management system for use with Oracle or Microsoft SQL server databases. It was published in 1998. It offers no museum administration facilities to support loaning, accessioning, event management,r and condition checks. MIMS's digital library features are limited to the storage of binary objects, which are to be handled by external applications.
Object-oriented data structures are used which can be modified online. During modification data integrity is checked. No predefined data structures are provided and Dublin Core or RDF standards are not supported.
A lot of search functionality is implemented. A user can choose between query-by-example and the powerful SQL query language. Search results are displayed in the form of a table, single datasets, or a user-definable report, and it can be saved for future use.
The MIMS system provides a browser-based Web interface to the search tool. Especially classified records can be searched for and displayed. XML/SGML import and export are not supported and there is no Z39.50 interface.
MIMS is a Museum Management System. It lacks important features to call it a Digital Library System:
Agora from SRZ
The Digital Library System Agora was developed by the Satz-Rechen-Zentrum SRZ (Berlin, Germany) in collaboration with the Göttinger Digitalisierungszentrum (Göttingen, Germany) for the storage of and access to digital documents, including their structural, bibliographic, and content metadata (http://www.agora.de/). The system complies with the specifications for a distributed digital scientific library as defined by the Deutsche Forschungsgemeinschaft (Mittler, 1998).
The software is still under development; the first version was announced for August 2000. Beta versions are in use in several German and Austrian libraries. A relational database (with the capability of SQL92 and ODBC/JDBC) and a Web server (with the capability of Java Servlets) are needed additional components.
As digital library system Agora provides components for scanning, indexing, image processing and conversion, document preparation, administration, document management, Web presentation, printing-on-demand, and storage on CD-R. The Agora XML Editor ensures the structuring of digitized print material including bibliographical and technical metadata. Output is XML files.
Batch import tools are provided to import XML files as well as digital material. Metadata are extracted from the XML tags and from inside the TIFF headers and imported into a relational database as center of the digital library. Images can be converted into other image formats and sizes.
One main part of Agora is a Web application based upon Java Servlets. HTML-templates that provide navigation features (catalog data, SGML/XML based tables of contents, indexes, lists of illustrations, thumbnails, etc.), simple, and extended searches are included and can be adapted. The integrated Verity Search engine(http://www.verity.com) offers full-text retrieval. The system supports SGML/XML and RDF. However, there is no Z39.50 interface.
Agora is no Library Information or Museum Management System but supports the handling of digital objects. The Agora concept is promising. Nevertheless it still lacks two features:
ADLIB from ADLIB Information Systems
ADLIB Information Systems is a company with over 15 years of experience in information and collection management in libraries, museums, and archives. As software modules they offer since 1992 (http://www.disbv.com/):
ADLIB library: 'The Integrated Library Application' that was intended specifically for information and catalog management in libraries.
ADLIB Museum: 'The Collections Management Program' that features comprehensive functionality for professional collections management.
ADLIB Archive: It was intended especially for archive management.
These modules are used by several international museums, archives, and libraries and are available in German, English, and Dutch.
The ADLIB applications base upon a propriety database. Internet users can search the catalog by any field or combination of fields, using Boolean and logical operators. For the Internet access Microsoft or Netscape Web server software and the ADLIB Internet Server are required.
ADLIB Information Systems are a member of the Consortium for Interchange of Museum Information (CIMI). Consequently ADLIB supports standards such as MARC (ISO 2709 MARC exchange format), Dublin Core (export routines), Z39.50, and CIDOC (Guidelines for Museum Object information). ADLIB has built-in image storage and retrieval capabilities.
ADLIB can be rated as Library Information and Museum Management System. For a Digital Library System, the ADLIB software lacks the following features:
Storage and retrieval of hierarchical structured digital objects (title page, first movement, second movement, etc.)
Storage and retrieval of other media types (video, audio, full text, 3D models, etc.)
Voyager from Endeavor Information Systems Inc.
The Voyager (http://www.endinfosys.com) integrated library management system consists of several different software modules for public access, OPAC, cataloging, acquisition, circulation, reporting; and system administration. Voyager offers access to images, full text, and other local and remote resources. More than 400 installations exist mainly in the US, Canada, and the UK.
Voyager is a client/server application with a clear separation between the three components: client functions, server functions; and database functions. It runs on Unix (Solaris, AIX) and Microsoft NT/Windows2000 .Voyager's server software uses a relational database to store and manage all of the libraries bibliographic data. Voyager has Oracle as its relational database management system (RDBMS) and supports SQL as query language.
The Voyager client software runs under Microsoft Windows. The existing HTTP client is called WebVoyage and it comes as part of the base Voyager package. It allows Web browsers to query the Voyager database. It is a functional Web OPAC, capable of handling all types of data. All other functions like cataloging, acquisition, and administration can only be performed through the MS-Windows client software.
ImageServer, an additional Voyager module, has the task to manage electronic resources, special collections, archives, and digital resources. It supports scanning, indexing, access, and printing of electronic documents and other library objects. A Document Management System with copyright control is integrated.
The Cataloging module provides the user with a MARC21 interface and batch import functionality. It is possible to import or export records through a USMARC or Z39.50 interface.
Voyager is a Library Information and Museum Management System. The following features are missing to call it a Digital Library System:
Virtua ILS from VTLS Inc.
Virtua ILS (http://www.vtls.com) is a client/server software that offers numerous features in an integrated solution to manage library collections. The basic subsystems of Virtua include OPAC, Web Gateway, Circulation, Cataloging, Serials, Acquisitions, System Administration, and Statistics and Reports. There are more than 300 installations in 24 countries world-wide.
Virtua's OPAC offers four types of searches: Browse, Keyword, Control Number, and Expert. To further refine search results, users can employ the use of filters in Keyword and Expert searches. Virtua's OPAC toolbar and views are configurable through the use of the system permissions and parameters.
The Chameleon Gateway is the Web Gateway of Virtua ILS. It allows libraries to offer a variety of customized interfaces and it allows end user to choose the one that meets their individual needs best. By creating or modifying a skin, the library can make the gateway match the appearance of the library Web site. Every library can build its own personalized Web portal. The Chameleon Gateway is designed for multimedia access through the Internet. Any user who has a connection to the Internet and an appropriate WWW browser can search through the image collection or other Z39.50 compliant databases. It is not possible to import or export records in XML/SGML.
Virtua enables the definition of an unlimited number of original cataloging workforms for various record types and materials formats. Workforms can also be designed for specialized needs, such as monographic series, metadata (such as Dublin Core), and institutional publications.
Virtua runs on Unix (Solaris, AIX) and Microsoft Windows (NT/2000). It uses Oracle as a relational database. The client software is only available in a Win95/98/2000 version.
Virtua ILS can be classified as a Library Information and Museum Management System. It lacks the following features to be rated as Digital Library System:
Content Manager by IBM
IBM's DB2 Digital Library is now integrated in a product called Content Manager (CM). Its use is not restricted to the cultural/museum industry (http://www-4.ibm.com/software/is/dig-lib/). Content Manager manages all digital information -- scanned images, workgroup business documents, computer-generated reports, Web content management (XML/HTML), and more. Its key functionality for digital libraries supports creation and capturing, storage and management, search and access, distribution, and rights management. Different scanner types are supported and new search technologies are embedded into the system such as parametric, content-based (text, image, video, audio), multi-search, navigation, and even intelligent agents.
A library server providing the search capabilities mentioned and one or more object servers providing a storage location support a client that needs access to a digital collection. A library server is the single point of control for the system. In particular, it holds index information, controls security and access, dispatches requests to object servers to accept new objects or to move objects to a client, and ensures database integrity. The objects themselves may be located centrally or may be decentralized and close to the users. They are stored in Content Manager object servers.
CM supports a broad variety of standard formats, ranging from TIFF, JPEG, XML, MPEG and QuickTime V.4 to hundreds of office document types, including Microsoft Word and Lotus 1-2-3 spreadsheets. Content can be scanned or imported via out-of-the-box clients or capture solutions. It is then organized in a DB2 database or Oracle relational database.
A Content Manager server operates on AIX, AS/400, OS/390, Windows NT, and Windows 2000 and supports clients for Windows 95, 98, 2000 and NT. Content Manager comes with a client toolkit for building Internet or desktop applications that access information stored in Content Manager. A sample Java applet demonstrates how searches can be performed. APIs include Java, Java Beans, C++, and ActiveX Controls.
IBM's Content Manager is a well-known, powerful Digital Library System. Its main disadvantage is the fact that it is only running on Windows systems or IBM's Unix dialect AIX. The customization process needs training people with appropriate skills.
Dienst by CDLRG (Cornell Digital Library Research Group)
Dienst is a system for configuring a set of individual services running on distributed servers to cooperate in providing the services of a digital library. Interoperability among Dienst servers provides the user with a single logical document collection, even though the actual collection is distributed across multiple servers (http://www.cs.cornell.edu/cdlrg/dienst/DienstOverview.htm; Dienst, 1995).
The functionality of a Dienst digital library includes storage and access to resources (digital objects), deposit of new resources, discovery and browsing of those resources, and user registration. Dienst supports indexing, administration and Web publishing of documents. Dienst does not offer collection management facilities such as purchasing, e-commerce, or copyright management. The distributed Dienst software is configured to handle textual resources (documents) in a variety of formats. However, the Dienst architecture includes a sophisticated document model that accommodates a wide variety of digital resources. Using the Dienst software for these other resources will require some programming.
The entire package of Dienst software (all services) is only supported on Unix (including LINUX, Solaris, AIX, Ultrix, and HP-UX). The Dienst software is designed and written to run in conjunction with an HTTP server such as the Apache Web server (http://www.apache.org/). The Dienst protocol uses HTTP as a transport layer, making Dienst servers accessible from any WWW client. Z39.50 and SQL is not used for query purposes. The only restriction on metadata used in Dienst is that it is formatted in XML. Examples use three metadata standards: rfc1807 (ftp://nic.merit.edu/documents/rfc/rfc1807.txt), Dublin Core (http://purl.org/DC/), and Open Archive Metadata Set (http://www.openarchive.org/).
Dienst is not a Digital Library Product in the classical sense: it is a protocol and server that provides distributed document libraries over the World Wide Web. In this context, it is more an infrastructure than a single-hosted Digital Library Solution.
Summary: Our Proposal for an Integrated Digital Library Solution
Table 1 describes the four most promising Digital Library Systems after our detailed review. They met our quality criteria checklist best.
Agora is a special development for digitized objects. It considers all aspects of German working groups that define requirements for the management of digitized material. The concept of Agora is very strong, not so the implementation. The main disadvantage of Agora is that it has not yet left its beta status and that the product release has been postponed several times in the past.
CM is a very comprehensive Digital Library System offering a lot of features for the storage and maintenance of digital objects. It is an often applied solution, powerful and well-known. Its main disadvantage is the fact that it is only running on Windows systems or IBM's Unix dialect AIX. The included toolkit offers great flexibility but needs training people with appropriate skills.
KE EMu focuses on collection management for museums where it has its strengths. But the capabilities reach beyond this towards administration of digital objects. Its weak point is the Web interface, which only provides a simple keyword query facility.
Virtua ILS has its strengths in the field of Library Information and Museum Management Systems. The main disadvantages are that the support of Web access is restricted to the Web OPAC and that it is not possible to use XML/SGML data. Nevertheless with a few enhancements it can be used as Digital Library System.
Figure 4: Steps 4-6 of our market overview
So far our work was paper work. In a next step we will run practical tests with the systems selected to check whether the promises of the companies will hold. Of course one has to talk to customers and bring in their special needs too. Finally their criteria will be decision-relevant. No system will cover all features. Therefore we plan extended developments to make a Digital Library Solution out of the Digital Library System selected.
allegro-C (1997). Systemhandbuch Version 15.
Bogen, M. & Bonkowski, C. & Borowski, M. & and Löffler, J. (2000). Digitizing a Cultural Heritage - The Key Issue for Preservation and Electronic Publishing. In: Proceedings of WebNet'2000, San Antonio, Texas
Chin (2000). The Canadian Heritage Information Network. Collections Management Software Review. http://www.chin.gc.ca/Resources/Publications/e_publications.html#CMSR.
Dienst (1995). Implementation reference manual. Cornell Computer Science Technical Report.
Eversberg, B. (1997). allegro-C: Partitur für Einsteiger.
KE EMu, (1999). KE Software. KE EMu User Guide.
KE Software (2000). KE EMu Electronic Museum Collections Management System Hardware and Network Requirements.
Knirim, H. & Graf, B. (1998). Softwarevergleich in der Fachgruppe Dokumentation des Deutschen Museumsbundes. http://www.museumsbund.de/fgdoku/ag-softwarevergleich/inhalt.html#Inhalts-verzeichnis.
MIMS (2000). XWAVE Solutions. Client/Server Operations Manual For The Museum Information Management System (MIMS).
Mittler, Elmar (Hrsg.): Retrospektive Digitalisierung von Bibliotheksbeständen, Berichte der von der Deutschen Forschungsgemeinschaft einberufenen Facharbeitsgruppen "Inhalt" und "Technik", Deutsches Bibliotheksinstitut, Berlin 1998
Noerr, P. (2000). The Digital Library Toolkit. Second Edition. http://www.sun.com/products-n-solutions/edu/libraries/noerrfinal.pdf.
Tews, A. (1997). allegro-C: allegro-Ouvertüre.
Z39.50 (1998). ISO 23950. Information and documentation -- Information retrieval (Z39.50) -- Application service definition and protocol specification.