Pictures and People: Distributed Query Database Collaboration
Edward W. Earle, International Center of Photography, New York; Roger Bruce, George Eastman House, Rochester, NY, USA
In 1998, the George Eastman House (GEH) and International Center of Photography (ICP) formed an alliance to create joint exhibitions and share collections and collection information. To strengthen the collection management data infrastructure, GEH and ICP studied a range of database solutions for managing the artifact collections. Eastman House had a legacy VAX system that desperately needed to be migrated to a modern database, and ICP was prepared to begin its first serious cataloging project. In 2000 and 2001, the two museums jointly licensed The Museum System from Gallery Systems. As part of this effort, the two institutions planned to design a joint Web site as a portal into databases representing the photographic collections of both organizations. The two museums are now working with Gallery Systems to test a distributed query system between the two institutions allowing Web visitors to enter a single query at the joint Web site (www.photomuse.org), which will spawn queries to both ICP in New York City and Eastman House in Rochester, NY.
Keywords: distributed query, image retrieval, database, search, multi-institutional collections
When new technology enters a culture, there is an initial struggle to domesticate the foreign. We need to use analogies to understand how the new technology works and to make it less threatening. For example, the horse was the main mode of transportation during much of American history, so when the steam locomotive was invented it was called an "iron horse." Similarly the phrase "horseless carriage" was applied to the first automobiles. These terms helped to domesticate the potentially threatening technology and make it palatable to the populace.
The language of assimilation has been used for today's technology as well. Fortunately, very few people, except perhaps politicians, talk about the "information super highway" any longer. However, technological change within the information sciences has also often required bringing along old legacy forms to instill comfort. How many database programs used the index card analogy to convince the user that this new technology at least looked like that which was known and understood?
Just as the culture of 19th Century America resisted new technology, museums were slow to reach the level of comfort that libraries have long had with information systems. Even when databases were introduced, there were legendary fights among museum departments over conflicting use of field structures and nomenclature. Early on, many museums and even departments within a museum went at it alone, developing complex databases suited to their local idiosyncratic needs (often defined by the need of the moment rather than longer term assessment). At the same time, these early efforts helped to define the unique needs of museums as distinct from the hierarchy of library science. Early work at the George Eastman house resulted in one of the first networked image databases (on a very experimental level). In the mid 1980s the Eastman House and California Museum of Photography, University of California, Riverside, experimented with networked data via 1200-baud dial-up modem to a VAX with a local videodisk in California displaying images based on records accessed from Rochester. By the turn of the century, this data and videodisk represented a legacy in both senses of the word.
Conditions In The Field Of Photography
Today, the number of museums and galleries exhibiting photography is testimony to the medium's broad acceptance within the art museum and visual arts fields. However, the very incorporation of photography within the institutionalized art world has, in some ways, limited general understanding of the unique nature of the medium. Photography as a medium can participate in many fields of study simultaneously. While an image can be understood to be an exemplary piece of artwork made by a noted photographer, it can also be seen as an agent of social change, or as a document that records a cultural or historical phenomenon, or as a tool for scientific, historical, or educational research. This is a level of complexity rarely claimed by other means of visual expression. The various strata of meanings present a challenge to all custodians of photographic collections. Reflecting both institutions' mission to advance the understanding of photography and to promote scholarly research and recreational inquiry into photography, the alliance is well-suited to employ information science and new database tools to help meet this challenge.
To truly exploit the value of photography collections for this expanded research agenda requires yet another level of institutional commitment. Over the past four years, ICP and GEH have dedicated themselves to establishing good internal collection management standards, while installing a solid information infrastructure and accessible Web site to disseminate collection information. As intended, the development of collection-based information will lend itself to a shared field-centric structure which features distributed search and retrieval through both collections via the www.photomuse.org Web site. While it is a descriptive and cataloguing problem to bring new layers of information to better understand a single collection, it is a very different priority to work to support the larger field of knowledge. Herein lies the challenge to the partners' multi-institutional plan and the area of greatest potential change throughout the museum and photography fields.
Museums, by their very nature, are often institutional versions of the very artists they exhibit: possessing a level of institutional ego which emphasizes the singular museum above others. This self-service may be a consequence of the competitive nature of the business, but this process can ultimately be a disservice to the field that the museum purports to serve. As collection-centric organizations, museums rarely follow a cooperative field-centric model, supported by a consortium of institutions, which could bring a richer range of resources to the public. The partners hope not only to promote the ICP-GEH alliance as a model for integrated information systems, but also to encourage other institutions to share and participate within this framework, thereby supporting and emphasizing the field-centric model.
Unlike other art forms, photography is uniquely suited to being represented in digital media and within a distributed query mechanism. While many photographs at ICP and Eastman House are fine art works which need to be experienced as elegant prints, others exist only as film or glass plate negatives. Here the transformation of the negative image into a high resolution digital image file has helped rescue the image from obscurity and given it a new life performing two functions: preservation and access. There is also the recognition that photographic works often exist in many variants. Photographers will revisit their work over time and print very different renderings from the same negative. A photograph by Edward Weston, Ansel Adams, W. Eugene Smith, or Bernice Abbott will look very different from original vintage prints, to renderings made by the photographer decades later. For these many reasons, a distributed query mechanism across institutions will allow for discovery of new work and also comparison of similar works created over time. While museums of photography generally collect artifacts, be it a fine art print by Cindy Sherman or a daguerreotype by Southworth and Hawes, museums will have to contend with work that is born digital. Today we might collect a fine archival print made on an Epson 2200 printer, but in the future that photographer or her estate might offer an archive to a museum in the form of digital image files.
Institutions have employed various methods to share and collate databases about object collections. Historically there is the catalogue raisonné, which is dependent not upon any special technological tool but only on the resolute hard work of a scholar seeking out all instances of an artist's work among public and private collections. There are concerted efforts among consortia of institutions in the form of union lists, which are also often dependent upon a central editor/publisher and a lot of individual gumshoe work. Within advanced technology there are hierarchical efforts such as AMICO (Art Museum Image Consortium) where organizations contribute to a central repository of both data and images. To simplify integration, that data is provided in strict fields and the images in specific formats. As a subscription model, AMICO's users are higher education institutions, allowing for controlled access and carefully maintaining copyright or proprietary rights. This model brings back revenue to AMICO and then to the participating collections over time – an excellent community model, but very complex; thus it has been hard to grow within the current economy. The next exciting centralized model, also with education as the target audience, is ArtStor from the Mellon Foundation. ArtStor takes a much broader view of the image to go beyond a representation of an artifact with secondary material such as large-lot slide collections used in art history curricula.
All of these methods require highly structured central repositories of information and images. AMICO and ArtStor have shown great promise partly due to their structure and goal of servicing a known audience of scholars and students. This is well suited to art historical research and classroom presentation on university campuses. While George Eastman House was a charter member of AMICO and an avid contributor, there was the recognition that not all photographs are limited to art historical analysis. Photographs participate in many histories, ranging from social to cultural, technological, political, regional, and local history. ICP and George Eastman House also realized that our audience is beyond the academic researcher or the student in an art history class. We view the "recreational scholar" as both a consumer and potential contributor to the history of photography. For these many reasons, a third way was needed - one between the individual museum repository accessible on the Web and the large-scale hierarchical database delivered by a central authority.
The model of the Internet itself is not a hierarchical structure but one with many nodes and self-mending routes from machine to machine. After all, the basic concept of the Arpanet - the original defense department "internet" from 1969 and into the 1970s - was intended to survive a nuclear attack by allowing nodes to continue to communicate when one is down. In the 1980s and 90s, there were many models for sharing information among computers, including WAIS (wide area information server), Gopher hypertext server from University of Michigan, the library machine query protocol (Z39.50), and, of course, the Web.
The most spirited form of a sharing protocol is Napster, starting in 1999. Despite the lawsuits by the Recording Industry Association of America (RIAA), they cannot put the genie fully back into the bottle. The success of iTunes suggests that many people are quite willing to pay a reasonable fee for a good product, but Napster epitomized the basic concept of file sharing and demonstrated the degree to which a loose amorphous community can be formed using networked systems. This concept employs a central server to point to content on a myriad of large and small computer nodes operating on a peer-to-peer level.
Gnutella is a more generic and pure peer-to-peer (P2P) file sharing protocol. Gnutella combines client and server functions into one program (sometimes called "servent" for SERVer and cliENT).
Of course, intellectual property issues abound on the subject of peer-to-peer file sharing. The great debate is where to place the responsibility and liability. The central issue for museums interested in forming distributed query collaboration or more informal circles of common interest is that each institution will still be responsible for properly representing control over copyrighted works in its collection. They can always just choose to share the text citation for the work and not an image.
While a P2P system returns to the spirit of the early Internet, it also presents a cacophony of voices and chaos of information. A third way for museum collections combines the authority of the individual archive with the centralized services to assist in discovery and retrieval. The project that the International Center of Photography and George Eastman House is conducting with Gallery Systems provides authority without authoritarianism.
Distributed Query System: Plans
The Museum System (TMS) from Gallery Systems is one of the most widely used integrated collection management systems. It is based on an SQL engine, generally Microsoft SQL server, but it also runs on Oracle. It is a rigorous and complex program. While some have criticized it for offering a one-size fits-all approach, the standardization of most fields was an asset for ICP and George Eastman House. There are still many user-defined fields, but knowing that the core information is in largely the same place facilitates sharing database information between the two museums. At the outset we planned a four-phase project:
This project is individually and collectively funded by grants from the Institute of Museum and Library Services (IMLS), the National Endowment for the Arts (NEA), and the support of Trustees and individuals at both museums.
Use of Standards
While ICP and GEH decided to license The Museum System (TMS) as part of our long-term strategy, Gallery Systems realized they could not assume that in a distributed query mechanism, all participants would be running TMS. It would be delightful for their bottom line, but not realistic. For this reason the company adopted a more open system that fits well with the goals of ICP and GEH to create an enhanced distributed query mechanism for the www.photomuse.org Web host. While TMS itself is written in Visual Basic and runs on Microsoft Windows operating system, the distributed query Central Server and individual nodes are Java applications. This allows the query nodes to run on many different hardware configurations with a number of operating systems. A field mapping strategy allows the distributed query node to communicate with a home database also running software applications including, but not limited to, TMS.
Gallery Systems prototype is code-named "Latitude." It employs several standardized software components:
The Procedures For Implementing This Server-Node Model
Most collection management databases offer the salient information about the catalogued object. Some also have provisions for more extended text (wall label), bibliography, etc. To give the photographic objects within the ICP and GEH collections greater context, the two museums plan to develop "information collections." These collections will afford the Web visitor an opportunity to understand the object in a larger context or to conduct research across very different disciplines. One example is the large repository of biographical information at the George Eastman House. This data was compiled initially for their own collection, then later to support work on the Index to American Photographic Collections (1982, 1990, 1998). This information could become part of a series of tangential tables related to the core database and retrieved as part of a distributed query. Another example is a chronology of the history of photography prepared by ICP as part of the original prototype for www.photomuse.org. This chronology will be an information collection and a separate node in the distributed query database.
Users as Authors
To develop these information collections, colleagues in photography museums and teachers of the history of photography will be invited to participate. Areas of research will be divided among participants. For example, a class in the history of photography at the University of New Mexico might be asked to conduct chronological research on the period 1880 to 1899, two decades. The chronology is a series of structured tables so it will be a simple task to organize the research accordingly. Using fill-out Web forms, the classes will upload chronological submissions. These will be reviewed, edited and then added to the live database if appropriate. This will also allow the project to expand the chronology to other nations using a similar approach. For example, a group in the Netherlands might be asked to submit information on milestones in the history of photography in the Low Countries. While our initial work will be only in English, we are preparing the tables to accept text in Unicode format to accommodate other languages. (This also brings up complex questions about future search strategies).
The International Center of Photography and George Eastman House also plan to collect and help refine existing information collections. For example, there are several large-scale bibliography projects that have been developed by private individuals. We hope to be able to reward this hard work through selective acquisitions (or licensing). These larger long-term aspects will require additional fundraising, but we hope that offering these services to the larger field of photographic research will lead to outside support.
Where It's Headed: Texts and Contexts
Given the prevalence of Napster, Gnutella, and Grokster among young people growing up in a computer culture, in the future it will be expected that museums offer services that create extended communities of interest. One way is to move beyond a "collection centric" approach to a "field centric" model for understanding artifacts in context.
We also need to move students from a reliance on "Googling," which will only expose material within a statistical arc, to more studious means of research that brings them closer to "getting dusty" in a library. Collaborative Distributed Query mechanisms, linked not only to objects but also to rich affiliated primary documents and secondary contextual sources, will give both researchers and the "recreational scholar" a closer approximation of the surprising discoveries afforded by getting lost in the stacks.
Without making a virtue of reporting on a decidedly unfinished project, we are operating with the optimistic assumption that we will continue to encounter new applications for information tools conceived originally as merely collection management catalogues. We view distributed query as but one of several initial steps in the transformation of what is fundamentally an inventory control system, albeit digitally enhanced for museums. Photography archives may present exceptional opportunities for their custodians who have come to understand that their images contain significantly more than has been described or referenced in their catalogue. Specifically, photographic meaning and content are proving to be resistant to terms of classification. Especially problematic would be terms that could enable retrieval by users untutored in the specifics of a collection's structure, a photographer's oeuvre, or the historical/social contexts from which particular images may have been derived. Populating the subject fields in photography data records presents an especially confounding problem. A controlled vocabulary of descriptors for a medium that continues to map the entire visible universe is a conundrum - even if we assume a single language and an infinite labor pool for data entry.
But the fact remains that searches among photography archives, especially those that increasingly derive from inter-collection search strategies, will target what we generally think of as subject matter. Anticipating the increasing demand for subject-related search results born of the virtual merging of photography archival data, we are looking beyond the subject field as an organizing principle for the grouping of images. Is it possible that a solution to the conundrum of subject searching in photography archives may rest in directly mapping relationships among images and between images and texts?
For the moment, forget about naming the subject, and look instead to the less precise but less demanding process of affiliating photographs by using texts, of any kind, that that have the capacity to link one image to another. Such texts, existing outside the image object record, might be an exhibition checklist, a list of images licensed for a particular publication, an essay referencing collection images, or any other document that could be marked up in ways that could be used to obtain an image. It would even be possible to employ the "activity" of individual images in a collection to generate useful affiliations for browsers of the archives. There is a kind of automatic authority in the taxonomically dumb, yet statistically powerful, merchandizing engine used by Amazon.com: "Customers who purchased this book, also purchased…."
Implementation of affiliations, such as those hypothesized above, would have the effect of generating multiple overlapping lots. And systems left to the accumulation of automatic links of affiliations could easily introduce data noise into a system that is already rather imprecise.
With all of the liabilities, the potential utility of affiliation tools remains sufficiently compelling to warrant a more aggressive program to digitally acquire texts relevant to key groups of images. The Alliance of ICP and Eastman House has already begun to photograph and transcribe the correspondence of Daguerreotypists Southworth and Hawes, as a function of the routine preparation of the exhibition treating these important 19th Century photographers. As we refine our systems, both institutions plan to incorporate this kind of document capture as a standard function of project development. By taking every opportunity to feed the potential of image affiliation, these two great photography venues will at least be making an investment in expanding access to collections via connections that we may never be able to name.
TMS The Museum System, a collection management database designed and distributed by Gallery Systems of New York City.
SOAP Simple Object Access Protocol, used to communicate and to deliver or retrieve data between many types of computers or operating systems by encapsulating the data into a command within the http Web standard.
Distributed Query, cross collection: this can be applied within a single institution with multiple collections or, as in the case of the International Center of Photography and George Eastman House, across similar materials in administratively separate organizations.
Distributed Query, cross domain: domain in the sense of type of collection; for example, a query to a photograph collection for works by Lorna Simpson and a query to a library database for books about Lorna Simpson, and perhaps exhibition texts for an exhibition of her work. Not to be confused with Internet domain names or Microsoft local network domains.
eMuseum: a software product from Gallery Systems for search and retrieval of database records and associated images on a Web host.
XML, Extensible Markup Language: while HTML (hyper text markup language) describes the format and look of the data, XML can describe the type of data, establishing a standard by which to search and represent content within a body of text.
Gnutella: a peer-to-peer file sharing protocol, not limited to music. This protocol is in effect a form of personal publishing. Of course, the personal publisher rarely owns the rights to the shared material. The inappropriate use of the technology does not change the relevance of it as part of a sea change in the culture.