Published: March 15, 2001.
Historical Map Collection Web Site
David Rumsey, Cartography Associates and Luna Imaging Incorporated, USA
The David Rumsey Historical Map Collection Web site includes over 4,400 high-resolution images of rare maps from Rumsey's collection of more than 150,000 maps of the Americas, one of the largest private map collections in the U.S. The site is powered by Luna Imaging's Insight software. Launched on March 15, 2000, Rumsey's site was the first Luna-designed site to go public with unlimited access to users without charge. This paper will discuss issues surrounding the site's purpose, design, and operation, including metadata management, digital image creation and preservation, site usage and economics, sharing web based archives, and future directions for online collections.
Keywords: historical maps, Web site design, metadata, online libraries, digital images, site access, Java Clients, Luna Imaging Insight, browser image viewers, collection sharing, search engines, image compression, copyright, intellectual property, virtual libraries.
Only a short time ago, the mere idea of a link between maps printed centuries ago and today's digital world of the Internet might have seemed ridiculous. However, these two different worlds contained the seeds of amazing collaborative possibilities, which are just now being realized. To understand how this has come about, it is helpful to look at the broader changes that are occurring today with regards to libraries and the Internet. Until recently, libraries only made catalogs of their materials available on the Web. Now Libraries are starting to actually deliver text and visual materials to their Web visitors via high-resolution images of maps, photographs, prints, books, and other cultural heritage objects (including film, video, and audio) along with metadata linked to each image or information file. This is leading to the development of large virtual libraries accessible to the public, world wide, for the first time.
Who is doing the work of creating digital libraries? Evidence of serious institutional commitment to digitizing our cultural heritage abounds. The Library of Congress has been an early leader with its American Memory project that includes photographs, maps, text, manuscripts, film, audio, and ephemera. The Mellon Foundation has launched several important initiatives, including JSTOR (a huge archive of scholarly journals), digitization of MOMA's design collection and the ancient scrolls of Dunhuang, China, and the formation of a sister virtual archive to JSTOR tentatively called ARTSTOR that will focus on visual resources for scholars. Research Library Group (RLG) recently announced a major effort to create an integrated digital collection of cultural materials held by its members. And the New York Public Library has begun the implementation of its Visual Treasures project, which will create an immense virtual library of 600,000 images documenting social and cultural history. NYPL also has several other digital projects either on the Web or soon to be launched, some on its own and some in collaboration with other libraries and universities. Of special interest to me is NYPL's project to digitize over 1300 early maps of the Middle Atlantic Seaboard. One can reasonably predict that the number of new digital initiatives will increase exponentially over the next five years, coming not only from the larger institutions, but also from smaller archives and private holdings, and from many parts of the world.
What does this mean for the future? Because these new virtual libraries will be digital in structure, they can be linked and shared with other virtual libraries, including museums and other archives. This will "break down the walls" between content types and lessen the traditional separation of paintings from maps, film and video from sculpture, photography from music, and so on. Geographical separation of archives will be eliminated by having simultaneous Web access to collections in North America, Europe, Asia or anywhere on the globe. Historical maps will play an important role in the new digital libraries because their content is uniquely suited to digitization: high resolution scanning of historic maps reveals a richness of detail that makes the map content "expand" as one zooms into finer and finer levels of information. Maps can be seen as kinds of dense systems of information that the process of digitizing and then displaying with powerful software tools can make visible -- in fact, make visible many things that we may have missed when viewing the originals. Maps are like virtual libraries in themselves: they hold huge amounts of information that is visually cataloged on their surface by location, symbol, color, type, and scale.
Historical maps will benefit in several ways from these developments. They will become far more accessible to the general public as well as scholars and students. In this process, they will become part of our public historical memory, in the same way paintings and photographs are. These maps will begin to be seen in their cultural context. By being included in large digital libraries of contemporary cultural heritage materials, we will find new meaning in maps, and maps will add new meanings to our understanding of the culture of their times. Maps will be able to be compared and studied as part of a vast shared virtual library of map holdings worldwide. Scholars and students will have greatly increased simultaneous access to the holdings of different institutions from many countries without having to travel great distances -- and they will have access all the time, 24 hours a day. Attendance at map libraries will actually increase as people become more familiar with historic maps over the Internet. People will always value seeing the "real" thing.
Against this background, I would like to share some specific details and lessons learned from the development of the Rumsey online collection, as an example of the process of moving cultural heritage materials from their, in my case, rather private physical existence, to their public digital life.
I had 20 years of preparation for going "digital." I used a database as soon as I began building the physical collection of my library. This database shaped the growth of the collection and allowed for truly contextual collecting -- seeing the relationships between the collected maps by subject, graphic type, time period, author or publisher, or geographic location. The use of a database allowed me to visualize the collection in a "data space," and this facilitated the growth of the collection into a coherent whole (in my case, maps of North and South America from about 1700 to the present). As the collection grew to more than 150,000 maps, the database was essential in maintaining depth to the various parts of the collection by showing which sections of the collection needed additions and how new items might flesh out the collection -- such as letters of a map publisher that revealed his cartographic and business practices.
I began to formulate a definition of what was required to create a virtual library on the Web: high-resolution images in sufficient number and in meaningful relationships (in other words, a strong collection) that would allow for serious scholarly research and educational use, as well as use by the general public; detailed metadata for each image that could also express relationships between images (such as the parts of an atlas in order); powerful onsite searching and sorting capabilities; effective printing tools; and excellent viewing software that allows one to examine the online images and create the feeling that you are holding them in your hands.
My familiarity with (in fact, dependence on) a collection catalog database made it natural for me to add high-resolution digital images of the maps to the metadata. I began the imaging process in 1997 around the time that high-resolution camera scanners became available, along with the development of image compression software called MrSid, by Lizardtech. Also at this time I established a relationship with Luna Imaging and started using an early Java Client version of their Insight software to display my first map images on the Web. I began the process of choosing which maps to digitize first - the existence of the catalog database was critical in both selecting which parts of the collection to start with and build from, and in avoiding the traps of either choosing only the "greatest hits" or beginning with the "a's" and arriving at the "z's" years later. Using the database, I broke the collection into its major categories and then began to select items for digitizing from each of those categories in sufficient number to provide depth. Some category examples are globes, civil war maps, maps of states, counties, and cities, important atlases, maps in Americana books, and so on.
On the technical side, Cartography Associates immediately established standards for scanning images of at least 300 pixels per inch, measured against the object's physical size (and in many cases, up to 600ppi if the smallest detail on the map justified it). We decided to use camera scanners instead of flatbed scanners because camera scanners gave us the depth of field required for images of books and atlases. Furthermore, the use of a copy-stand digital camera with side lighting results in a stronger sense of the map as an object, which is essential in today's 2-D Web space. Color management was accomplished by calibrating all monitors to a standard with colorimeters, using color bars in saved image files as well as creating color profiles for all parts of the production line (scanners, monitors, and printers) and saving color profiles in the images themselves. Lenses were tested for maximum depth of field. Florescent lights were chosen for color consistency and minimum heat on the objects. The MrSid compression software was tested for compression levels that created the least artifacting in the MrSid image. Light smoothing software was employed to even out variations in image brightness -- especially important in knitting together multiple shots of large originals. Once all of these elements were in place, we were able to produce between 50 and 100 images a day, including derivative production and archiving the originals.
For our metadata format we chose initially to create our own "Rumsey" set of fields in the Access database to allow for the fairly complex relationships of objects and containers: an atlas had to be represented as a whole by a "container" field, and in its parts by other fields, and all the parts had to be in a proper sequence, i.e. the same sequence as in the book. MARC format is not intended to easily handle this kind of problem, although we are currently cataloging in both "Rumsey" and MARC because of MARC's ubiquity in the library world. And the Luna software can now translate our data structure on the fly to Dublin Core or VRA. I suspect that over time certain consistent metadata standards will evolve for Web-based online libraries. For the present, one has to contend with several competing standards.
On March 15th of last year we launched davidrumsey.com with 2,300 online images and data. The site increased its holdings to more than 3,000 images in July 2000. By the beginning of 2001 we had about 4,400 images online, and expect to have 10,000 by the end of 2001. The collection is accessible to the public in two formats: a browser version that can be easily accessed by Netscape or Internet Explorer, and a downloadable version called a Java Client that, while it takes the time to download, offers advanced functionality.
When you make the decision to bring a private collection into the public domain of the Internet, anything can happen. I had high hopes of reaching a new, undiscovered audience when I launched davidrumsey.com--but I certainly didn't expect my antiquated, somewhat academic endeavor to be covered by Wired magazine, let alone to be chosen as the Site of the Week by South Korea's Ginseng Chicken Factory Web site.
The reality of going online and maintaining an online library produced some interesting and revealing experiences for me.
First, we reached a much larger audience than we expected. In the 12 months that we have been online we have had over 200,000 total visitors and about 10,000,000 hits. Average visit length is about 10 minutes. United States visitors make up about two thirds of total visits, with international visitors accounting for the balance. ".Com" visitors are about 25%, ".Net"about 25%, ".Edu" about 25%, and all others about 25%. Visits per day ran from 500 to 6,000. We were astonished at this volume of visits because typically, map libraries draw around 2,000 to 6,000 visitors per year (UC Berkeley around 6,000, Stanford about 2,000). In my own library I had about 200 visitors over 20 years. Our visitor rate for the past year, 200,000 annual visitors, is about the number of visitors to the typical university map library over a period of 50 years. Is the quality of the visit as good as the "real visit" to the map library? I think so. The Web-site visitor enjoys easier access to the material, high-resolution images, comparisons, printing copies, searching, speed, etc. Does it take the place of a real visit? Not at all. Exposure to the material on the Web-site library should encourage more visits to the physical library by allowing people to first search and focus on what they want to see via the Web, then go and see the actual object at the physical library.
Our browser software was successful in getting us directly and quickly to our users. Users did not have to download any software -- they could access the collection directly using Netscape or Internet Explorer browsers. Users were able to find us on the Web through the search engines and directories. Many of our users downloaded the Java Clients after they had experienced the collection with the browser, for greater functionality and the ability to download high-resolution image files for printing. Users have downloaded over 8,000 Java Clients, about 4% of the total visits. However, the type of users that visited was different from what we expected: 50% were from the "general public" with about 25% from schools and universities and the other 25% from ".gov", ".org", ".mil" and unknown. We expected the general public percentage to be smaller. To us this indicated that the public is hungry for interesting and well-presented material on the Web. Users did find us easily on the Web as we appeared in the major directories and search engines with good reviews (Best of Net at Yahoo and About.com, Editors Choice at Netscape, Lycos, and DMOZ, Site of the Day at USA Today, and others).
The Web site has created a public presence for the collection. We are a private collection that is not open to the general public. We needed a Web "house" to get our maps out for public use. Luna's Insight software architecture creates the feeling of a "place" where people can go and examine things as if they were holding them in their hands. We are now listed with all the major institutional map libraries in the Web directories. Our online map library is equaled in size only by the Library of Congress Map site.
We have been able to function as a model for other collections headed online. Many institutional map libraries that wanted to go online and were interested in information have approached us. I think most of them are convinced by our site that the experience of using a map online with good software is very close to working with the real thing and has the obvious advantages of easy delivery anywhere, combined with eliminating the wear and tear involved in working with the original maps. Private collectors are also interested, but in most cases they will need support from major map libraries in going online because of the technical requirements -- however, this should be an exciting area of collection development for many institutions -- by helping private collectors move their materials online, they are more likely to be the recipients of those actual materials as donations in the future.
We have been able to keep the site free to all users. Users, to our knowledge, have been respecting our copyright rules and there has not been a mass downloading of images with the Java Clients. Users are very appreciative of the fact that the site is free and open. Several interesting revenue opportunities have arisen to license the images and data to large Web sites that want to display the images within their own browser environments. We are currently working out an arrangement with Ancestry.com, the largest genealogy site on the Web with over 100,000 visitors a day. Another company would like to do high-resolution art printing of the maps to sell site visitors. Revenues from these arrangements will allow the free site to continue.
The site has been in operation 24 hours a day, 7 days a week, since March 15, 2000, the day we launched. We have been down a few times, but never for very long, and most of those problems have been solved. The software scales well, working fine with hundreds of simultaneous users. We have not had to provide a huge amount of tech support and the vast majority of users figure out how to use the site without problems. What problems we did have invariably resulted from users with older, slower computers and less then 56k modems. Broadband is becoming more widely accessible and we will benefit from this. Our high morning and afternoon hourly usage patterns show that many people are signing on from work where they have fast lines, but the evening traffic is picking up as more cable and DSL lines are installed in homes.
Our users found interesting applications for the site. Many K-12 teachers have used the site with their classes and we were chosen as "school site of the day" on Discovery.com. The intuitive design of the Luna software allows for easy use by young students and advanced researchers alike, although it appears that most students use the simpler browser version while scholars eventually switch to the Java Client version for its advanced and somewhat more complex features. Civil War researchers are major users of the more than 100 Civil War maps on the site. Genealogists use the site to find where their ancestors lived. Map collectors and map dealers use and link to the site. Many of the major map libraries and collections link to the site and several feature it prominently as a resource. Several Web design sites, including the prominent "Project Cool" have featured the site as a demonstration to designers of what is possible with sophisticated software design. The site has been used by home school organizations. We had more international use than we expected: we were featured in prominent news Web sites in England, France, Portugal, Brazil, Sweden, Estonia, Slovakia, Norway, and certain factories in South Korea already mentioned. And most of all, the general North American public has used the site far more than we expected, and judging from the emails we got, many of these users had not been exposed to historic maps before. Returning visitors numbered about 15% per day indicating roughly that visitors returned to the site about once a week. In order to maintain that we will have to continue to add material to the site. However, map image databases have the advantage of containing an extraordinary amount of visual information in each image: currently we have about a terabyte of information on the site, with average image size of over 200 megabytes. It will take a user many visits to examine everything on the site in detail.
Putting digitized cultural heritage materials on the Web offers us the opportunity to significantly increase the number of people using these collections. From our experience, we feel that there is a potentially huge audience on the Web for interesting cultural content of all kinds, as long as it is well documented and well presented. Focused or specialized content can find large numbers of users on the Web - historical maps are certainly in this category, not being generally recognized as "art" objects. Through the vast reach of the Web, you can find your audience for almost anything, and it will be much larger than you think.
However, collections can't just launch their sites and sit back and wait for the traffic to appear. The Web has over one million new pages a day added, and if online collections do not take the time to become a part of the immense Web cataloging system, their audience will not see them. Collections all take the time to catalog their individual collection items; similarly, they must take the time to see that their Web collection is cataloged as a whole and in its significant parts by the search engines and directories. This is done by structuring the collection Web-site home pages so that they best reflect what the collection is about, as this will be seen by the search-engine spiders and is the basis of their indexing; by submitting the site to all the major search engines and directories; and by creating additional site pages that break down the collection into broad themes - this is especially important for dynamic sites like ours that have most of their content hidden in an internal database that is not visible to the search-engine spiders. With the increasing use of Limited Area Search Engines (or LASE's), that are non commercial, from the academic world and usually focused on specific subjects (hence limited) there is hope that the secret algorithms of the commercial search engines and the inevitable frustration of trying to figure out how they will catalog your site will be replaced with transparent and rational cataloging methodologies. Examples of LASE's are Argos, Hippias, and DevSearch.
Working with user profile information generated by the site server log is very important. Using good Web log analysis tools such as Webtrends allows collection managers to understand who is coming to the collection site, what they are viewing, where they were referred from, what search engine they used, how long they stayed, and so on. It is actually possible with log analysis software to have an as good or better idea of the site visitor activity as you would of a collection's physical visitors. This information can be useful in decision-making concerning the growth and direction of an online library.
There is rising concern among collection managers about copyright issues associated with going public on the Web. I think the best way to deal with copyright and security concerns is to create different groups or levels of the collection. One level could be all pre-1923 material that is clearly in the public domain (this is the case with almost my entire collection, fortunately) and that can be taken public without problems. The next level could be post-1923 material or material that is for other reasons not suitable for the public domain, allowing this level to be used within a limited community only or with other institutions on site only. Another level could be for use only within the museum or library, and so on. The legal status of copyright of the digital images themselves remains in flux. The Bridgeman case asserted that there could be no enforceable copyright for digital images. Undoubtedly there will be further rulings and appeals. My view is that ultimately, the legislative process will newly define copyright and other intellectual property issues in the near future. As a practical matter, we have found that the vast majority of users respect our copyright and work with us to license the images.
Going on the Web may hold real financial benefits to collection owners, though this remains to be proven. Several possibilities exist: substantially increased Internet use and attendance may make the collections more able to attract government and corporate funding; judicious licensing of content to appropriate online educational/information sites can produce revenues; printing of reproductions is a possible revenue source. Collection development may benefit by collaborating with private collectors to digitize their holdings as a first step in bringing their materials into an institution. I suspect other possibilities exist that we can't see today. Currently, in most cases, charging for access by the general public to an online collection is not feasible. In the future, it may be. Ancestry.com provided an interesting recent example of charging the general public for access -- they launched their 10-million image database of the US census from 1790 to 1920 and promptly sold more than 30,000 one-year subscriptions for about $30 each, bringing in more than 1 million dollars in revenue.
Going public on the Web with high-resolution image access to cultural materials today is still seen as only an option by most libraries and museums. I suspect that in a few years, it will become as necessary as keeping an institution's doors open to the public. The public will get a taste of the wonder and excitement of Internet access to great collections and they will want more. Those institutions that get in early on this phenomenon will be able to shape the agenda and create the early "brands" of the great online collections. Those that wait too long will have to play catch up later.
We are working on several new initiatives for our site that are worth mentioning. We are proceeding with a project with UC Berkeley to take all of our metadata and upload it to their online Web library catalog called Melvyl, with a URL in each record's MARC 856 field, that when clicked will open up the map detailed on that record, directly from our online collection. This will allow catalog searchers at the University to find us via the university catalog and at the individual map level, as opposed to coming in our "front door" and then searching for that map with our own search function. These will be records for the digital images of the maps as opposed to records of the physical maps, and as such will be unique records created within OCLC and thus available for downloading by participating OCLC members.
Another new project will be MapLibraries.com, a collaborative site of the Rumsey Collection and the Library of Congress Map Division. Initially the site will have about 8,000 images, half from each collection. We hope to add other participants in the future. As part of the project, we have constructed a Luna site for the LOC. From the homepage of MapLibraries one can access either the Rumsey Collection or the LOC collection individually, or together. Instead of browsing online map resources at separate Web sites or traveling great distances to view maps at different libraries, all the image holdings of participating institutions can be viewed and searched together. This allows for comparison of map images side by side, while the actual physical maps being viewed may be in libraries separated by thousands of miles. The newest version of the Luna Insight software will support searching across sites with different metadata structures, and allows for a distributed model in that data and images can be accessed from separate collection servers. Currently, participating sites must be using the Luna platform. In the future, it may be possible to include participants in some fashion that are on non-Luna platforms. Image duplication is a real issue and initially we will favor non-duplication of shared image content to conserve limited resources for scanning. Access schemes may vary among participants, with some participants having entirely open access and others having access only within certain communities. The software can also allow for these different access levels.
We will be starting our own Limited Area Search Engine (or LASE) for maps. The purpose of this LASE will be to catalog online map collections at the item level, instead of at the collection level. In other words, it will be a catalog of actual digital map images available on the Web. We will incorporate this into the MapLibraries.com site. We would like this project to be a demonstration project that would encourage others to create LASE's for other subject areas. Eventually, individual LASE's could be combined.
Also we will continue to work closely with Luna Imaging on enhancements to the software. Coming soon in version 3 of the Insight software: an image annotation tool, multiple rotating image views, drag zooming, cross collection combining and searching including support for mapping between different data structures (MARC, VRA, Dublin Core, etc.), support for video and audio, linking from images to other resources, measurement and scaling tools, defining different access and permissions for a collection, and changing to a fully relational database allowing the utilization of tables such as Getty's Thesaurus of Geographic Names.
We will georeference our maps so that they can be displayed in GIS systems and used by GIS based historians and others. This means establishing the correct latitude and longitude of the map's four corners and its scale. The map can then be downloaded from our site and displayed in GIS programs like AcrView or ArcInfo. Then historical data such as population, climate, or locations can be displayed as layers on top of the historical map; modern data can by used in the same way. One can imagine using GPS devices to show one's exact position in a georeferenced historic map. Historical data can also be extracted from the maps through digitizing, and combined with other data and displayed. Working with GIS in this manner raises some concerns that a map's identity as a cultural object will give way to its use as a data source. I think it is important to always identify which maps have been "distorted" into correctness through georeferencing and to always consider the original ungeoreferenced image as the primary image to be archived, with the georeferenced images classified as derivatives.
Finally, we will participate with other online cultural collections of non-map images such as Yale's Imaging America project, in order to integrate our map collections with paintings, photographs, art and architecture historical images, and other cultural heritage objects. We hope that this will allow historical maps to function in a larger context and perhaps move maps more into the cultural mainstream.
In conclusion, our experience with the Rumsey online map collection indicates that even focused, specialty archives such as ours can have great usefulness to the public when they are made freely accessible on the Internet using high-resolution images, solid metadata, and advanced viewing software. One can reasonably forecast that the growth of online archives of all types and sizes will increase exponentially over the next five years. The challenge then will be to avoid the "balkanization" of separate resources by providing powerful linking and cross-collection searching mechanisms that will effectively unite all the archives into a digital "Alexandria" while maintaining their separateness and uniqueness as collections.