Skip to main content

Museums and the Web

An annual conference exploring the social, cultural, design, technological, economic, and organizational issues of culture, science and heritage on-line.

Radically Open Cultural Heritage Data on the Web

Jon Voss, We Are What We Do, United States


What happens when hundreds of thousands of archival photos are shared with open licenses, then mashed up with geolocation data and current photos? Or when app developers can freely utilize information and images from millions of books? Like the web of documents that became the World Wide Web, a web of data is the goal of Linked Open Data.  This paper examines how a cultural, technological, and legal environment is enabling a growing ecosystem of open historical data.  We explore the fundamental elements of Linked Open Data, how a global community within libraries, archives and museums is beginning to play a critical role, and how we are moving toward broader adoption.


Figure 1: Canvas, by Tim Wray. Linked Open Data to bring virtual museums to life (Wray, 2011)

1.     Introduction

In 2008, an independent technology consultant in San Francisco with a passion for local history set out to build a simple website that would allow the public to view historical photos on a map, regardless of where the photo was hosted.  Essentially a geographic web search of historical photos, it allowed, and required, manual identification of photos on the web and ascribing geographic data via the Google Maps API.  The metadata was added to a MySQL database, and the stored photo location would allow a fair use image to be displayed on the site, always linking back to the source photo page. That history geek was me, and the project was called LookBackMaps.

It was a working prototype and was successful in creating a view of place through time while drawing from disparate historical photo databases.  It was picked up by a few big blogs and started getting used more and more.  I had to upgrade to a more scalable hosting environment. I started to tackle mobile apps that would give an even more immersive experience of history. 

But another problem was clearly evolving. While I had created a rudimentary solution to the problem of accessing and displaying data from silo databases, I now had that data in another database, no more accessible than any of the databases we were drawing from and consolidating.  It felt like I was back where I started. 

I began dreaming of a database that others could contribute to and edit and use for their own projects, similar to Wikipedia.  Fortunately, there were a lot more people smarter than me already pouring countless hours and investing millions of dollars toward creating a similar solution.  I learned about DBPedia, structured data culled from Wikipedia; and Freebase, a community-edited database, which could be exactly what we needed.  I started to discuss the possibilities with colleagues in various technology sectors and in archives and libraries as well.  In November of 2009, at the Internet Archive, we pulled together a small working group of engineers, developers, archivists, product managers, and a copyright lawyer.   We called the group the Open Archives Metadata Mapping Project (“Open AMMP”), and we had the goal of identifying interoperable standards that would allow computer developers to build with data from libraries and archives, and freely share metadata improvements.

Working with a team from Freebase, we realized that the potential solution was already well under development, that a framework for shared data had been built, and that Linked Data, a protocol for sharing data on the World Wide Web, could very well be utilized to reach our goals. There were big questions of adoption, openness, access, and education, but the basic building blocks were already in place.

What’s more, a mashup culture (counter culture?) built on shared resources was already very alive in the world of libraries, archives, and museums.  In increasingly lean economic times, many cultural heritage institutions were more open to collaborative innovation than ever before, realizing that in many cases their very survival may be dependent on proving their relevance to patrons and donors alike.

Combine this with a growing global push for open government data; the technology to make that data increasingly useful; a World Wide Web in which national boundaries and traditional barriers were all but a thing of the past; and a global social graph that enabled organizing like never before…

… and you’ve got all the ingredients for something pretty amazing.

2.     Linked Open Data in Libraries, Archives & Museums

Our initial working group turned quickly toward demonstrating a use case for Linked Open Data, in order to illustrate the large scale potential for practical implementation, particularly within the humanities side of libraries, archives, and museums.  We launched Civil War Data 150 (, a collaborative project between the Archives of Michigan, the Internet Archive, Freebase, then LookBackMaps, and the University of Richmond Digital Scholarship Lab.  Yet our first response to grant proposals made very clear that Linked Open Data was an idea that the funding community and practitioners of the digital humanities were not ready to get behind just yet.  It was a very new idea, and we certainly faced then, even as we do now, the “chicken or the egg” problem, in that many funders were not willing to invest in Linked Open Data projects until they could see a viable use case; and we were unable to fund a viable use case until funders could be persuaded that it was a worthwhile investment.  More than anything else, this was a branding problem.

Above all, we found that in the United States, most people in libraries, archives and museums didn’t know what Linked Open Data was, so we had to start by defining the field and explaining exactly what it was that we were talking about.  Fortunately, Tim Berners-Lee, often referred to as the “father of the World Wide Web” had begun to evangelize the idea of Linked Data in a TED talk in 2009 (Berners-Lee, 2009), asking everyone to share their “Raw Data Now!”  While he had originally introduced the technological framework for Linked Data in 2006, the evangelizing he was doing now made it much easier to comprehend for a general audience, and was beginning to pick up steam in the World Wide Web community.

For instance, in the commercial sector, there were already examples we could look to. Enabled by the tools and techniques of Linked Data, a web of data is evolving as news outlets such as the BBC and The New York Times release their metadata to the world with open licensing.  Major movie and music databases like IMDB and Musicbrainz, and social networking sites like Facebook have done the same, allowing developers to discover and build new and useful ways to connect and present information on the web.  These tools span multiple databases and will radically change how we use the web. A critical element of this web of data is data aggregators, which serve as a clearinghouse, index, and easy access point to millions of named entities across thousands of data sources, such as Freebase.  

So What is Linked Open Data?  Some Definitions

In the context of libraries, archives and museums, we’re particularly interested in metadata, or information about collection holdings. Metadata is differentiated from data or assets in this case, which refer to the holdings themselves, be they digital surrogates of photographs, objects, books or digitized books, etc. The metadata may describe what the book or photo refers to, where an image is located on the web, what the copyright or licensing restrictions are, etc. How this metadata is made available to the public, and to the World Wide Web or software development communities determines whether it is “Open,” or “Linked.”  The phrases are capitalized because they connote specific requirements.  Without going into much technical detail, these are the requirements:

  • Open Data refers to data or metadata that is made freely available to the public with the express permission to reuse freely for any purpose, though publishers may require attribution.
  • Linked Data refers to data or metadata that is made available on the web in a format that utilizes generally accepted markup and World Wide Web protocol, much the way web pages utilize a code that allows them to be read by web browsers.
  • Finally, Linked Open Data refers to data or metadata made freely available on the World Wide Web with a standard markup format.

It should be noted that metadata can be licensed separately from data or assets. For instance, an institution may have a digitized photo collection of buildings in New Orleans in the 1940s, which the donor gave to a museum with certain restrictions. The institution has published these photographs on the web through a content management system and the public can access the photos through a finding aid, and see that there are restrictions for reuse, a way to contact the institution to request higher resolution images for a fee, etc. It’s typically not possible for search engines to index or “crawl” (i.e. make it available for searching) the metadata itself, so these images will rarely show up in a web search.

Despite the donor restrictions on the photographs in this example, the institution can still publish the metadata about the collection items with an open license. Let’s say they publish a simple comma separated file (.csv) which has a header row with metadata fields and information such as headings, titles, categories, image locations, copyright status, etc. If they publish this structured data with a Creative Commons 0 license (, now developers can utilize this information to map the photograph locations, use fair use versions of the images to display them on the map and link back to the photographs in the collection, sort them by keywords, and otherwise build tools and visualizations to make the collection much more useful and accessible.

The institution, or someone in the general public for that matter, can make this Open Data even more useful by translating the metadata into a format that can be more easily read by machines, using markup and standards agreed upon by the World Wide Web community ( When shared in this way, it’s now considered Linked Open Data, and there are a number of ways that it can be published, queried, browsed, and discovered.

Also note that Linked Data is considered a part of the “Semantic Web,” and is often colloquially used almost interchangeably. Essentially, the Semantic Web is how machines can derive meaning and context from structured data, which can be read by machines when published as Linked Data. According to the World Wide Web Consortium (W3C), “The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries” ( So when we talk about “semantic browsers,” we are talking about web browsers that build visualizations or compile links from web pages using Linked Data.

3.     Catalyzing A Global Community

In 2009, Linked Open Data was already beginning to take hold in the sciences, with groundwork being laid for metadata publishing and open licensing options by organizations like Creative Commons and the Open Knowledge Foundation.  Across Europe, countries were exploring Linked Open Data in the push toward open government, and even making significant inroads in the humanities.  For example, at about the same time as the UK government was getting ready to launch a new website to make non-personal government data available as Linked Open Data, the UK’s Joint Information Systems Committee (JISC) was getting serious about pursuing Linked Open Data in universities.  By early 2010, JISC had funded and published the Linked Data Horizon Scan (, followed shortly thereafter by nearly $1.2 million in grants to Linked Data projects at UK universities.  We’re already beginning to see the fruits of this in projects such as LOCAH (, which is creating a model for expressing EAD (Encoded Archival Description) as Linked Data.

The job ahead of us was to familiarize practitioners and decision-makers in libraries, archives, and museums with the technology behind and potential of Linked Open Data, if applied to the vast amounts of structured data collected, stored and maintained in institutions around the world.

Kris Carpenter Negulescu, of the Internet Archive, and I turned our attention to the idea of a focused gathering of practitioners and decision makers from libraries, archives, and museums around the world. By identifying and gathering catalysts that played an influential role in their own communities, and by bringing together professionals in both the sciences and the humanities, we knew we could make a major push toward global familiarity with the basics of Linked Open Data.

We chose a date just prior to the annual Semantic Technology conference that takes place in San Francisco, and cast a wide net to see if we could attract 50 attendees, building budgets appropriately. We recruited a global organizing committee to help us spread the word, vet applications, and choose attendees.  In addition to Kris Carpenter Negulescu and me, our organizing committee consisted of the following people, together with their titles and affiliations at the time: Lisa Goddard, Acting Associate University Librarian for Information Technology, Memorial University Libraries; Martin Kalfatovic, Assistant Director, Digital Services Division at Smithsonian Institution Libraries and the Deputy Project Director of the Biodiversity Heritage Library; Mark Matienzo, Digital Archivist in Manuscripts and Archives at the Yale University Library; Mia Ridge, Lead Web Developer & Technical Architect, Science Museum/NMSI (UK); Tim Sherratt, National Museum of Australia & University of Canberra; MacKenzie Smith, Research Director, MIT Libraries; Adrian Stevenson, Research Officer, UKOLN; Project Manager, LOCAH Linked Data Project; John Wilbanks,VP of Science, Director of Science Commons, Creative Commons.

Thanks to the generous support of the National Endowment for the Humanities and the Alfred P. Sloan Foundation, the Internet Archive convened a two-day summit in San Francisco, June 2-3, 2011 to foster greater public access to metadata in the world’s libraries, archives, and museums through increased adoption and implementation of Linked Open Data.

The International Linked Open Data in Libraries, Archives, and Museums Summit (“LODLAM”) convened leaders in their respective areas of expertise from the humanities and sciences to catalyze practical, actionable approaches to publishing Linked Open Data, through:

  • Identification of the tools and techniques for publishing and working with Linked Open Data.
  • Drafting of precedents and policy for licensing and copyright considerations regarding the publishing of library, archive, and museum metadata.
  • Publishing of definitions and promotion of use cases that give LAM staff the tools they need to advocate for Linked Open Data in their institutions.

Participants had to apply to attend, and we spread a wide net to reach our target audience, including specific communities, like the W3C Library Linked Data Incubator Group, mailing lists of various technology groups, and individual recruitment. The ideal attendee was a programmer, administrator, lawyer, LAM professional, or other professional with at least a working understanding of Linked Open Data, if not some direct experience with the technology or policies involved.  Participants needed to have authority in their position to implement policy or technology, or influence decision makers in their institution or sector. The organizing committee looked for people that had organized others in their field around Linked Open Data and who had a wide sphere of influence. They actively sought representative candidates from a broad range of institutions with diverse levels of leadership and technical expertise. Application submissions opened at 8am, PST February 1, 2011, and closed 5pm PST, February 28, 2011.  Over 150 applications were received for 50 available slots. Given the enthusiastic response, the event budget was reworked to accommodate up to 100 participants. Participants were selected and notified by March 7, 5pm PST. Ultimately, 100 participants from 85 institutions and 17 countries attended the summit.

The LODLAM Summit utilized the Open Space Technology meeting format to give this group of expert innovators the time and space to freely identify and address the most pressing issues related to forwarding Linked Open Data in libraries, archives, and museums.  The summit was convened to address one specific question; “How do we expand adoption of Linked Open Data amongst Libraries, Archives, and Museums?” and was based on two primary principles - passion and responsibility: passion to jump in and play an active role; and responsibility to lead, and follow through with action.

The summit opened with a session in which the participants collaboratively created the agenda for breakout sessions for the first day.  Because the LODLAM Summit was action-oriented, a similar process occurred on the second day, but with a focus on deliverables, documentation, and collaboration thought to be achievable during the calendar year following the summit.

4.     Immediate Steps Toward Greater Adoption of Linked Open Data

After two intense days, there were a few things that were immediately clear. 

First, the collaborative atmosphere and global connections made at the Summit insured that the seeds of a global community committed to further pursuing Linked Open Data in Libraries, Archives & Museums would continue to grow.  Follow-up meetings were planned, panels were proposed for upcoming conferences and annual meetings, and this group of well-connected catalysts left with a better understanding of the complexity and possibilities of Linked Open Data. In the following weeks and months, this would play out around the world as Linked Open Data became a common phrase at meetings and conferences, with Summit participants and others presenting on the issue, and convening working groups and exploratory committees.  With some support from the LODLAM Summit, regional meetings were organized and well attended in New York City, Washington DC, London, Atlanta, and Wellington New Zealand.  A representative list of public report-outs by Summit participants includes: SemTech 2011 in San Francisco; Linked Data and Libraries 2011 in London; the Society for American Archivists annual meeting in Chicago; the 1st International Workshop on Semantic Digital Archives in Berlin; the American Library Association annual (New Orleans) and midwinter (Dallas) meetings; the Semantic Web in Libraries conference in Hamburg; the Museum Computer Network conference in Atlanta; the Digital Library Federation Fall Forum in Baltimore; and the National Digital Forum in New Zealand.

A second takeaway from the Summit was that some very preliminary and immediate tools were needed to support the evangelization of Linked Open Data in libraries, archives, and museums.  While technology staff at several university libraries and federal agencies were already well on their way to adapting or moving toward systems supporting Linked Open Data, most institutions around the world were just learning about the technology and required education and basic information on the topic. Furthermore, the concept of Open Data publishing was new to many institutions and there continues to be a need for supportive documentation, clear descriptions, and examples of precedent. To this end, two immediate products of the Summit were a video-recorded talk covering the basics of Linked Open Data and how it applies to libraries, archives and museums (, and the Proposed 4-star Classification Scheme for Linked Open Cultural Metadata (, which attempts to point institutions toward appropriate open licenses for metadata.

A third and final takeaway from the Summit was that we still have a long way to go.  While the possibilities for Linked Open Data in Libraries, Archives, and Museums are inspiring and encouraging, we are still in what can be compared to the early days of the World Wide Web.  This will require experimentation and failures, small lightweight implementations that demonstrate the possibilities, and use cases that highlight just what we can build with Linked Open Data.  Before we can have Linked Open Data, we need Open Data, and that process of education and data publishing with open licenses has been slow going.  Easy tools for publishing and ingesting Linked Open Data are not yet readily available or easy to use.  Semantic web browsers currently harken back to Netscape in 1994, a far cry from the web browsing experience we’re all used to today.

5.     Building Momentum

Yet despite these obstacles, the last year has seen the building of serious momentum in the field, and I fully expect that to continue.  In 2010, for instance, the Linked Open Data cloud grew by an astounding 300%, but the amount of data relevant for libraries grew by 1000% (Pohl, 2011). 

The amount of Open Data and Linked Open Data available is increasing daily.  In libraries in particular, great strides are being made in publishing bibliographic records as Linked Open Data.  The Stanford University Libraries and Academic Information Resources (SULAIR) with the Council on Library and Information Resources (CLIR) conducted a week-long workshop with 20 particpants on the prospects for a large scale, multi-national, multi-institutional prototype of a Linked Data environment for discovery of and navigation among the rapidly, chaotically expanding array of academic information resources. Due to the diversity of knowledge, experience, and views of the potential of Linked Data approaches, the workshop participants focused on two primary goals: building common understanding and enthusiasm and on identifying opportunities and challenges to be addressed in the definition, development and operation of a LOD prototype. A technology plan was also produced as an output of the workshop six months following the event (  Stanford also published millions of bibliographic records as LOD in 2011. Stanford Libraries plans to continue an alliance with Metaweb/Freebase in transcoding bibliographic facts to Uniform Resource Identifiers “URIs”, unique strings or ‘addresses’ used to identify and locate a resource on the World Wide Web, and will continue to send HighWire micro-data to, which then finds its way to DBpedia. They are also exploring linked data collaborations with the British Library, the British Museum, and JISC.

In October of 2011, the Library of Congress announced The Bibliographic Framework Initiative General Plan (, an ambitious effort to move the U.S. library community toward a modern method of exchanging bibliographic data including the use of the World Wide Web Consortium's (W3C) Resource Description Framework as a data model, which is the preferred method for publishing Linked Data, and implant libraries in an environment conditioned by the technologies of the Semantic Web and linked data principles. (Kelley, 2011)

Also in October of 2011, The final report of the W3C Library Linked Data Incubator Group ( urged the library community to reconceptualize metadata and publish it to the Web using Linked Data technologies so that it will be compatible[NP1]  with non-library datasets on the Web. Several authors of that report attended the LODLAM summit. The “Use Cases” report ( describes library applications that showcase the benefits of adopting Semantic Web standards and linked data principles to publish library assets such as bibliographic data, concept schemes, and authority files. The  “Datasets, Value Vocabularies, and Metadata Element Sets” report provides a snapshot of key resources available for creating library Linked Data today (

In the cultural heritage realm, Europeana has built Linked Open Data into its framework, with the goal of publishing metadata on 2.3 million texts, images, video and sounds as Linked Open Data (  To begin with, Europeana plans to publish all their metadata under a CC0 License by June of 2012. (Keller, 2011)

6.     The (Near) Future

Where is all of this leading us?  Increasingly collaborative projects built on Linked Open Data are beginning to make their way through the funding pipelines.  The Institute for Museum and Library Services included two Linked Open Data projects in their September 2011 Leadership grants (IMLS, 2011). The National Endowment for the Humanities has funded at least three Linked Open Data projects since the LODLAM Summit (Linked Ancient World Data Institute, Improving Digital Record Annotation Capabilities, and Linking and Populating the Digital Humanities). And JISC continues to fund leadership in Linked Open Data in the UK, with projects like JISC Step change, creating Linked Data architecture for the UK archive sector (

With projects like Historypin, we’re working with hundreds of partners in libraries, archives, and museums around the world and taking the lead of Europeana to move toward publishing metadata as Linked Open Data.  We’re developing ways to ingest Linked Open Data to include content or historical data from UK archives as part of the JISC Step change project (

Other developers are beginning to reimagine museum and library collections, utilizing APIs and Linked Open Data.  Tim Sherratt’s Invisible Australians project hints at the power of dynamic displays based on data (Sherratt, 2011).  Tim Wray’s Canvas project dramatically brings the experience of browsing museum hallways online, and creates the possibilities for creating virtual museums of like items utilizing Linked Open Data across collections (Wray, 2011).

Specific domains continue to lend themselves well to collaborative efforts to create Linked Open Data utilizing metadata from multiple collections.  Civil War Data 150 moves slowly along, while the model is beginning to inform similar work around the First World War Centenary on a global scale.

7.     Join Us

A global movement is afoot to create radically open cultural heritage data on the web.  It’s rooted in the culture of the Web, utilizing technology to create a web of data, and made possible by standardized licensing and shared metadata.  It may be a dream, or an ethos, but it’s happening, and we’re building it together. 

There are a number of ways that you can get involved in the LODLAM community now, without knowing any code at all. And if you are a developer or interested in utilizing Linked Open Data in mashups or visualizations, there’s even more you can get your hands dirty with.

Educate yourself about LODLAM. We’ve made a lot of up-to-date information available on the LODLAM blog ( There are videos, reading lists, and more. Remember, Linked Open Data is not just a libraries, archives, and museums thing—it’s a World Wide Web thing. There are developers and enthusiasts around the world learning, experimenting and building. A great way to learn more is to participate in a Semantic Web meetup, which may already happening in your city (  

Be a part of the LODLAM community. There is a low traffic Google Group ( with public postings (you need to join to post), and you can also follow the active Twitter hashtag: #LODLAM ( Remember, we’re early on with this effort, and you don’t need to be an expert to participate!

Help publish Open Data. Before we can get to Linked Open Data, we need Open Data, which requires a lot of thought, and a lot of work in libraries, archives, and museums. We’re seeing a growing number of precedents everyday, so please join in. Consider publishing datasets with Creative Commons 0 licensing. Maybe you can create a “data” subdomain on your institutional website and publish information about your collections and actual datasets as CSV or XML files. You don’t need to wait until you have money for developer resources to create an API, just publish it with an open license. In addition to posting the datasets on your own site, you can also publish to the Data Hub by the Open Knowledge Foundation (

Encourage your vendors to support Linked Open Data. The more we approach the various content management system vendors about Linked Open Data, the more they will put resources toward developing tools to publish and utilize Linked Open Data. Find out which vendors support Linked Data protocols, and which open source projects may support it (like Drupal 7). Share what you find with the LODLAM community.

Create or participate in Linked Open Data pilot projects. Projects like the Civil War Data 150 Project ( involve discrete domains that you can contribute metadata to and get involved with a community around. We’re seeing funding and interest going into Linked Open Data projects with World War One as we’re approaching the centenary. Reach out to the LODLAM list and see if you can connect with others working in a specific area.

Build data mashups into your projects. As you develop new online projects, think about how Linked Open Data could be incorporated to improve your project. If you’re curating an online exhibit on a WWI battalion for instance, think about how you could visualize data about other battalions, or draw on content from other institutions about this particular battalion. Think of how you might browse your online collections interactively, based on keywords or semantic terms that draw out connections that might not have been obvious. While the implementation may be out of reach at the moment, it’s exactly these kinds of creative use cases that will help us keep pushing the boundaries and make the web of data a reality.

8.     Acknowledgements

I’d like to give special thanks to Kris Carpenter Negulescu, the amazing Project Director for the International LODLAM Summit; Josh Greenberg at the Alfred P. Sloan Foundation; Jennifer Serventi at the National Endowment for the Humanities, Office of Digital Humanities; Brewster Kahle at the Internet Archive; the many enthusiastic participants at the LODLAM Summit and meetups around the world; Historypin and We Are What We Do for their continued investment in Linked Open Data; and my family, who’ve made it possible for me to follow my dreams and passion.

9.     References

Berners-Lee, Tim (2009). TED Talk February, 2009. Consulted January 31, 2012.


Miller, Paul (2010). February 24, 2010. Consulted January 31, 2012.


Stevenson, Adrian (2011). Last updated, August 31, 2011. Consulted January 31, 2012.


Voss, Jon (2011). September 15, 2011. Consulted January 31, 2012.


Smith, Mackenzie (2011).  June 6, 2011. Consulted January 31, 2012.


Pohl, Adrianne (2011).  November 30, 2011. Consulted January 31, 2012.


Marcum, Deanna (2011). October 31, 2011. Consulted January 31, 2012.


Kelley, Michael (2011). October 31, 2011. Consulted January 31, 2012.


Keller, Paul (2011). Post to LOD-LAM Google Group October 3, 2011. Consulted January 31, 2012.


IMLS (2011).  September, 2011. Consulted January 31, 2012.

Browell, Geoffrey (2012). January 10, 2012. Consulted January 31, 2012.


Sherratt, Tim (2011). Presentation at National Digital Forum, Wellington, New Zealand, November 30, 2011. See also full text:’s-all-about-the-stuff-collections-interfaces-power-and-people. Consulted January 31, 2012.


World Wide Web Consortium (2012). Consulted Febuary 20, 2012.


Wray, Tim (2011). December 14, 2011. Consulted January 31, 2012.