Skip to main content

Museums and the Web

An annual conference exploring the social, cultural, design, technological, economic, and organizational issues of culture, science and heritage on-line.

Reprogramming The Museum

Luke Dearnley, Powerhouse Museum, Australia

Abstract

This paper looks at how the Powerhouse Museum's collection data API launched in 2010 quantitatively and qualitatively improves upon the access provided by the download dataset previously offered, as well as how the tracking methods were built into the API to ensure that the project is best able to adapt to the user needs of API developers. It provides details on the lessons learned and suggests best practices for API development in the cultural sector.

Keywords: Web 2.0, API, collection access, Flickr, semantic web, Creative Commons

Introduction

"… see Jon Udell's 1996 Byte article 'On-Line Componentware' … 'A powerful capability for ad hoc distributed computing arises naturally from the architecture of the Web.' That's from 1996, folks."

Richardson and Ruby, 2007

According to the Wikipedia page for APIs, "An Application Programming Interface (API) is a particular set of rules and specifications that a software program can follow to access and make use of the services and resources provided by another particular software program that implements that API."

In other words, an organised way for two machines (or programs, websites, apps, etc) to talk to each other or exchange data.

Examined in this paper is the story of the Powerhouse Museum collection data API so far. How we came to be familiar with other APIs and the journey and experiences of making our own. What we learned and what we recommend to others considering the same move.

The old REST vs SOAP debate

Further down that Wikipedia page is the definition of a web API: "When used in the context of web development, an API is typically a defined set of Hypertext Transfer Protocol (HTTP) request messages, along with a definition of the structure of response messages, which is usually in an Extensible Markup Language (XML) or JavaScript Object Notation (JSON) format. While "Web API" is virtually a synonym for web service, the recent trend (so-called Web 2.0) has been moving away from Simple Object Access Protocol (SOAP) based services towards more direct Representational State Transfer (REST) style communications. Web APIs allow the combination of multiple services into new applications known as mashups."

When building an API to work over HTTP, the current trend is to build a REST API. Web services from Flickr to Facebook to Yahoo and beyond use REST APIs. Within the sector we have also seen several museums such as Brooklyn Museum and the Victoria and Albert Museum, UK, choosing to use RESTful designs for their APIs (Morgan, 2009). Not so long ago SOAP was the automatic choice. Let us examine some of the differences between SOAP and REST.

The response from a SOAP API is less simple, requiring more work implementing applications on the client-end. REST responses are practically human-readable so your average web developer, already familiar with XML or JSON, is more likely to be at home implementing a RESTful API.

REST responses are smaller, not being laden with the data overhead of the SOAP envelope. Nobody wants their service to appear sluggish, and a leaner response will of course arrive faster. This is increasingly pertinent in the growing world of the mobile app in which the effects of slower bandwidths must not be ignored or audiences will turn to other services that deliver seamlessly and fast.

There is a certain neatness to REST APIs that appeals to the ordered mind, with a unique URL for each object or resource and the HTTP verb (GET, PUT, POST etc) being meaningful in the interaction unlike in SOAP.

Furthermore, the established protocols that already operate on web server infrastructure to handle caching content and authenticating users can be automatically brought into play when using a RESTful API. So when creating a web API intended for public consumption, REST seems to be a wise choice (Singh, 2009).

Powerhouse experience with other APIs

The Powerhouse Museum uses APIs which have increasingly been offered over recent years, including:

As the first museum to join the Commons on Flickr (c) in April 2008, we started off with an initial batch of a few hundred images with no known copyright. The initial images were from the Tyrrell Collection of glass plate negatives from the studios of Charles Kerry (1857-1928) and Henry King (1855-1923). Kerry and King had two of Sydney's principal photographic studios in the late 1800s and early 1900s, and these historical photographs provide insight into life in Sydney and around New South Wales at that time. We now regularly submit groups of ten to twenty images to the Commons using the Flickr API.

The process is partially manual and partially automated. Our Image Services Manager, Paula Bray, chooses the next images to be included and exports them. A PHP script examines each image in turn, retrieving metadata from our online public access collection (OPAC) database and uses the Flickr API to submit the image along with that metadata to Flickr. Lastly the script uses data returned by the API to store the Flickr ID for that image back in our OPAC. That ID allows us to provide a link from our OPAC website to the image and associated community-driven input - further images, comments, tags, etc on Flickr. Part of the metadata added to the image is a 'machine tag' containing the reciprocal link to the image's record in our OPAC. Once the images are added, Paula and Seb (Sebastian Chan, Head of Digital, Social and Emerging Technologies at the Powerhouse Museum) log on to Flickr, double-check the data added, editing as necessary, and make the images public.

Before we began our work on the Commons on Flickr, some museum colleagues were concerned that engaging with the Flickr community would increase workloads greatly. While the monitoring of the site does take some work, the value gained via the users has far outweighed any extra effort. In some cases, users have dated images for us. In other cases, when all we knew was that an image was taken somewhere in New South Wales (which has an area of over 800,000 sq km!), people were able to give us much more accurate location information. Also users have provided us with photographs they have taken of the same site, from the same angle, to show the changes that have taken place at the location.

Figure 1: An example of a historical Tyrrell Collection image at left; contemporary image by Flickr user 'lifeasdaddy' at right.Figure 1: An example of a historical Tyrrell Collection image at left; contemporary image by Flickr user 'lifeasdaddy' at right.

Figure 2: An example of Paul Hagon's Then & Now Google mashup.Figure 2: An example of Paul Hagon's Then & Now Google mashup.

Among consequential developments of being involved with the Flickr Commons have been:

In subsequent use of the Flickr API, we appropriated tags users had added to our images, and now include them in our own collection database website (OPAC). We also retrieved geo-location data added to our images for use in third party apps like Sepiatown and Layar.

We have made even more use of the Flickr API to obtain original sized versions of all images submitted to groups specially set up for competitions or exhibitions. So we have used the Flickr API quite extensively to upload and download images as well as submit and retrieve metadata.

When Reuters (now Thompson Reuters) released their Open Calais semantic tagging API (http://www.opencalais.com/) we immediately wrote some code to parse all the descriptive text of our objects from our online collection database. More than 75,000 museum objects were parsed for semantic entities such as people, companies, organisations, cities, states, countries and so on.

After two weeks of processing we had mostly great results. Some amusing ones did occur such as Ray Oscilloscope being identified as a person! This led us to attempt to cross-reference the already extracted person entities with OCLC's WorldCat bibliographic records using their API (http://www.oclc.org/worldcatapi).

Through these activities we have gained valuable experience of what it is like to be a client of an API, and this experience has informed our approach to developing our own APIs.

What does your soul look like?

In our case the purpose of creating an API was to allow others to use our content. An essential part of our planning process was to consider how our content might be most useful to our audiences. Here are some of the questions we believe it is important to consider if you are planning to develop an API:

  • How suitable is the writing style of the content?
  • How suitable is the level of technical complexity of the content?
  • Will it suit potential users?
  • Are there staff in your organisation who will be able to re-write the thousands to millions of these pieces of content?
  • If so, who should rewrites be tailored for - the researcher, the tertiary or post-graduate student, the school pupil, the pupil's teacher, the layperson?

After considering these questions you may be feeling somewhat negative about the idea of letting the public scrutinise your data via an API. Certainly at the time we were angling to get our collection online merely on our OPAC there was resistance from certain areas driven by the fear that the data was incomplete or unsuitable for some reason or other. The years since we did that have very definitely shown that it was a good idea to release our data regardless. So consider the questions above not in the context of should we or shouldn't we put our data online (via an API or otherwise) but rather in the context of managing expectations of the data's uptake.

If one of the main reasons for building an API is to enable users to provide automatic inclusion of, for example, object description text in their site or app, then the issues of writing style and technical complexity arise. It is important to consider how suitable the writing style is and whether the level of technical complexity of the content will suit potential users.

Marshall Kirkpatrick, referring to just this problem when using Wikipedia content, wrote recently on ReadWriteWeb, "Unfortunately, Wikipedia articles ... are often far too technical. They aren't written with re-use in mind, for example, they often don't put the most accessible content at the top." (Kirkpatrick, 2011)

If this is the case then it is likely sites wishing to re-use museum content via museum APIs will have the same problem with object descriptions, most of which were written by museum staff well before the content was put online, let alone available for re-use in the public sphere.

In 2009 the Powerhouse Museum was involved in a project run by The Learning Federation (TLF - a consortium of state and federal government education bodies) to build a (rather complex) data feed which supplied a hand-selected set of object data to the TLF's central pool of data. This then fed an educational portal used by schools around the country to be used in lesson plans. Seb Chan blogged about results of the pilot, saying, "The obvious hurdles of copyright, content suitability, writing style at the museum end, and the teacher training at the schools end were far greater than any of the technical data supply issues." (Chan, July 2009)

Even in this trial where the objects were selected by Powerhouse Museum Registration staff to fit the subject areas required, and where the data was quality checked by TLF staff, 7% of teachers "noted the overuse of 'technical language'". (TLF, 2009.)

So while an API provides access to one's content, the style of the content will determine to a large extent whether a potential audience chooses to use it or not.

Steps to an API

While we had been considering making an API for some years and watching others in the sector do just that, there were (although we probably didn't realise it at the time) several important things which had to happen before we could provide a public web API. The first was the need to determine the licence status of our content.

The drive to open up the licensing of our content came when, on a tour we conducted of the Museum's collection storage facilities for some Wikipedians, they pointed out that the copyright notice standard across our site (© Trustees, Museum of Applied Arts and Sciences) meant it was forbidden for Wikipedia contributors to quote any part of our site in a Wikipedia article without officially seeking permission first. This prompted Seb Chan to make the changes required to make our online collection documentation available under a mix of Creative Commons licences. (Chan, April 2009)

Since the Museum adopted Creative Commons licences for online collection documentation, it has not been quoted very much in Wikipedia, but it certainly has been used. A six-month audit of copying and pasting from our object view pages showed more than twelve thousand copies containing nearly three million words. (Chan, March 2010)

Opening up the licensing had another benefit: it meant that we had already cleared one hurdle in the path to creating an API.

The Government 2.0 Taskforce (http://gov2.net.au/about/) was the driver leading us to take the next step. In its endeavour to catalyse "increasing the openness of government through making public sector information more widely available to promote transparency, innovation and value adding to government information", it created a new potential audience for using our collection data. Thus we became the first cultural institution in Australia to provided a bulk data dump of any sort.

This data dump was not particularly complex. It was just a big list of objects (and a subset of the fields describing them) shown on our collection database web pages. There was no explanation of what the fields meant or the type of data they might contain.

While it was really an opportunity at state and federal level to build political capital and awareness of the Museum's open access initiatives contained within its new strategic plan (http://www.powerhousemuseum.com/dmsblog/index.php/2009/09/16/a-new-strategic-plan-for-the-powerhouse/) it also meant that we could for the first time experience other people coming to grips with using our data and benefitting others by making new web apps with it.

The download was linked from both the NSW state government and federal government's data catalogues (http://www.data.nsw.gov.au/catalogue and http://data.australia.gov.au/102 respectively). This led to it being used in the Mashup Australia contest (http://mashupaustralia.org/) and related Hack Day events where around five per cent of the 82 groups used our dataset. It was also used in the Apps4NSW contest (http://www.information.nsw.gov.au/apps4nsw) and the Govhack event (http://govhack.org/) in Canberra.

It was also used in tertiary education when Senior Lecturer in Database Systems, Dr Uwe Roehm, from the School of Information Technologies at the University of Sydney asked to use our dataset in assignments he was setting for his undergraduate classes. He asked students to model a relational database schema for the provided data set and to code up a web interface to maintain the collection data. His main motivation for using our dataset for this was that it was a 'real world' example of a database system and moreover one which was more interesting than a standard financial database system.

The great thing about this use is that it exposes the Museum and its collection to the academic sector, enlightening them regarding potential career options in the cultural sector. A future with more computer scientists working in museums, or at least some temporary interns, would be a desirable one indeed.

An interesting point to consider is that the data dump had no value to us internally. We made it for others to use and had no need to try using it ourselves since our sites are connected to the more comprehensive databases already. But other people certainly did use it.

One person who has used it is Malcolm Tredinnick, one of the core Django developers. In a talk called 'Displaying Australian Datasets With Django' at the 2010 Pycon (a conference for the Python Community), he noted several obvious and large hitches anyone trying to use our dataset would encounter.

Issues such as the Record_ID field not being unique - with more than 70 objects repeated once, twice or even five times in one case! Date fields were troublesome with no format specified, some being date ranges, some specific dates. Some date ranges were reversed (and there was the curious use of '0 AD'). No information accompanied our dataset about how long fields were going to be and some fields contained unmatched delimiters causing havoc with attempts to script importing the data.

The critique continues and is well worth a watch for anyone considering supplying a similar dataset. Seb Chan's blog includes the video of the session here: http://www.powerhousemuseum.com/dmsblog/index.php/2010/07/05/malcolm-tredinnick-on-some-problems-with-working-with-our-collection-dataset/ (Chan, July 2010) and mentions the point that the data is 'messy' which is certainly true. I am sure we would have cleaned up the data had we been forced to use it ourselves! That said, had we not released it, we would have not had these issues with the data so succinctly pointed out, nor would we have seen people make new apps with our data.

The apps people built using our dataset would have benefitted from an API. They needed to ingest our entire dataset to build their app regardless of whether they were interested in every field or even every record. Using the API they could have pulled in merely what they needed. Furthermore the data in these apps gradually becomes out of date and, once again, if using the API they would be drawing on the most up-to-date version of our collection data instead.

In the second half of 2010 we very fortunately found ourselves in the position of being able to hire a new developer. We were even more fortunate to have recruited Carlos Arroyo to join us at the Powerhouse Museum. Carlos brought a lot of experience to the job having coded numerous APIs previously, though most for use within 'walled-garden' situations where you are the only client of your own API.

Having introduced Creative Commons licences for our content, seen the interesting apps others have made with our data, and with a new developer on board, it was time to forge ahead with our own API.

We had keenly observed the National Library of New Zealand develop their digital NZ project (http://www.digitalnz.org/) and describe how APIs allow the institution to passively or indirectly engage the skills of external developers to create new spaces and tools that benefit the community at large. (blog post: http://www.digitalnz.org/blog/case-studies/article-test)

We also learned from Brooklyn Museum's experiences releasing their API. Shelley Bernstein, Chief of Technology at Brooklyn Museum, describes being inspired by seeing developers doing interesting things with content people had put on the Flickr Commons. She also says, "One thing we do know is people within our own industry have been working to create various pan-institution collection databases. By releasing our API, Brooklyn Museum data can now be included in these endeavors without requiring more staff time from us", and talks of, "allowing outside developers the chance to add their own talent and wealth to our data." (Bernstein, 2010)

Apart from being inspired by all these plusses, we also saw the API as an opportunity to release more data about each object and a more complexly structured set of data including the meta-level constructs of 'themes', 'collections' and the like. Seb Chan said in a blog post about our new API, "the API gives access to roughly three times the volume of content for each object record - as well as structure and much more. Vitally, the API also makes internal Powerhouse web development much easier and opens up a plethora of new opportunities for our own internal products." (Chan, October 2010)

One big job was developing the terms and conditions that people would need to agree to before getting access to the API. We carefully considered other people's documents and decided we would pay a legal firm to help us compose our own specific set of terms and conditions. Having these locked in place and 'digitally signed' (agreed to by checkbox) during the sign-up process meant that we could, for the first time, include access to thumbnails with our API. The issue of image rights is a complex and messy one, however, where we know we may, in the case of some objects, we also release medium-sized images. Of course where the images are on the Flickr Commons, the necessary ID to locate them there is included too. The API was specifically designed so that in future as finer grained permission data emerges for images, it can easily slot into the API.

Technical details of the API

I will briefly mention some of the technical aspects of the API now for those interested. In line with industry best practice the Powerhouse Museum is moving more and more to open-source based hosting and so we chose a Linux platform for serving the API. It is written in Python/Django and during the design, consideration was given to enabling the possibility of moving the site out into Google App Engine, dependent on load requirements. Apache runs the scripted pages and nginx serves the static content. The database is postgreSQL and a very large amount of time was taken developing complex scripts and models to harvest and store the data from our rather less strictly defined OPAC database tables.

This process exposed where a lot of the issues with our downloadable dataset came from, some being traceable back to 'messiness' of data within the collection management system (CMS) itself, and others inherent to the design of the OPAC database necessitated by the harvest process we engage in monthly between it and the CMS. We owe a large debt to Carlos here for being so insistent that we end up with proper, consistent, relational models and for being so patient when we repeatedly expounded the strange world of museum collection data. His experience at his craft was invaluable here. He initially considered using a Django-based API-making framework called Piston however it ended up falling short of what we wanted and we quickly moved on to building a completely bespoke solution.

Images are served from the cloud as we had already moved them there for our OPAC, to reduce outgoing bandwidth from the Museum's network.

After the launch of the API, we considered ceasing maintenance of other forms of the same data, such as the downloadable dataset. It was suggested to us this was unwise by Ingrid Mason who has been involved with the Museum via projects such as Collections Australia Network and The Museums Metadata Exchange. She said we should keep the dataset as it has greater value to data analysts working in digital humanities than the API. For them needing to write code to access our data is a significant barrier to doing so. Furthermore the dataset lends itself much better to things like data visualisation where the entire scale of aspects of the collection is more the focus than a selection of items that the API is geared toward providing. A wise point and we are now committed to maintaining both.

Once we had the API up and running, we realised it would not be too much work to make a WordPress plug-in which allowed bloggers to add objects from our collection to their blogs or blog posts. Once built, this was tested internally on our own blogs. Then in early 2011 we added it to the WordPress plugin directory: http://wordpress.org/extend/plugins/powerhouse-museum-collection-image-grid/

This path we have traversed from having the data only within pages on our website, to having a downloadable dataset, then an API, followed by our first product of the API (the WP plugin) may appear to be one of increasing access to our collection data. On the contrary, with each new phase the actual number of people able to access collection data via this method decreases. The exception is the WordPress plugin where the audience size increases.

These audiences do however become more 'identifiable' and 'promotable to'. Consider the following diagram.

Figure 3: Relative audience sizes for each phase of collection data release.Figure 3: Relative audience sizes for each phase of collection data release.

In the Creative Commons licensing phase, everyone (represented by the very, very large circle, only a detail of which, appearing as a slightly curved line, is visible above) could access the data using copy and paste. However, we had very little idea who they were or how to reach them.

In the data dump phase, the people (represented by the largest circle fully visible above) who could use this tab-separated-data-file grew smaller - being developers, data analysts and digital humanities researchers. Some of these people would tell us when they had used the data, so we knew something of who they were and for what purposes they had used the data.

In the API phase, the potential user-base is smaller still with serious web developers only having the specialised skills to write code to query the API. But since these people had to sign up with an account and apply for an API key we can know exactly how many of them there are and we can contact them directly if we need to (for example when new API versions or features are released).

Curiously what we see with our first product of the API (the WordPress plugin) is the potential user-base starts to expand again. All these people need to apply for an API account and generate a key as well, but need have no developer experience to include items from our collection on their website. Of course with every new phase the old methods of access do not go away. So the phases co-exist.

Tracking usage

One of the main advantages the API has over the data dump is the ability to track use. We do this in several ways. We considered the idea of logging every request made to the API in its database - like we do with our collection database website (we log every search term and every object view and relationships between them). We decided against it for several reasons. One being that since it is a RESTful API we can always analyse the web server logs to get most of the information we need. The other reason was we feared slowing down the API. We knew it was going to be very heavily used right after launch during the AmpedWeb hack day and thought if every database read also then generated a write it may slow things down too much. We could always add it in later.

It is also worth noting that since the API requests usually do not generate pages that are rendered in a browser it is not possible to embed Google Analytics tracking scripts in the API's output.

By requiring people to sign up using a valid email address before requesting an API key we are able to track API use back to individuals or organisations. The model we decided upon was one where users are encouraged to create a new API key for each application.

We considered engaging a practice whereby keys had to be approved before use but decided that would severely hamper uptake of the API as, from our own experience, when the light bulb goes on and you want to jump in and try an idea, you want to do it immediately when the enthusiasm is at its peak. Concerns that people would use the API inappropriately were dealt with by adding a limit to the number of requests per hour each key can generate. This can be adjusted on a per key basis. Furthermore each user can only create a small number of keys by default. We encourage people to contact us if they want that adjusted in their case.

On top of this we track any traffic back to our collection database web pages from people's sites or apps that use the API. This is done by ensuring the PURLs for objects, collections, themes and so on all include an identifier which ties them back to a particular API key.

We had previously set up our own URL-shortening service so we could have Powerhouse Museum 'branded' short URLs for use in social media and the like and it was trivial to extend this to redirect short URLs of a slightly different form to full URLs such that the identifier was tacked on as a Google Analytics (GA) Campaign code. We felt it was ideal to integrate this tracking into GA due to its simplicity and the fact that it meshes into our existing analysis and reporting framework.

Given that we had little idea how much the API would be used initially and into the future it was specifically designed to operate on technologies which would allow us to easily relocate it from local servers to cloud-based hosting. So one big reason to track the API's usage trends is to help determine on-going hosting requirements. Tracking is also necessary for providing evidence of usage to support the business case and funding of these projects.

Launching the API and working with the developer community

As we began developing our API we realised Web Directions South (http://www.webdirections.org/), an international conference for web professionals, was on the horizon, and if our API was ready by then we could tell everyone there about it. We had six weeks. As things turned out the Powerhouse Museum was to be the site for the free AmpedWeb (http://ampedweb.org/) 'hack day' event rounding off Web Directions South for 2010. At AmpedWeb there were to be several web app programming challenges attendees could partake in and we launched the API there, making building something with it one of the challenges for the day of hacking.

More than 250 people were at the day-long event with 24 teams submitting web apps. Of those teams, 13 decided to use our API in their apps. (Chan, 2010)

This was a great opportunity for us, not only to launch the API to such a large, concentrated group of web application developers but also to see first-hand how they went about the business of using what we had just built. Carlos, Seb and I were not only on hand to answer questions and provide support for people using the app but also to get real world feedback from the users as they used our product. This direct interaction trumped any marketing opportunity we could have dreamt of as promotion is not just about getting the product used but getting feedback on problems and ideas for improvements.

Immediate, on-the-day feedback meant we could make much requested changes such as turning on the JSON-P output format over and above the existing JSON one. Also numerous teams mentioned our API would benefit from adding some fields to the information about images returned with each object. Otherwise their apps needed to make more API calls to find basic information about an object's images and display the thumbnails. We quickly added in these fields on the day.

The ability to respond to these requests in such an agile fashion meant many teams kept working with the API instead of abandoning it for a different challenge. Of course this is only possible if one has an in-house development team onsite and still engaged with project.

Supporting the API

Certain responsibilities emerge once the task of creating an API is complete. The API is now a product, used by the public who expect a certain level of support. People will want clarification on the documentation, to suggest or request features, criticise design decisions, ask for increased query-per-hour quotas and generally expect someone to be at the end of the email address all the time.

Furthermore it is unlikely anyone's first release of an API will end up being perfect. Apart from bugs, in the haste to get to launch, some features will have been pushed into the 'version two' pile. As likely the API's biggest user, you yourself will feel an urgency to deploy these extra features. Despite competing pressures from other projects, it is important to maintain support of the API and to continue to schedule upgrades, test, document and release future versions.

An unexpected bend in the road

One of the most interesting side effects of creating the API is that it has changed our whole medium- to long-term development roadmap. Until that point we had the situation illustrated in the following diagram.

Figure 4: Current internal data harvesting model.Figure 4: Current internal data harvesting model.

Here the Museum's central collection management system (CMS) feeds the client CMS applications on staff's desktop computers. They access these to go about their curatorial or registration work. We run a script approximately monthly which harvests updated collection data into a database designed to feed our online collection database website (OPAC). The main reason for this is to ensure a busy website does not impact curatorial or registration staff by causing them to compete for resources with public web users.

In the diagram it can be seen that the OPAC database's collection tables feed the OPAC website, which in turn is accessed by both the public and Museum staff. On top of those tables the OPAC database also has tables accruing data on the use of the OPAC website itself. We log search terms, objects viewed and relationships between search terms and object views, among other things.

To the right of all this is the API infrastructure. Yet another database is filled from the OPAC database via more harvest scripts and this store feeds the API website, which in turn interacts with any sites or apps made with it. These are of course viewed by the public and in some cases generate traffic (or 'click-throughs') back to the OPAC site. There is a small amount of traffic to the API site that does not feed API powered sites and web apps but is in fact developers reading the documentation, signing up for accounts and so on. This is represented by the cloud of 'users' in the diagram.

So that is the situation we currently have, excluding the further complications of where images are stored and so on. There is clearly a repetition in server infrastructure (albeit virtual), in data storage and in harvest processes going on here. Given that we can use our own API, the future of the OPAC database really looks uncertain. Our likely new roadmap for the future is mocked up in the next diagram.

Figure 5: Future internal data harvesting model.Figure 5: Future internal data harvesting model.

The OPAC website would be powered directly by the API. The API database would harvest directly from the CMS database and now do the job of logging all public interaction with OPAC. Both these types of data would be inserted using the API itself. Currently there are no PUT or POST actions available for storing data in our API. However being RESTful it has been designed with this capability from the beginning and the idea at this stage would be that the harvest script uses actual PUT/POST API calls to populate the API database rather than being a low-level behind the scenes database fill as it currently is. This of course requires the more advanced API authentication to be operational. The fact that Piston did not allow us enough control over this feature was the main reason we abandoned it.

Statistics and graphs

The following graph of click-through traffic to our OPAC website from anything made using the API shows a few interesting things. We see a peak of less than 20 hits from the launch of the API in mid-October when it was used heavily at the AmpedWeb hack day. Next we see a similar peak in traffic around the late November to early December period and another peak right at the start of 2011.

Figure 6: Traffic to our OPAC generated by API powered sites and apps.Figure 6: Traffic to our OPAC generated by API powered sites and apps.

While we cannot know exactly how busy the API itself was with queries generated by apps we must assume some proportionality between 'click-throughs' and actual API queries. It is likely Seb Chan's blog post interviewing Mia Ridge (Lead Web Developer for the London Science Museum) about the game she made using our API for her Masters research dated January 3rd (http://www.powerhousemuseum.com/dmsblog/index.php/2011/01/03/interview-with-mia-ridge-on-museum-metadata-games/) caused the spike in traffic seen in the new year. I suspect the rise in traffic 4-5 weeks earlier was also due to Mia's game as she promoted via her own channels.

The graph certainly shows there has not been a massive number of click-throughs and, despite having had the better than ideal situation to promote the API at the AmpedWeb hackday, this small amount of traffic reinforces the reality that increasing access does not increase demand.

There is a small possibility someone has created an app or site which is very busily using the API but one which is designed in such a way that it does not generate these click-throughs to OPAC which we are tracking. However there is nothing in our server load history to suggest this.

The next level of detail in tracking is using Google Analytic's campaign tracking to work out which API keys generate the most click-throughs. See the graph of per key use (all time) below.

Figure 7: Traffic by API key.Figure 7: Traffic by API key.

As expected Mia's game's API key is the top of the list with 105 visits, followed by one of our own WordPress powered sites, The 80s Are Back (http://from.ph/80s) with 64. Third with 38 visits is the most popular and visually striking app to emerge from the AmpedWeb hack day (play it here http://powerhouse.nf.id.au/) which was made by Andrew Gerrand from Google in the Go Language. In fourth spot we see Jeremy Ottevanger (http://doofercall.blogspot.com/) from the Imperial War Museum (http://www.iwm.org.uk/) in London and his Mashificator experiment which you can read about in Seb's blog here (http://www.powerhousemuseum.com/dmsblog/index.php/2010/11/04/making-use-of-the-powerhouse-museum-api-interview-with-jeremy-ottevanger/).

The fifth key is the last one to generate more than a handful of click-thoughs and here we discover the people at culture360.org (a unique online platform that connects the people of Asia and Europe through Arts and Culture) doing what they describe as a 'functionality test'.

In the short time the WordPress plugin has been available it has already been responsible for the second highest amount of traffic. With the plugin very recently (at the time of writing) released to the public, it is likely this type of traffic will soon account for the most. Until, of course, we power our OPAC with the API.

Another important reason to track use of your data via your API is to ensure people are adhering to the licence terms. In reality people are unlikely to read the boring Terms and Conditions they agree to hurriedly while keen to sign up to start using your exciting new API. This was experienced by the Brooklyn Museum when a user was breaching the Creative Commons license they released their data under and were selling an app they made (Bernstein, 2010).

Conclusion

So on our API journey we learned it was wise to consider the nature of your data, especially in the context of who might use it and how much. We learned that no matter what you think of your data you should release it anyway (under some suitable licence, and in numerous forms) and you will likely be surprised by innovative ways others do use it which will return value to your institution.

It is wise to use other people's APIs to get a better understanding of them. Make an API by all means, but make it RESTful. Make one you would use and then use it! As Seb Chan put it, "you must eat your own dog food" (http://en.wikipedia.org/wiki/Eating_your_own_dog_food). Don't forget that your API is a product and as such the community using it requires support.

Track the use of your API using whatever cunning ways you can come up with, not just in volume but by user if possible. When launching your API hold a hack day or better still piggyback on someone else's!

But most of all do not expect increasing access to your data this way will necessarily increase demand for it.

I will finish up by briefly mentioning some possible future directions regarding our API. At the time of writing there is a hefty collection of new features jostling for inclusion into the next version. On top of that we have several internal projects we need to find development cycles for which will make use of the API as a data source, including a complete overhaul of our OPAC site.

Having found it useful on another project, we want to experiment with connecting the Apache Solr indexing engine to the API's database to unleash faceted result sets on the users. Perhaps even a 'proper' semantic web RDF endpoint for our collection data will be offered. But that is definitely in the future. As Mike Ellis and Dan Zambonini (2009) point out it is likely such technologies will remain niche for some time and API's are a second best option to proper semantic web endpoints.

Acknowledgements

This paper would not have been possible without having been exposed to the boundless, infectious energy of Sebastian Chan. His guidance and clarity of vision have been invaluable. I also owe a large debt to the hard working and patient Carlos Arroyo who did all the hard work on the API code. Many thanks also to Irma Havlicek for her advice, patience and proofreading.

References

Bernstein, S. (2010). http://www.brooklynmuseum.org/community/blogosphere/2009/03/04/brooklyn-museum-collection-api/

Bernstein, S. (2010). http://www.brooklynmuseum.org/community/blogosphere/2010/12/01/app-store-confusion-necessitates-api-changes/

Chan, S. (April 2009). 'Powerhouse collection documentation goes Creative Commons', 2 April 2009. http://www.powerhousemuseum.com/dmsblog/index.php/2009/04/02/powerhouse-collection-documentation-goes-creative-commons/

Chan, S. (July 2009). 'Will schools use collection content? The Learning Federation Pilot Report', DMS blog, 15 July 2009. http://www.powerhousemuseum.com/dmsblog/index.php/2009/07/15/will-schools-use-collection-content-the-learning-federation-pilot-report/

Chan, S. (March 2010). 'Spreadable Collections: Measuring the Usefulness of Collection Data'. In J. Trant and D. Bearman (eds). Museums and the Web 2010: Proceedings. Toronto: Archives & Museum Informatics. Published March 31, 2010. http://www.archimuse.com/mw2010/papers/chan/chan.html

Chan, S. (July 2010). http://www.powerhousemuseum.com/dmsblog/index.php/2010/07/05/malcolm-tredinnick-on-some-problems-with-working-with-our-collection-dataset/

Chan, S. (October 2010). http://www.powerhousemuseum.com/dmsblog/index.php/2010/10/18/launch-of-the-powerhouse-museum-collection-api-v1-at-amped/

Ellis, M. and D. Zambonini. (2009). 'Hoard.it: Aggregating, Displaying and Mining Object-Data Without Consent (or: Big, Hairy, Audacious Goals for Museum Collections On-line)'. In J. Trant and D. Bearman (eds). Museums and the Web 2009: Proceedings. Toronto: Archives & Museum Informatics. Published 31 March 2009. http://www.archimuse.com/mw2009/papers/ellis/ellis.html

Kirkpatrick, M. (2011). 'Something's Keeping Wikipedia from Becoming a Platform', 14 January 2011. http://m.readwriteweb.com/archives/why_wikipedia_struggles_to.php

Morgan, R. (2009). 'What is Your Museum Good at, and How Do You Build an API for It?'. In J. Trant and D. Bearman (eds). Museums and the Web 2009: Proceedings. Toronto: Archives & Museum Informatics. Published March 31, 2009. http://www.archimuse.com/mw2009/papers/morgan/morgan.html

Richardson, L. and S. Ruby (2007). RESTful Web Services. O'Reilly Media Inc.

Singh, T. (2009). 'REST vs. SOAP - The Right WebService', 24 August 2009. http://www.taranfx.com/rest-vs-soap-using-http-choosing-the-right-webservice-protocol.

TLF (2009). 'Museum & Education Digital Content Exchange - Final Report'. PDF available from: http://www.thelearningfederation.edu.au/for_jurisdictions/planning,_reports_and_research/research2009.html

Cite as:

Dearnley, L., Reprogramming The Museum. In J. Trant and D. Bearman (eds). Museums and the Web 2011: Proceedings. Toronto: Archives & Museum Informatics. Published March 31, 2011. Consulted

http://conference.archimuse.com/mw2011/papers/reprogramming_the_museum

Program Item Reference: