Skip to main content

Museums and the Web

An annual conference exploring the social, cultural, design, technological, economic, and organizational issues of culture, science and heritage on-line.

Authority Records, Future Computers and Other Unfinished Histories

Aaron Straup Cope, Stamen Design, USA

Abstract

What becomes the role for institutions and scholars charged with the study and safe-keeping of the past and the near-future when traditional methodologies like "authority records" are forced to compete with automated data collection, machine learning, the now-suddenly-practical reality of "big data" and the rise of broad communities of participation?

The breadth and reach of the Internet and the availability of alternative data sources, whether they are harvested programmatically or fashioned by amateur communities of interest, has created a world where both the conceptual and financial economics of traditional scholarship are rapidly being undermined. Further, in the absence of a way for non-experts to feel as though they can participate in the discourse outside of established venues and vocabularies, the opinions and assumed meritocracies of experts are increasingly being overlooked entirely.

What would it mean to change the role of digital preservation and scholarly interpretation from one where it looks and feels, to those the outside, like castle walls to be more like a rough guide composed of road signs and fence-posts? To consider a project whose goal is no longer to weave elaborate tapestries of the past facts but to produce textiles, and patterns, to be fashioned into reflections of the present?

Keywords: digital preservation, linked data, mashups, scholarships

"In other words, if you could easily forget the masses of institutions, skills, conventions and instruments that went into the making of a beautifully printed atlas, it is much more difficult to do so now that we are constantly reminded of the number of satellites presiding over our GPS, of the sudden disappearance of network coverage, of the variations in data quality, of the irruption of censorship, of the inputs of final users sending back data, and so on. As usual, far from increasing the feeling of dematerialization, digital techniques have rematerialized the whole chain of production." (Camacho-Hubner & Latour)

"We're entering a world where we can all leave as much of a legacy as George Bush or Bill Clinton. Maybe that's the ultimate democratization." (Walker)

1. Carbonite or methanal

The most interesting question facing scholars and archivists today is what to do about communities of amateurs. This is no longer about institutions developing a social media or participating in social networking sites. It is about the ability of communities of interest to organize themselves, asynchronously and across geographies, around a subject and create their own authoritative datasets alongside, and occasionally in opposition to, those that have been painstakingly nurtured by experts. It is about the de facto commoditization of facts whether they are harvested by companies like Google or actively unearthed by open data initiatives (or freedom of information requests), or simply collected brute-force by enthusiasts who lack the financial means or academic accreditation to access information.

In effect, the entire question of historical stewardship is, once more, being actively redefined. In many cases the question is evolving simply as a by-product of communities forming to address a need left unmet, or simply realizing that gaps in the historical record can be filled with their own efforts.

The good news is that these projects share many, if not most, of the same sensibilities that professional scholars do in collecting information and interpreting its meaning and veracity. The bad news, perhaps, is that as often as not they also adhere to the adage that "Good enough is perfect".

This last point needs to be emphasized: What has fundamentally changed is, largely thanks to the Internet, people's ability to organize themselves. In the absence of any other avenues for the incorporation of their interests (whether they are so-called "low-brow" culture yet to be acknowledged or simply artifacts not considered worthy to be part of an academic canon of works), that self-organizing also fosters an awareness and understanding they no longer require the blessing of an established elite to create their own record(s) of authority.

People are actively creating parallel "registries" that mirror those traditionally hosted by scholars and experts if for no other reason than they offer a more inclusive definition of what constitutes "important". Most, if not all, of these projects have or will suffer from mistakes long since made and learned from by others, but those that are outwardly celebrated will be heralded as proof that ad-hoc structure emphasizing convention, open debate and the shared responsibility of many eyes watching over a project ensures its resilience. Whatever else these projects accomplish, they create a zone of safe-keeping, however, tenuous, for works not yet deemed worthy of scholarly attention but which are the fabric of contemporary life.

As these efforts mature, many are taking on overtly political overtones as individuals are able to imagine meeting these needs as a social good rather than an economic resource: Wikipedia (http://www.wikipedia.org/), Open Street Map (OSM) (http://www.openstreetmap.org/), Creative Commons (http://creativecommons.org/) being the most visible. Put another way: something is better than nothing.

Meanwhile, the potentially very bad news is that companies like Google (http://www.google.com/) and, more recently, Facebook (http://www.facebook.com) have begun to focus the tools and technologies they've developed for their core businesses at the broad arena of cultural heritage. Whether or not anyone in the humanities realizes or wants it, they are now competing with these same companies.

These are companies whose entire business rests on their ability to build a better bucket-sorting framework, developing and refining the formal taxonomies and infrastructure designed for search and retrieval across a global space, both literal and metaphorical. They have, in effect, created a system for minting and maintaining authority records in the service of purely commercial needs. The consequence shouldn't be underestimated: the authority records produced by the cultural heritage community become no more difficult to absorb, to manage, to supersede than any other "product line".

In Google's case, this means combining the strength of their existing search algorithms with massive computing power and a willingness to, literally, pay the salaries of people who are sent out to explore new ideas and inroads. In Facebook's case, it will likely be an extension of their OpenGraph initiative (http://blog.facebook.com/blog.php?post=383404517130) which claims to "incorporate web pages into the social graph" by encouraging the explicit addition of metadata to web pages which is then harvested and filtered and vetted using a combination of computer algorithms and community decision-making (by [WORDS] willingness of websites to also include the Facebook 'Like' button). As a result, every webpage becomes a kind of weighted authority record.

Although no one from Facebook has ever discussed the OpenGraph initiative this way, it is not difficult to see the OpenGraph as a low-intensity battle with Wikipedia to redefine the entire question of what a dictionary is and to become the locus for meaning and ideas for the Web, the Internet and entire societies. And it is worth asking the question: Why shouldn't they?

If these organizations are already acting as the sense-making tools for the highly focused needs of individuals and tightly knit groups, then what is the harm in applying the lessons they've learned to the humanities? If facts are reduced to the square pegs of a database schema, why wouldn't Amazon (http://www.amazon.com/), for example, choose to deliver it as a faster and cheaper service?

An obvious risk in this scenario is that either service becomes a kind of gravity well for ideas, sucking them in with a force that only prevents them from ever escaping. The consequence is that, acting as commercial enterprises, they either abuse their position of trust or are unable to provide assurances that the bodies of knowledge they accumulate will be adequately safe-guarded. This is as much a problem caused by any one organization's desire to be the dominant actor in an endeavor as it is a natural human failure, putting all of our eggs in one basket. (Wikipedia occupies a similar space to the degree that they want their project to be an actual reference and more than just a catalog of links to other sites; there is always going to be the potential for them to become a giant single conceptual point of failure.)

Still, if the "authority record" really is the scholarly pillar on which the humanities rest, then it is not hard to look at the contemporary landscape and the approaching horizon and conclude that it will continue to be attacked by computer programs and individuals alike, eventually being surpassed in both breadth and coverage.

This might sound like a special kind of grim meat-hook future, but it needn't be.

2. Bias is a four-letter word

In 2010, the Museums and the Web conference held a one-day workshop (http://www.archimuse.com/mw2010/abstracts/prg_335002379.html) between members of the museum and the Wikipedia communities with the goal of trying to establish a better working relationship. I attended, and from an outsider's perspective it was both a frustrating and fascinating event to participate in. Neither group seemed to fully understand the other, and when things weren't going well, I observed the following dynamics:

Wikipedia authors were encouraged to "buddy up" with professional curators to help write authoritative articles in a way that presumed whatever had been written so far was charming but most likely unworthy of scholarly consideration, somehow ignoring or conveniently side-stepping everything that Wikipedia has accomplished in its short ten-year existence.

Wikipedia authors essentially told the museum community to give up now and hand over all their data, and that the many eyes of the Wikipedia community would "make shallow" any and all questions of authority around a subject producing a single objective article encompassing all possible points of view and interpretations.

Although the meeting itself produced little in the way of tangible concrete results, I think that it was a success because it forced both communities to confront the scent of each other's bias.

All organizations live and die by the oral culture they produce. Those stories are the collective and historical statements of bias of a group, and they serve as the sign-posts of the past that guide its current members forward. All organizations, sooner or later, struggle with the task of marshaling that oral tradition into a more rigid framework that aims to capture the essence of the history, but in a controlled and easily repeatable fashion. Paramount in many of these systems is the idea of complex search and database facilities to answer the multitude of questions that may exist.

The problem with this scenario is that stories evolve and databases don't (or when they do, not nearly fast enough). Rather than the systems adapting to the needs of the users, what ends up happening is a kind of intellectual body-modification in the service of the framework. This often leads to a perverse language of expertise geared towards the needs of a database that only a few people may have mastered, and without any of the underlying richness of the stories first told. Conversational shortcuts, in the end, become an entire controlled vocabulary unto themselves.

This, in many ways, describes the situation that cultural heritage institutions find themselves in today. There is a very real tension between the evolution and measure of scholarship in institutions and their perceived role as keepers of histories - of being the story-tellers - and the fact that, quite literally, fewer and fewer people who aren't involved in the day-to-day operations of an institution understand less and less of what is being said.

The authority record is not solely to blame for this situation and will continue to play a role, but it is no longer the governing concern it might have once been. To mistake the rigors of convention and best practices around sharing information as an end unto itself instead of simply being tools in the service of making sense of the world is, put bluntly, a recipe for irrelevance.

If the task of accumulating and vetting raw facts is being "automated", whether it be by machine learning or communities of interest, then why not embrace the moment as a chance to once again champion that which cannot be codified? Why not, then, redefine the question from one of authority records and lists to one of genuine interpretation and bias? Why not embrace language and rediscover an oral history outside of the confines of controlled vocabularies and the cognitive overhead of metadata standards?

Computers, despite their relentless advances, are still pretty dumb, and language is still magic. Among the most impressive contributions of the Wikipedia project are the "history" (http://secure.wikimedia.org/wikipedia/en/wiki/Help:Page_history) and "talk" pages (http://secure.wikimedia.org/wikipedia/en/wiki/Help:Using_talk_pages). The former simply list every change to a record in reverse chronological order, while the latter are forums to discuss improvements or changes to an article. These "addenda" become, in reality, as much a part of an article as the text of the subject they describe; their omission not only lessens the value of an article but also calls into question its legitimacy. (The best example of this idea, to date, has been James Bridle's Iraq War "historiography" http://booktwo.org/notebook/wikipedia-historiography/).

Why not seize the opportunity this model presents to change the focus of the authority record from one of considered fact to one that continues to present the consensus around interpretation, but actively encourages the dispute not as an abstract idea but as fundamental to how a work is understood? Shift the emphasis from selfish search and retrieval to the kind of oral histories and cautionary tales - camp fire stories - that seep into collective memory and motive. After all, if everything is just a record in a lookup table, doesn't that make history little more than a calculator?

Beginning in 2005, the United States of America mandated that the "811" phone number be reserved for use as a resource for companies and individuals to inquire about underground public utilities, specifically in order that they not damage existing pipes or cables. Cities in the USA have also been implementing a "311" phone service for citizens to report non-emergency problems with the public works department. What would it look like to imagine a similar "811" or "311" service designed for authority records, less to mirror functionality, but to reflect the intent of harnessing the good will of the community and of using it is an invitation to participate in the larger project?

In their paper on reconstructing 3D models from ad hoc photo collections "Building Rome on a Cloudless Day" (http://grail.cs.washington.edu/rome/), the authors talk about talk generating "local iconic scene graphs" composed of distinct clusters of photographs, saying: "[W]e intentionally construct multiple sub-models that may share some images. We use these images to merge newly completed sub-models with existing ones whenever sufficient 3D matches exist." This seems like a more useful way to consider the authority record moving forward as a sort of constantly moving skeleton, or scaffolding, over which we lay the richer stories that tell the meaning of a thing. Those stories are what people look to scholars and experts for; they are the music that does not lend itself to codification, that leaves room for the unexpected and that breathes life in to very idea of research itself.

3. History has always been "lossy"

One of the things that struck me the first time I visited Rome is just how bad many of the Renaissance painters actually were. It turns that there were a lot of ceilings and walls to paint, back then, and into this vacuum came an army of craftsmen equipped with both technique and the philosophy of the moment. Still, despite the obvious skill of execution, much of the work fails to inspire. In contrast are the works of Caravaggio, scattered across Rome, no less versed in technique, but whose inky black shadows consume the rooms in which they hang. What separates Caravaggio's work is that for all its realism and Renaissance "street cred," it depicts a world that does not exist. Put simply: light, in the real world, just doesn't do the sorts of things that happen in Caravaggio's paintings.

The phrase "here be dragons" refers to the practice of marking unknown or uncharted territories on a map. It's a lovely expression because it admits what we do not know while at the same time encouraging (taunting, perhaps) further discovery. What does it mean to use all these amazing tools we've created to chase away all the "dragons" as tools to actively look for new dragons, to be able to shine the light around and expose all those other places left uncharted?

We are fortunate to be living in a time when both technology and individuals are predisposed to experimentation that makes what was once a seemingly impossible problem merely difficult. The exciting part about working in the Now-ish is that the barriers to entry across disciplines are coming down with an almost frightening speed and opening themselves up beyond the history of their specializations. As daunting as these new realities are, and the challenges they pose, they are also a kind of super-power. How will we do with these new abilities what Caravaggio did with light and shadows?

4. References

Building Rome on a Cloudless Day. http://grail.cs.washington.edu/rome/

Camacho-Hubner, Eduardo & Bruno Latour. "Entering a risky territory: space in the age of digital navigation". http://www.nytimes.com/2011/01/09/magazine/09Immortality-t.html

Walker, Rob. "Cyberspace When You're Dead". http://www.nytimes.com/2011/01/09/magazine/09Immortality-t.html

Cite as:

Cope, A., Authority Records, Future Computers and Other Unfinished Histories. In J. Trant and D. Bearman (eds). Museums and the Web 2011: Proceedings. Toronto: Archives & Museum Informatics. Published March 31, 2011. Consulted http://conference.archimuse.com/mw2011/papers/authority_records_future_computers