Skip to main content

Museums and the Web

An annual conference exploring the social, cultural, design, technological, economic, and organizational issues of culture, science and heritage on-line.

Computational Linguistics in Museums: Applications for Cultural Datasets


As museums continue to develop more sophisticated techniques for managing and analyzing cultural data, many are beginning to encounter challenges when trying to deal with the nuances of language and automated processing tools.  How might user-generated comments be harvested and processed to determine the nature of the comment?  Is it possible to use existing collection documentation to derive relations between similar objects?  How can we train systems to automatically recognize (disambiguate) different meanings of the same word? Can automated language processing lead to more compelling browsing interfaces for online collections?

Luckily, a good deal of expertise and tools exist within the field of computational linguistics that can be applied to these problems to achieve meaningful results.  Informed by previous work in computational linguistics and relevant project experience, the authors will address a number of these questions providing insight about how answers to impact museum practice might be found. Authors will share tools and resources that museum software developers can use to prototype and experiment with these techniques - without being experts in language processing themselves.  In addition, the authors will describe the work of the T3: Text, Tags, Trust research project and how they have applied these tools to a large shared dataset of object metadata and social tags collected by the project. 

Specific challenges regarding batch-processing tools and large datasets will be addressed.  Best practices and algorithms will be shared for dealing with a number of sticky issues. Directions for future research and promising application areas will be also be discussed.


Paper - in formal session


jklavans's picture
Judith L. Klavans, Ph.D. has more than 30 years of experience in many aspects of language processing including text-mining, machine translation, text-to-speech, health informatics and language standards for interoperability. She has worked in many application areas, and is noted for her creative...
rjstein's picture
Robert Stein is the Deputy Director for Research, Technology and Engagement at the Indianapolis Museum of Art (IMA). In that role, Stein leads a wide range of activities for the museum including an extensive effort in media, web technology and software. Since 2006, he has played a significant role...
schun's picture
Susan Chun is a researcher and consultant to cultural heritage organizations specializing in publishing; intellectual property policy and open content initiatives; information management; visualization; advanced search strategies; and multilingual content development and management. She leads...
rguerra's picture
Raul Guerra is a second year Ph.D. student in the Computer Science Department at the University of Maryland. His research interests lie in finding ways to make sense, to find insights, and to help humans analyze the enormous amounts of data that are being generated by computers by coming up with...