info @ archimuse.com
published: April, 2002
Evaluating The Features of Museum Websites: (The Bologna Report)
Nicoletta Di Blas, HOC-DEI, Politecnico di Milano, Maria Pia Guermand, IBC, Istituto Beni Culturali, Emilia Romagna, Carolina Orsini, Università di Bologna, Paolo Paolini, Politecnico di Milano, Italy
MiLE (Milano Lugano Evaluation Method) is an innovative method for evaluating the quality and usability of hypermedia applications. This paper focuses upon the specific module of MiLE concerning cultural heritage applications, synthesizing the results of research carried on by a group of seven museum experts of Bologna (Italy), with the joint coordination of IBC (Institute for the Cultural Heritage of the Emilia Romagna Region) and Politecnico di Milano. The Bologna group is composed of different professional figures working in the museum domain: museum curators of artistic, archaeological and historical heritage; museum communication experts; Web sites of cultural institutions communication experts.
After illustrating the general features of MiLE and the specific features for Cultural Heritage, we will briefly show a few of the results which are to be published in the Bologna Report.
Keywords: usability, inspection method, cultural heritage, users scenarios
1. MiLE in a nutshell
MiLE is based upon a combination of Inspection (i.e. an expert evaluator, systematically exploring the application) and Empirical Testing (i.e. a panel of end users actually using the application, under the guidance and the observation of usability experts). If this combination of the two methods is not new (several usability methods propose, in fact, a similar combination), the innovation of MiLE comes from the set of guidelines being used for making both inspection and empirical testing more effective and reliable. In extreme synthesis, we introduce two specifics (heuristic concepts):
Abstract Tasks, ATs in short, used for inspection. They are a list of generic actions (generic in that they can be applied to a wide range of applications) capable of leading the inspector through the maze of the different parts and levels an application is made of, as the Ariadnes thread. MiLE, in fact, provides inspectors with some guidelines that draw their attention to the most relevant features of the application.
Concrete Tasks, CTs in short. They are a list of specific actions (specific in that they are defined for a single application) which users are required to perform while exploring the application for the empirical testing.
Inspection is the focus of this paper, and we will not further explore the issues concerning empirical testing. One contribution of MiLE is the emphasis on the need for separating different levels of analysis: technology, navigation, content, illocutionary force, graphic, etc. For each level a library of Abstract Tasks has to be prepared, when building the method, in order to support the inspection. The Abstract Tasks are nothing but the marrow of each levels experts knowledge. For some levels (e.g. graphic or navigation), the abstract tasks can be generally independent from the specific application domain; for other levels (e.g. content) we have different tasks according to the application domain (i.e., specific tasks for the cultural heritage domain, for the e-commerce domain, and so on).
The inspector has to understand the clients communicative goals, combine them with the intended users probable requirements, and then select the appropriate set of tasks to perform. If, for example, we have to evaluate a museums site that is specifically meant to attract visitors to the real museum, we as inspectors concentrate on all those tasks that at the content level involve the practical services parts of the site (opening hours, buy a Ticket, etc.).
When performing inspection, the inspector has to check a list of attributes concerning the different facets of usability/quality (e.g. richness, completeness, etc.). For each attribute (in relation to a specific AT), a score must be given. After the scoring phase is over, the set of collected scores is analyzed through weights which define the relevance of each attribute for a specific goal (or, technically speaking, for a user scenario).
Weighting allows us a clean separation between the scoring phase (using the application, performing the tasks, and examining them) from the evaluation phase in a strict sense, where different possible usages are considered. Let us introduce a simple example: assume that a navigation feature (e.g. using indexes) is not very powerful, but very easy to learn. What should the evaluation be? With MiLE the inspector can provide a score (e.g. 9/10 for predictability and 2/10 for powerfulness) for the navigation. Later, figuring out two different user scenarios (e.g. casual users and professional users), the evaluator (possibly different from the inspector), can assign two different pairs of weights to the attributes predictability and powerfulness. The weights, for example could be <0.8 (predictability), 0.2 (powerfulness)>, for casual users, or <0.1 (predictability), 0.9 (powerfulness)> for professional users. The weighted score for the navigation feature is very different of course (7.6 for casual users and 2.7 respectively), but it reflects the different users scenarios. The inspector could therefore conclude that the application (at least for this feature) is well suited for casual users, while it is somehow ineffective for professional users. Trying different weighting systems allows the evaluator to test different user scenarios using the same set of scores derived from the inspection.
In short, an inspection with MiLE requires the following steps:
The reliability of the method has proved to be very high. Execution of the Abstract Tasks (at navigation and content level) allows producing more reliable evaluation results and helps spot unexpected usability problems (inconsistencies, lack of clarity, etc.). Even at-first-sight agreeable sites, when put to the test through a systematic inspection à la MiLE, may reveal weaknesses and defects.
Inspection already provides valuable evaluations; in some cases, however, panels of users may be required for double checking. When empirical testing is required, users are given a list of concrete tasks, i.e. a list of specific actions that they are asked to perform. Concrete tasks definition (different for each case) is based upon the results of the inspection, which has identified portions of the application, tasks and attributes that need special attention. Real users can fine-tune the inspectors observations, confirming them (very often), or dissenting from them (seldom) or spotting additional problems. We now discuss the above outlined approach applied to cultural heritage applications.
2. Guidelines for Evaluating Cultural Heritage Applications
Some of the features (such as navigation or layout) of an application can be examined largely independently from a specific application domain; other features, such as content or functions offered to the users, require a different evaluation schema for each application domain. In order to explore functions and contents for museum Web sites (a specific sub-domain, within the larger domain of cultural heritage applications), a specific panel of experts (the so-called Bologna group) has been created, with a partnership between Politecnico di Milano and Istituto Beni Culturali, a regional organization supervising cultural heritage activities in a large region, Emilia Romagna, with headquarters in Bologna. The group is composed of museum curators (both of archaeology, modern and contemporary art museums and galleries), museum communication experts, and researchers of new technologies for cultural heritage.
The first step of the Bologna group was to identify the main pieces of a generic Museum Web site. In order to avoid the danger of wish listing the sum of what everybody could foresee as the ideal Web site, we took an empirical standing: we selected a large number of sites and considered them to be the universe of discourse. The resulting model (shown also in the appendix) is therefore a synthesis of contents and features found in those sites. At this stage of the research, we have listed more than a hundred elementary constituents, organized into three main groups:
A. site presentation: general information about the Web site;
B. museum presentation: contents and functions referring to a physical museum (like arrows pointing to the real world);
C. the virtual museum: contents and functions exploiting the communicative strength of the medium.
A further analysis has allowed us to detect high level constituents such as, collections, services, promotion, which gather the elementary constituents (a full account of all the pieces of the model can be found as an Appendix to this paper).
The next job has been to define a set of users scenarios as a way to build a library of suitable ATs. A user scenario, in this context, is a pair <user profile, operation (that users may wish to perform)>. In simple words, we tried to identify a number of user profiles (culture, expertise, interests, etc.) and for each of them we tried to provide a set of meaningful answers to this simple question: What might a user (with this profile) want to do with the application?
Therefore the tasks are coupled to user profiles, in the sense that a given task may be interesting for a given profile, and/or meaningless (or irrelevant) for a different profile. When the inspectors perform an inspection, they will learn from their customers who the intended users of the application are and will concentrate on those tasks likely to be performed by these customers. In any case the Inspectors will be free to create new tasks that fit better the communicative goals of the application, as long as they follow the guidelines of the method and its philosophy.
Overall we have classified ATs according to two different dimensions: scope and concern. Possible values for scope are the following: Narrow (a specific item is interesting), Complex (several items are interesting), and General (a generic overview). Possible values for concern are Practical Info (the user wants to gather useful information), Operational (the user wants to do something) and Cognitive (the user wishes to learn something).
The following table shows some examples of AT, classified accordingly:
Regarding the users, we took into consideration a number of variables, such as age, expertise, professional interest (e.g. school students, fine arts students, fine arts experts, tourists, etc.) Each relevant user profile is based upon a number of these variables.
On-going research work consists of identifying the largest possible number of ATs. More than fifty ATs have been identified so far (while 49 ATs where identified for navigation, in another research), and more than double that number will be the likely result.
Regarding the list of attributes to be scored during the inspection, we started with the idea that they would be different for each AT. At the moment, however, we have developed the following list which seems to be applicable (with minor problems only) virtually to any AT:
3. Some Examples
In this section we will introduce a few examples of inspection to help the reader grasp how our method works. The examples are very simple, and are taken from actual Web sites. We hope that in the period between the writing of this paper and the reading of it by a user, the Web sites will not be modified, so that the readers may try directly to inspect them. (The impossibility of freezing Web sites, in practice, makes it difficult to develop examples of inspection that could maintain their validity over a long span of time.)
Example 1 (practical Info AT)
find the events/exhibitions/lectures occurring on a specific date in a real museum
The users scenario for this task is that of well-educated French-speaking tourists (who can speak English too), first-time visitors to the site, who know that on March 9th (Saturday), 2002, they will be in the town where the real museum is actually located. Therefore they would like to know what special exhibitions or activities of any kind (lectures, guided tours, concerts) will take place in that day.
We performed this task on many different Web sites, and we describe here our findings for the Louvre site (www.louvre.fr) and the Royal Ontario Museum site (www.rom.on.ca), on the basis of an inspection that took place on February 13th, 2002. The focus of our attention is the section named information about museum activities and events in our schema (see the Appendix for the details).
The relevant attributes that we use for this brief example are the following:
(A1) currency of the information;
(A2) quality of the organization of the information, since users are looking for operational support;
(A3) multilinguisticity, fundamental for an international audience;
(A4) richness of the information provided, very important in order to make understandable the potential interest of the events.
The table below synthesizes our scoring and evaluation.
a) We are evaluating a specific task and not expressing a global evaluation; in addition, we are scoring each single attribute. This level of detail introduces two advantages: precision of the feedback to application designers and possibility of pinpointing the causes for possible discrepancies among different inspectors.
b) Through weights we can take into account the specific objectives for the (portion of the) application. In the example above, we gave great relevance to attributes A2 and A1, and minor relevance to A3 and A4.
c) Global concise evaluation can be obtained trough combining the evaluation for each attribute (as in the above table), and/or combining the evaluation for the different ATs (again using weights in order to attribute different relevance to each AT).
d) Different systems of weights can be used in order to take into account different user profiles.
Example 2 (Cognitive AT)
find all the works of an artist shown in the site
This task might be performed by a high-school student looking for some information about an artist hes currently studying at school; lets say Giovanni Battista Tiepolo. He finds out that some of Tiepolos works are kept by the Met Museum (www.metmuseum.org) and by the Hermitage Museum (www.hermitagemuseum.org).
The relevant attributes that we will use for this brief example are the following:
(A1) effectiveness of the information;
(A2) completeness of the information;
(A3) richness of the information;
(A4) navigation organization.
Using The Metropolitan Museums Website, we have two choices: to use the search engine, or to navigate the site. Writing the name Tiepolo in the search window of the home page, we get a list of more than 200 records, many of which refer to the shop on-line. If we leave aside this too-long list and enter the section collections, we can use the tool search the collection, inserting again the name Tiepolo (or the full name Giovanni Battista Tiepolo, in order to avoid the mixing between Giovanni Battistas and Giovanni Domenicos works). This more precise searching tool gives as a result the list of the 23 Giovanni Battista Tiepolos works of art shown in the Web site. For each of the works, we have also the basic data, a description and the possibility of zooming the image.
If we decide to ignore the search and to find what were looking for by navigating the site, then we have to reach the sub-section European paintings of the section the collection; here we find a brief introduction in which theres a mention of our author and a link. Following the link we are shown a single work of Tiepolo (Allegory of the planets and continents), but having entered the guided tour of the departments highlights, if we click on the next or previous buttons, then we find other artists works of art but no more Tiepolos. In order to perform our task, we have either to check one by one the 2275 items preserved by the department (clicking on entire department) or, if we dont want to spend too many hours in front of the video, to turn again to the search tool.
The Hermitage Web site offers a similar functionality, for we have a search tool right in the home page. Inserting the name Tiepolo, we are given a list of 10 works of art (each of them with the zoom); one of them (not the first nor the last, therefore it takes a careful reading to notice it) is in fact not by Giovanni Battista but by Giovanni Domenico Tiepolo; in any case, we are not allowed to search by the full name (Giovanni Battista Tiepolo). There isnt a description of the works.
If instead we navigate the site, we can choose either collection highlights or digital collection. In the former case, we have further to select the option western European art and eventually painting; after all this path we are given an introduction to Western European painting with some little icons on the right side, one of them representing the painting by Tiepolo: Maecenas Presenting the Liberal Arts to Emperor Augustus. A link leads to a bigger image and a description. As an alternative, we can decide to browse the digital collection by type of art work (paintings prints and drawings) and artist, getting nine results (corresponding in this case to the nine works by Giovanni Battista Tiepolo).
On the whole, we can say that both sites permit reaching the wished information only by using the search engines: this can be considered a sign of poor organization of the information.
For the Metropolitan, the collections search engine must be used and not the main search; otherwise the user may get completely lost! The Hermitage search engine doesnt distinguish between the works of Giovanni Battista and Giovanni Domenico Tiepolo. The Metropolitan Museum offers a good description of all the items, whilst the Hermitage offers only one single works description (the others are simply shown).
The table below synthesize our scoring and evaluation.
Table 3: The scores and the evaluation for all the works of a given artist
We should note first, that in this case the weights do not change the relative evaluation of the two sites, but rather reduce both of them, given the high relevance assigned to A4. Secondly, we can see that if more details about navigation are wanted, then a different level of analysis should be entered: we have devised nearly 50 different tasks in order to inspect navigation features precisely.
4. Conclusions and Future Work
The general distinctive features introduced by MILE can be synthesized as follows:
The specific contribution of the Bologna group (there is also another research group, coordinated by The Museum of Science and Technology of Milan, examining the same issue for scientific and technical museums) is described in this paper. Our task has been the identification of a general framework for defining a set of AT suitable for Art Museum Web sites. The framework is the result of an extensive analysis of several Web sites which are now the objective of our trial inspection.
The current work consists of identifying, through the ATs, the universe of possible functions that a museum Web site should support; the next step will be to pair user profile features with ATs. The goal is to generate an overall schema showing what type of user is interested and in what information/action. The combination of user-profile/AT is what we mean by User Scenario; therefore, we could also say that we are trying to build a large set of possible user scenarios for museum Web sites.
We aim at providing a contribution to the community of people interested in museum Web sites (museum curators, designers, Web managers, etc.), sharing our understanding of what it means to evaluate quality and usability of virtual artifacts.
Since the amount of work to be performed is immense, and we would like to generate a discussion in a large community, the authors encourage all interested persons to contact them in order to enlarge the scope and the validity of this research in evaluation.
Costabile, M.F., Garzotto, F., Matera, M., Paolini, P. The SUE Inspection: A Systematic and Effective Method for Usability Evaluation of Hypermedia. In IEEE Transactions on Systems, Man, and Cybernetics - In print.
Costabile, M.F., Garzotto, F., Matera, M., Paolini, P. Abstract Tasks and Concrete Tasks for the Evaluation of Multimedia Applications Presented at the Int. Workshop on Theoretical Foundations of Design, Use, and Evaluation, Los Angeles, CA, USA, 1998.
De Angeli, M. Costabile, F., Garzotto, F. Matera, M., Paolini, P. On the advantages of a Systematic Inspection for Evaluating Hypermedia Usability. Iin International Journal of Human Computer Interaction, Erlbaoum Publ. - In print.
De Angeli, M. Matera, M. F. ,Costabile, F. ,Garzotto, F., and Paolini, P.. Validating the SUE Inspection Technique. In Proc. AVI, 2000, pp. 143-150.
Garzotto, F., Matera, M., Paolini, P. A Framework for Hypermedia Design and Usability Evaluation. In Designing Effective and Usable Multimedia Systems, P. Jonhson, A. Sutcliffe, J. Ziegler Eds., Boston, MA: Kluwer Academic, 1998, pp. 7-21.
We wish to acknowledge the work of the other members of the Bologna group, who made this (still on going) research effort possible. We therefore warmly thank Dede Auregli (Galleria d'Arte Moderna di Bologna), Gilberta Franzoni (Musei Civici di Arte Antica di Bologna), Paola Giovetti (Museo Civico Archeologico di Bologna), Laura Minarini (Museo Civico Archeologico di Bologna), Federica Liguori (Politecnico di Milano), Uliana Zanetti (Galleria d'Arte Moderna di Bologna).
Appendix: the Contents Survey Schema for Museum Web sites
1. first section: sites presentation
This part of the schema describes the information about the web-site structure
2. second section: the real museum
3. Third section: the virtual museum