Evaluating Context-Aware Mobile Applications In Museums: Experiences from the MUSE Project
Tullio Salmon Cinotti, Giuseppe Raffa, and Luca Roffia, UniversitÓ di Bologna; Franca Garzotto, Politecnico di Milano; Rossana Muzii, Viviana Varlese, Soprintendenza Speciale per il Polo Museale Napoletano; Maurizio Malavasi and Stefania Galasso, Ducati Sistemi S.p.A., Italy
The MUSE project has built a proprietary, context-aware wearable terminal called WHYRE®and has developed, for the domain of cultural tourism, a general framework for implementing multichannel Web applications, i.e. applications that can deliver multimedia content and services on different devices (both stationary and mobile). The MUSE technology is currently applied to three cultural contexts: Il Museo e Certosa di San Martino - a primary institution dedicated to the history and the artistic traditions of the city of Naples, the Institute and Museum of the History of Science in Florence, and the archeological site of Pompeii. This paper discusses the evaluation of the mobile context-aware multimedia version of the application developed for the Museum and Charterhouse of San Martino. The main goal of our research is to evaluate: i) general user satisfaction; ii) the multimedia content design (i.e., the soundness of the different media and contents in different situations within the museum); iii) the ergonomy of WHYRE®(i.e., how comfortable users felt in using the mobile device); iv) the usability of the navigation and interaction design (focusing, in particular, on those aspects that are peculiar to context-aware mobile systems, such as the understandability of the context-aware behavior and the effectiveness of a multi-modal approach). Evaluation has been carried out using two complementary methods: questionnaire-based user testing (involving representative samples of end users) and heuristic inspection (performed by usability experts and based on the MiLE evaluation technique).
Keywords: Mobile Computing, context-awareness, multi-channel multimedia, usability inspection, user testing, historical museum, Charterhouse, cultural tourism
Mobile context-aware technology is a promising support for applications exploiting innovative paradigms of human-computer interaction. While many mobility support products already exist for geographic areas (for example, car navigators), not so common are the applications finalized to space deciphering in confined areas, and there are few studies that evaluate their quality and their added value, Proctor, 2003 and Gleue, 2002 being two of these. This paper presents a currently on-going usability study on context-aware applications, where a mobile terminal acts as a personal guide in museums and archaeological sites.
The work reported on is carried out within the MUSE project at Il Museo e Certosa di San Martino (Museum and Charterhouse of San Martino), a primary history and art Museum in Naples. MUSE is an industrial research project managed by Ducati Sistemi, a Bologna-based company, within the Italian National Research Program on Cultural Heritage Parnaso. MUSE was first presented in Malavasi, 2000; Salmon Cinotti, 2001; its multichannel nature and its application design framework were presented at Museums and the Web 2003 (Garzotto, 2003), and the pilot application for the Museum and Charterhouse of San Martino was demonstrated at ICHIM'03 (Muzii, 2003).
The main project goal is the development of a software and hardware platform for multi-channel, context-aware applications in the field of cultural tourism. Pilot applications of the platforms are already installed not only in Naples, but also at the Institute and Museum of the History of Science in Florence (Barattin, 2003), and soon it will be demonstrated in Pompeii.
MUSE design solutions aim to extend the visitors' perception and memory, and to increase their consciousness level, in order to create new expectations, new demand and consequently new business opportunities. Basically the idea is to use multimedia and Internet technologies to pull the public to the museums (Samis, 2001; Woodruff, 2001), by strengthening the cultural objects' readability, their contextualization and their thematic analysis (Sacco, 2003). Mobile context-aware technology is used to simplify navigation and to provide immediate access to the relevant multimedia contents on site, in front of the exhibits.
After a quick review of the MUSE system architecture and its application design solutions, this paper concentrates on the evaluation of WHYRE®, the mobile context-aware proprietary device developed within MUSE, and of the currently available WHYRE®-based multimedia personal guide to the Museum and Charterhouse of San Martino.
The MUSE System in San Martino
For many years La Soprintendenza Speciale per il Polo Museale di Napoli has been active in setting up dynamic and effective initiatives aiming to enhance the value of its Museums; namely, il Museo di Capodimonte, la Certosa e museo di San Martino, il Castel Sant'Elmo, il museo Duca di Martina, il museo Diego Aragona Pignatelli Cortes in Napoli and la Certosa di San Giacomo in Capri.
In this context the Soprintendenza was very pleased to host MUSE and to concentrate its application on La Certosa e museo di San Martino, the most impressing, charming and evocative of its monuments.
Facing the entire Parthenopean Gulf, the Charterhouse is located in natural and enchanting scenery. The complex structure of its architectural space (Figure 1) includes a sumptuous church with chapels, solemn cloisters, hanging gardens and balconies. Its new museum ordering - unveiled in the year 2000 and made possible by the work of architect Adele Pezzullo (Spinosa, 2000) - enriches visitor experience with large and differentiated artistic collections.
As a first action to support MUSE, the Soprintendenza authorized the wiring of the Museum with optical fibres and cables. This was a remarkable problem - the need to be non-invasive with respect to the monument's architecture being the Soprintendenza's major concern - and the solution required the joint effort of a team of architects, engineers and telecommunication experts. It was decided to demonstrate MUSE inside the museum section called Immagini e Memorie della CittÓ (Images and Memories of the City) along the historical tour segment starting from the XVth Century, as illustrated by the famous Tavola Strozzi, up to the beginning of the XVIIIth Century, with the Veduta della Darsena by Gaspar Van Wittel. Based on this choice, search for documentation and the design of multimedia contents kept museum management busy for approximately two years, requiring the services of several art historians, architects, engineers and computer science experts. The need to get feedback from the audience, as well as the desire to evaluate the communication impact of the project, led Museum management and the design team to set up the questionnaire analyzed in this paper. User satisfaction testing is on-going, and the Soprintendenza recently applied for funding from its local district (Regione Campania) in order to enlarge the museum area provided with mobile multimedia educational support.
MUSE Main Features
The MUSE framework consists of a set of cooperating stationary and mobile devices linked by wire or by air (Malavasi, 2000). The system supports four application channels: an on-site mobile context-aware channel, an on-site stationary channel, a memories channel, and the conventional Web channel. The mobile channel is based on a dedicated device called WHYRE®, designed to act as an interactive, context-aware, multimedia personal guide. The on-site stationary channel is based on large graphic displays located in dedicated halls and controlled by WHYRE®. The memories channel currently is an optional and personalized CD automatically built during the visit and collected by the visitors when they leave the site.
In this paper we concentrate on evaluation of the user experience on the mobile channel. The WHYRE®-based mobile channel is multimodal, in that it supports multiple forms of user interaction, as discussed in Garzotto, 2003. Presently, the following two interaction modes are implemented: Web (or free navigation) mode, pure context-aware mode; and guided tour. At any point, users can switch from one mode to a different one. The Web mode provides the conventional link-based navigation paradigm: through WHYRE® multimedia contents are accessed during the site visit, but are exploited by the user as in any standard Web site. In context-aware mode, museum visitors are dynamically prompted with contextualized multimedia information about the area where they are located and the artwork they are looking at; furthermore, they can get directions on how to continue their tour. Context-aware navigation is enabled by the position and orientation information provided by WHYRE® multi-sensor facilities. In indoor areas, the user's approximate position is detected by a positioning algorithm based on the signals received by the WLAN access points. Visitor orientation is tracked by a combination of inertial and geomagnetic sensors. WHYRE® multi-sensor outfit supports also the guided tour navigation mode, and enables logistic services; for example, self-orienting on the site. Additional services defined within the mobile channel include picture-taking and real time access to dynamic information provided by the curator or by management.
Our Approach To Evaluation
Our evaluation focuses on the WHYRE®-based multimedia application for the Museum and Charterhouse of San Martino in Naples. The current infrastructure and the available multimedia contents cover a significant portion of the Museum (Fig. 2) - a court, two cloisters, and the first nine halls of the museum section called Immagini e Memorie della CittÓ (Images and Memories of the City). But not all of the MUSE services discussed in the previous sections are ready to be evaluated by the public and by the usability experts.
The main goals of our study are to evaluate:
Evaluation is carried out using two complementary methods: user testing and inspection. The term user testing (Dix, 1998; Preece, 1994) refers to any evaluation techniques that involve some representatives of real users , who perform real tasks and operate on a physical artifact, be it a prototypal or final system. User testing is aimed at collecting data about user behaviour, user performance, or user feelings about a system, all of which can then be analysed to identify problems in the system design. User testing techniques differ in the type of data that are investigated and in the way they are collected. Data collection techniques include questionnaires and interviews, focus groups, user observations or tape/audio recordings of user behaviours while on the system, and on-going tools (i.e., automatically recording, at execution time, user actions on the system).
In contrast, usability inspection (Nielsen and Mack, 1994) involves inspectors only, who examine usability-related aspects of a system, trying to detect violations of established usability principles. The inspectors can be usability specialists, or designers and engineers with special expertise (e.g., knowledge of specific domains). Typically, a usability inspection is aimed at finding usability problems in an existing system, and then using this information to generate recommendations for designers, thus improving the usability of the design. Of course, the application of inspection methods relies on a good understanding of usability principles (i.e., how they apply to the specific application to be analysed), and on the particular ability of the evaluators in discovering critical situations where problems occur. Different methods can be used for inspecting an application. Among these, the most commonly used is heuristic evaluation (Nielsen, 1993) in which usability specialists judge whether the system's properties conform to established usability principles or heuristics. Inspection methods also include the cognitive walkthrough (Polson et al., 1992; Wharton et al., 1994), which uses detailed procedures for simulating users' problem-solving processes, trying to see if the functions provided by the system are efficient for users and lead them to the next correct actions; the Pluralistic Walkthrough, in which developers, designers, or human factors people walk through a scenario, discussing usability issues related to the dialogue elements involved in the scenario steps (Bias, 1994).
In MUSE, user testing is adopted to address goals 1, 2, 3 and it is carried on using questionnaires. Inspection focuses on goal 4, and it is performed using a heuristic evaluation technique called MiLE, as discussed in the following sections.
User testing sessions
The questionnaire includes 28 questions (in some cases, binary questions; in others, multiple choices questions or open questions). They aim at different goals:
1. Identifying the user profile (so, indirectly, the museum target)
2. Evaluating the novelty of the proposed approach
3. Evaluating the capability of the approach to modify visitors' expectations, and from there, its potential to create new demand and a new market
4. Evaluating how the proposed development approach fits the specific museum nature
5. Evaluating the WHYRE® device ergonomy (audio quality, screen visibility, shape and buttons design, weight)
6. Evaluating the effectiveness of the multimedia contents
7. Understanding how much a visitor could afford to pay for WHYRE® services.
The user sample consists of museum visitors and persons specifically invited for the test runs (mainly university students in cultural heritage fields).
Five test sessions were carried out before this report was written. The 49 collected questionnaires were divided into two distinct sets: Set 1 includes 32 questionnaires filled out by people visiting the museum, while Set 2 includes 17 questionnaires filled out by invited persons. Since their responses are very similar, we shall concentrate our analysis on Set 1, and eventually briefly summarize the few differences found with Set 2.
Set 1 Responses: Museum Visitors
Goal 1: Identifying the user profile (and, indirectly, the museum target)
Set 1 includes 15 men and 17 women. We feel that most respondents were kind to their questionnaire because they wanted to encourage any initiative in favor of their country, their history and their cultural heritage. In fact, over 90% of the interviewees are Italian (the questionnaire is written in Italian), and 70 % come from Naples. This is probably due to the interview season (November 03, December 03 and January 04, i.e., low season in Italy).
Of the total, 12 interviewees are students, 11 employed people, 7 independent professionals, 1 unemployed and 1 retired. Their ages range from 21 to over 65; nearly 66% are under 40 years old, 25% were over fifty and only 10% in their forties; none is a teenager, the youngest being 21. Their educational qualifications are high; 0ver 85 % either have a University degree or are university students. Over 50% do not work or study in the Cultural Heritage sector: some are physicians, others are, for example, lawyers, architects, legal or financial advisers, artisans and shopkeepers. About 47% of the interviewees already knew the Museum in its current set up, having visited the site at least once during the last five years; therefore they may be considered friends of the museum and its establishment.
Over 95% are interested in technology (computers, mobile phones, ...); therefore, if on one hand, we can trust that the interface evaluation is not affected by prejudice, on the other hand, we must admit that we do not have data on the reaction of people totally unused to Information Society habits. We can conclude that the museum's main target is made up of middle class people with a medium-to-high level of education. Thus it is reasonable to conceive that one of the potential goals of the MUSE approach could be to expand this target to include teenagers and people with a broader education level range and a more international profile.
Goal 2: Evaluating the novelty of the proposed approach
All of the answers to the set of questions related to this goal confirm the novelty of the proposed system: none of the interviewees had seen a similar device before.
Goal 3: Evaluating the capability of the approach to modify the visitors' expectations, and consequently its potential to create new demand and a new market
This is strategic goal which was addressed mainly by the following question: Would you like other Museums to offer this or a similar service? The answer to this question was unanimously affirmative.
Goal 4: Evaluating how the proposed development approach fits the specific museum nature
The answers to the first two questions relating to this goal (Figure 3) show that 84% of the responders consider the visit to the Immagini e Memorie section to be more involving with WHYRE®, and that they would expect a better quality of visit if WHYRE® were available also in other museum sections. These results confirm WHYRE®'s ability to address the specific needs of visitors to this museum and to create new cultural interests and the will for new cultural experiences in the public.
In reply to the question, Does WHYRE® improve the quality of the visit to this Museum?, only 4 out of 32 answers are no. The major reasons for negativity were: it is uncomfortable (2), or it took too much time to understand how it works (1). The last no was given by a student in communications based on the supposition that a user unfamiliar with IT devices could find some difficulties, and that proper training could make heavy demands on museum personnel and slow down museum admission. Luckily, this dangerous assumption was not borne out in the interface evaluation (goal 5); still it needs to be taken into account when planning admission procedures. In any case, 100% of the questionnaires agree with the decision to provide Museum goers with such an innovative service.
Trying to look into the visitors' likes and interests, the last question of this goal asks for a list of the most interesting exhibits. The response: 14 people mentioned La Tavola Strozzi, the most famous painting of the section; 10 people mentioned two XVIIth Century ivory and ebony cabinets (Gli Stipi); and 7 people did not provide any answer. More than 10 other precious exhibits were also mentioned, including the famous Presepi (cribs) still not included in the WHYRE® contents, as well as the panorama of Naples as it may be enjoyed from the Loggia of one of the Museum halls.
An attempt to interpret these answers let us guess that the Tavola Strozzi - a XVth Century view of Naples from the sea, with the Aragonian fleet sailing back to the harbor after winning the Ischia Battle - was chosen not only because of its beauty, but also because WHYRE® offers a large set of presentations, including a first level multimedia introduction and several detailed thematic presentations: one on the culture of the XVth Century in Naples, one on the city structure at the same time, and one on the painting and its history.
The Stipi, on the other hand, were chosen because WHYRE® offers a 3D model enabling the visitor to interactively look inside and outside the cabinets, from any direction (Figure 4).
Even if, in the authors' intentions, the VRML 3D model has only a demonstration purpose and its quality can be definitely improved, still the visitors were fascinated by the internal cabinets' details that cannot be enjoyed without WHYRE®, because the real cabinets are closed and can not be touched.
It must also be mentioned that the 3D model is offered as a second-level detail. When the visitor accesses the Stipi, WHYRE® provides also a first-level audio-video introduction that plays an important role in letting the user discover and understand the Stipi history and meaning - otherwise unreadable by most of the visitors without a suitable multimedia aid.
Since many other exhibits are mentioned in the questionnaire, and all of them but one (i.e. excluding the presepi) are extensively supported by WHYRE®, we may also expect that WHYRE® really helps the visitor in learning, enjoying and remembering the individual exhibits as well as their meaning and their context.
Goal 5: Evaluating WHYRE®> device ergonomy (audio quality, screen visibility, shape and buttons design, weight)
Figure 5 shows WHYRE® worn by illustrious guests at Il Museo e Certosa di San Martino, while Figure 6 shows that both audio quality and screen visibility are considered for the most part good or excellent.
From Figure 7 we get confirmation that the main disadvantage of WHYRE® is its weight, at least as long as it is held with a strap around the neck. WHYRE® is heavy, rather like an old Reflex Camera with a telephoto lens: its weight is nearly three pounds. Even if 55% of the responders found WHYRE®'s weight more than bearable, it should not be forgotten that only 7 out of the 32 questionnaires (22%) explicitly suggested making WHYRE® lighter.
The last question concerning the ergonomy asks the user to provide a few adjectives (maximum three) to describe WHYRE®'s design. Only two people did not answer. Figure 8 shows the resulting list of WHYRE® design qualifiers. It is comforting to note that the responder who used the word heavy joined it to the words useful and communicative.
Based on WHYRE®'s ergonomy evaluation, a redesign of the carry strap is underway, to permit carrying the device free-hand.
Goal 6: Evaluating the effectiveness of the multimedia contents
Showing that the multimedia contents were definitely clear to all of the interviewees, figure 9 is significant acknowledgement to the museum team who believed in the project and put a big effort into content design.
Figure10 shows that 62% of the users enjoyed all the contents format, both textual and audio-video, while 38% liked the audio-video contents better. This response is interpreted by the authors as an indication that the multimedia approach is correct if quality meets minimum requirements; but combining it with written text may lead to an optimum result.
The third and last question related to goal 6 asks visitors which additional type of information they would like to get from WHYRE®: 46% did not answer this question, 13% answered nothing else, while 40% left interesting and somewhat conflicting indications: some, for example, call for fast-guided tours for non experts, while others ask for additional second-level thematic details. Some are interested in additional information about the city, some in the museum structure or its logistics, some in the exhibits themselves, some in the video-clips music. Two asked for more connections to other museums and to city monuments. One reported her desire to know how long the visit was going to last. All of these indications are well focused; their correct interpretation is considered key in planning WHYRE®'s transition from an experimental to industrial state.
Figure 11 illustrates visitor reaction to the WHYRE® interface:
WHYRE® interface is context-aware. For example, figure 11 top is an unsolicited screen telling the visitors that they have entered the Gaspar Van Wittel hall, while Figure 11 bottom shows the screen visitors get when they are in front of the Gaspar Van Wittel's beautiful Darsena, just after their decision to focus their attention on the painting itself. On both screens the interface is basically the same: the top left key activates the short first-level presentation of the focused cultural object (the hall first and the painting afterwards, in this example) while the left side keys lead to their Dublin Core record and to deeper thematic presentations if available on the museum server. Even if 50% of viewers would like to get some written instruction on the screen, the questionnaires show that WHYRE® is a friendly device. In fact, over 90% found that operating WHYRE® is easy or very easy (figure12), and 90% of those who used the SOS key (60% of the interviewees) found the answers provided by WH YRE® useful or very useful.
At the end, the questionnaire leaves some space for hints and comments. Less than 50% took the time to fill this field. Of these, as already pointed out, 7 suggested making WHYRE® lighter. Two explicitly asked for a quick user guide; one recommended improving WHYRE®'s orientation facilities (fig.13).
One person found wearing/using WHYRE® uncomfortable, and one person made the point that it may be difficult to concentrate both on the exhibit and on the related multimedia content. This last is an interesting consideration that should affect decisions about the content structure and its information density. Some asked for additional functionalities, such as a personalized memory of the visit as well as some interaction facilities between groups of users. Both of these functionalities are currently under development. Finally, the most encouraging comment was an invitation to promote WHYRE® and extend its service to other Museums.
Set 2 Responses: Selected Responders
Additional testing was carried out by a second set of 17 university students (one man and 16 women) explicitly invited to the Museum to evaluate WHYRE®(Set 2). All of these students have their curricula related to the Cultural Heritage domain: 5 are graduate students in cultural heritage management (1 man and 4 women), while 12 are undergraduate students in museology (all women).
Set 2 provided feedback very similar to Set 1 with the following small differences: WH YRE®'s weight was found slightly less intolerable, as shown in figure 14 reporting the comments on WHYRE® design provided by this set of young users. Some of the students enjoyed the textual contents more than the audio-video clips; they found WHYRE® impact on the visit quality and its potential even higher than Set 1, and the list of their preferred contents was quite a bit longer. In their final comments, they asked for the promotion of WHYRE® also in smaller Museums, and suggested extending its thematic contents as well as including touristic, logistic and commercial information.Figure 14: WHYRE ® design from the students' point of view
The inspection technique adopted in MUSE is MiLE (Milano Lugano Evaluation Method) (Garzotto, 1995; Di Blas, 2002; Paolini, 2002) - an heuristic evaluation method evaluating the usability of multimedia applications. MiLE separates different dimensions of analysis: navigation, interaction, contents, graphics and cognitive features. For each dimension, MiLE provides a set of usability attributes (i.e., usability heuristics that must be evaluated) and procedural guidelines to carry on the inspection activity, called Inspection Tasks. MiLE recognizes that, in addition to generic usability heuristics defined on the basis of general design principles, the inspection of an application must consider a broader spectrum of usability paradigms that are specific to the type of application (e.g., mobile context-aware application) or domain (e.g., museum). MiLE therefore distinguishes among generic and specific attributes and inspection tasks. Specific usability attributes are identified by analyzing user goals, i.e., the expectations and objectives that users want to meet with a given kind of application, eventually in a specific context of use. They define the properties that the application should have in order to satisfy user goals. Inspection tasks (both generic and specific) define activities that an inspector can perform on an application in order to efficiently evaluate the usability attributes and to detect potential usability problems.
So far, MiLE has been adopted in MUSE to inspect the interaction and navigation aspects of the Museum and Charterhouse of San Martino multimedia guide. In the rest of this section, we discuss the evaluation of the specific usability attributes induced by mobile and location-dependent interaction, which represents the most innovative and original contribution of our work. Indeed, there are few works that have explored the usability of mobile context-aware applications (Bontrager, 2003; Bradley, 2002; Chincholle, 2002; Ciavarella, 2003; Fithian, 2003). All of them adopt pure user-testing methods, and no one has attempted to apply systematic inspection techniques.
The specific usability attributes that we consider (identified through the preliminary analysis of MUSE users and goals) are multimodality evidence, synchronicity, fault evidence, and recovery/task resumability. In the rest of this section, we discuss the meaning of these attributes and the Inspection Tasks (ITs) provided by MiLE to help inspectors to evaluate them. We also report the results of the evaluation of these attributes in the MUSE system, as defined during the inspection sessions carried on in the Museum and Charterhouse of San Martino.
For scoring each attribute, the reference range is 0 - 10 where:
As discussed in a previous section, MUSE is multimodal in that it offers various interaction modes. In the pure context-aware mode, museum visitors are dynamically prompted with contextualized multimedia information about the area or room where they are located and the artwork they are looking at. In the guided tour mode, the application guides the users along a path on the site they are visiting, pinpointing the current position on a map and the directions to continue the visit. The Web, or free-exploration mode, is a conventional hyper-textual navigation - users can navigate from the current presentation of the current topic to multimedia details of the same topic or to related subjects. Different modes provide different degrees of user control on the application. In the context-aware mode and the guided tour mode, the application has total control, based on the user's current position. In the Web mode, what is presented on the device is totally under the user's control and is not affected by the current position in the physical space. There are a number of usability issues related to multimodality: Is it evident for the user what the current mode is? Can the user easily understand what can or cannot be done while a specific mode is active? Are the mode transition mechanisms clear?
MiLE Inspection Task for Multimodality effectiveness (IT1)
Evaluation results for MUSE: 7