Evaluating the Practical Applications of Eye Tracking in Museums
Edward Bachta, Robert J. Stein, Silvia Filippini-Fantoni, Tiffany Leason, Indianapolis Museum of Art, USA
Abstract
Funded by an Institute for Museum and Library Services Sparks! Ignition grant, the Indianapolis Museum of Art is exploring whether or not eye-tracking technology can be useful to museums seeking to better understand how in-gallery visitors actually “see” the objects in our collection. Through a set of three experiments, the project seeks to understand whether eye tracking can be used to measure visitor attention to artworks, understand the correlation between guided interpretation and visitor comprehension, and to trigger interpretive content delivery. In this paper, authors will review the relevant literature in the field that connects gaze detection and cognition; explain in detail the experimental methodology used in the first experiment to determine the practicality of adopting these techniques in museums; and report on initial conditions and factors discovered during the project’s initial research.
Keywords: eye tracking, gaze tracking, visitor attention, user interface, research
1. Introduction and background
Every year, museums welcome millions of visitors to galleries and exhibitions hoping that many of them will find meaningful experiences there that will help them understand the world in new ways. Museum staff spends enormous amounts of time and effort studying the ways visitors experience works of art in their collections, hoping to increase the visitor’s engagement with them. Research conducted at the Indianapolis Museum of Art (IMA) and at other museums based on Falk’s model (2009) of identity-related motivations for visiting shows that visitors to art museums see them as places for inspiration and contemplation, among other things. However, it’s difficult to get to a more concrete understanding of what aspects of a patron’s visit are inspiring and how museums can actively promote and encourage those experiences without digging deeper.
While some visitors clearly have deep and engaging experiences, research shows that the average visitor spends only seconds in front of a work of art. In Learning in the Museum, Hein (1998) states:
Empirical data supports the view that visitors spend little time at individual exhibit components (often a matter of a few seconds and seldom as much as one minute); seldom read labels; usually stop at less than half the components at an exhibit; are more likely to use trial-and-error methods at interactive exhibits than to read instructions; that children are more likely to engage with interactive exhibits than adults, and that attention to exhibits declines sharply after about half an hour.
In fact, studies of 150 visitors at the Metropolitan Museum of Art (Smith, 2001) found a mean time of less than 30 seconds to be typical, with most spending significantly less time. Worts (2003) summarizes this behavior as “grazing” and theorizes that the pattern may arise from a mismatch in the goals of curators and visitors:
Audience research across the field commonly reveals the characteristic behavior of ‘grazing’ – or wandering slowly past many artworks, spending only seconds looking at any work in particular. It is relatively rare to watch a visitor spend more than a minute with any individual artwork.
These texts have also motivated IMA’s own examination of viewing patterns in the permanent collection galleries in a multi-year effort called the Viewing Project (http://www.imamuseum.org/art/research/viewing-project), which seeks to encourage active looking to support visitor creativity and engagement, and to present objects from the permanent collection in new ways.
Evaluations from the project’s installations studied in-gallery viewing behaviors and found that “time spent looking” at an artwork resulted in median times between 4 and 31 seconds covering ten different installations. While improvements in engagement have been realized for some of the Viewing Project installations, a quantitative link between looking and engagement remains elusive, and measuring that “time spent looking” is a detailed and time-intensive human process.
Research by Housen (1999) and Yenawine (1997) asserts that deep looking, and the critical thinking that results from facilitated conversation about works of art, results in richer connections and increased engagement with those works. This deep looking is therefore an important skill to encourage in our visitors—but looking at what and in what context? Can understanding what a visitor actually sees when he or she looks at a museum object help us to provide more engaging experiences for that visitor, inform the creation of gallery interpretation, and have implications for the design of those in-gallery experiences?
Potential for eye tracking
Techniques for measuring gaze have been an important part of cognitive psychology and many other fields of study since the early 1960s. Environmental scans by Rayner (1978, 1998) summarize the scope and evolution of research linking eye tracking and cognition. Agreement in the research suggests that gaze and attention are tightly coupled (Hoffman, 1998), implying a direct relationship between how we look at museum objects and our thinking about them. Automated scientific equipment for eye tracking became more widely available in the 1970s but involved complex and expensive hardware and often constrained the user’s head movement. More advanced eye-tracking systems developed later were head-mounted and worn like goggles or glasses. These systems allowed users to move their heads freely and supported a more mobile study of eye tracking. While these systems were an important improvement over immobilizing the user’s head, they still required visitors to wear detailed calibration and cumbersome equipment to.
Research by Wooding (2002) examines the use of eye-tracking systems and art from the collection of the National Gallery in London. While the data seems promising, Wooding’s work focused more on a generalized method for visualizing eye-tracking data and not on specific applications of these techniques for art history or museology. Milekic (2010) published an overview of gaze tracking and its potential applications for museums, highlighting the advent of cheap and commercially available equipment to support the use of these tools in a museum setting.
While still somewhat expensive, eye-tracking technology offers the potential for museums to directly study what our visitors are looking at when they spend time with a work of art. Future eye-tracking technology will likely include software-only systems that will run on common laptops and desktop computers. Several academic software tools already exist that attempt to track gaze in this way. These systems are still largely experimental at this stage and lack the accuracy and ease of use for routine deployment in galleries.
Museums now have the opportunity to explore and model a number of ways in which eye-tracking techniques can be used to improve visitor experience, allowing them to exploit those advances as hardware costs continue to fall and software-based systems become more common. Eye tracking has the potential to transform the ways we understand visual processing in the arts and at the same time offers a direct way of studying several important factors of a museum visit.
Project design
Seeking to explore useful and practical means of applying eye-tracking technology to common problems faced by museums, the IMA proposed a series of three experiments to be conducted as part of a research project. The project’s work is funded in part by a 2010 Sparks! Ignition grant from the Institute for Museum and Library Services.
Staff from the museum’s technology, audience engagement, and media departments collaborated on the design, creation, execution, and evaluation of each experiment. This paper will describe the instrumentation to be used for these experiments and the methodology, analysis, and summary of the first experiment.
Museums have many different ways to measure attendance in our galleries. From hand clickers to beam counters and even thermal cameras, museums are quite sophisticated in how we count feet through the door; however, museums have made little progress towards understanding just what those visitors do once they enter that door. As we learned earlier, museums are already studying the amount of time visitors spend with works in their collections, but these studies require a set of observational rubrics that are labor intensive and prone to human error. The ability to measure the attention of visitors automatically in front of a museum object would be a transformational metric for gallery design, collection management, and interpretive development in museums. As a step in this direction, the goal of the first experiment is to assess whether an eye tracker can be used to accurately measure the amount of time a visitor spends looking at a painting.
2. Instrumentation
In determining which hardware technology to use for our research, head-mounted trackers were ruled out because one of the project goals is to determine whether an eye-tracking system can be used to detect the gaze of a visitor without encumbrance or requiring calibration. Of the models available, the EyeTech Digital Systems VT2 eye tracker was chosen for the experiments (http://www.eyetechds.com/). This device is designed to sit on the included stand or be mounted on the bottom of a display. During operation, the eye tracker emits infrared light toward the viewer and captures a series of images (called frames) with a camera. Each frame is analyzed by the tracker to determine whether a pair of eyes is looking toward the device. When the tracker is calibrated with a reference rectangle, it is able to compute gaze coordinates relative to that area. The software library supplied can report these and other parameters during operation.
Figure 1: The EyeTech Digital Systems VT2
The device was tested prior to planning experiments to determine the design constraints for setup in the gallery. Due to the attenuation of intensity from the infra-red emitters and the characteristics of the lens, the tracker has an ideal viewing range that was found to be approximately 25 inches from the front panel. It also is unable to detect the eyes if they are looking too far past the left or right edges of the device (the field of view is about 36 degrees at this distance), or too high above the device.
Unfortunately, it is not reasonable to attempt to detect the gaze at standing height with these limitations, as variations in height would likely place the eyes in a position that cannot be tracked. As a result, it was decided to have participants sit during the experiment to reduce variation in the position of the eyes. It is also important to consider that the area to be viewed must lie within the field of view that can be tracked once the tracker has been set up.
3. Experiment One
After considering the limitations of the device, the objective of this experiment was refined to assessing how accurately the amount of time a seated person’s eyes spend looking at a work of art can be derived by using an eye tracker. In a live deployment, it would be ideal to avoid disrupting a visitor’s normal patterns of viewing the art by requiring calibration, so a key component of the evaluation is to test whether the device performs well when not calibrated for each participant.
Figure 2: Edward Hopper, American (1882–1967), Hotel Lobby, 1943, oil on canvas, 32 1/4 x 40 3/4 in. (image),
40 1/2 x 48 1/2 in. (framed), William Ray Adams Memorial Collection, 47.4 © Edward Hopper.
Edward Hopper’s Hotel Lobby was chosen for this experiment based on the parameters determined during preliminary testing. The eye tracker was set up in the gallery and calibrated once to register the position of the painting. The person for whom this calibration was performed was also a participant in the last session of the experiment.
The experiment consisted of two phases. In phase one, the seat was placed at the distance determined to be ideal based on the preliminary tests and was not moved from that position. In phase two, before beginning the session the location of the seat was adjusted for each participant to test whether results could be improved without tilting the camera. Twenty-two IMA staff participated in the experiment, representing a range of heights, and some wore glasses. Ten people participated in phase one of the experiment, and twelve people participated in phase two.
Figure 3: A session underway in the gallery
Subjects’ standing and seated heights were measured, as well as the distance from the floor to mid-eye level when seated. Over a period of sixty seconds, participants were asked to look in the field of the painting including the frame (referred to here as “in bounds”), then outside, and then directed back inside. Simultaneously, two research assistants observed the participants and used stopwatches to manually track time of gaze in bounds. These recorded times were then averaged in order to compare to the time tracked by the device to gauge accuracy. Research assistants also noted any movement by the participants.
In phase two, the procedure was as detailed above, but if the device did not pick up participants’ gazes when first getting into position, participants were asked to move the seat until they could see their eyes reflected back in the device and the system reported that it could detect the eyes fairly consistently. The distance from the device to the new position of the seat was recorded, and the data recording session for the participant began. A Python script was written using the C API provided by EyeTech to record various data reported by the tracker for future analysis. The script produced auditory feedback (which was output only to the computer operator via headphones) and a simple summary at the end of each session so that the process could be monitored.
Some participants from phase two were also asked to take part in a secondary experiment in which the ability of the device to accurately track gaze location when not calibrated for each viewer was assessed. Participants were instructed to look at six different areas within the painting (i.e., the blond woman’s hair, the man behind the desk, the old man’s face, the old woman’s shoes, the old woman’s hat, and the painting on the wall) for 10 seconds each. Tracker data was logged in the same manner as the other sessions.
4. Analysis of results
Gaze-duration measurements
The logs generated during the experiment were first processed using Python scripts to calculate aggregate figures. In this aggregation, it was assumed that the eyes were looking at the location reported for frame n for the duration between frame n-1 and frame n, rather than interpolating. The calculated figures were then compared with the manually recorded data as outlined below.
Quantity of valid gaze data
The first measurement considered was the amount of valid gaze coordinate data that the tracker was able to collect during the session. If the tracker was unable to detect eyes in the frame, it was flagged as being an invalid frame in the reported metadata. To calculate the total period during which valid frames were captured, the sum of the intervals between each frame with valid gaze data and their previous frame was computed (i.e., each frame was taken to represent the interval of time since the previous frame).
Figure 4: Valid gaze data analysis results
The figure above plots the difference between the total valid gaze period described above and the time spent looking in bounds as manually recorded. The results are normalized as a percentage of the session time. Positive results indicate that when instructed to look outside the painting, the participant may have still looked close enough to the painting that their eyes were detected by the tracker. Negative results indicate that invalid frames were reported during a greater amount of time than the period in which the participant was asked to look away from the painting. For reference, the average manually observed time spent looking out of bounds was 23 percent.
Overall, invalid frames surpassed the amount of time spent looking away by 20 percent of the session time for six of the participants. While this was the case during phase one (participants 1-10) for four people (in fact, no valid frames were reported during two of these sessions), an improvement can be seen in phase two (participants 11-22), where the seat position was adjusted.
Comparing the quantity of valid gaze data to seated eye level
The duration represented by invalid frames surpassed the amount of time spent looking away by over 40 percent of the session time for four of the six participants whose eye level was below 50 inches from the ground. The tracker was not able to register any valid data for two of these participants in phase one, and two participants in phase two had difficulty getting into a good position. There did not appear to be a correlation to the quantity of valid data for seated eye levels between 50 and 53.75 inches (the highest seated eye level in the study).
Comparison of tracker data and manually recorded data
The amount of time that the participants spent looking in bounds was calculated based on the raw data as described above for measuring the amount of valid data, but only including frames where the tracker data indicated that the gaze was within the outside edge of the frame. This value differed from the average manually observed time by less than 5 percent of the session time for five of the twenty-two participants. For nine participants, the tracker data differed from observation by over 20 percent of the session time.
Figure 5: Results from the raw tracker data
In an attempt to improve this performance, a gap-handling algorithm was implemented that included the period between a valid frame and the next valid frame if the gaze was directed in bounds in both frames, and if the frames occurred within a given period of time. This algorithm was applied to the recorded data with thresholds of 100 milliseconds, 500 milliseconds, and 1 second.
Figure 6: Results from the gap-handling algorithm
The algorithm appears to improve the accuracy if a threshold between 100 and 500 milliseconds is used. With the improvement seen when applying the 500-millisecond threshold, the difference from the manually recorded time comes within 5 percent of the session time for nine of the participants, and within 10 percent for half of the participants.
When participants were asked to adjust the position of the seat in phase two, the tracker produced results that more closely matched observation. An accuracy better than 5 percent was reported for five phase two participants, versus none for phase one. Improvement can be seen with the gap-handling algorithm applied as well.
It can also be seen that the last participant, who calibrated the system, was among those for whom the system was very accurate. This provides confidence that the tracker was not nudged out of alignment during the experiment.
Considering glasses
Ten of the twenty-two participants wore glasses. Of the seven for whom the raw tracker accuracy was within 10 percent, only one used glasses, and this was the person for whom the device was calibrated. Of the eleven for whom the accuracy was within 10% after applying the 500-millisecond threshold, only three wore glasses.
Gaze location measurements
Results for the calibrated viewer
To analyze the data from the secondary experiment, the gaze points from each session were plotted using a scatterplot technique. Shown below is the scatterplot for the participant with whom the system was calibrated:
Figure 7: Gaze points for the calibrated viewer
When the gaze data is valid, the location of the gaze is reported as an x-y pair, normalized such that (0,0) corresponds to the upper left of the calibration boundary and (100,100) corresponds to the lower right. For these experiments, the system was calibrated such that the boundary corresponds to the outside edge of the frame of the painting. The image below shows the scatterplot superimposed on a photograph of the painting, such that the gaze locations appear brighter:
Figure 8: Composited gaze points with calibration
As can be seen, the reported locations match with the actual locations that the participant was asked to look at: the blond woman’s hair, the man behind the desk, the old man’s face, the old woman’s shoes, the old woman’s hat, and the painting on the wall. The distribution of points in each cluster is roughly 5 percent of the width and height of the painting, which at the distance used in the experiment (70 inches from painting to tracker, and about 25 inches from tracker to eye), corresponds to 1.5 degrees of the field of view. As described by Rolfs (2009), the eye tends to microsaccade once or twice per second when fixating on a point, and it is common to use a threshold of 1 to 2 degrees to distinguish between these involuntary eye movements and voluntary saccades. Considering this, the results for the calibrated viewer are about as good as can be expected.
Results for uncalibrated viewers
Two approaches were used to compare the results for the calibrated participant with the others. The first was a visual comparison using the same scatterplot technique. Shown below is a composite of a typical uncalibrated participant’s results with the photograph of the painting.
Figure 9: Composited gaze points without calibration
The result shows that the horizontal component seems to be fairly accurate, but that the system has perceived the participant to be looking significantly lower than he or she actually was. There were no cases where the gaze was perceived to be higher. There was also some variation in the distribution of the clusters, but because this may be due to the viewer choosing to look fixedly at one point or at the general area described by the instruction, it is not possible to evaluate the precision.
To quantify the accuracy, a second approach was taken. Using the R statistical analysis package, the gaze points from each session were clustered with the k-means algorithm into seven groups (an extra group was required to contain invalid points reported at 0,0). Plots of the output validate that the algorithm identified the appropriate clusters.
Figure 10: K-means clustering
The centroids of the clusters for the uncalibrated participants were then compared with those for the calibrated viewer. Reported across all uncalibrated viewers as a percentage of the respective dimension of the painting, the horizontal component differed by 3.9 percent on average, and the vertical component differed by 24.4 percent on average. There was also variance between people with a similar seated eye level. Interestingly, the range of the vertical component of the cluster centroids also differed significantly between viewers. The range of the calibrated user was [37.9,63.8], or 25 percent, which corresponds to the proportion of the painting in which the gaze targets lie. The widest range among the other participants was 39.2 percent, while the narrowest was 15.2 percent. This means that it would not be possible to correct for the error in vertical position with only a translation operation (i.e., adding a constant value to the coordinate).
5. Summary of results
From the preliminary tests alone, it is apparent that the device used in the experiment is not capable of tracking the gaze for casual visitors walking through the gallery. The constraints on viewing distance and angle make it unlikely that the data recorded would correlate well with the attention paid to an object by the average visitor.
Analyzing the results of the experiment tells us more about the potential for learning from visitors who may casually sit in front of a tracking system while viewing an object (i.e., without requiring calibration). The results indicate that in general, the tracker was unable to continuously capture gaze location data for the duration of each session. Allowing the viewer to adjust the position of the seat improves the accuracy of computed gaze duration within a boundary, and for some participants the results were quite good. However, it does not appear possible to measure this reliably for all viewers. When having participants sit, there is still enough variation in height that the eyes may not be in a good position for tracking, and it is likely that the frames of glasses interfere with the ability of the tracker to detect the eyes.
From these results, it would seem that perhaps by giving the viewer more instructions to pay attention to his or her position while adjusting the height of the seat, it would be possible to improve the performance of the system. However, the gaze location results show that without being calibrated, the eye tracker would not be able to determine whether the gaze has crossed the upper or lower boundary of the physical target area. Because the horizontal measurement does appear to be accurate, it may be interesting to consider whether this is an issue in the gallery setting, as it is not common to place paintings above and below each other.
In conclusion, we find that it is not possible to compute gaze duration accurately within a specific zone without calibrating the tracker for each viewer, and it may not be possible even if calibrated unless an algorithm is applied to handle gaps in the data. Analyzing the results of the secondary experiment has shown that the tracker was able to report gaze location accurately in a gallery setting for the viewer who performed calibration. For uncalibrated viewers, the system was only accurate in tracking the horizontal component of gaze location.
6. Future work
To prepare for the second experiment proposed in the grant, a phase will be included in which each participant will first calibrate the system and then look at a set of points. The procedure will be similar to the gaze-location experiment described earlier and more precisely identified points will be included. The results of this phase will be analyzed both to compare with the results of the first experiment and to verify that the gaze location data will be accurate for each participant in the main phase of the experiment. If the reported gaze coordinates are still found to be unreliable after calibrating for each person, the plan for experiment two will be modified.
In the main phase of the second experiment, the eye-tracking equipment will be used to monitor a user’s gaze during a typical Visual Thinking Strategies (VTS) session facilitated by an IMA staff member. Using the VTS method, educators regularly engage groups in interactive discussions seeking to draw out visitor thoughts regarding a work of art. While such discussions often provide unique insight into an individual’s thoughts about a work of art, direct measurements of the connection between viewing and thinking are often difficult and subjective.
A video recording of the session will be made and synchronized with the data stream from the eye-tracking hardware, allowing museum staff to examine the connection between gaze and response. The experiment may reveal a valuable new technique to study the ways in which visitors with varying levels of art experience approach artworks in museum collections.
7. Acknowledgements
We would like to thank the representatives from EyeTech who informed us about the capabilities of the tracker and spent a day familiarizing our team with the system. We would also like to acknowledge the researchers and engineers who have brought eye-tracking technology to the point where cultural institutions can begin to consider its use in evaluation.
8. References
Falk, J.H. (2009). Identity and the Museum Visitor Experience. Walnut Creek, California, USA: Left Coast Press.
Hein, G.E. (1998). Learning in the Museum. London: Routledge, 138.
Hoffman, J.E. (1998). “Visual attention and eye movements”. In H. Pashler (ed.), Attention. Hove, UK: Psychology Press, 119–154.
Housen, A. (1999). “Eye of the Beholder: Research, Theory, and Practice.” Presented at the conference of Aesthetic and Art Education: a Transdisciplinary Approach. September 27–29, 1999, Lisbon, Portugal. Consulted January 20, 2012. Available at: http://www.vtshome.org/system/resources/0000/0006/Eye_of_the_Beholder.pdf
Milekic, S. (2010). “Gaze-Tracking and Museums: Current Research and Implications.” In J. Trant & D. Bearman (eds.), Museums and the Web 2010: Proceedings. Toronto: Archives & Museum Informatics. Consulted September 28, 2011. Available at: http://www.archimuse.com/mw2010/papers/milekic/milekic.html
Rayner, K. (1978). “Eye movements in reading and information processing.” Psychological Bulletin. 85(3).
Rayner, K. (1998). “Eye movements in reading and information processing: 20 years of research.” Psychological Bulletin. 124(3).
Rolfs, M. (2009). “Microsaccades: Small steps on a long way”. Vision Research, 49. Elsevier Ltd., 2415–2441. Consulted January 29, 2012. Available at: http://www.martinrolfs.de/Rolfs_MicrosaccadeReview.pdf
Smith, J.K., & L. F. [hc1] (2001[EB2] ). “Spending Time on Art”. In Empirical Studies of the Arts, 19(2).
Wooding, D. (2002). “Fixation maps: quantifying eye-movement traces.” In Proceedings of the 2002 symposium on Eye tracking research & applications (ETRA '02). New York, NY, USA: ACM, 31–36.
Worts, D. (2003). “On the Brink of Irrelevance? Art Museums in Contemporary Society.” In L. Tickle, V. Sekules, & M. Xanthoudaki (eds.), Researching Visual Arts Education in Museums and Galleries: An International Reader. Dordrecht, Netherlands: Kluwer Academic Publishers.
Yenawine, P. (1997). “Thoughts on Visual Literacy.” In J. Flood, S.B. Heath, & D. Lapp (eds.), Handbook of Research and Teaching Literacy through the Communicative and Visual Arts. Consulted January 20, 2012. Available at: http://www.vtshome.org/system/resources/0000/0005/Thoughts_Visual_Literacy.pdf