Join our Mailing List.
Published: March 1999.
Meta-Data Resource Discovery and Educational Information on the Internet:The Gateway to Educational Materials (GEM) ProjectCarrie Lowe, ERIC Clearinghouse on Information & Technology, USA
IntroductionThe growth of the Internet, and specifically, the World Wide Web (web), has provided users with access to an unprecedented amount and variety of information. Twenty-four hours a day, seven days a week, anyone with a computer, dial-up access, and a certain amount of searching savvy can access information on virtually any topic.
Just as the Internet has increased access to information, it has also created a publishing explosion. The production and publication of Internet resources is not limited to publishing houses or even commercial elements. It is now possible for any individual with server space allotted by his or her ISP to create and maintain Internet resources on any topic.
At best, the Internet in its current form is the worlds' greatest library; providing a wealth of information on any topic to any person with just a few keystrokes at any time, day or night. At worst, the Internet is a confused mess of worthless resources wasting server space and quality resources that are impossible to find because of the insufficiency of available searching technologies. The truth lies somewhere in between.
The Gateway to Educational Materials (GEM) project is solving this resource discovery problem using meta-data. GEM allows educators to bypass frustrating search engines and connect directly with a wide range of educational resources through The Gateway. 1999 marks the beginning of the third year of the GEM project, and the second year that The Gateway has been accessible to users through the web.
Meta-data is often cited as the technological development which may someday help information professionals to make sense of the Internet. GEM provides an example of the challenges and successes presented by a wide-scale meta-data effort; challenges for the project's developers, the collection holders, and the end-users.
The Gateway to Educational Materials (GEM) ProjectThe GEM project, a consortium effort funded by the U.S. Department of Education's National Library of Education and a special project of the ERIC Clearinghouse on Information & Technology (which also runs the AskERIC project, a well-known question and answer service for educators), seeks to make finding educational materials on the Internet faster and easier. The GEM Consortium includes both user groups and collection holders.
The GEM Project has several components. The term GEM (www.geminfo.org) refers to the project itself, including all of its parts. The Gateway (www.thegateway.org) is the catalog of educational resources, the final product of the GEM project (see Figure 1). The GEM Profile refers to the meta-data elements which have been chosen to describe the resources in The Gateway. GEMCat is the software developed by the GEM research team, which is used to create and imbed meta-data elements. Each of these terms will be explained and explored in more detail below.
GEM's goal is to provide easier access to the distributed collections of lesson plans, activities, and other education resources found on commercial, nonprofit and government Internet sites. Organizations join GEM as collection holders or user groups. Collection holders are trained to "catalog" their collections (create and imbed meta-data), and are responsible for maintaining their collections and publicizing the project. User groups are also responsible for publicity, and provide the GEM team with feedback to help us improve the tools. It is important to note, however, that it is not necessary to join the GEM consortium to use The Gateway to access educational materials. Consortium membership is only for user groups who would like to take part in governing the project and for collection holders who would like to make their collections available through The Gateway.
User groups and collection holders benefit from joining the GEM consortium in several ways. For user groups, making members aware of The Gateway means helping them in their quest to pinpoint the types of educational materials they seek. User groups are also able to participate in governance activities to influence the development of GEM. For collection holders who make their educational materials available through The Gateway, this means their target audience is able to find their resources more easily. This is especially true for organizations with small collections of educational resources which may not be well publicized, such as those belonging to museums.
There are several technologies in the GEM toolbox which have contributed to its success (Figure 2). The first is GEMCat, the GEM meta-data cataloging software. With GEMCat, catalogers create meta-data records (the individual packets of meta-data which describe and point to objects, something like a catalog card) for Internet resources. The meta-data is imbedded in the resource itself, or saved to a separate file if necessary. GEMCat is available free of charge from the GEM Developers' Workbench; support and training materials are also available.
Whether or not GEM meta-data can be imbedded directly into a resource depends largely on what type of resource it is. If the resource is an HTML document, then imbedding is no problem, since meta-data tags can be put into the document's header without altering the resource. However, if it is a visual or sound resource, then it may not be possible to imbed meta-data without changing it in some way. In this case, is it necessary to create a separate meta-data record which points to the resource.
After the meta-data is imbedded or saved to a separate file, the Harvest program is run to gather and compile each meta-data record at a particular site and add the list to The Gateway. This step in the process is essential; Harvest makes a copy only of the meta-data record, not of the resource (lesson plan, curriculum unit) itself. Harvest collects all of the GEM records together at a central location, forming The Gateway.
The third GEM tool is Browse Builder. Browse Builder uses the meta-data records collected by the Harvest program to create simple HTML pages, which are the individual records in The Gateway (Figure 3). Users of The Gateway can search for educational materials to meet their needs using the PL Web searching software, or they can browse The Gateway by keyword or subject.
One aspect of GEM that sets it apart from similar projects is the emphasis on training and support. The GEM team includes several training specialists who conduct training sessions with the catalogers of collection holders to help them prepare to catalog efficiently using GEMCat. GEM has also prepared training materials which are available online. Users of The Gateway and collection holders have access to a toll-free support line, as well, which they can use to get help with any part of the GEM Project.
The Importance of Meta-dataMeta-data is structured information that describes, manages, and organizes Internet resources. The easiest way to understand this concept is to think about a library catalog card. The card describes a resource (it may be a book, videotape, or CD), listing its title, author, location, and other information. Catalog cards can then be filed, and used to locate resources throughout the library. Meta-data works much the same way. It can be used to describe an object on the Internet (such as an HTML document), and much like in a card catalog, can be used to pinpoint information on the Internet (Lowe, 1999).
Not all meta-data is the same. Meta-data description lies upon a continuum, ranging from the very detailed to the simple and bare. One example of highly detailed meta-data is MARC (MAchine Readable Catalog). MARC allows catalogers to create a complex and complete portrait of an item. Cataloging in MARC requires trained professionals who are familiar with the scheme's standards and syntax. Miller (1996), describes MARC as "expert" meta-data, requiring special knowledge to create and use.
Meta-data can also be extremely simple. Many Internet search engines, including the extremely popular AltaVista, are able to read meta-data tags that some web authors include in their resources. These meta-data tags, which are imbedded into the header of the document, give only a partial portrait of the document they describe. Many search engines recognize the KEYWORDS meta-data tag, and can use these keywords to rank a search.
In order to organize the vast Internet, a new approach to meta-data was needed. Clearly, the overly simplistic approach to cataloging taken by many search engines would not give a complete picture of the document being described. Conversely, it is unreasonable to expect web authors untrained in cataloging to use a complex approach like MARC. What was needed was a meta-data cataloging scheme which would allow untrained catalogers to create useful cataloging records for documents on an endless variety of documents. Besides being a more useful tool for authors and catalogers of web documents, a new meta-data approach would also need to provide enough information to be useful for people searching for information; it would need to allow them to access enough information about a document to make an informed decision about whether or not they should retrieve the item.
The solution to the problem of meta-data on the Internet arrived in the form of a fifteen element set known as Dublin Core. Says Clarke (1997), "The Dublin Core's purpose is to enable searching in a more sophisticated manner than mere free-text indexing and search engines can support, without requiring professional cataloging effort to be invested." Dublin Core was created at the 1995 OCLC/NCSA Metadata Workshop by a group of interested stakeholders, and subsequent Dublin Core meetings have refined and further developed the element set.
The Dublin Core was designed to be useful for "document like objects" from nearly any subject area (Heery, 1996). The Dublin Core allows for materials of a range of formats (PDF files, sound files, HTML documents); it is also general enough to accommodate nearly any intellectual topic. The Dublin Core is:
The Dublin Core itself is too general to describe resources in many subject areas, including education, in a useful way. Miller (1996) writes:
The [fifteen] elements of the Dublin Core are not capable of describing all eventualities. If the core element set were extended to attempt this, it would rapidly become large and unwieldy, and ultimately one of the incomprehensibly complex meta-data schemes that Dublin Core was created to avoid.This represented a problem for the creators of Dublin Core. They needed to find a way to allow Dublin Core to adequately describe resources in a variety of subject areas, while retaining its essential simplicity.
The solution to this problem came in the form of the Canberra Qualifiers, released after a subsequent Dublin Core meeting. The Canberra Qualifiers allow other elements to be added to the core elements, and its existing elements to be expanded to better describe information resources from various subject areas (Sutton, in press). The GEM Project utilized the flexibility added to the Dublin Core through the Canberra Qualifiers in the creation of its element set.
GEM added eight meta-data elements to the Dublin Core which are tailored to the information needs of teachers. The elements include description elements, evaluative elements, and a meta-meta-data element (Sutton, in press).
The five descriptive elements include information which is important to educators in evaluating educational resources, but is not included in Dublin Core. The descriptive elements are:
By combining the Dublin Core element set and the GEM elements, a tool is created which allows catalogers to create a complete description of an educational resource on the Internet - the GEM meta-data profile.
Research Foundations of GEMFrom its genesis, GEM has been a project grounded in research. The GEM elements that were added to the Dublin Core element set were chosen as a result of research conducted by the GEM team.
In order to create a useful Internet resource for educators, several basic questions needed to be answered. These were (Small et al., in press):
To explore the question of what types of educational resources are available on the Internet, the research team began with a sample of 95 teaching resources. A content analysis was then performed on these resources to identify the types of educational materials found on the Internet, and the information elements they contain.
Despite the small sample size, the results of this study gave the researchers a general idea of the scope of educational materials on the Internet. The researchers found that 76% of the teaching resources were lesson plans, 23% of the resources were classified as unit plans, and 1% were activities. Although the researchers realized that this sample did not represent the full scope of educational materials on the Internet, lesson plans appear to be one of the most common types of instructional materials available. This gave the researchers an idea of how to describe the types of educational materials available on the Internet.
Identifying the information elements of the sample of Internet-based education resources was an important part of the development of the GEM project. Before it is possible to classify resources by the information contained within them, the kinds of information present within them must be known. This content analysis allowed the researchers to get a picture of the types of information found in educational resources on a wider scale.
To explore the information needs of teachers and the ways they satisfy those needs, the researchers turned to one of the most widely-used sources of educational information - the AskERIC question and answer service for teachers. AskERIC is the preeminent source of educational information on the Internet; on average, it receives over 900 questions per week. AskERIC provided the researchers with a way to unobtrusively observe the information-seeking behaviors of teachers.
The AskERIC service receives questions via E-mail from teachers looking for educational information. The researchers performed a content analysis of 161 AskERIC questions to get an idea of the information needs of K-12 educators. The research revealed that the queries posted by teachers contained two different types of information; asked-for information and known information (Small et al., in press). The content analysis revealed that the most frequently asked-for information is resource type, with lesson plans being the most frequently sought in that category (33% of the time). The most frequently known elements were subject area, grade, and topic. Grade appeared as both a range and as a specific level. This information was extremely significant to the GEM research team, since it informed them that, to meet the information needs of teachers, GEM would have to provide access to lesson plans.
The way that users of the AskERIC service formulate their questions (the pieces of information that they consider important in meeting their query) gives an indication of the way that they search for information in general. Teachers feel that the most important pieces of information they have in satisfying their need are resource type, subject area, grade, and topic. This information contributed directly to the creation of the GEM meta-data elements.
The GEM team also gathered information by interviewing educators. These consultations occurred in two phases; the first, a series of semi-structured interviews, and the second, an on-line questionnaire. Five educators were interviewed about their information seeking strategies. They were asked about the ways that they search the Internet, the resources that they rely upon on-line, the elements within resources which they consider to be most important, and whether the list of elements produced by the content analysis are complete.
The results of the interviews were also used to develop an online questionnaire for AskERIC users. The questionnaire consisted of 32 items, with 21 multiple choice and Likert scales (questions that ask subjects to evaluate an item on a sliding scale) and 11 open-ended questions. The questions covered many of the same topics as the interviews and content analyses: demographic information, Internet access, preferred sources of information for planning curriculum, information search processes, and important elements of lesson-planning resources.
The questionnaire was emailed to a sample of people who had used the AskERIC service in a particular three-month period. AskERIC users were chosen because they are likely to be involved in some way with K-12 education, and are also somewhat familiar with technology. Of the 2,135 people questioned, 260 were returned and judged by the research team to be valid (23 were found to be invalid because the respondent was not a K12 educator).
The results of the questionnaire create an interesting portrait of educators in the information age. Teachers reported that they most often used non-human channels (databases, textbooks, Internet resources, magazines) for lesson planning, followed by human channels (librarians, colleagues, family and friends). Teachers rely on group forums (bulletin boards, listservs, workshops) less often than other sources. It is important to remember that these results may not give an accurate picture of the use of technology by K-12 teachers, since the recipients of this questionnaire were chosen through their use of AskERIC, an online service.
The majority of the respondents reported experiencing a moderate degree of difficulty in the Internet search process. Math/science and social studies instructors experience the greatest ease in their search experiences, while pre-K teachers have the most trouble. Teachers in all subject areas were able to find at least some of what they were looking for on the Internet, with social studies teachers experiencing the greatest success.
The final section of the questionnaire asked teachers for feedback on the meta-data elements selected through the content analysis. Respondents replied that the elements they consider to be most important when they are evaluating a resource are topic and subject, with author description and publisher considered the least important (Small et al., in press). The researchers used the ranking of meta-data elements as a guideline in creating the GEM meta-data profile. Many of the elements in the list correspond to Dublin core elements; those that do not became the GEM elements which were added to the Dublin Core to enable it to better describe educational materials.
Besides being the guiding force behind the creation of the GEM meta-profile, the preliminary research also helped the GEM team to design user-friendly interfaces. The Gateway currently has one simple search interface; through the research, the GEM team determined that it is most important to allow users to search using free-text terms and grade level. The simple search interface has so far produced satisfactory results. In addition to the simple search interface, The Gateway also allows users to browse the collection by keyword and subject.
Progress of the GEM ProjectFall 1997 marked the completion of year one for GEM. The emphasis in year one was on developing the architecture of the project and the tools necessary to implement it. The effort that was concentrated on creating a firm foundation for GEM paid off - on January 30, 1998, The Gateway was made freely available on the World Wide Web.
Year two began with a meeting of GEM Consortium members in Washington, D.C. in February, 1998. The consortium defined collection-building as a goal for the next year of the project. They also adopted a governance document to guide the GEM Project.
Adding resources to The Gateway requires contacting possible consortium members, training them to catalog their resources, and adding the new GEM records to The Gateway using the Harvest program. The Gateway began in January of 1998 with 700 resources. Currently, The Gateway contains over 3700 resources, and is updated weekly with new records.
Challenges Facing GEMGEM has experienced great success in its first few years of existence. Through its consortium, it has created a meta-data framework for describing educational materials on the Internet. It then created the technical tools necessary to create and collect GEM meta-data. Most recently, GEM has created what is becoming one of the most popular websites for teachers looking for educational materials on the Internet.
In order to encourage such rapid growth, GEM has helped its new members with cataloging by cataloging a portion of their resources for them. This allows the collection holders to see their resources in The Gateway before deciding to invest the time and effort into doing the cataloging themselves. Obviously, this is not a long-term solution. As The Gateway grows and GEM meta-data is more widely adapted and used, it would be impossible for GEM to catalog a portion of every collection holder's resources.
For future collection building GEM will rely on collection holders to do the majority of the meta-data cataloging. For collections held in a dynamic database, this is a fairly easy process; the database fields must be mapped to the GEM elements, and a script can be written which will allow the database to output GEM meta-data. For collections of static HTML pages, the process is more complex. A person, using the GEMCat cataloging module, must create each individual record.
There has been a great deal of discussion about whether or not this is a realistic expectation. Thomas and Griffin (1998) feel that many organizations (particularly commercial organizations) will, in fact, find disincentives to meta-data cataloging. These disincentives spring from the fact that meta-data cataloging can be time consuming, and may not create immediate financial benefit. The failure of organizations to embrace and implement meta-data, they argue, will make it an increasingly difficult to navigate Internet.
In the traditional library model, information creators (authors) are a separate entity from information describers (catalogers). The meta-data model changes this, as most meta-data efforts assign responsibility for cataloging to web page authors. Thomas and Griffin (1998) feel that information creators may balk at the idea of becoming information describers. This is another challenge that the GEM project faces.
GEM creates solutions for these problems through ease-of-use. The GEMCat cataloging module is simple and can be programmed to automatically create information repeated in several records (such as Cataloging Agency or Creator). Additionally, GEM offers training materials and assistance in training. Finally, GEM technical support is available to all collection holders and users.
While the importance of meta-data for resource discovery is an inarguable fact, GEM cannot force collections of educational materials to implement GEM meta-data. For organizations that do choose to become a part of the GEM project, they have an opportunity to make their resources available through a service which is rapidly becoming the first stop on the Internet for educators searching for educational resources.
User ResponseThe response of teachers to GEM has been overwhelming. The project is very warmly received by audiences of educators, particularly the ability to search the collection by standard. Teachers have become one of the project's greatest sources of publicity, informing their colleagues of the great collection of resources available to them on the Internet.
Virtual response has been equally positive. The installation of a feedback form on The Gateway has allowed users to communicate directly with the project, and the messages have been extremely positive. One user wrote:
I feel I have found one of the most important sites on the web. Thank you very much for providing such a gold mine of information! Well, I thought I was going to get some sleep tonight; however, by the look in my eyes I'm going to be on the web checking your sight out for a little while longer.Working in a web environment gives little opportunity for contact with users. GEM is proud to have created a service valued enough to compel users to submit positive feedback.
The feedback form also provides a mechanism for users to ask questions about the project. Most users are unfamiliar with the idea of seeing a bibliographic record on the Internet, such as the GEM record. Users are occasionally confused about how to link to the resource itself from the GEM record. The GEM staff responds to every query, working with users to ensure that they understand how to search The Gateway and connect to resources.
ConclusionsIn the din of many conversations about meta-data on the Internet, the name Dublin Core is heard over and over again. Dublin Core provides the kind of flexible, extensible framework necessary to describe the broad range of information resources on the Internet. The Gateway to Educational Materials project provides a working example of the promise that meta-data holds - it makes thousands of educational resources from many different collections available and easily retrievable. GEM proves that meta-data - Dublin Core meta-data - can deliver on its promise to make document retrieval easier and more efficient for users.
Meta-data promises to be the technology capable of untangling an Internet that many users find bewildering. Whether or not organizations and individuals producing resources choose to employ meta-data to make their documents more retrievable is their own choice; some doubt the ease of convincing them to do so, but most agree that meta-data is necessary to make the web usable.
In the next year, the World Wide Web will experience another period of exponential growth. As the number of Web resources increases, it will be increasingly difficult for users to find what they need. GEM will also continue to grow over the next year; we expect our resources to double in year three. The strengths of meta-data will ensure that it will remain easy to find resources in The Gateway.
Clark, R. (1997). Beyond the Dublin Core: Rich Meta-Data and Convenience-of-Use Are Compatible After All. Consulted January 5, 1999. Available: http://www.anu.edu.au/people/
Heery, R. (1996). "Review of Metadata Formats." Program 30(4): 345-373.
Lowe, C. (in press). "The Gateway to Educational Materials: Meeting the Needs of Teachers in the Information Age". Meridian: A Middle School Computer Technology Journal. Available: http://www.ncsu.edu/meridian/jan99/gem/index.html.
Miller, P. (1996). Metadata for the Masses. Consulted January 6, 1999. Available: http://www.ariadne.ac.uk/issue5/metadata-masses/.
Small, Ruth et al. (in press). "Information-Seeking for Instructional Design: An Exploratory Study." Journal of Research of Computing in Education.
Sutton, Stuart (in press). "Conceptual Design and Deployment of a Metadata Framework for Educational Resources on the Internet." Journal of the American Society for Information Science.
Thomas, C. and Griffin, L. Who Will Create the Metadata for the Internet? Consulted January 5, 1999. Available: http://www.firstmonday.dk/issues