Zum Inhalt
Zur Navigation

Current Approach for Searching the Portal

Currently, the Museumsportal Berlin provides its visitors with a simple keyword based search functionality allowing users to find web pages presenting museums, exhibitions, or events in which the specified term appears either in the textual description or among tags associated with a particular page (see Figure 1). The advantage of this approach lies in its simplicity and the fact that users are familiar with this kind of search. However, the main drawback of the keyword search, in general, is that the results are obtained merely on the basis of a syntactic match (i.e. the exact occurrence of a given term). This problem is especially evident in cases of:
  • misspelling
  • alternative spelling (e.g. Sandro Botticelli vs. Il Botticello)
  • aliases (Alessandro di Mariano di Vanni Filipepi known as Botticelli)
  • synonyms (words having the same meaning, e.g. fix and repair)
  • homonyms (the same word having different meanings, e.g. bank meaning either a river bank or a financial institution)
Those cases may lead to a situation where the museums of interest to the user are not found, even though they exist in the portal's database, simply because the searched keyword does not exactly match the words used in the museums' descriptions.
Figure 1: Webpage describing a museum.
Figure 1: Webpage describing a museum.
Another problem arises from the fact, that the processing search engine does not "understand" the meaning (semantics) of the search keyword and thereby cannot relate it to other terms which might also yield a valid query result. In order to illustrate this, consider the following example: If users of the Museumsportal Berlin who are looking for museums related to "impressionism", merely perform a search for this particular keyword, they will find only two entries. There are, however, many more museums presenting paintings of impressionist artists, for example pieces by Claude Monet, Max Liebermann, or Karl Hagemeister (see Figure 2). Unless the user performs multiple iterative searches for all those related terms, which of course is a tedious task requiring some knowledge in the arts domain, many museums of interest to the user will not be found.
Figure 2: Example of a search expansion. The search for "impressionism" is expanded into multiple searches for artists belonging to this art movement, like Claude Monet, Max Liebermann and Karl Hagemeister. The blue boxes represent additional search resul
Figure 2: Example of a search expansion. The search for "impressionism" is expanded into multiple searches for artists belonging to this art movement, like Claude Monet, Max Liebermann and Karl Hagemeister. The blue boxes represent additional search resul
Moreover, the keyword search proves rather an inefficient method if users introduce additional constraints into their queries. For example, if someone is looking for museums or events related to "impressionism", open on Tuesday, with entrance fee less than 10€ and audio guidance in English, the museums of interest can hardly be found by a simple enumeration of keywords. Instead, a mixed approach of searching and navigation is required (as depicted in Figure 3). First, the user has to perform a query for the key concept (i.e. impressionism, see Fig. 3.a), then he or she has to examine each found museum or event by following links to subpages containing information on opening hours, prices, and services (see Fig 3.b-d). In this concrete example the user of the Museumsportal Berlin would have to go through a navigation path consisting of 45 clicks, at the same time evaluating and aggregating all the information "manually" and memorizing museums satisfying his or her preferences.
Figure 3: A navigation path for a complex query.
Figure 3: A navigation path for a complex query.
The problems associated with keyword search, described above, are mainly caused by the fact that most of the information available on various portals, such as the Museumsportal Berlin, is represented in form of textual descriptions designed to be read by humans. Although machines can parse web pages for layout processing, they do not understand the semantics of the data. We hereby propose enhancements to the portal, relying on a formal representation of the information from the arts domain using Semantic Web Technologies.

Enhancing the Portal with Semantic Web Technologies

In the following we want to present some ideas on how the search and navigation on the portal may be improved by the application of Semantic Web technologies.

Museum Ontology

As already pointed out above, the main problem about accessing information on the Museumsportal arises from the fact that information is represented in a text format which, in terms of semantics, can hardly be understood by computers. Our proposed solution to this problem is the formalization of the portal data in form of a museum ontology consisting of two sub-ontologies:
  • Museum Description Ontology defining the semantic structure and key concepts used for describing cultural institutions as well as events and exhibitions offered by them.
  • Arts Domain Ontology capturing the general knowledge from the arts domain including information on artists, art movements, etc.
The former sub-ontology is populated with instances of museums which are present on the portal. We convert all the available data about each museum into the schema of our ontology. Since most of the information is provided by those institutions themselves, through a simple input form, the data is rather weakly structured. Therefore, we additionally apply Named Entity Recognition techniques for the extraction of artist names, etc. as well as identify catchwords belonging to the arts domain. The found names and catchwords are, in turn, mapped onto concepts from the latter sub-ontology, thereby connecting the information about museums with a broader knowledge base of semantic relations from the arts domain. (Compare integration component in Figure 4.)

Since the process of ontology development and maintenance is a rather complex and costly task, especially for such a broad domain as arts, we try to reuse already existing knowledge provided by other communities such as Wikipedia. At this point, it is important to note that we utilize this particular information source only as a practical example in order to illustrate the potential benefits resulting from the application of semantic technologies. In fact, there exist several classifications and thesauri, for example the Art and Architecture thesaurus (ATT) or the Union List of Artists Names (ULAN), which could be used as a foundation for our domain ontology as well.
Figure 4: A schematic view of the system architecture.
Figure 4: A schematic view of the system architecture.

Information Integration

There is, however, one important issue about integrating information from Wikipedia into Museumsportal Berlin, which is, that Wikipedia itself is a collection of documents represented in textual form, targeted at human readers and thus can only be queried by keywords. As argued before, we need a well-structured and semantically rich representation of data in order to overcome the limits of keyword search. This is even more important if we want to automatically integrate the relevant information from Wikipedia into Museumsportal Berlin. Fortunately, owing to the DbPedia Project - a community effort aiming at extracting structured data from Wikipedia and representing it with Semantic Web technologies - we can easily perform this integration task.

For each catchword or named entity (e.g. artist name), found either in the museum description or among its tags, we perform a look up in DbPedia in order to check if the given concept belongs to the arts domain. This can be found out based on the category of the DbPedia-resource corresponding to the concept in question. For example, the catchword "bauhaus" has a corresponding DbPedia-resource dbpedia:Bauhaus which belongs to (indicated by the property rdf:type) the category yago:ArtMovements, as shown in Figure 5. If the given catchword was positively validated, additional information describing this resource (in this example: painters associated with this movement etc.) is integrated into our ontology and stored in a local triple store for improved performance (see Figure 4).
Figure 5: The DBPedia ressource for Bauhaus.
Figure 5: The DBPedia ressource for Bauhaus.
By linking domain concepts on the Museumsportal with DbPedia-resources (also pointing to human-readable Wikipedia articles) we are able to enrich the content presented on the portal by embedding additional information on catchwords and entities found in museum descriptions. Consequently, visitors of the portal are provided with comprehensive information on the subject of museum exhibitions without the need to leave the Museumsportal in order to consult other sources for more details on encountered keywords (see Figure 6).
Figure 6: A popup for the entity surrealism providing additional information.
Figure 6: A popup for the entity surrealism providing additional information.

Improved Search and Navigation

Apart from enriching the information presented in the front-end of the Museumsportal Berlin, we also use semantic relations between concepts from the arts domain in order to overcome the limits of keyword search discussed earlier.

Since the domain ontology extracted from DbPedia contains information on synonyms and alternative spelling for arts concepts, e.g. impressionism and impressionist art, as well as on aliases of artist names (both indicated by the property dbprop:redirect), e.g. Sandro Botticelli or Il Botticello, we utilize this data by applying the mechanism of query expansion. Each search for a keyword specified by the user is complemented with queries for all its synonyms and spelling variations from our ontology.

Moreover, this simple mechanism is also applied to provide cross-lingual search. Although most of the museum and exhibition descriptions, delivered by those institutions themselves, are available in German as well as in English, there are still some exceptions where only a German version is available, especially in the case of tags. However, since the concepts in our ontology are associated with their names in different languages (indicated by the rdfs:label property, see Fig. 5) we are able to map the search keyword specified by the user to the same ontology concept, regardless of the language used, and expand the query into other languages. For example the search for impressionism (engl.) is realized by mapping this keyword to the concept dbpedia:Impressionism and performing the search for both the English and the expanded German (i.e. Impressionismus) term.

The examples so far deal with improving the keyword search for a particular concept from the arts domain by considering its different lexical representations (synonyms, alternative spelling, translations in different languages, etc.). The mechanism of query expansion, however, may go one step further by additionally taking into consideration semantic relations between different concepts. For example, based on our ontology, we are able to expand the search for an art movement into queries for artists belonging to (indicated by the property dbprop:movement) this particular style, or in the case of artists additionally search for their style and other artists they are related to in various ways (indicated by properties like dbprop:influencedBy or dbprop:training). Those kinds of semantic relations are the most interesting ones from the users' point of view.
Figure 7: An expansion rule for the query expansion.
Figure 7: An expansion rule for the query expansion.
To give a better understanding of our approach to query expansion we will illustrate the workflow of a search in our system. The search is initiated by a user entering a search term, e.g. "Paul Klee". The first thing the system does is a normal keyword based search as every other classical information retrieval system. The system then searches the Museum Description Ontology for a resource with the label "Paul Klee". If it finds one (or more) as in this example dbpedia:Paul_Klee of rdf:type yago:GermanPainters the search term is mapped to this resource. We then search a set of predefined semantic rules for the query expansion. Those rules are defined in an RDF file which enables a greater flexibility as the rules can easily be modified if the schema changes and new rules can be added. Every rule has a set of resource types it applies to (because of a restructuration of the DBPedia ontology in version 3.6 we do not only match rules by means of the rdf:type property but by arbitrary properties, esp. dcterms:subject). In this example the system finds (among others) the rule rule:ExtendArtistWithMovement which  applies to resources of the supertype yago:Creator109614315, as shown in Figure 7. To determine the supertype of a resource we use interference. A rule also contains an expansion pattern, which is a SPARQL query that has as result labels of resources that the rule finds related to the searched resource. In this case the expansion pattern returns labels of resources that are art movements, e.g. "expressionism", "surrealism" and "bauhaus". The system then performs a keyword bases search for every additional label the expansion query delivered. In this example the classical search in our system as well as the genuine Museumsportal finds three normal results. Our system additionally finds nine more museums that have exhibits from the art movements of Paul Klee, e.g. expressionism.
The advantage of our hybrid approach to search (first performing a normal keyword based search and then expanding the search based on semantic relations between domain concepts) is that on one hand the system still finds all the results the genuine Museumsportal Berlin finds. On the other hand the system is able to find more relevant results, thus offering an improvement to the normal search.

Because the expansion of a query into semantically related concepts increases the amount of answers, it is important to present the search result in such a way that it is manageable and comprehensible to users. At this point, once again, the semantic relations used in the process of query expansion might be used for generating explanations of the result set. One possible way of doing this, is to first list the exact matches of the searched keyword followed by the results obtained through query expansion, each with a dynamically generated explanation, as shown in Figure 8.
Figure 8: An explanation for an extended result telling the user, why this result is shown.
Figure 8: An explanation for an extended result telling the user, why this result is shown.
Another advantage of a well-structured ontology-based representation of museum data is the possibility of the realization of complex queries, such as discussed earlier. Users can specify their search constraints (e.g. desired services, etc.) through facets corresponding to the possible values of properties, from the ontology, describing museums and exhibitions. The preferences provided by the visitors of the portal are then translated into a formal query (i.e. SPARQL, see Figure 4). In consequence, the amount of clicks currently required to find out the desired information, as shown in the example in Figure 3, can significantly be reduced.
Figure 9: A complex search without a long navigation path.
Figure 9: A complex search without a long navigation path.

Team

Radoslaw Oldakowski (contact) and Dennis Hartrampf

Publications

  • Adrian Paschke, Radoslaw Oldakowski, Johannes Krug: Demonstrator eines Semantischen Museumsportals für Berlin. EVA Berlin 2009 (Electronic Information, the Visual Arts and Beyond), Special Topic Session. 13. November 2009. (pdf)
  • Radoslaw Oldakowski: Semantische Datenintegration und Suche im Museumsportal Berlin, In: Johann-Christoph Freytag, Robert Tolksdorf (Hrsg.): Tagungsband Xinnovations 2009, Berlin, 14.-16. September 2009. ISBN: 978-3-00-028902-6.

Go back

2012-12-15 10:33

Abschlussveranstaltung des Corporate Semantic Web Projekts

5 Jahre Corporate Semantic Web Abschlussveranstaltung am 16.1.2013

Read more …

2012-12-07 18:20

CSW active in OMG API4KB Standardization

API4KB is an initiative within OMG that aims at defining a standard programming interface for knowledge bases

Read more …

© 2008 FU Berlin | Feedback
This work has been partially supported by the  InnoProfile-Corporate Semantic Web project funded by the German Federal Ministry of Education and Research (BMBF) and the BMBF Innovation Initiative for the New German Länder - Entrepreneurial Regions.
doctor death jack kevorkianbuy flagyl