European Thematic Network for Doctoral Education in Computing

     In the last decade the increasing popularity of the World Wide Web has lead to an exponential
growth in the number of pages available on the Web. This huge number of Web pages makes
it increasingly difficult for users to find required information. In searching the Web for specific
information, one gets lost in the vast number of irrelevant search results and may miss relevant
material. Current Web applications provide Web pages in HTML format representing the content
in natural language only and the semantics of the content is therefore not accessible by machines.
To enable machines to support the user in solving information problems, the Semantic Web
proposes an extension to the existing Web that makes the semantics of the Web pages machine-
processable. The semantics of the information of a Web page is formalized using RDF meta-data
describing the meaning of the content. The existence of semantically annotated Web pages is
therefore crucial in bringing the Semantic Web into existence.
     Semantic annotation addresses this problem and aims to turn human-understandable content
into a machine-processable form by adding semantic markup. Many tools have been developed
that support the user during the annotation process. The annotation process, however, is a se-
parate task and is not integrated in the Web engineering process. Web engineering proposes
methodologies to design, implement and maintain Web applications but these methodologies
lack the generation of meta-data.
     In this thesis we introduce a technique to extend existing XML-based Web engineering
methodologies to develop semantically annotated Web pages. The novelty of this approach is
the definition of a mapping from XML Schema to ontologies, called WEESA, that can be used
to automatically generate RDF meta-data from XML content documents. We further demonstrate
the integration of the WEESA meta-data generator into the Apache Cocoon Web development
framework to easily extend XML-based Web applications to semantically annotated Web appli-
cation.
     Looking at the meta-data of a single Web page gives only a limited view of the of the in-
formation available in a Web application. For querying and reasoning purposes it is better to
have the full meta-data model of the whole Web application as a knowledge base at hand. In this
thesis we introduce the WEESA knowledge base, which is generated at server side by accumu-
lating the meta-data from individual Web pages. The WEESA knowledge base is then offered
for download and querying by software agents.
     Finally, the Vienna International Festival industry case study illustrates the use of WEESA
within an Apache Cocoon Web application in real life. We discuss the lessons learned while
implementing the case study and give guidelines for developing Semantic Web applications using
WEESA.

PhD DATABASE