Semantic Web PDF
Document Details
Uploaded by CreativeFantasticArt
An-Najah National University
Dr. Amjad Hawash
Tags
Summary
This document provides an introduction to the Semantic Web. It explains the need for machine-readable information, and how concepts from semiotics, such as syntax, semantics, and pragmatics, apply to the Web. It also touches on the concepts of metadata and how to create and extract semantic metadata.
Full Transcript
Knowledge Representation & Reasoning Semantic Web INTRODUCTION The World Wide Web is primarily composed of documents written in HTML. During the first decade of its existence, most of the information on the Web is designed only for human consumption. Humans can read Web p...
Knowledge Representation & Reasoning Semantic Web INTRODUCTION The World Wide Web is primarily composed of documents written in HTML. During the first decade of its existence, most of the information on the Web is designed only for human consumption. Humans can read Web pages and understand them, but their inherent meaning is not shown in a way that allows their interpretation by computers. Dr. Amjad Hawash - KRA 2 - An-Najah N. University INTRODUCTION The information on the Web can be defined in a way that it can be used by computers not only for display purposes, but also for interoperability and integration between systems and applications. One way to enable machine-to-machine exchange and automated processing is to provide the information in such a way that computers can understand it. Dr. Amjad Hawash - KRA 3 - An-Najah N. University INTRODUCTION The information on the Web can be defined in a way that it can be used by computers not only for display purposes, but also for interoperability and integration between systems and applications. One way to enable machine-to-machine exchange and automated processing is to provide the information in such a way that computers can understand it. This is precisely the objective of the semantic Web – to make possible the processing of Web information by computers. Dr. Amjad Hawash - KRA 4 - An-Najah N. University INTRODUCTION The Semantic Web was made through incremental changes, by bringing machine-readable descriptions to the data and documents already on the Web. Currently the Web is in evolution, as illustrated in Figure 1-1, and different approaches are being sought to come up with the solutions to add semantics to Web resources. Dr. Amjad Hawash - KRA 5 - An-Najah N. University INTRODUCTION Dr. Amjad Hawash - KRA 6 - An-Najah N. University INTRODUCTION On the left side of Figure 1-1, a graph representation of the syntactic Web is given. Resources are linked together forming the Web. There is no distinction between resources or the links that connect resources. To give meaning to resources and links, new standards and languages are being investigated and developed. The rules and descriptive information made available by these languages allow to characterize individually and precisely the type of resources in the Web and the relationships between resources, as illustrated in the right side of Figure 1-1. Dr. Amjad Hawash - KRA 7 - An-Najah N. University INTRODUCTION Due to the widespread importance of integration and interoperability for intra- and inter-business processes, the research community has tackled this problem and developed semantic standards such as the Resource Description Framework (RDF) and the Web Ontology Language (OWL). RDF and OWL standards enable the Web to be a global infrastructure for sharing both documents and data, which make searching and reusing information easier and more reliable as well. Dr. Amjad Hawash - KRA 8 - An-Najah N. University INTRODUCTION RDF is the W3C standard for creating descriptions of information, describing their semantics and reasoning, especially information available on the World Wide Web. What XML is for syntax, RDF is for semantics. Both share a unified model and together provide a framework for developing Web applications that deal with data and semantics Dr. Amjad Hawash - KRA 9 - An-Najah N. University INTRODUCTION Relationships are at the heart of semantics. Perhaps the most important characteristic of RDF is that it elevates relationships to first class object, providing the first representational basis for giving semantic description. OWL provides a language for defining structured Web-based ontologies which allows a richer integration and interoperability of data among communities and domains. Dr. Amjad Hawash - KRA 10 - An-Najah N. University SEMIOTICS – SYNTAX, SEMANTICS, AND PRAGMATICS Semiotics is the general science of signs – such as icons, images, objects, tokens, and symbols – and how their meaning is transmitted and understood. A sign is generally defined as something that stands for something else. The human language is a particular case of semiotics. A language is a system of conventional spoken or written symbols by means of which people communicate. Dr. Amjad Hawash - KRA 11 - An-Najah N. University SEMIOTICS – SYNTAX, SEMANTICS, AND PRAGMATICS Formal languages, such as logic, are also based on symbols and, therefore, are also studied by semiotics. Compared to the human language, formal languages have a precise construction rules for the syntax and semantics of programs. Semiotics is composed of three fundamental components: syntax, semantics, and pragmatics. Dr. Amjad Hawash - KRA 12 - An-Najah N. University SEMIOTICS – SYNTAX, SEMANTICS, AND PRAGMATICS Dr. Amjad Hawash - KRA 13 - An-Najah N. University Syntax It deals with the formal or structural relations between signs (or tokens) and the production of new ones. For example, grammatical syntax is the study of which sequences of symbols are well formed according to the recursive rules of grammar. The set of allowed reserved words, their parameters, and the correct word order in an expression is called the syntax of a language. In computer science, if a program is syntactically correct according to its rules of syntax, then the compiler will validate the syntax and will not generate error messages. This, however, does not ensure that the program is semantically correct (i.e., return results as expected). Dr. Amjad Hawash - KRA 14 - An-Najah N. University Syntax For example, when XML is used to achieve interoperability and integration of information systems, the data exchanged between systems must follow a precise syntax. If the rules of the syntax are not followed, a syntactical error occurs. For example, using a tag spelled instead of , omitting a closing tag, or not following the syntax of a XML Schema will generate a syntactical error. It should be noticed, that syntax does not include the study of things such as “truth” and “meaning.” Dr. Amjad Hawash - KRA 15 - An-Najah N. University Semantics It is the study of relations between the system of signs (such as words, phrases, and sentences) and their meanings. As it can be seen by this definition, the objective of semantics is totally different from the objective of syntax. The former concerns to what something means while the latter pertains to the formal structure/patterns in which something is expressed. Semantics are distinct from the concept of ontology. Dr. Amjad Hawash - KRA 16 - An-Najah N. University Semantics While the former is about the use of a word, the latter is related to the nature of the entity or domain referenced by the word. One important and interesting question in semantics research is if the meaning is established by looking at the neighborhood in the ontology that the word is part of or if the meaning is already contained in the word itself. Dr. Amjad Hawash - KRA 17 - An-Najah N. University Semantics Second important and interesting question is the formal representation language to capture the semantics such that it is machine processable with consistent interpretation. Third important question is the expressiveness of this representation language that balances computability versus capturing the true richness of the real world that is being modeled. Dr. Amjad Hawash - KRA 18 - An-Najah N. University Semantics Correspondingly, the following three forms of semantics have been defined: – Implicit semantics:This type of semantics refers to the kind that is implicit in data and that is not represented explicitly in any machine processable syntax. – Formal semantics: Semantics that are represented in some well-formed syntactic form (governed by syntax rules) is referred to as formal semantics. Dr. Amjad Hawash - KRA 19 - An-Najah N. University Semantics – Powerful (soft) semantics: Usually, efforts related to formal semantics have involved limiting expressiveness to allow for acceptable computational characteristics. Since most KR mechanisms and the Relational Data Model are based on set theory, the ability to represent and utilize knowledge that is imprecise, uncertain, partially true, and approximate is lacking, at least in the base/standard models. Representing and utilizing these types of more powerful knowledge is, in our opinion, critical to the success of the Semantic Web. Soft computing has explored these types of powerful semantics. We deem these powerful (soft) semantics as distinguished, albeit not distinct from or orthogonal to formal and implicit semantics. Dr. Amjad Hawash - KRA 20 - An-Najah N. University Pragmatics It is the study of natural language understanding, and specifically the study of how context influences the interpretation of meaning. Pragmatics is interested predominantly in utterances, made up of sentences, and usually in the context of conversations (Wikipedia 2005). The context may include any social, environmental, and psychological factors. It includes the study or relations among signs, their meanings, and users of the signs, and the repercussions of sign interpretations for the interpreters in the environment. While semantics deals with the meaning of signs, pragmatics deals with the origin, uses, and effects of signs within the content, context, or behavior in which they occur. Dr. Amjad Hawash - KRA 21 - An-Najah N. University SEMANTIC HETEROGENEITY ON THE WEB Problems that might arise due to heterogeneity of the data in the Web are already well known within the distributed database systems community. Heterogeneity occurs when there is a disagreement about the meaning, interpretation, or intended use of the same or related data. As with distributed database systems, four types of information heterogeneity may arise in the Web: system heterogeneity, syntactic heterogeneity, structural or schematic heterogeneity, and semantic heterogeneity. Dr. Amjad Hawash - KRA 22 - An-Najah N. University SEMANTIC HETEROGENEITY ON THE WEB System heterogeneity: – Applications and data may reside in different hardware platforms and operating systems. Syntactic heterogeneity: Information sources may use different representations and encodings for data. Syntactic interoperability can be achieved when compatible forms of encoding and access protocols are used to allow information systems to communicate. Dr. Amjad Hawash - KRA 23 - An-Najah N. University SEMANTIC HETEROGENEITY ON THE WEB Structural heterogeneity: – Different information systems store their data in different document layouts and formats, data models, data structures and schemas. Semantic heterogeneity: The meaning of the data can be expressed in different ways leading to heterogeneity. Semantic heterogeneity considers the content of an information item and its intended meaning. Dr. Amjad Hawash - KRA 24 - An-Najah N. University SEMANTIC HETEROGENEITY ON THE WEB Approaches to the problems of semantic heterogeneity should equip heterogeneous, autonomous, and distributed software systems with the ability to share and exchange information in a semantically consistent way. In the representation languages to support the Semantic Web approach, as recommended by the W3C, XML supports ability to deal with syntactic heterogeneity; XML, XPath, and XQuery provide ability to transcend certain structural heterogeneity, while RDF and OWL (or other ontology representatio languages) provide a key approach to deal with semantic heterogeneity. Dr. Amjad Hawash - KRA 25 - An-Najah N. University SEMANTIC HETEROGENEITY ON THE WEB One solution is for developers to write code which translates between the terminologies of pairs of systems. When the requirement is for a small number of systems to interoperate, this may be a useful solution. However, this solution does not scale as the development costs increase as more systems are added and the degree of semantic heterogeneity increases. Dr. Amjad Hawash - KRA 26 - An-Najah N. University SEMANTIC HETEROGENEITY ON THE WEB Assuming the development of bidirectional translators, i.e. translators that enable the interoperation of system A to system B and from system B to system A, to allow the interoperability of ‘n’ systems we need (n-1)+(n- 2)+...+1 translators. Figure 1-3 shows the translators required to integrate 6 systems. Dr. Amjad Hawash - KRA 27 - An-Najah N. University SEMANTIC HETEROGENEITY ON THE WEB Dr. Amjad Hawash - KRA 28 - An-Najah N. University SEMANTIC HETEROGENEITY ON THE WEB A more suitable solution to the problem of semantic heterogeneity is to rely on the technological foundations of the semantic Web. More precisely, to semantically define the meaning of the terminology of each distributed system data using the concepts present in a shared ontology to make clear the relationships and differences between concepts. Dr. Amjad Hawash - KRA 29 - An-Najah N. University SEMANTIC HETEROGENEITY ON THE WEB Figure 1-4 shows a possible architecture that achievesinteroperability using the semantic Web and ontologies. This solution only requires the development of ‘n’ links to interconnect systems. Dr. Amjad Hawash - KRA 30 - An-Najah N. University SEMANTIC HETEROGENEITY ON THE WEB Dr. Amjad Hawash - KRA 31 - An-Najah N. University METADATA Metadata can be defined as “data about data.” The goal of incorporating metadata into data sources is to enable the end-user to find items and contextually relevant information. Data sources are generally heterogeneous and can be unstructured, semi-structured, and structured. In the semantic Web, a data source is typically a document, such as a Web page, containing textual content or data. Of course, other types of resources may also include metadata information, such as records from a digital library. Dr. Amjad Hawash - KRA 32 - An-Najah N. University METADATA Metadata can exist in several levels. These “levels of metadata” are not mutually exclusive; on the contrary, the accumulative combination of each type of metadata provides a multi-faceted representation of the data including information about its syntax, structure, and semantic context. Dr. Amjad Hawash - KRA 33 - An-Najah N. University METADATA The process of attaching semantic metadata to a document or any piece of content is called semantic. Metadata extraction is the process of identifying metadata for that document or content. This process could be manual, semiautomatic or fully automatically. Dr. Amjad Hawash - KRA 34 - An-Najah N. University METADATA Semantic applications are created by exploiting metadata and ontologies with associated knowledgebase. In essence, in the semantic Web, documents are marked up with semantic metadata which is machine-understandable about the human-readable content of documents. Other approaches, which are less expressive, consist on using purely syntactic or structural metadata. Dr. Amjad Hawash - KRA 35 - An-Najah N. University Syntactic Metadata The simplest form of metadata is syntactic metadata. It describes non-contextual information about content and provides very general information, such as the document’s size, location, or date of creation. Syntactic metadata attaches labels or tags to data. Dr. Amjad Hawash - KRA 36 - An-Najah N. University Syntactic Metadata The following example shows syntactic metadata describing a document: Dr. Amjad Hawash - KRA 37 - An-Najah N. University Syntactic Metadata Most documents have some degree of syntactic metadata. E-mail headers provide author, recipient, date, and subject information. While these headers provide very little or no contextual understanding of what the document says or implies (assuming value of author is treated as a string or ordered sets of words, rather than its full semantics involving modeling of author as a person authoring a document, etc.), this information is useful for certain applications. For example, a mail client may constantly monitor incoming e-mail to find documents, related to a particular subject, the user is interested in. Dr. Amjad Hawash - KRA 38 - An-Najah N. University Structural Metadata Structural metadata provides information regarding the structure of content. It describes how items are put together or arranged. The amount and type of such metadata will vary widely with the type of document. For example, an HTML document may have a set of predefined tags, but these exist primarily for rendering purposes. Therefore, they are not very helpful in providing contextual information for content. Dr. Amjad Hawash - KRA 39 - An-Najah N. University Structural Metadata Nevertheless, positional or structural placement of information within a document can be used to further embellish metadata (e.g., terms or concepts appear in a title may be give higher weight to that appearing in the body). On the other hand, XML gives the ability to enclose content within more meaningful tags. This is clearly more useful in determining context and relevance when compared to the limitations of syntactic metadata for providing information about the document itself. Dr. Amjad Hawash - KRA 40 - An-Najah N. University Structural Metadata For example, a DTD or XSD outlines the structural metadata of a particular document. It lists the elements, attributes, and entities in a document and it defines the relationships between the different elements and attributes. A DTD declares a set of XML element names and how they can be used in a document. Dr. Amjad Hawash - KRA 41 - An-Najah N. University Structural Metadata The following lines, extracted from a DTD, describe a set of valid XML documents: Dr. Amjad Hawash - KRA 42 - An-Najah N. University Structural Metadata Structural metadata tell us how data are grouped and put in ordered arrangements with other data. This DTD sample indicates that a “contacts” element contains one or more “contact” elements. A “contact” element contains the elements “name” and “birthdate”, and the “name” and “birthdate” elements contain data. Dr. Amjad Hawash - KRA 43 - An-Najah N. University Semantic Metadata Semantic metadata adds relationships, rules, and constraints to syntactic and structural metadata. This metadata describe contextually relevant or domain- specific information about content based on a domain specific metadata model or ontology, providing a context for interpretation. In a sense, they capture a meaning associated with the content. If a formal ontology is used for describing and interpreting this type of metadata, then it lends itself to machine processability and hence higher degrees of automation. Dr. Amjad Hawash - KRA 44 - An-Najah N. University Semantic Metadata Semantic data provides a means for high- precision searching, and, perhaps most importantly, it enables interoperability among heterogeneous data sources. Semantic metadata is used to give meaning to the elements described by the syntactic and structural metadata. These metadata elements allow applications to “understand” the actual meaning of the data. Dr. Amjad Hawash - KRA 45 - An-Najah N. University Semantic Metadata By creating a metadata model of data, information, and relationships, we are able to use reasoning capabilities such as inference engines to draw logical conclusions based on the metadata model, or path identification and ranking using graph based processing leading to mining and discovery. Dr. Amjad Hawash - KRA 46 - An-Najah N. University Semantic Metadata For instance, if we know that the ABC Company sends every year a gift to very good customers, and that John is a very good customer, then by inference, we know that the company will ship a gift to John next year. Or if we find a potential customer has a business partner with another person who is on the Bank of England list of people involved in money laundering, the potential customer is a suspect according to the government’s anti-money regulations. Dr. Amjad Hawash - KRA 47 - An-Najah N. University Semantic Metadata Figure 1-5 shows the types of metadata we have discussed. Dr. Amjad Hawash - KRA 48 - An-Najah N. University Creating and Extracting Semantic Metadata In order to extract optimal value from a document and make it usable, it needs to be effectively tagged by analyzing and extracting relevant information of semantic interest. Many techniques can be used to achieve this based on extracting syntactic and semantic metadata from documents. Dr. Amjad Hawash - KRA 49 - An-Najah N. University Creating and Extracting Semantic Metadata These include: – Semantic lexicons, nomenclatures, reference sets and thesauri: Match words, phrases or parts of speech with a static or periodically maintained dictionary and thesaurus. Semantic lexicon, such as WordNet which groups English words into sets of synonyms called synsets and records semantic relations between synonym sets, can be used to identify and match terms in different directions, finding words that mean the same or are more general or more specific. WordNet supports various types of relationships such as synonyms, hypernyms, hyponyms, holonym, and meronym which can de effectively used to find relationship between words and extract the meaning of words. Dr. Amjad Hawash - KRA 50 - An-Najah N. University Creating and Extracting Semantic Metadata These include: – Document analysis: Look for patterns and co-occurrences, and apply predefined rules to find interesting patterns within and across documents. Regular expressions and relationships between words can be used to understand the meaning of documents. Dr. Amjad Hawash - KRA 51 - An-Najah N. University Creating and Extracting Semantic Metadata These include: – Ontologies: – Capturing domain-specific (application or industry) knowledge including entities and relationships, both at a definitional level (e.g., a company has a CEO), and capturing real-world facts or knowledge (e.g., Meg Witman is the CEO of eBay) at an instance or assertional level. – If the ontology deployed is "one size fits all" and is not domain-specific, the full potential of this approach cannot be exploited. Dr. Amjad Hawash - KRA 52 - An-Najah N. University Creating and Extracting Semantic Metadata The last option, also known as ontology-driven meta data extraction, is the most flexible (assuming the ontology is kept up to date to reflect changes in the real world) and comprehensive (since it allows modeling of fact- based domain-specific relationships between entities that are at the heart of semantic representations). Dr. Amjad Hawash - KRA 53 - An-Najah N. University EMPIRICAL CONSIDERATIONS ON THE USE OF SEMANTICS AND ONTOLOGIES The last option, also known as ontology-driven meta data extraction, is the most flexible (assuming the ontology is kept up to date to reflect changes in the real world) and comprehensive (since it allows modeling of fact- based domain-specific relationships between entities that are at the heart of semantic representations). Dr. Amjad Hawash - KRA 54 - An-Najah N. University References The Semantic Web and Its Applications Jorge Cardoso University of Coimbra Amit Sheth University of South Carolina Dr. Amjad Hawash - KRA 55 - An-Najah N. University