CSBP 461 Internet Computing: XML Basics (Part1) PDF

Document Details

CoolestOnyx679

Uploaded by CoolestOnyx679

2020

Dr. M. Elarbi Badidi

Tags

XML Internet computing Web development Computer science

Summary

This document is a lecture presentation discussing XML basics in Internet computing. Dr. M. Elarbi Badidi's lecture notes for Fall 2020.

Full Transcript

CSBP 461 Internet Computing: XML Basics (Part1) 1 Dr. M. Elarbi Badidi Fall 2020 Objectives ▪ Introduce XML concepts ▪ Introduce the technologies for describing XML – DTD and XML Schema...

CSBP 461 Internet Computing: XML Basics (Part1) 1 Dr. M. Elarbi Badidi Fall 2020 Objectives ▪ Introduce XML concepts ▪ Introduce the technologies for describing XML – DTD and XML Schema 2 XML Overview ▪ When people refer to XML, they typically are referring to XML and related technologies. 3 XML Resources ▪ XML 1.0 Specification http://www.w3.org/TR/REC-xml ▪ WWW consortium’s Home Page on XML http://www.w3.org/XML/ ▪ Apache XML Project http://xml.apache.org/ ▪ XML Resource Collection http://xml.coverpages.org/xml.html ▪ O’Reilly XML Resource Center http://www.xml.com/ 4 XML Overview ▪ EXtensible Markup Language (XML) is a meta-language that describes the content of the document (self-describing data) Java = Portable Programs XML = Portable Data ▪ XML does not specify the tag set or grammar of the language o Tag Set – markup tags that have meaning to a language processor o Grammar – defines correct usage of a language’s tag 5 XML Overview, cont. ▪ eXtensible Markup Language (XML) is a language for defining markup languages ▪ HTML is an example of a well known markup language ▪ Tags in XML are defined by the author whereas tags in HTML are predefined by the W3C standard ▪ XML provides a portable (cross-platform) method for encapsulating and describing data ▪ An XML document is composed of elements consisting of opening and closing tags (data) 6 Simple XML Document (1) Larry Brown Marty Hall... 7 Simple XML Document (2) XML Developer's Guide Fabio Arciniegas 2001 26.95... 8 XML Components ▪ Prolog Defines the xml version, entity definitions, and DOCTYPE ▪ Components of the document Tags and attributes CDATA (character data) Entities Processing instructions Comments 9 XML Overview, cont. ▪ First line is document prolog: ▪ Single root element, , forms base of tree ▪ Tag names describe the data, i.e., ▪ Additional info can be provided via attributes: ▪ Applications exchanging XML need a common understanding of the semantic information provided by descriptive tag names and attributes. ▪ Advanced example: XHTML is HTML restructured to conform to the rules of XML. 10 XML Prolog XML Files always start with a prolog The version of XML is required The encoding identifies character set (default UTF-8) The value standalone identifies if an external document is referenced for DTD or entity definition. – Note: the prolog can contain entities and DTD definitions 11 XML Elements ▪ An XML element is an XML tag and the data it encapsulates (e.g., 5.50) ▪ Tag names Case sensitive Start with a letter or underscore After first character, numbers, -and.are allowed Cannot contain whitespaces Avoid use of colon expect for indicating namespaces– discussed later ▪ Element contents must be character data in the encoding character set – no binary data ▪ For a well-formed XML documents Every tag must have an end tag … All tags are completely nested (tag order cannot be mixed) 12 XML Elements, cont. ▪ Tags can also have attributes I have started XML basics. did you reach this chapter? 13 XML Element Attributes ▪ The opening tag of an XML element may contain attributes (e.g., ) Attributes provide metadata for the element. Attribute names must adhere to the same rules as element names. Attribute values are separated from the attribute name by an equal sign and must be enclosed in quotation marks, either the straight double quote (") or the apostrophe ('), with no commas in between. ▪ For every attribute there must be a value, even if the value is an empty string ▪ No duplicate attributes within a single element 14 XML Comments ▪ Comments start with , ▪ Comments can not include a string of consecutive dashes (e.g., - -), and ▪ They may appear anywhere within the document because they are not XML elements. 15 Processing Instructions ▪ XML processing instructions begin with (e.g., ). XML processors are designed to recognize certain targets and execute specific logic. ▪ Example 37 49.99 0130897930 Core Web Programming Second Edition Marty Hall Larry Brown 16 Document Entities ▪ Entities refer to a data item, typically text General entity references start with & and end with ; The entity reference is replaced by it’s true value when parsed The characters < > & ' " require entity references to avoid conflicts with the XML application (parser) &lt; &gt; &amp; &quot; &apos; ▪ Entities are user definable ]> Core Web Programming, &COPYRIGHT; 17 Document Entities, cont. ▪ Character entities represent a single character for which, possibly, no keyboard combination exists (such as à). They can be used only in text, not in element or attribute names. They can be numbered (e.g., &#224;) or named (e.g., &agrave;). The number in numbered entities represents a code point in the Unicode set. ▪ Enclosing text and possibly markup in a CDATA section instructs the XML parser not to attempt to parse it. A CDATA section begins with the markup. A CDATA section may contain any characters except the CDATA ending sequence. 18 Well-Formed versus Valid ▪ An XML document can be well-formed if it follows basic syntax rules. ▪ An XML document is valid if its structure matches a Document Type Definition (DTD). ▪ Unlike HTML parsers, XML parsers must report errors and may not replace missing quotes, close unclosed tags, or silently rearrange overlapping tags based on an assumption about the intended meaning. ▪ Some commonly abused XML syntax rules are: 1) Element and attribute names must be legal XML names; 2) Characters < and & must be escaped as character entities when used in text; 3) Every element must be closed; 4) Attributes must have values and values must be delimited with quotation marks; 5) Every element except the root element must be the child of exactly one element; 6) Comments must be properly formed, in particular, a comment may not contain the string “--” 19 Namespaces ▪ Use XML Namespaces to prevent name collisions among element and attribute names, which can be caused by designers choosing their own element names that conflict with imported elements defined in other XML documents. ▪ Namespaces are declared by adding an xmlns attribute to an element where the value of the xmlns attribute is a unique URI (not necessarily a valid URL). ▪ The element with the xmlns attribute and all of it’s children (nested elements) inherit the namespace; others that are not nested are not affected. ▪ xmlns can also be used repeatedly with different qualifiers, e.g.,. Then use the prefix to associate a namespace with a specific element (qualify): 20 Example Name Conflicts ▪ In XML, element names are defined by the developer. This often results in a conflict when trying to mix XML documents from different XML applications. ▪ This XML carries HTML table information: Apples Bananas ▪ This XML carries information about a table (a piece of furniture): African Coffee Table 80 21 120 ▪ If these XML fragments were added together, there would be a name conflict. Both contain a element, but the elements have different content and meaning. ▪ An XML parser will not know how to handle these differences. 21 Example, cont. Solving the Name Conflict Using a Prefix ▪ Name conflicts in XML can easily be avoided using a name prefix. ▪ This XML carries information about an HTML table, and a piece of furniture: Apples Bananas African Coffee Table 80 22 120 In the example above, there will be no conflict because the two elements have different names. 22 Example, cont. XML Namespaces - The xmlns Attribute ▪ When using prefixes in XML, a so-called namespace for the prefix must be defined. Apples Bananas African Coffee Table 80 120 23 In the example above, the xmlns attribute in the tag give the h: and f: prefixes a qualified namespace. ▪ When a namespace is defined for an element, all child elements with the same prefix are associated with the same namespace. 23 Example, cont. ▪ Namespaces can be declared in the elements where they are used or in the XML root element: Apples Bananas African Coffee Table 80 120 24 Note: The namespace URI is not used by the parser to look up information. The purpose is to give the namespace a unique name. However, often companies use the namespace as a pointer to a web page containing namespace information. 24 Namespaces, cont. ▪ A default namespace can be defined by using an unqualified xmlns on the root element of the XML document, e.g.,. Unqualified elements and attributes (names without prefixes) fall under the default namespace. ▪ Valid XML requires the root element of an XML document to be qualified, but other elements need not be. Best practice is to make sure that all of the elements in an XML document are qualified, either by the default namespace or explicitly by a prefix. ▪ Support for namespaces has to be built into the application that processes the XML. It is up to the application processing the XML to recognize namespaces, map the namespace URI to the identifying prefix, and process elements correctly depending upon their namespace. 25 Validating XML Documents ▪ A well formed document conforms to the syntax rules of XML, but it is not necessarily valid in the context of a particular application. For instance, a well formed XML document describing an invoice is probably not valid in the context of an application dealing with a catalog of books. ▪ If no formal document model is defined for an XML document, the document must still be well formed, but there are no limits on the element names used, the structure or contents of the elements, or the use of attributes. For complex documents or documents that will be used across organizational boundaries, a more formal definition of validity is needed. ▪ Two popular solutions are Document Type Definition (DTD) and XML Schema 26 Document Type Definition (DTD) ▪ Defines Structure of the Document Allowable tags and their attributes Attribute values constraints Nesting of tags Number of occurrences for tags Entity definitions ▪ A DTD is a sequence of these declarations enclosed in a DOCTYPE declaration or stored separately and referred to from a DOCTYPE 27 XML DOCTYPE ▪ Document Type Declarations Specifies the location of the DTD defining the syntax and structure of elements in the document Common forms: The root identifies the starting element (root element) of the document The DTD can be external to the XML document, referenced by a SYSTEM or PUBLIC URL o SYSTEM URL refers to a private DTD Located on the local file system or HTTP server o PUBLIC URL refers to a DTD intended for public use 28 DTD in XML Prolog (Internal Subset) ]> Boss Troops 15 April 1951 The buck stops here. Parentheses are grouping operators and commas are and operators #PCDATA means parsed character data 29 DTD in XML Prolog (Internal Subset) ]> Ahmed Salem... 30 External Subset DTD ▪ Anexternal subset DTD is specified in the DOCTYPE declaration using the SYSTEM keyword ▪ The DTD definition is stored in its own file, and the XML document looks like the following: Boss Troops 15 April 1951 The buck stops here. 31 Specifying a PUBLIC DTD The Formal Public Identifier (FPI) has four parts: 1) Connection of DTD to a formal standard - if defining yourself + nonstandards body has approved the DTD ISO if approved by formal standards committee 2) Group responsible for the DTD 3) Description and type of document 4) Language used in the DTD 32 PUBLIC DOCTYPE Examples 33 Defining Elements ▪ ▪ Types – ANY Any well-formed XML data – EMPTY Element cannot contain any text or child elements – PCDATA Character data only (should not contain markup) – elements List of legal child elements (no character data) – mixed May contain character data and/or child elements (cannot constrain order and number of child elements) 34 Defining Elements, cont. ▪ Cardinality [none] Default (one and only one instance) ? 0, 1 * 0, 1, …, N + 1, 2, …, N ▪ List Operators , Sequence (in order) | Choice (one of several) 35 Grouping Elements ▪ Set of elements can be grouped within parentheses (Elem1?, Elem2?)+ o Elem1 can occur 0 or 1 times followed by 0 or 1 occurrences of Elem2 o The group (sequence) must occur 1 or more times ▪ OR ((Elem1, Elem2) | Elem3)* o Either the group of Elem1, Elem2 is present (in order) or Elem3 is present, 0 or more times 36 Element Example... ]> Ali Mubarak 37 Defining Attributes ▪ ▪ Examples 38 Attribute Types ▪ CDATA Essentially anything; simply unparsed data ▪ Enumeration attribute (value1|value2|value3) [Modifier] ▪ Eight other attribute types ID, IDREF, NMTOKEN, NMTOKENS, ENTITY, ENTITIES, NOTATION 39 Attribute Modifiers ▪ #IMPLIED Attribute is not required ▪ #REQUIRED Attribute must be present ▪ #FIXED "value“ Attribute is present and always has this value ▪ Default value (applies to enumeration) 41 Limitations of DTDs DTD itself is not in XML format – more work for parsers Does not express data types (weak data typing). DTDs do not support data types like integers, decimals, booleans, dates, or enumerations DTDs do not allow one to specify that the data appear in a specific format. No namespace support Document can override external DTD definitions No DOM support XML Schema is intended to resolve these issues but … DTDs are going to be around for a while. An XML schema is an XML document that conforms to the XML Schema specification 42 XML Schema ▪ W3C recommendation released May 2001 http://www.w3.org/TR/xmlschema-0/ http://www.w3.org/TR/xmlschema-1/ http://www.w3.org/TR/xmlschema-2/ Depends on following specifications o XML-Infoset, XML-Namespaces, Xpath ▪ Benefits: Standard and user-defined data types Express data types as patterns Higher degree of type checking Better control of occurrences 43 XML Schema, Example … … … 47 XML Schema, cont. ▪ The root XML element for the XML schema definition is. ▪ The xmlns attribute of the schema definition binds the namespace prefix xsd to the version of XML Schema being used, in this case http://www.w3.org/2001/XMLSchema. ▪ The targetNamespace for the XML schema is the namespace for the elements and attributes defined by the schema definition. When this schema is referenced by another XML document, the targetNamespace will be used to qualify the elements defined by this schema. ▪ The elementFormDefault attribute set to "qualified" indicates that nested elements in the XML document instance must be namespace qualified; default is unqualified. 48 XML Schema, cont. ▪ XML elements are defined using the tag and XML element attributes are defined using the tag. ▪ The name and type attributes are used to define the element/attribute name and data type, respectively. ▪ Elementscan be defined as either complexType or simpleType, attributes can only be simpleType. Simple types can have neither attributes nor child elements. Complex types can have either. ▪ XML Schema defines many built-in atomic types including strings, numbers, dates, and times. ▪ The built-in atomic types can be further constrained by a derived simple type specifying facets using the element, e.g., minLength and maxLength. 49 XML Schema, cont. ▪ Example complexType and simpleType: 50 Summary ▪ XML is a self-describing meta data ▪ DOCTYPE defines the root element and location of DTD ▪ Document Type Definition (DTD) defines the grammar of the document Required to validate the document Constrains grouping and cardinality of elements ▪ DTD processing is expensive ▪ Schema uses XML to specify the grammar – More complex to express but easier to process 51

Use Quizgecko on...
Browser
Browser