CSBP 461 Internet Computing XML Basics (Part1) PDF
Document Details
Uploaded by CoolestOnyx679
2020
Dr. M. Elarbi Badidi
Tags
Summary
These are lecture notes about XML basics. The document provides an overview of XML and describes XML components, including tags, attributes, and comments. Includes examples of XML code.
Full Transcript
CSBP 461 Internet Computing: XML Basics (Part1) 1 Dr. M. Elarbi Badidi Fall 2020 Objectives ▪ Introduce XML concepts ▪ Introduce the technologies for describing XML – DTD and XML Schema...
CSBP 461 Internet Computing: XML Basics (Part1) 1 Dr. M. Elarbi Badidi Fall 2020 Objectives ▪ Introduce XML concepts ▪ Introduce the technologies for describing XML – DTD and XML Schema 2 XML Overview ▪ When people refer to XML, they typically are referring to XML and related technologies. 3 XML Resources ▪ XML 1.0 Specification http://www.w3.org/TR/REC-xml ▪ WWW consortium’s Home Page on XML http://www.w3.org/XML/ ▪ Apache XML Project http://xml.apache.org/ ▪ XML Resource Collection http://xml.coverpages.org/xml.html ▪ O’Reilly XML Resource Center http://www.xml.com/ 4 XML Overview ▪ EXtensible Markup Language (XML) is a meta-language that describes the content of the document (self-describing data) Java = Portable Programs XML = Portable Data ▪ XML does not specify the tag set or grammar of the language o Tag Set – markup tags that have meaning to a language processor o Grammar – defines correct usage of a language’s tag 5 XML Overview, cont. ▪ eXtensible Markup Language (XML) is a language for defining markup languages ▪ HTML is an example of a well known markup language ▪ Tags in XML are defined by the author whereas tags in HTML are predefined by the W3C standard ▪ XML provides a portable (cross-platform) method for encapsulating and describing data ▪ An XML document is composed of elements consisting of opening and closing tags (data) 6 Simple XML Document (1) Larry Brown Marty Hall... 7 Simple XML Document (2) XML Developer's Guide Fabio Arciniegas 2001 26.95... 8 XML Components ▪ Prolog Defines the xml version, entity definitions, and DOCTYPE ▪ Components of the document Tags and attributes CDATA (character data) Entities Processing instructions Comments 9 XML Overview, cont. ▪ First line is document prolog: ▪ Single root element, , forms base of tree ▪ Tag names describe the data, i.e., ▪ Additional info can be provided via attributes: ▪ Applications exchanging XML need a common understanding of the semantic information provided by descriptive tag names and attributes. ▪ Advanced example: XHTML is HTML restructured to conform to the rules of XML. 10 XML Prolog XML Files always start with a prolog The version of XML is required The encoding identifies character set (default UTF-8) The value standalone identifies if an external document is referenced for DTD or entity definition. – Note: the prolog can contain entities and DTD definitions 11 XML Elements ▪ An XML element is an XML tag and the data it encapsulates (e.g., 5.50) ▪ Tag names Case sensitive Start with a letter or underscore After first character, numbers, -and.are allowed Cannot contain whitespaces Avoid use of colon expect for indicating namespaces– discussed later ▪ Element contents must be character data in the encoding character set – no binary data ▪ For a well-formed XML documents Every tag must have an end tag … All tags are completely nested (tag order cannot be mixed) 12 XML Elements, cont. ▪ Tags can also have attributes I have started XML basics. did you reach this chapter? 13 XML Element Attributes ▪ The opening tag of an XML element may contain attributes (e.g., ) Attributes provide metadata for the element. Attribute names must adhere to the same rules as element names. Attribute values are separated from the attribute name by an equal sign and must be enclosed in quotation marks, either the straight double quote (") or the apostrophe ('), with no commas in between. ▪ For every attribute there must be a value, even if the value is an empty string ▪ No duplicate attributes within a single element 14 XML Comments ▪ Comments start with , ▪ Comments can not include a string of consecutive dashes (e.g., - -), and ▪ They may appear anywhere within the document because they are not XML elements. 15 Processing Instructions ▪ XML processing instructions begin with (e.g., ). XML processors are designed to recognize certain targets and execute specific logic. ▪ Example 37 49.99 0130897930 Core Web Programming Second Edition Marty Hall Larry Brown 16 Document Entities ▪ Entities refer to a data item, typically text General entity references start with & and end with ; The entity reference is replaced by it’s true value when parsed The characters < > & ' " require entity references to avoid conflicts with the XML application (parser) < > & " ' ▪ Entities are user definable ]> Core Web Programming, ©RIGHT; 17 Document Entities, cont. ▪ Character entities represent a single character for which, possibly, no keyboard combination exists (such as à). They can be used only in text, not in element or attribute names. They can be numbered (e.g., à) or named (e.g., à). The number in numbered entities represents a code point in the Unicode set. ▪ Enclosing text and possibly markup in a CDATA section instructs the XML parser not to attempt to parse it. A CDATA section begins with the markup. A CDATA section may contain any characters except the CDATA ending sequence. 18 Well-Formed versus Valid ▪ An XML document can be well-formed if it follows basic syntax rules. ▪ An XML document is valid if its structure matches a Document Type Definition (DTD). ▪ Unlike HTML parsers, XML parsers must report errors and may not replace missing quotes, close unclosed tags, or silently rearrange overlapping tags based on an assumption about the intended meaning. ▪ Some commonly abused XML syntax rules are: 1) Element and attribute names must be legal XML names; 2) Characters < and & must be escaped as character entities when used in text; 3) Every element must be closed; 4) Attributes must have values and values must be delimited with quotation marks; 5) Every element except the root element must be the child of exactly one element; 6) Comments must be properly formed, in particular, a comment may not contain the string “--” 19 Validating XML Documents ▪ A well formed document conforms to the syntax rules of XML, but it is not necessarily valid in the context of a particular application. For instance, a well formed XML document describing an invoice is probably not valid in the context of an application dealing with a catalog of books. ▪ If no formal document model is defined for an XML document, the document must still be well formed, but there are no limits on the element names used, the structure or contents of the elements, or the use of attributes. For complex documents or documents that will be used across organizational boundaries, a more formal definition of validity is needed. ▪ Two popular solutions are Document Type Definition (DTD) and XML Schema 26 Document Type Definition (DTD) ▪ Defines Structure of the Document Allowable tags and their attributes Attribute values constraints Nesting of tags Number of occurrences for tags Entity definitions ▪ A DTD is a sequence of these declarations enclosed in a DOCTYPE declaration or stored separately and referred to from a DOCTYPE 27 XML DOCTYPE ▪ Document Type Declarations Specifies the location of the DTD defining the syntax and structure of elements in the document Common forms: The root identifies the starting element (root element) of the document The DTD can be external to the XML document, referenced by a SYSTEM or PUBLIC URL o SYSTEM URL refers to a private DTD Located on the local file system or HTTP server o PUBLIC URL refers to a DTD intended for public use 28 DTD in XML Prolog (Internal Subset) ]> Boss Troops 15 April 1951 The buck stops here. Parentheses are grouping operators and commas are and operators #PCDATA means parsed character data 29 DTD in XML Prolog (Internal Subset) ]> Ahmed Salem... 30 External Subset DTD ▪ Anexternal subset DTD is specified in the DOCTYPE declaration using the SYSTEM keyword ▪ The DTD definition is stored in its own file, and the XML document looks like the following: Boss Troops 15 April 1951 The buck stops here. 31 Specifying a PUBLIC DTD The Formal Public Identifier (FPI) has four parts: 1) Connection of DTD to a formal standard - if defining yourself + nonstandards body has approved the DTD ISO if approved by formal standards committee 2) Group responsible for the DTD 3) Description and type of document 4) Language used in the DTD 32 PUBLIC DOCTYPE Examples 33 Defining Elements ▪ ▪ Types – ANY Any well-formed XML data – EMPTY Element cannot contain any text or child elements – PCDATA Character data only (should not contain markup) – elements List of legal child elements (no character data) – mixed May contain character data and/or child elements (cannot constrain order and number of child elements) 34 Defining Elements, cont. ▪ Cardinality [none] Default (one and only one instance) ? 0, 1 * 0, 1, …, N + 1, 2, …, N ▪ List Operators , Sequence (in order) | Choice (one of several) 35 Grouping Elements ▪ Set of elements can be grouped within parentheses (Elem1?, Elem2?)+ o Elem1 can occur 0 or 1 times followed by 0 or 1 occurrences of Elem2 o The group (sequence) must occur 1 or more times ▪ OR ((Elem1, Elem2) | Elem3)* o Either the group of Elem1, Elem2 is present (in order) or Elem3 is present, 0 or more times 36 Element Example... ]> Ali Mubarak 37 Defining Attributes ▪ ▪ Examples 38 Attribute Types ▪ CDATA Essentially anything; simply unparsed data ▪ Enumeration attribute (value1|value2|value3) [Modifier] ▪ Eight other attribute types ID, IDREF, NMTOKEN, NMTOKENS, ENTITY, ENTITIES, NOTATION 39 Attribute Modifiers ▪ #IMPLIED Attribute is not required ▪ #REQUIRED Attribute must be present ▪ #FIXED "value“ Attribute is present and always has this value ▪ Default value (applies to enumeration) 41 Limitations of DTDs DTD itself is not in XML format – more work for parsers Does not express data types (weak data typing). DTDs do not support data types like integers, decimals, booleans, dates, or enumerations DTDs do not allow one to specify that the data appear in a specific format. No namespace support Document can override external DTD definitions No DOM support XML Schema is intended to resolve these issues but … DTDs are going to be around for a while. An XML schema is an XML document that conforms to the XML Schema specification 42