Cloud Computing Architecture - Deployment Models
Document Details
Uploaded by RightfulCoralReef
Tags
Summary
This document describes different cloud computing deployment models, including public, private, hybrid, and community clouds. It details the characteristics, examples, and potential risks associated with each model, covering topics like workload relocation and security considerations.
Full Transcript
![](media/image2.png)**CLOUD COMPUTING** **ARCHITECTURE - Deployment Models** - Public Cloud - Private Cloud - Hybrid Cloud - Community Cloud 2 Public Cloud - Cloud infrastructure is provisioned for open use by the general public. It may be owned, managed, and operated by a bus...
![](media/image2.png)**CLOUD COMPUTING** **ARCHITECTURE - Deployment Models** - Public Cloud - Private Cloud - Hybrid Cloud - Community Cloud 2 Public Cloud - Cloud infrastructure is provisioned for open use by the general public. It may be owned, managed, and operated by a business, academic, or government organization, or some combination of them. It exists on the premises of the cloud provider. - ![](media/image3.jpeg)Examples of Public Cloud: - Google App Engine - Microsoft Windows Azure - IBM Smart Cloud - Amazon EC2 3 Public Cloud - ![](media/image2.png)In Public setting, the provider\'s computing and storage resources are 4 ![](media/image3.jpeg)Public Cloud - - In the public scenario, a provider may migrate a subscriber\'s workload, whether processing or data, at any time. - Workload can be transferred to data centres where cost is low - Workloads in a public cloud may be relocated anywhere at any time - - A single machine may be shared by the workloads of any combination of subscribers (a subscriber\'s workload may be co-resident with the workloads of competitors or adversaries) - Introduces both reliability and security risk 5 Public Cloud - ![](media/image2.png)Organizations considering the use of an on-site private cloud - - Subscribers connect to providers via the public Internet. - Connection depends on Internet's Infrastructure like - Domain Name System (DNS) servers - Router infrastructure, - Inter-router links 6 Public Cloud - ![](media/image2.png)**Limited visibility and control over data regarding security (public):** - The details of provider system operation are usually considered proprietary information and are not divulged to subscribers. - In many cases, the software employed by a provider is usually proprietary and not - A subscriber cannot verify that data has been completely deleted from a provider\'s systems. - **Elasticity: illusion of unlimited resource availability (public):** - Public clouds are generally unrestricted in their location or size. - Public clouds potentially have high degree of flexibility in the movement of subscriber workloads to correspond with available resources. 7 ![](media/image2.png)Public Cloud - - **Restrictive default service level agreements (public):** - The default service level agreements of public clouds specify 8 Private Cloud - ![](media/image6.jpeg)The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises. - Examples of Private Cloud: - Eucalyptus - Ubuntu Enterprise Cloud - UEC - Amazon VPC (Virtual Private Cloud) - VMware Cloud Infrastructure Suite - Microsoft ECI data center. 9 ![](media/image3.jpeg)Private Cloud - Contrary to popular belief, private cloud may exist off premises and can be managed by a third party. Thus, two private cloud scenarios exist, as follows: - On-site Private Cloud - Applies to private clouds implemented at a customer's premises. - Outsourced Private Cloud - Applies to private clouds where the server side is outsourced to a hosting company. 10 ![](media/image3.jpeg)On-site Private Cloud - The security perimeter extends around both the subscriber's on-site - ![](media/image7.jpeg)Security perimeter does not guarantees control over the private cloud's resources but subscriber can exercise control over the resources. 11 On-site Private Cloud - ![](media/image2.png)Organizations considering the use of an on-site private cloud should consider: - **Network dependency (on-site-private):** - **Subscribers still need IT skills (on-site-private):** - Subscriber organizations will need the traditional IT skills required to manage user devices that access the private cloud, and will require cloud IT skills as well. - **Workload locations are hidden from clients (on-site-private):** - To manage a cloud\'s hardware resources, a private cloud must be able to migrate workloads between machines without inconveniencing clients. With an on-site private cloud, however, a subscriber organization chooses the physical infrastructure, but individual clients still may not know where their workloads physically exist within the subscriber organization\'s infrastructure 12 On-site Private Cloud - - Workloads of different clients may reside concurrently on the same systems and local networks, separated only by access policies implemented by a cloud provider\'s software. A flaw in the software or the policies could compromise the security of a subscriber organization by exposing client workloads to one another - - On-demand bulk data import/export is limited by the on-site private cloud\'s network capacity, and real-time or critical processing may be problematic because of networking limitations. 13 ![](media/image3.jpeg)On-site Private Cloud - **Potentially strong security from external threats (on-site-private):** - In an on-site private cloud, a subscriber has the option of implementing an appropriately strong security perimeter to protect private cloud resources against external threats to the same level of security as can be achieved for non-cloud resources. - **Significant-to-high up-front costs to migrate into the cloud (on-site-private):** - An on-site private cloud requires that cloud management software be installed on computer systems within a subscriber organization. If the cloud is intended to support process-intensive or data-intensive workloads, the software will need to be installed on numerous commodity systems or on a more limited number of high-performance systems. Installing cloud software and managing the installations will incur significant 14 ![](media/image3.jpeg)On-site Private Cloud - - An on-site private cloud, at any specific time, has a fixed computing and storage capacity that has been sized to correspond to anticipated workloads and cost restrictions. 15 ![](media/image3.jpeg)Outsourced Private Cloud - Outsourced private cloud has two security perimeters, one implemented - Two security perimeters are joined by a protected communications link. - ![](media/image8.jpeg)The security of data and processing conducted in the outsourced private cloud depends on the strength and availability of both security perimeters and of the protected communication link. 16 Outsourced Private Cloud - ![](media/image2.png)Organizations considering the use of an outsourced private cloud should - - In the outsourced private scenario, subscribers may have an option to provision unique protected and reliable communication links with the provider. - - **Risks from multi-tenancy (outsourced-private):** - The implications are the same as those for an on-site private cloud. 17 Outsourced Private Cloud - - On-demand bulk data import/export is limited by the network capacity between a provider and subscriber, and real-time or critical processing may be problematic because of networking limitations. In the outsourced private cloud scenario, however, these limits may be adjusted, although not eliminated, by provisioning high-performance and/or high-reliability networking between the provider and subscriber. - - As with the on-site private cloud scenario, a variety of techniques exist to harden a security perimeter. The main difference with the outsourced private cloud is that the techniques need to be applied both to a subscriber\'s perimeter and provider\'s perimeter, and that the communications link needs to be protected. 18 ![](media/image3.jpeg)Outsourced Private Cloud - **Modest-to-significant up-front costs to migrate into the cloud (outsourced-** - In the outsourced private cloud scenario, the resources are provisioned by the provider - Main start-up costs for the subscriber relate to: - Negotiating the terms of the service level agreement (SLA) - Possibly upgrading the subscriber\'s network to connect to the outsourced private cloud - Switching from traditional applications to cloud-hosted applications, - Porting existing non-cloud operations to the cloud - Training 19 ![](media/image3.jpeg)Outsourced Private Cloud - - In the case of the outsourced private cloud, a subscriber can rent resources in any quantity offered by the provider. Provisioning and operating computing equipment at scale is a core competency of providers. 20 ![](media/image3.jpeg)Community Cloud - Cloud infrastructure is provisioned for exclusive use by a specific community of - ![](media/image9.jpeg)Examples of Community Cloud: - Google Apps for Government - Microsoft Government Community Cloud 21 On-site Community Cloud - ![](media/image2.png)Community cloud is made up of a set of participant organizations. Each participant - At least one organization must provide cloud services - Each organization implements a security perimeter 22 ![](media/image3.jpeg)On-site Community Cloud - The participant organizations are connected via links between the - Access policy of a community cloud may be complex - Ex. :if there are N community members, a decision must be made, either implicitly or explicitly, on how to share a member\'s local cloud resources with each of the other members - Policy specification techniques like role-based access control (RBAC), 23 ![](media/image3.jpeg)On-site Community Cloud - Organizations considering the use of an on-site community cloud should consider: - **Network Dependency (on-site community):** - The subscribers in an on-site community cloud need to either provision controlled inter-site communication links or use cryptography over a less controlled communications media (such as the public Internet). - The reliability and security of the community cloud depends on 24 ![](media/image3.jpeg)On-site Community Cloud - - Organizations in the community that provides cloud resources, requires IT skills similar to those required for the on-site private cloud scenario except that the overall cloud configuration may be more complex and hence require a higher skill level. - Identity and access control configurations among the participant organizations may be complex - 25 ![](media/image2.png)On-site Community Cloud - - The communication links between the various participant organizations in a community cloud can be provisioned to various levels of performance, security and reliability, based on the needs of the participant organizations. The network-based limitations are thus similar to those of the outsourced-private cloud scenario. - - The security of a community cloud from external threats depends on the security of all the security perimeters of the participant organizations and the strength of the communications links. These dependencies are essentially similar to those of the outsourced private cloud scenario, but with possibly more links and security perimeters. 26 ![](media/image2.png)On-site Community Cloud - - The up-front costs of an on-site community cloud for a participant organization depend greatly on whether the organization plans to consume cloud services only or also to provide cloud services. For a participant organization that intends to provide cloud services within the community cloud, the costs appear to be similar to those for the on-site private cloud scenario (i.e., significant-to- high). 27 Outsourced Community Cloud ![](media/image11.jpeg) 28 Outsourced Community Cloud - ![](media/image2.png)Organizations considering the use of an on-site community cloud - - The network dependency of the outsourced community cloud is similar to that of the outsourced private cloud. The primary difference is that multiple protected communications links are likely from the community members to the provider\'s facility. - - Same as the outsourced private cloud 29 Outsourced Community Cloud - - Same as the on-site community cloud - - Same as outsourced private cloud - - Same as the on-site community cloud - - Same as outsourced private cloud 30 ![](media/image3.jpeg)Outsourced Community Cloud - - Same as outsourced private cloud 31 ![](media/image3.jpeg)Hybrid Cloud - The cloud infrastructure is a composition of two or more distinct cloud - Examples of Hybrid Cloud: - Windows Azure (capable of Hybrid Cloud) - VMware vCloud (Hybrid Cloud Services) 32 ![](media/image3.jpeg)Hybrid Cloud - A hybrid cloud is composed of two or more private, community, or public - ![](media/image13.jpeg)They have significant variations in performance, reliability, and security properties depending upon the type of cloud chosen to build hybrid cloud. 33 ![](media/image2.png)Hybrid Cloud - A hybrid cloud can be extremely complex - A hybrid cloud may change over time with constituent clouds joining and leaving. 34 35 ![](media/image1.jpeg) **CLOUD COMPUTING** Virtualization ![](media/image3.jpeg)IaaS -- Infrastructure as a Service ========================================================= - What does a subscriber get? - Access to virtual computers, network-accessible storage, network infrastructure components such as firewalls, and configuration services. - How are usage fees calculated? - Typically, per CPU hour, data GB stored per hour, network bandwidth consumed, network infrastructure used (e.g., IP addresses) per hour, value-added services used (e.g., monitoring, automatic scaling) 2 #### ![](media/image3.jpeg)IaaS Provider/Subscriber Interaction Dynamics - The provider has a number of available virtual machines - Client A has access to vm1 and vm2, Client B has access to vm3 and Client C has access to vm4, vm5 and vm6 - ![](media/image15.jpeg)Provider retains only vm7 through vmN 3 #### IaaS Component Stack and Scope of Control - ![](media/image2.png)IaaS component stack comprises of hardware, operating system, - Operating system layer is split into two layers. - Lower (and more privileged) layer is occupied by the Virtual Machine Monitor (VMM), which is also called the Hypervisor - Higher layer is occupied by an operating system running within a VM called a guest operating system 4 #### ![](media/image3.jpeg)IaaS Component Stack and Scope of Control - In IaaS Cloud provider maintains total control over the physical hardware and administrative control over the hypervisor layer - Subscriber controls the Guest OS, Middleware and Applications layers. - Subscriber is free (using the provider\'s utilities) to load any supported operating system software desired into the VM. - Subscriber typically maintains complete control over the operation of the guest operating system in each VM. 5 #### ![](media/image3.jpeg)IaaS Component Stack and Scope of Control - A hypervisor uses the hardware to synthesize one or more Virtual Machines (VMs); each VM is \"an efficient, isolated duplicate of a real machine\". - Subscriber rents access to a VM, the VM appears to the subscriber as actual computer hardware that can be administered (e.g., powered on/off, peripherals configured) via commands sent over a network to the provider. 6 ![](media/image3.jpeg)IaaS Cloud Architecture ============================================= - Logical view of IaaS cloud structure and operation 7 ![](media/image3.jpeg)IaaS Cloud Architecture ============================================= - Three-level hierarchy of components in IaaS cloud systems - *Top level* is responsible for *central control* - *Middle level* is responsible for *management of possibly large computer clusters* that may be *geographically distant* from one another - *Bottom level* is responsible for *running the host computer systems* on which virtual machines are created. - Subscriber queries and commands generally flow into the system at the top and are forwarded down through the layers that either answer the queries or execute the commands 8 ### ![](media/image3.jpeg)IaaS Cloud Architecture - Cluster Manager can be geographically distributed - Within a cluster manger computer manger is connected via high speed network. 9 ### ![](media/image3.jpeg)Operation of the Cloud Manager - Cloud Manager is the public access point to the cloud where subscribers sign up for accounts, manage the resources they rent from the cloud, and access data stored in the cloud. - Cloud Manager has mechanism for: - Authenticating subscribers - Generating or validating access credentials that subscriber uses when communicating with VMs. - Top-level resource management. - For a subscriber's request cloud manager determines if the cloud 10 ### ![](media/image3.jpeg)Data Object Storage (DOS) - DOS generally stores the subscriber's metadata like user credentials, operating system images. - DOS service is (usually) single for a cloud. 11 ### ![](media/image3.jpeg)Operation of the Cluster Managers - Each *Cluster Manager* is responsible for the operation of a collection of computers that are connected via high speed local area networks - *Cluster Manager* receives resource allocation commands and queries from the *Cloud Manager*, and calculates whether part or all of a command can be satisfied using the resources of the computers in the cluster. - *Cluster Manager* queries the *Computer Managers* for the computers in the cluster to determine resource availability, and returns messages to the *Cloud Manager* 12 ### ![](media/image3.jpeg)Operation of the Cluster Managers - Directed by the Cloud Manager, a Cluster Manager then instructs the Computer Managers to perform resource allocation, and reconfigures the virtual network infrastructure to give the subscriber uniform access. - Each Cluster Manager is connected to Persistent Local Storage (PLS) - PLS provide persistent disk-like storage to Virtual Machine 13 ### ![](media/image3.jpeg)Operation of the Computer Managers - At the lowest level in the hierarchy computer manger runs on each computer system and uses the concept of virtualization to provide Virtual Machines to subscribers - Computer Manger maintains status information including how many virtual machines are running and how many can still be started - Computer Manager uses the command interface of its hypervisor to start, stop, suspend, and reconfigure virtual machines 14 ### Virtualization {#virtualization-1} - Virtualization is a broad term (virtual memory, storage, network, etc) - ![](media/image2.png)Focus: **Platform virtualization** - Virtualization basically allows one computer to do the job of multiple computers, by sharing the resources of a single hardware across multiple environments 15 ![](media/image3.jpeg)Virtualization {#virtualization-2} ==================================== - Virtualization is way to run **multiple operating systems** and **user applications** on the - E.g., run both Windows and Linux on the same laptop - How is it different from **dual-boot**? - Both OSes run **simultaneously** - The OSes are completely **isolated** from each other ![](media/image35.jpeg) 16 - *Equivalence*: The VM should be indistinguishable from the underlying hardware. - *Resource control*: The VM should be in complete control of any virtualized resources. - *Efficiency*: Most VM instructions should be executed directly on the underlying CPU without involving the hypervisor. 17 - **privileged** instructions, which cause a trap if executed in user mode, and - **sensitive** instructions, which change the underlying resources (e.g. doing I/O or changing the page tables) or observe information that indicates the current privilege level (thus exposing the fact that the guest OS is not running on the bare hardware). - The former class of sensitive instructions are called **control sensitive** and the latter **behavior sensitive** in the paper, but the distinction is not particularly important. 18 #### ![](media/image37.jpeg)VMM and VM ![](media/image3.jpeg) - For any conventional third generation computer, a VMM may be constructed if the set of sensitive instructions for that computer is a subset of the set of privileged instructions - A conventional third generation computer is recursively virtualizable if it is virtualizable and a VMM without any timing dependencies can be constructed for it. 19 ![](media/image3.jpeg) Approaches to Server Virtualization =================================== 20 - 1^st^ Generation: Full - Software Based - VMware and Microsoft - 2^nd^ Generation: - 3^rd^ Generation: Silicon- - Unmodified guest - VMware and Xen on virtualization-aware hardware platforms ![](media/image39.png) Virtualization Logic 21 ### ![](media/image3.jpeg)Full Virtualization - ![](media/image2.png)1^st^ Generation offering of x86/x64 server virtualization - Dynamic binary translation - Emulation layer talks to an operating system which - Guest OS doesn\'t see that it is used in an emulated environment - All of the hardware is emulated including the CPU - Two popular open source emulators are QEMU and Bochs 22 ### Full Virtualization - Advantages - ![](media/image2.png)Emulation layer - Isolates VMs from the host OS and from each other - Controls individual VM access to system resources, preventing an unstable VM from impacting system performance - Total VM portability - By emulating a consistent set of system hardware, VMs have the ability to transparently move between hosts with dissimilar hardware without any problems - It is possible to run an operating system that was developed for another - A VM running on a Dell server can be relocated to a Hewlett-Packard server 23 ###### Full Virtualization - Drawbacks - Hardware emulation comes with a performance price - In traditional x86 architectures, OS kernels expect to run privileged code in Ring 0 - However, because Ring 0 is controlled by the host OS, VMs are forced to execute - Due to these performance limitations, para-virtualization and hardware- 24 ### ![](media/image48.png)Para-Virtualization - Guest OS is modified and thus run kernel-level operations at Ring 1 (or 3) - Guest is fully aware of how to process privileged instructions - Privileged instruction translation by the VMM is no longer necessary - Guest operating system uses a specialized API to talk to the VMM and, in this way, execute the privileged instructions - VMM is responsible for handling the virtualization Server virtualization approaches ![](media/image3.jpeg) 25 Para-Virtualization =================== - ![](media/image2.png)Today, VM guest operating systems are para-virtualized using two different approaches: - ***Recompiling the OS kernel*** - Para-virtualization drivers and APIs must reside in the guest operating system kernel - You do need a modified operating system that includes this specific API, requiring a compiling operating systems to be virtualization aware - ***Installing para-virtualized drivers*** - In some operating systems it is not possible to use complete para-virtualization, as it requires a specialized version of the operating system - To ensure good performance in such environments, para-virtualization can be applied for individual devices - For example, the instructions generated by network boards or graphical interface cards can be modified before they leave the virtualized machine by using para-virtualized drivers 26 Server virtualization approaches ### ![](media/image48.png)Hardware-assisted virtualization - Guest OS runs at ring 0 - VMM uses processor extensions (such as Intel®- VT or AMD-V) to intercept and emulate privileged operations in the guest - Hardware-assisted virtualization removes many of the problems that make writing a VMM a challenge - VMM runs in a more privileged ring than 0, a 27 ![](media/image3.jpeg)Server virtualization approaches Hardware-assisted virtualization ================================ - **Pros** - It allows to run unmodified OSs (so legacy OS can be run without - **Cons** - Speed and Flexibility - An unmodified OS does not know it is running in a virtualized environment and so, it can't take advantage of any of the virtualization features 28 #### ![](media/image3.jpeg)Network Virtualization ![](media/image66.png) 29 ![](media/image3.jpeg)Why Virtualize ? -------------------------------------- - Internet is *almost* "paralyzed" - Lots of makeshift solutions (e.g. overlays) - A new architecture (aka clean-slate) is needed - Hard to come up with a *one-size-fits-all* architecture - Almost impossible to predict what future might unleash - Why not create an *all-sizes-fit-into-one* instead! - Open and expandable architecture - Testbed for future networking architectures and protocols 30 ![](media/image3.jpeg)Related Concepts -------------------------------------- - Virtual Private Networks (VPN) - Virtual network connecting distributed sites - Not customizable enough - Active and Programmable Networks - Customized network functionalities - Programmable interfaces and active codes - Overlay Networks - Application layer virtual networks - Not flexible enough 31 ![](media/image3.jpeg)Network Virtualization Model -------------------------------------------------- - Business Model - Architecture - Design Principles - Design Goals 32 ![](media/image3.jpeg)Business Model ------------------------------------ - Infrastructure Providers (*InPs*) - Manage underlying physical networks - Service Providers (*SPs*) - Create and manage virtual networks - Deploy customized end-to-end services - End Users - Buy and use services from different service providers - Brokers - Mediators/Arbiters 33 ![](media/image3.jpeg)Architecture ---------------------------------- 34 ![](media/image96.png)Design Principles --------------------------------------- - *[Concurrence]* of multiple - *[Recursion]* of virtual networks - *[Inheritance]* of architectural attributes - *[Revisitation]* of virtual nodes \ Hierarchy of Roles ![](media/image3.jpeg) 35 - Flexibility Design Goals (1) ---------------- - ![](media/image2.png)Service providers can choose - arbitrary network topology, - routing and forwarding functionalities, - customized control and data planes - No need for co-ordination with others - IPv6 fiasco should never happen again - Manageability - Clear separation of policy from mechanism - Defined *accountability* of infrastructure and service providers - Modular management 36 - Scalability Design Goals (2) ---------------- - ![](media/image2.png)Maximize the number of co-existing virtual networks - Increase resource utilization and amortize CAPEX and OPEX - Security, Privacy, and Isolation - Complete isolation between virtual networks - *Logical* and *resource* - Isolate faults, bugs, and misconfigurations - Secured and private 37 - Programmability Design Goals (3) ---------------- - ![](media/image2.png)Of network elements e.g. routers - Answer *"How much"* and "*how"* - Easy and effective without being vulnerable to threats - Heterogeneity - Networking technologies - Optical, sensor, wireless etc. - Virtual networks 38 Design Goals (4) ---------------- - ![](media/image2.png)Experimental and Deployment Facility - PlanetLab, GENI, VINI - Directly deploy services in real world from the testing phase - Legacy Support - Consider the existing Internet as a member of the collection of multiple virtual Internets - Very *important* to keep all concerned parties satisfied 39 Definition ---------- 40 ![](media/image3.jpeg)Typical Approach -------------------------------------- - Networking technology - IP, ATM - Layer of virtualization - Architectural domain - Network resource management, Spawning networks - Level of virtualization - Node virtualization, Full virtualization 41 ![](media/image3.jpeg) 42 #### ![](media/image2.png)Introduction to XML: ##### eXtensible Markup Language 1 ### XML ?? - ![](media/image2.png)Over time, the acronym "XML" has evolved to imply a growing family of - How XML data can be represented and processed - application frameworks (tools, dialects) based on XML - Most "popular" XML discussion refers to this latter meaning - We'll talk about both. #### Presentation Outline - ![](media/image2.png)What is XML (basic introduction) - Language rules, basic XML processing - Defining language dialects - DTDs, schemas, and namespaces - XML processing - Parsers and parser interfaces - XML-based processing tools - XML messaging - Why, and some issues/example - Conclusions #### What is XML? - ![](media/image2.png)***A syntax*** for "encoding" text-based data (words, phrases, numbers, \...) - ***A text-based syntax.*** XML is written using ***printable Unicode*** characters (no explicit - ***Extensible***. XML lets you define your own ***elements*** (essentially ***data types***), within the constraints of the syntax rules - ***Universal format***. The syntax rules ensure that all XML processing software ***MUST*** ![](media/image3.jpeg) ###### Example Revisited quantity element **\ - There are explicit syntax rules for DTD content \-- well-formed XML must be correct here also. - The parser then replaces every occurrence of an ***entity reference*** by the referenced entity (and does so recursively within entities) - The "resolved" data object is then made available to the XML application XML Processing Rules: External Entities ***Location** given* - The parser processes the DTD content, identifies the external entities, and "tries" to - The parser then replaces every occurrence of an ***entity reference*** by the referenced entity, and does so recursively within all those entities, (like with internal entities) - But.... what if the parser can't find the external entity (firewall?)? - That depends on the application / parser type - There are ***two types of XML parsers*** - one that MUST retrieve all entities, and one that can ignore them (if it can't find them) - **Validating parser** - ![](media/image2.png)***Must*** retrieve all entities and must process ***all*** DTD content. Will stop processing and indicate a failure if it cannot - There is also the implication that it will test for compatibility with other things in the DTD \-- instructions that define syntactic rules for the document (allowed elements, attributes, etc.). We'll talk about these parts in the next section. - **Non-validating parser** - Will try to retrieve all entities defined in the DTD, but will ***cease processing the DTD*** content at the first entity it can't find, But this is not an error \-- the parser simply makes available the XML data (and the names of any unresolved entities) to the application. **DTD** ![](media/image3.jpeg) #### Special Issues: Characters and Charsets - ![](media/image2.png)XML specification defines what characters can be used as whitespace in tags: \ - ***You cannot use EBCIDIC character 'NEL' as whitespace*** - Must make sure to not do so! - What if you want to include characters not defined in the encoding charset (e.g., Greek characters in an ISO-Latin-1 document): - Use ***character references***. For example: - Also, binary data must be encoded as ***printable characters*** ### Presentation Outline - ![](media/image2.png)What is XML (basic introduction) - Language rules, basic XML processing - Defining language dialects - DTDs, schemas, and namespaces - XML processing - Parsers and parser interfaces - XML-based processing tools - XML messaging - Why, and some issues/example - Conclusions #### How do you define language dialects? - ![](media/image2.png)Two ways of doing so: - **XML Document Type Declaration (DTD)** \-- Part of core XML spec. - **XML Schema** \-- New XML specification (2001), which allows for stronger constraints on XML documents. - Adding dialect specifications implies ***two classes*** of XML data: - ***Well-formed*** An XML document that is syntactically correct - ***Valid*** An XML document that is both well-formed *and* - What DTDs and/or schema specify: - Allowed element and attribute names, hierarchical nesting rules; element content/type restrictions - Schemas are more powerful than DTDs. They are often used for ***type validation***, or for relating database schemas to XML models #### Example DTD (as part of document) #### ![](media/image3.jpeg)Example "External" DTD - Reference is using a variation on the DOCTYPE: - Of course, the DTD file must be there, and accessible. ![](media/image3.jpeg) 23 #### ![](media/image2.png)Introduction to XML: ##### eXtensible Markup Language 1 ### XML Schemas - ![](media/image2.png)A new specification (2001) for specifying validation rules for XML - Uses ***pure XML*** (no special DTD grammar) to do this. - Schemas are more powerful than DTDs - can specify things like integer types, date - They are often used for ***type validation***, or for relating database schemas to XML models - They don't, however, let you declare entities \-- those can only be done in DTDs. - The following slide shows the XML schema equivalent to our DTD #### XML Schema version of our DTD (Portion) #### ![](media/image3.jpeg)XML Namespaces - Mechanism for identifying different "spaces" for XML names - That is, ***element*** or ***attribute*** names - This is a way of identifying different ***language dialects***, consisting of names that have specific semantic (and processing) meanings. - Thus \ - SAX Simple API to XML (event-based) - DOM Document Object Model (object/tree based) - JDOM Java Document Object Model (object/tree based) - Lots of XML parsers and interface software available (Unix, Windows, OS/390 or Z/OS, etc.) - SAX-based parsers are fast (often as fast as you can stream data) - DOM slower, more memory intensive (create in-memory version of entire document) - And, validating can be ***much slower*** than non-validating #### ![](media/image3.jpeg)XML Processing: SAX A. **SAX: Simple API for XML** -