Distributed Systems Exercise 01 PDF
Document Details
Uploaded by StatelyAgate7771
University of Zurich
2023
Bruno Rodrigues, Burkhard Stiller
Tags
Related
- Types of Distributed System PDF
- Cloud Computing and Distributed Systems Introduction PDF
- Availability vs Consistency PDF
- Primary-Replica vs Peer-to-Peer Replication (Trade-offs) PDF
- Netcentric Fundamentals (ITT501) Chapter 2 - Network Communication PDF
- Distributed Systems Multiple Choice Questions PDF
Summary
These are lecture notes and exercises on distributed systems, focusing on topics like distributed systems architecture, encoding in distributed systems (including formats like JSON, XML, Binary, ASN.1 and IPFS), unreliable communication systems, consistency, and consensus in distributed systems. The document is from the University of Zurich in Switzerland.
Full Transcript
This is background and exercise material, only the teaching slides’ content and the book's chapters refer to the valid class information. Distributed Systems Exercise 01 Dr. Bruno Rodrigues, Prof. Dr. Burkhard Stiller Department of Informatics IfI, Communication Systems Group CSG, University of Zur...
This is background and exercise material, only the teaching slides’ content and the book's chapters refer to the valid class information. Distributed Systems Exercise 01 Dr. Bruno Rodrigues, Prof. Dr. Burkhard Stiller Department of Informatics IfI, Communication Systems Group CSG, University of Zurich UZH [rodrigues|stiller]@ifi.uzh.ch © 2023 UZH, CSG@IfI 1 Outline Summary Lecture 03 E01 Questions Release E02 © 2023 UZH, CSG@IfI 2 Content DS Tour D‘Horizon on Distributed Systems and Encoding in Distributed Systems – Language-specific Formats – JSON, XML, Binary, ASN.1 – IPFS Unreliable Communication Systems – Faults – Unreliable Networks and Clocks – Knowledge, Truth, Lies Consistency and Consensus in Distributed Systems – Guarantees – Linearizability – Ordering © 2023 UZH, CSG@IfI 3 Summary DS Part 1 Tour D‘Horizon on Distributed Systems and Encoding in Distributed Systems © 2023 UZH, CSG@IfI 4 Distributed Systems Distributed Systems come in many flavors! – Computers connected by a network and – Spatially separated by any distance Definition: “A collection of independent computers that appears to its users as a single coherent system” – Hardware: All machines are fully autonomous – Software: Users think they deal with a single system Consequences – Concurrency – No global clock – Independent failures © 2023 UZH, CSG@IfI 5 Structured, Semi-structured, Unstructured Data (1) Cardoso, Jorge. "Developing Dynamic Packaging Applications Using Semantic Web-Based Integration." Semantic Web Technologies and EBusiness: Toward the Integrated Virtual Organization and Business Process Automation, edited by A.F. Salam and Jason Stevens, IGI Global, 2007, pp. 1-39. http://doi:10.4018/978-1-59904-192-6.ch001 © 2023 UZH, CSG@IfI 6 Structured, Semi-structured, Unstructured Data (2) Omitted © 2023 UZH, CSG@IfI 7 Image source: https://www.iunera.com/kraken/fabric/an-easyguide-to-structured-unstructured-and-semi-structured-data/ Encoding Advantages Save/persist the state of an object (write to file) – Efficient usage of storage (remove data redundancy) • Save the object state instead of recreating Transmit objects across a network – Efficient usage of bandwidth (remove data redundancy) • Useful for Remote Procedure Calls (RPC) Application Encoding © 2023 UZH, CSG@IfI Compressed Byte Stream Compressed Byte Stream Network Application Decoding Storage Storage Object Object 8 Encoding Drawbacks Encoding also raises concerns – Lock-in • Encoding methods are often tied to a particular programming language and reading the data in another language is difficult – Security • In order to restore data in the same object types, the decoding process needs to be able to instantiate arbitrary classes – Versioning • Encoding is difficult once it is intended for quick and easy access, neglecting problems of forward/backward compatibility – Efficiency • In terms of CPU usage is often an afterthought; Java’s built-in serialization is notoriously known for its bad performance © 2023 UZH, CSG@IfI 9 ASN.1 Defines the syntax of messages to be exchanged between peer applications independently of local representation, i.e, cross-platforms. Types, structure, (many) rules – Basic-ER, Packet-ER, XML-ER, Distinguished-ER Widely used in telecommunications Contact ::= SEQUENCE { name VisibleString, phone NumericString } ASN.1 Structure © 2023 UZH, CSG@IfI Basic Encoding Rules (BER) 30 19 80 0A 4A6F686E20536D697468 81 0B 3938372036353433323130 Packed Encoding Rules (PER) 0A 4A 6F 68 6E 20 53 6D 69 74 68 0B A9 80 76 54 32 10 XML Encoding Rules (XER) <?xml version="1.0" encoding="UTF-8"?> <Contact> <name>John Smith</name> <phone>987 6543210</phone> </Contact> JSON Encoding { "name" : "John Smith", "phone" : "987 6543210" } 10 Inter Planetary File System (IPFS) Definition – A protocol defining a decentralized storage and delivery network build upon peer-to-peer (p2p) and contentaddressed file-system principles Goal – Replace the centralized HTTP (HyperText Transfer Protocol) protocol as the backbone of the Web providing a more open, available, and faster Internet • Open: decentralization of services and content • Available: content is decentralized in the network • Faster: pages are loaded from multiple peers instead of a single host © 2023 UZH, CSG@IfI 11 Centralized, Distributed, Decentralized Centralized Distributed Decentralized 1 server – 1 organization N servers – 1 organization N servers – N organizations Simple web-pages hosted by a server Complex services hosted by distributed servers (typically cloud-based) © 2023 UZH, CSG@IfI 12 Peer-to-peer services Feature-based Comparison of HTTP and IPFS Features IPFS HTTP Architecture Decentralized Centralized Addressing Content-based whereas files are identified by a hash Location-based whereas hosts are identified by an IP address Bandwidth Higher as data is retrieved from multiple peers Lower since data retrieval’ relies on host’s upload bandwidth Availability Higher since data is decentralized across multiple peers Lower since data is stored in a single host Privacy Lower since data is decentralized across multiple unknown peers Higher since data is stored at known hosts Adoption Slower since it is a relatively new protocol Established and industrystandard protocol Costs Lower since every peer can host content Higher as a single host should provide the entire content/service © 2023 UZH, CSG@IfI 13 E01 Questions Solutions © 2023 UZH, CSG@IfI 14 E01 Questions 1 © 2023 UZH, CSG@IfI 15 E01 Questions 2 Encoding: - Data representation and conversion in a standard manner (including Serialization). - Applicable to more simpler data structures (strings, arrays) - Examples: Base64, ASCII, UTF-8 Serialization: - Save/persist the state of an object (write to file) - Transmit Objects across a network - Applicable to more complex data structures (trees, graphs, hash tables) - Examples: JSON, Protobuf © 2023 UZH, CSG@IfI 16 E01 Questions 3 • Serialization concerns a standardized approach to represent data for transport and storage. • Compression aims at reducing file size eliminating redundancies at data, which may not be present in serialization. • Encryption focus on confidentiality. Requires the usage of keys (PKI or shared keys) to make transmitted or stored content confidential. Can be combined with compression and serialization Interesting link on the differences of encoding, encryption, hashing, obfuscation: https://danielmiessler.com/study/encoding-encryption-hashing-obfuscation/ © 2023 UZH, CSG@IfI 17 E01 Questions 4 • A data schema describe the interconnection between data. Defining a schema that is fully compatible backward and forward is extremely relevant and difficult. • Applications evolve over time so as their data. • Hardware and software often changes, as well as the requirements that drove their initial versions. • It is positive to ensure that all previous versions of software will be supported in future upgrades. • Otherwise, it may be negative if changes are frequently made and (relatively) recent versions are not supported. © 2023 UZH, CSG@IfI 18 E01 Questions 5 structure Property JSON XML Protobuf CSV JSON: no support for comments, standardized and fast. Semi-structured in a well-defined format. { Brevity 7/10 3/10 9/10 type1:field, type2: field 9/10 } Human Readability 5/10 5/10 1/10 5/10 XML: human readable, but not concise excessive verbosity in its syntax. Semi-structured. <example> Language Support 9/10 9/10 7/10 <type1>field</type1> <type2>field</type2> 9/10 </example> Data Structure Support 6/10 9/10 8/10 3/10 CSV: not suited for objects or hash tables, and not well standardized. The fastest encoding format. type1,field type2,field Speed 9/10 8/10 9/10 10/10 Standardization 9/10 10/10 9/10 3/10 © 2023 UZH, CSG@IfI Protobuf: not human readable. Concise and fast. Need to declare a structure, then messages following that structure can be encoded/decoded. For example: message example { string type1= "field"; string type2= "field"; } 19 E01 Questions 6 Telecommunications is a widely diverse sector (i.e., many applications within), in which its infrastructure is also highly heterogeneous. Thus, the ITU-T proposed ASN.1 as a means to ensure compatibility between applications (and associated hardware) from different vendors. © 2023 UZH, CSG@IfI 20 E01 Questions 7 ASN.1 describes a standard to define data schemas. JSON is an encoding scheme that represents data within a schema. • There is no mutually exclusive relation between those, possible to combine both. Examples: https://asn1.io/dynamic/whyasn1.html © 2023 UZH, CSG@IfI 21 E01 Questions 8 • Ensure (back and forward) data compatibility. • A schema can be implemented in/represented by different encoding algorithms as long as the data structure remains compatible. © 2023 UZH, CSG@IfI 22 E01 Questions 9 Centralized Distributed Decentralized 1 server – 1 organization N servers – 1 organization N servers – N organizations A standardized schema is more relevant in decentralized systems, in which different organizations may have a different interpretation of data (i.e., different data schemas). For example, they can map differently the fields of a «contact». For example, organization A may require «name, phone, and address», and organization B may require only «ID and phone». Thus, the data schema of organizations A and B are not fully compatible. © 2023 UZH, CSG@IfI 23 E01 Questions 10 High availability of peers with the requested content, and a high performance to upload/serve the content. The file size is also relevant considering that larger files may present higher difference in performance being scattered through more nodes (in contrast to a centralized server). © 2023 UZH, CSG@IfI 24 E01 Questions 11 A centralized storage performs better for relatively small files. – No overhead additional overhead to distribute/partition a file across the network nodes Decentralized storage performs better for relatively large files – Large files are divided into fixed-size blocks and the content is distributed across the network. Multiple nodes can provide different blocks simulatenously © 2023 UZH, CSG@IfI 25 Release E02 Today, I promise © 2023 UZH, CSG@IfI 26 Questions? © 2023 UZH, CSG@IfI 27