Message Brokers: Concepts, Benefits, and Implementation | PDF
Document Details
data:image/s3,"s3://crabby-images/7bec1/7bec19077fcf7aa52ca16870c6acf3e15691ff9c" alt="InsightfulJaguar9662"
Uploaded by InsightfulJaguar9662
Yuval Wilf
Tags
Summary
This document is a module on message brokers, discussing messaging concepts, loosely coupled applications, and enterprise messaging systems. It covers topics such as JMS (Java Message Service), AMQP (Advanced Message Queuing Protocol), Kafka and RabbitMQ, the benefits, and the problems associated with message brokers. It is suitable for undergraduate level students wanting to improve their knowledge of architecture.
Full Transcript
Module 4 Message Brokers Copyright © Yuval Wilf Architecture 1 Agenda 1. General Messaging concepts 2. Message Brokers 3. SOA architecture 4. Messaging Systems 1. JMS – Java Message Service 2. AMQP – Advanced Message Queueing Protocol 3. Bi...
Module 4 Message Brokers Copyright © Yuval Wilf Architecture 1 Agenda 1. General Messaging concepts 2. Message Brokers 3. SOA architecture 4. Messaging Systems 1. JMS – Java Message Service 2. AMQP – Advanced Message Queueing Protocol 3. Big Data message broker Architecture 2 What is Messaging ? Messaging is a method of communication between software components or applications A message is just arbitrary data that we want to exchange between processes A broker is a piece of software that brokers messages A messaging system is a peer-to-peer facility A messaging client can send messages to, and receive messages from, any other client Each client connects to a messaging server that provides facilities for creating, sending, and receiving messages A message can be in many formats, most common JSON Messages have structure – headers, body Architecture 3 Distributed Systems – Synchronous Programming 1. LPC vs. RCP Local Procedure Call – same process, may be between threads Remote Procedure Call – between processes, can be deployed on different servers RPC is not LPC with longer wire 2. Today all most all applications are distributed – think on cloud native characteristics 3. Synchronous programming relay on applications to be connected – some form of RPC 4. RPC is unreliable Architecture 4 Loosely and Tightly Coupled Applications Messaging enables distributed communication that is loosely coupled Application Application Application Application Application A B B A H Client Client Client RPC RPC Client/Server Client/Server API API API Requires Application Application N * (N-1)/2 C G Message Connections Client Client Server\Broker API API Application Application D Application Application Application C D E F RPC RPC Client Client Client Client/Server Client/Server API API API Tightly coupled Loosely coupled Architecture 5 Messaging – General Terms Enterprise Messaging Systems Message oriented middleware (MOM) provides: A mechanism for integrating applications in a loosely coupled manner Asynchronous delivery of data between applications, assured delivery of messages* MOM relieves application programmers from knowing the details of RPC and networking/ communications protocols Application A Application B Message Messaging Client Messaging Client Oriented Messaging Send Middleware Send Messaging API Receive Receive API Architecture 6 Messaging Benefits and Problems 1. Benefits a) Scale – many publishers may have many consumers b) Failure – Can publish to a broker – consumers can un-available c) Retry – message delivery can be re-tired 2. Problems a) Disconnected – no automatic way to know if a message is processed b) No easy correlation mechanism between messages c) Harder to get response Architecture 7 Messaging – JMS Concept Publish/Subscribe Domain – 1 To Many The destination in Pub/Sub domain is called Topic Each message may have multiple consumers After clients subscribe to topics they can receive messages Subject to Quality of Service, Connectivity and Selection Subscriber(s) Subscribe 1 Client 1 2 M M M Subscriber Topic Publisher Subscribe Publish Client 2 Client 1 M M M M M M M M Subscriber Publisher M M M M Client 3 Subscribe Subscriber Architecture 8 Messaging – JMS Concept Point-to-Point – 1 To 1 The Destination in PTP domain is called a Queue Each message is addressed to a specific queue One client will obtain the message sequentially in FIFO nature Queues retain all messages sent to them until Messages are consumed Messages will expire Receives Consumer(s) M1 Client 1 producer Queue Ack Receiver Sends Client 1 M1 M3 M2 M1 Receives Sender M2 Client 2 Receiver Ack Architecture 9 Messaging – JMS Implementation Example JMS Message Types Message Type Message Body TextMessage Standard java.lang.String MapMessage A set of name/value pairs, with names as Strings, Values as primitive types The entries can be accessed sequentially by enumerator or randomly by name The order of the entries is undefined BytesMessage A stream of un-interpreted bytes StreamMessage A stream of primitive values Filled and read sequentially. ObjectMessage A Serializable Java object Message Empty, No body, composed of header fields and properties only Architecture 10 Messaging – JMS Implementation Example Consuming Messages in Queue and Topic Asynchronously – registering a message listener with a consumer Synchronously - Explicit retrieval of a message from a destination, blocking consumer producer JMS / Queue Sends Receives M Client 2 Client M M M onMessage(Message) QueueAsynchReceiver QueueMessageSender Receives M Client 3 receive() QueueSynchReceiver Architecture 11 Messaging – JMS Implementation Example JMS Transactions effects on Producers On commit() – the JMS Provider sends the set of staged messages On rollback() – the broker disposes the set of staged messages JMS Broker On commit() Producer Before commit() M M M M M M Transacted M M M Consumer Session M M M M M M M On rollback() Architecture 12 Messaging – JMS Implementation Example JMS Transactions effects on Consumers On commit() – the JMS Provider disposes the set of staged messages On rollback() – the broker resends the set of staged messages JMS Broker On rollback() Consumer M M M M M M M M M M M M M M M On commit() M M M M M Architecture 13 Once upon a time… Service Oriented Architecture - SOA This is architectural style that focuses on discrete services instead of monolithic design Evolution of distributed computing based on request-reply paradigm for synchronous & asynchronous applications Application business logic & functions are modularized and presented as services for consumer/client applications Flexible method of structuring and managing heterogeneous component-based systems Architecture 14 Once upon a time… Characteristics of SOA Services Loosely-coupled Services have platform independent, self describing interfaces (XML) Messages are formally defined (XSD) Services can be discovered (UDDI) QoS - quality of service characteristics defined in policies (security, authentication, authorization, reliability, etc.) Services can be provided on any platform Composable Interoperable SOA Re-Usable Loosely Coupled Architecture 15 Once upon a time… Anatomy of a Service New Service Service Consumer Wrapped Legacy Interface Proxy Composite Service Service Service Interface Implementation Architecture 16 Once upon a time… SOA Evolution (1) – Plate of Spaghetti Heterogeneous systems with proprietary interfaces Application Screen Scrape Download Application Message File Application Queue Screen Application Scrape Sockets Screen Transaction Transactio Scrape File n File Application Sockets Download CICS Gateway RPC File ORB APPC Application Message ORB Application Application Message Transaction Queue File Application Message Queue CICS Gateway Transaction Screen File Scrape APPC Application Message Download RPC File Source: Gartner Architecture 17 Once upon a time… SOA Evolution (2) – Service Based Integration Heterogeneous systems sharing data and functionality using a standard message format Legacy Systems “wrapped” by a service At the time SOA was common – SOAP was the protocol that was used Messaging - SOAP - Simple Object Access Protocol Application A SOAP/HTTP Application B Platform & language independent, payload is Service Consumers Service Providers HTTP(S) Description - WSDL - Web Services Description SOAP/HTTP SOAP/HTTP DB Language Application C Application D Interface - protocol bindings , deployment Service Consumers Service Providers details of Web services with XML Discovery Legacy UDDI - Universal Description, Discovery and Integration Acts as a directory for storing information about web services Architecture 18 Once upon a time… SOA Evolution (3)–Service Oriented Architecture Dynamic discovery Security Management and Governance Service SOAP/HTTP Service Consumers Providers SOAP/HTTP Service SOAP/HTTP Brokers /ESB Transformation Security Registry Policies Monitoring Architecture 19 Once upon a time… SOA Evolution (4)–Business Oriented SOA Business Process Management Just-in-time integration B2B on demand Service SOAP/HTTP Service Consumers Providers SOAP/HTTP Service SOAP/HTTP Business Process Brokers Business Rules Flows /ESB Adaptors Transformation Security Registry Policies Monitoring Architecture 20 Message Broker Types 1. JMS Implementations Common API but no common protocol Examples: 2. AMQP Advanced Message Queuing Protocol Common Protocol, no common API Example - 3. Big Data Message Brokers Support the 3 V’s Example - Architecture 21 Rabbit MQ - 1. Open Source 2. Written in Erlang – Ericson language – it is highly scalable and robust language 3. Supports AMQP – standard messaging protocol Architecture 22 Rabbit MQ – Message Broker Build around 3 primary items: Exchange, Binding, Queue Queue Consumer Exchanges Queue Consumer Publisher Exchanges Queue Consumer Broker Publishers send messages to exchanges, exchanges are bound to queues Consumers consume from queues Architecture 23 Rabbit Exchanges There are 4 different exchange types that deal with messages differently 1. Direct – the message is routed to queues whose binding key match the message key 2. Fan out – transmits messages to all of the queues bound to it – it is similar to? 3. Topic – match the routing key and the routing pattern specified in the binding 4. Headers – use message header attributes for routing Queue Exchanges Queue Queue Exchanges Queue Broker Architecture 24 Rabbit Queues & Bindings 1. Queues can have different properties Name, Durability, Exclusivity, Delete semantics, Other arguments 2. Bindings – queues are bound to exchanges via a routing key 1. Queues are used by consumers 2. Exchanges are used by publishers (consumers do not care about exchanges) 3. Binding specify 3 things: 1. The exchange to bind to Queue 2. The Queue that we want to ding to that exchange Exchanges Queue 3. Routing key Queue 3. Different exchange types process bindings differently Exchanges Queue In example – fan out exchange ignores the key Broker 4. Bindings and exchanges give a lot of flexibility Architecture 25 Rabbit Management Architecture 26 Rabbit Management Architecture 27 Direct Exchange – messages go direct to a queue 1. When we send messages to an exchange we give it routing key which tells the message which queue to go to 2. Direct exchange a) The routing key is mapped directly onto a single queue b) There is no wild card 3. Direct exchange is also the default exchange – the RK is the queue name in this case Queue Consumer M M Any queue M M M Publisher Exchanges matching RK will Consumer M get the message Queue Consumer Broker Architecture 28 Rabbit API 1. ConnectionFactory – creates connections 2. Connection - manage single connection 3. Channel – logical connection to Rabbit 4. exchangeDeclare – Declare a new exchange 5. queueDeclare – Declare a new queue (with or without name) 6. queueBind – Bind a queue to an exchange 7. basicPublish – publish a message 8. basicConsume – Consume a message – uses callback Architecture 29 Publishing messages to RabbitMQ What exchange is that? Architecture 30 Messages in Rabbit Queue Architecture 31 Messages in Rabbit What is this IP? Architecture 32 Consuming messages in Rabbit Architecture 33 Rabbit – Message ack and durability Message acknowledgment 1. Is send by the consumer to signal the broker that it can delete it 2. If the connection is closed – broker will re-queue it for other consumers 3. There is a default timeout of 30 min for message ack from consumers 4. There is also manual ack by consumers Message durability 1. In case broker is down – messages can be lost 2. Broker can persists message in case of crashes, we need to mark the queue and message as durable 1. The queue declare needs to be both in the publisher and consumer 2. Marking messages as persistent – delivery mode Architecture 34 Publish and Consume durable messages with ack Architecture 35 Publishing Durable Messages with Ack Durable Architecture 36 Consuming Durable Messages with Ack Architecture 37 Consuming Durable Messages with Ack Architecture 38 Fan out Exchange – Deliver to multiple consumers Fanout exchange broadcasts all the messages it receives to all the queues it knows Commonly known as publish/subscribe – pub/sub No routing – all consumers get the message Need to bind any queue to the exchange Queue Consumer M M M M M Publisher Exchanges M M Queue Consumer M M Queue Consumer Broker Architecture 39 Publish - Fanout What queue? Architecture 40 Consuming from Fanout Exchange Why? Architecture 41 Auto- generated Queue Architecture 42 RabbitMQ vs. JMS Providers 1. JMS supports two models - Queue – 1 : 1, publish/subscribe – 1:many 2. RabbitMQ supports the AMQP model which has 4 models : direct, fanout, topic, headers 3. Data types - JMS supports 5 different data types but RabbitMQ supports only the binary data type 4. In AMQP, producers send to the exchange then the queue, but in JMS, producers send to the queue or topic directly. 5. JMS is specific for java users only, but RabbitMQ supports many technologies. Architecture 43 What is a Distributed System? 1. Distributed system is a collection of resources/independent components located on different machines that are instructed to achieve a specific goal or function 2. Consists of multiple workers or nodes Example: K8S, Big Data, Kafka etc Why multiple nodes? (to spread the work around) 3. The system of nodes require coordination to ensure consistency and progress toward a goal 4. Coordination is not possible without proper communication between all components within the system Architecture 44 Controller – Roles and Responsibilities of Components 1. There is an hierarchy – at the top – controller or supervisor Is a worker node that is selected amongst its peers – to officiate administrative role Usually the controller that is selected, is the one that been around the longest Has some critical responsibilities, maintain the following: 1. An inventory of what workers are available to take on work 2. A list of work items that has been committed to and assigned to workers, 3. Active status of the staff and their progress on assigned tasks. 2. Once the controller is established, and workers are assigned and available – a team is formed and work can now be distributed and executed Architecture 45 Controller in a distributed cluster Attendance Jobs Status 1. In Kafka for example – this team formation is the cluster, its members are brokers that assigned themselves to a designated controller within the cluster Worker A Controller 2. When a task comes in – the controller has to make a decision which workers should take it Worker B Worker C Who is available and in good health? What risk policy should be governed in this assignment (i.e. – redundancy level, how many Worker D Worker E Worker F replicas) 3. If work is assigned to a worker, and that worker becomes un-available – the work is not lost Architecture 46 Distributed Systems – Reliability – Step 1 – Choosing a Leader 1. Each task, given to a worker, must be also Attendance Jobs Status given to at least one of the worker’s peers in the event of failure tasks 2. The controller determines redundancy level Worker A Redundancy level == 3 Replicas a) It will promote a worker to a leader that will Controller take direct ownership of the tasks assigned Worker B Worker C b) The leader will recruit two of its peers to Leader Leader make part in the replication c) In Kafka for example – risk policy = Worker D Worker E Worker F replication factor Peer Peer Peer Architecture 47 Distributed Systems – Reliability - Step 2 – Having a Quorum 1. Once peers have committed to a leader – a Attendance Jobs Status quorum is formed tasks 2. The committed peers – become followers Worker A 3. If a leader cannot get a quorum – the controller Controller will assign task to leaders that can Worker B Worker C Quorum Leader Leader 4. In Kafka – the work that cluster of brokers perform is receiving messages, categorizing them into topics and reliably persisting them Worker D Worker E Worker F for consumers Follower Follower Peer Architecture 48 Distributed Systems: Communication and Consensus 1. All components must communicate between themselves for a distributed system to operate Attendance Jobs Status events 2. This communication is called – consensus or events gossip protocol, it will include: Worker A Events related to workers becoming available events events Controller and requesting cluster membership events Workers name lookups Worker B Worker C Retrieving configuration settings and changes Controller and leader election Worker D Worker E Worker F Health status like heart beats events events Architecture 49 Apache ZooKeeper 1. Zookeeper serves as centralized service for metadata about vast clusters of distributed workers 2. Such metadata can include: Bootstrap and runtime configuration Health and synch status Cluster and Quorum group membership – roles of elected nodes 3. ZooKeeper is a distributed system – multiple workers that form ZooKeeper ensemble (cluster) 4. Distributed system that use ZooKeeper to enable scale out, fault tolerant capabilities: Hadoop, Spark, Hbase, Mesos, Solr, Redis, Neo4J Architecture 50 Kafka – High throughput distributed message system 1. Kafka is all about data, and getting large amount of it from one place to another a) Rapidly b) Scalably c) Reliably 2. It is an event streaming platform, developed by 3. It enables users to collect, store and process data to build real-time event drive applications 4. Written in Java and Scala – have lots of APIs to work with 5. Kafka supports the Big Data 3 V’s a) Volume – 1.4 trillion messages/day, 175 TB/day, 650 TB messages consumed, 433 M users b) Velocity – 13 M messages/second, 2.75 GB/sec c) Variety – Multiple RDBMs, No-SQL Architecture 51 Apache Kafka 1. Is a publish-subscribe messaging rethought as a distributed commit log 2. Kafka is really just a highly distributed raw database that brokers reads and writes using pub/sub semantics 3. Kafka Use Cases: Metrics collection – operational monitoring data Log aggregation solution Stream processing – using storm or spark streaming to read data from topic, process and write processed data to a new topic where it is available for other apps Architecture 52 Pre-2010 Data Architecture posts ads mail search Network updates news WWW jobs people profile recommendation groups Queues RDBMS NoSQL Logs DW Hadoop BI Analysis Search Architecture 53 Next Generation Messaging Goals High throughput Horizontally Scalable Reliable & durable Loosely coupled Producers and Consumers Flexible publish-subscribe semantics Architecture 54 Post -2010 Data Architecture RDBMS Logs NoSQL Analysis consume consume consume consume publish publish consume publish consume Topic Topic Topic Topic consume consume consume consume consume publish publish DW Hadoop BI Analysis Search Architecture 55 Kafka 1. Kafka is immune to one of the challenges of most messaging systems, slow consumers Kafka can retain messages for a long period of time It is configurable per topic 2. Kafka addresses shortcomings of traditional data movement tools and approaches 3. Unlike other messaging systems, Kafka is tailored for high throughput use cases – where vast amount of data needs to be moved in a scalable fault tolerant way 4. Kafka is not a processing engine – it is not Hadoop or Big Data where we save large files 5. We cannot query Kafka’s topics 6. In Kafka there is no one master that producers write to Architecture 56 Kafka - Concepts 1. Topic a) Logical entity - a collection or grouping of messages – physical containers of data b) Has a specific name that can be defined or created ad-hoc c) Kafka topics are both like JMS queues and JMS topics d) Producers produce to topic, Consumers consume from a topic e) Topics are similar to tables in the DB but – kafka maintains topics as logs that are ordered collection of append-only events f) Topics can be stored as a single log or scaled horizontally into partitions 2. Partitions a) A topic is represented by one or more physical log files called partitions Architecture 57 Kafka - Concepts 1. Broker a) A software process, an executable or daemon service running on VM or physical that persists part of the data b) Highly Scalable, has access to file system c) Broker == kafka server d) Kafka brokers is the difference from other messaging systems 2. Cluster a) Grouping of multiple kafka brokers 3. Zookeeper a) Centralized service for maintaining metadata about cluster of distributed nodes Configuration information, Health status, Group membership b) Brokers are stateless – they use ZooKeeper for maintaining their cluster state c) Performs leader election of a broker Architecture 58 Kafka topic_1 topic_2 old M Broker A Replica M M M Leader M Consumer Group M new M Consumer M storage M Producer Broker B Consumer Replica M M M M Follower M M M storage Producer Consumer Broker C Replica M M M Producer M Follower Consumer storage Cluster, size=3 Architecture 59 Kafka Messaging System - Topics 1. Topic is a logical entity – it spans across the entire cluster of brokers 2. Producers and Consumers need to know which topic they work with 3. A topic physically represented as a log 4. Messages that are published, are appended to a time-ordered sequential stream 5. Each message represents an event, or fact that needs to be persistent and available for consumers 6. The events are immutable, once received, cannot be changed Producer 7. Message content: Append only “topic_01_metrics” Timestamp Unique ID 0 1 2 3 4 5 6 7 Payload (binary) Consumer A Consumer B 8. Consumers can receive the same messages at the same or different times, if a consumer has an error processing a message – it should not affect any other consumers or producers Architecture 60 Kafka – Consuming Messages 1. The design goal was to make messages consumption possible from unlimited number of consumers that are all interested in receiving the same message at the same or different times 2. If a consumer has an error processing a message – that should not affect any other consumers or producers 3. How consumers maintain their autonomy? Message offset 4. Message offset: Producer a) A placeholder – last read message position “topic_01_metrics” b) Maintained by Kafka Consumer c) Corresponds to message ID 0 1 2 3 4 5 6 7 5. When messages arrive, the connected consumer event will receive an event for new message, it can then Consumer A Consumer B Consumer C { offset: 1 } { offset:5 } { offset:6 } advance its position after retrieve of message Architecture 61 Message Retention Policy 1. Kafka retains all published messages regardless of consumption It does not delete messages after they are consumed ! 2. Retention period is configurable – default 7 days 3. Retention period is defined on a per-topic basis 4. Retention according to partition size - calculate physical storage for retention ! Architecture 62 Kafka Partitions 1. Each topic has one or more partitions 2. Each partition is mutually exclusive from one another 3. A partition is the basis for which Kafka can: a) Scale – determined by # number of partitions being managed by multiple brokers b) Become fault-tolerant & achieve higher levels of throughput 4. Each partition is maintained on at least one or more brokers 5. Each partition is located on one machine, cannot split a single partition to more than one machine Producer topic name “my_topic” /tmp/kafka-logs/my_topic-0 Consumer A 0 1 2 3 4 5 6 7 -.index file -.log file Partition # Broker Architecture 63 Kafka Partitions Partition is a log “my_topic” Partition 0 0 1 2 3 4 5 6 7 Partition 1 0 1 2 3 4 5 6 7 Producer Partition 2 0 1 2 3 4 5 6 Architecture 64 Kafka Consumer Group 1. Consumer group is a group of consumers that share the same group id 2. Consumers in the same group will receive a message to only one consumer Kafka will ensure that each partition is consumed by only one consumer from the same group “my_topic” “my_topic” Partition 0 Consumer 1 A Consumer 1 Partition 0 B Partition 1 Partition 1 Consumer 2 Consumer group 1 Consumer group 1 “my_topic” Consumer 1 Partition 0 C Partition 1 Consumer 2 Consumer 3 Consumer group 1 Architecture 65 Kafka vs. Rabbit MQ – both excellent solutions Different design philosophies 1. Kafka – producer centric – based around partitioning a fire hose of events data into durable message brokers with cursors - circular persistence buffer more than a queue system 1. If you have fire hose of events (20K+/sec per producer) – higher performance than Rabbit 2. Kafka – deliver – “at least once” 3. Re-read of messages 4. Consumers manage the state of consumption 2. Rabbit – Broker Centric – focus around delivery guarantees between producers and consumers 1. 20K+/sec events per queue 2. Routing according to complex rules to consumers 3. Per message delivery guarantee 4. Broker manages the state of consumers Architecture 66