System Design - High Level PDF
Document Details
Uploaded by ParamountHydra
Shalini Goyal
Tags
Summary
This document is a high-level overview of system design. It covers topics including the introduction to system design, key characteristics of distributed systems, and other relevant concepts in system design. The document is intended for a technical audience.
Full Transcript
omprehensive System Design Roadmap For C Beginners his roadmap will serve as a high level syllabus for mastering system design. It covers all T essential topics in distributed systems, networking, storage, and scalability with proper context to help you build scalable, fault-tolerant sys...
omprehensive System Design Roadmap For C Beginners his roadmap will serve as a high level syllabus for mastering system design. It covers all T essential topics in distributed systems, networking, storage, and scalability with proper context to help you build scalable, fault-tolerant systems. Each section builds on the previous one to ensure a gradual, deep understanding. 1. Introduction to System Design al ystem design phase in software development takes high-level requirements—both functional S oy and non-functional—and transforms them into a blueprint that guides the construction of the system. This process involves making strategic decisions about how different components of the system will interact, what technologies will be used, and how to ensure the system meets its performance, scalability, and reliability goals. G W hat is System Design? System design is the process of defining the architecture, modules, interfaces, components, and data flow to meet specific business or technical goals. It helps create i systems that are scalable, reliable, secure and easy to maintain. in Why is it important? System design is crucial for large-scale systems to handle increased load, avoid downtime, and maintain fast response times. al How to start learning? Start by breaking down the problem into components. Then, identify communication Sh paths, select storage mechanisms, and think about scalability and failure handling. Horizontal vs. Vertical Scaling ○ Horizontal Scaling: Adding more servers to distributeload (e.g., load balancers distributing traffic across multiple servers). ○ Vertical Scaling: Adding more resources (CPU, memory)to a single server (e.g., upgrading a database server). High-level Design vs. Low-level Design ○ High-level Design: Focuses on the overall architectureand how components interact (e.g., database, services, APIs). ○ Low-level Design: Delves into specific modules, class diagrams, and algorithms used within components. Monolith vs. Microservices ○ Monolith: One large application managing all logic(easier to start but harder to scale). Created By Shalini Goyal ○ M icroservices: Separate smaller services for managing specific tasks (increases flexibility but introduces complexity). Logging and Metrics Calculation ○ Use logging to track important events and metrics to monitor system health and performance (e.g., error rates, latency). Software Decoupling and Extensibility ○ Decouple components for independent updates and scaling. Use APIs and queues to avoid tight coupling between modules. Primary-Secondary Architecture ○ A common database architecture where theprimaryhandles writes, and secondaryreplicates data and handles reads to improve performance. al 2. Key Characteristics of Distributed Systems oy istributed systems allow multiple servers to work together as one logical unit. Here are the D core traits that define them: R esource Sharing G Components share hardware or software resources like memory, CPU, or files. Openness Use of open standards (e.g., HTTP, REST) ensures interoperability with other systems. Concurrency i in Multiple requests are handled simultaneously. Systems need synchronization techniques to avoid conflicts (e.g., locking mechanisms). Scalability al Ability to grow with increasing demand. Includes scaling storage, compute, and network resources. Fault Tolerance Sh Systems need to detect failures and continue to operate (e.g., using replicas or redundant services). Transparency Users shouldn't notice the distributed nature of a system (e.g., location transparency ensures users don’t know where data is stored). 3. Networking and Protocols Understanding networking is essential for communication between distributed components. IP, TCP, UDP ○ IP: Routing packets across networks. Created By Shalini Goyal TCP: Ensures reliable delivery. ○ ○ UDP: Faster but does not guarantee delivery (usefulfor streaming). HTTP/HTTPS ○ Used for communication between clients and servers on the web. HTTPS adds security with encryption. DNS ○ Resolves domain names (e.g., google.com) into IP addresses. REST vs RPC APIs ○ REST: Stateless, resource-oriented APIs. ○ RPC: Remote Procedure Calls—function-oriented, usefulin microservices. 4. Storage Systems al ata is the backbone of every system. Understanding storage solutions helps you build resilient D oy systems. Shapes of Data ○ Structured (SQL), Semi-structured vs Unstructured (NoSQL) data. Sorts of Availability G ○ Techniques like replication and failover ensure high availability. Consistency and Scalability i ○ CAP theorem: You can choose only two betweenConsistency,Availability, in andPartition tolerancein a distributed system. SQL vs NoSQL Databases ○ SQL: Structured queries with relational tables (e.g.,MySQL). al ○ NoSQL: Flexible schemas (e.g., MongoDB) for handlinglarge datasets. Memory Estimation ○ Estimate memory needs for operations to avoid bottlenecks. Sh Sharding and Partitioning ○ Divide data across servers to spread load (e.g., dividing user data by region). Database Replication ○ Replicate data across nodes for redundancy. Map-Reduce ○ A framework for processing large data sets across multiple nodes (used in Hadoop). ACID Properties ○ Ensure reliability:Atomicity, Consistency, Isolation, Durability. Created By Shalini Goyal 5. Metrics and Performance Monitoring Performance metrics help you evaluate the system's responsiveness. Latency ○ Time taken for a request to travel from source to destination. Throughput ○ Number of requests processed in a given time frame. Availability ○ Uptime of the system, avoiding Single Points of Failure (SPOF). 6. Proxies and Load Balancing al Proxies and load balancers distribute traffic efficiently across multiple servers. oy Forward and Reverse Proxies ○ Forward proxies act on behalf of clients, while reverse proxies act on behalf of servers (e.g., NGINX, HAProxy). G Load Balancing Strategies ○ Round Robin: Distribute requests equally. ○ Weighted Round Robin: More requests to powerful servers. i ○ IP Hashing: Route requests based on the client’s IP. in ○ Layer 4 vs Layer 7 Load Balancing: Distribute basedon network or application layer information. al 7. Caching Mechanisms Sh Caching helps reduce latency and offload work from databases. Cache-aside ○ Application loads data into the cache when needed. Write-through and Write-behind ○ Sync data to cache and database in different ways to maintain consistency. Application, Database, and In-Memory Caching ○ Use Redis or Memcached to cache frequently accessed data. Created By Shalini Goyal 8. Consistency and CAP Theorem Consistency is crucial for maintaining the accuracy of data across nodes. Consistent Hashing ○ Efficiently distributes load across servers by hashing keys. CAP Theorem ○ Balances trade-offs between Consistency, Availability, and Partition Tolerance. 9. Content Delivery Networks (CDN) al CDNs speed up content delivery by caching it near the user. Push CDN oy ○ Preload content to CDN nodes. Pull CDN ○ Content is fetched only when needed. G 10. Logging, Monitoring, and Alerting i in Observability ensures the system operates smoothly and issues are resolved quickly. Log Collection, Transport, and Storage al ○ Use tools like ELK stack for managing logs. Analysis and Alerting ○ Identify patterns and set alerts for abnormal activity. Sh 11. Rate Limiting Rate limiting controls traffic to prevent abuse. Token Bucket and Leaky Bucket Algorithms ○ Manage request flow and prevent overload. Created By Shalini Goyal 12. Polling and Streaming Use the appropriate method for fetching updates. Polling ○ Regularly checking for new data (inefficient but simple). Streaming ○ Continuous flow of data in real-time (e.g., Kafka for event streaming). al oy i G in al Sh Created By Shalini Goyal System Design Exercises and Learning Plan Purpose of the Exercises he following exercises are to give learners a challenge in building large-scale systems. They T offer practice with architectural decision-making, system scalability, fault tolerance, and network management. These challenges simulate real-world problems, encouraging learners to implement distributed design principles. So tickle your brain with them, don’t start looking for solutions over the internet. ry to create your designs on paper and then look for the solution over the internet to find out T the differences. al Exercise Plan Overview oy Each design exercise is divided into four key components: . 1 roblem Definition:What is the system supposed toachieve? P G 2. Core Functional Requirements:Features necessary tomake the system useful. 3. Non-Functional Requirements:Performance, scalability,availability, etc. 4. High-Level Solution Design:Architectural decisions,components, and trade-offs. i in 1. How to Design a URL Shortening Service like bit.ly? al Key Topics: Sh atabase design (NoSQL or Relational) D Hashing algorithms for generating short URLs Handling high read-write traffic Caching (in-memory cache like Redis) Focus Areas: C ore requirements:Generate, store, and retrieve shortURLs. Non-functional needs:High availability, low latency. Scalability:Handle millions of URL requests per day. 2. How to Design a Website like Pastebin? Created By Shalini Goyal Key Topics: ext storage and retrieval T Security (private/public pastes) Data expiration (auto-deletion) Database schema design Focus Areas: D ata structure:Use relational databases to storeuser text snippets. Expiration policies:Ensure old posts are removedautomatically. Authentication:Add user management with login featuresfor private pastes. al 3. How to Design a Chat Application like WhatsApp or Telegram? oy Key Topics: eal-time messaging protocols R Consistent and eventual messaging delivery Offline message storage G Encryption (end-to-end security) Focus Areas: i in M essage queues:Use Kafka or RabbitMQ for messagehandling. Database:Use NoSQL for scalability (MongoDB or Cassandra). Real-time updates:Implement WebSockets for live chat. al Sh 4. How Would You Create Your Own Instagram? Key Topics: edia storage (images/videos) M Content feed algorithms User profile and authentication Caching and CDN for media delivery Focus Areas: U ser feed:Design algorithms to personalize the feed. Storage:Use AWS S3 or similar services for mediastorage. Performance:Employ CDNs for fast image loading. Created By Shalini Goyal 5. How to Create Your Own Twitter? Key Topics: andling real-time tweets and replies H Follower-following relationship model Timeline generation Rate limiting for APIs Focus Areas: T imeline:Use caching for quick timeline retrieval. al Search:Implement text search with Elasticsearch. Follower model:Use graph-based data structures. oy 6. How to Design a File Sharing System like Google Drive or Dropbox? Key Topics: G ile storage and synchronization F User access control and permissions i File versioning in Distributed storage Focus Areas: al S torage:Use cloud services with replication for redundancy. Sync logic:Handle conflicts between versions. Sh Access control:Ensure secure sharing of files. 7. How to Design a Global Video Streaming Service like Netflix? Key Topics: ontent delivery using CDNs C Video encoding and storage Subscription management Load balancing for large-scale traffic Focus Areas: Created By Shalini Goyal L oad management:Use load balancers to distribute traffic. Video storage:Store different resolutions for bandwidthoptimization. CDN:Use edge servers for content delivery. 8. How to Design an ATM System? Key Topics: ransactional operations (withdrawal, deposit) T Network security Data synchronization between banks Fault-tolerant design al Focus Areas: oy A vailability:Handle offline transactions. Consistency:Ensure account balance consistency. Security:Encrypt transactions. G 9. How to Design a Web Crawler like Google? i Key Topics: in rawling strategies (BFS vs. DFS) C URL prioritization al Handling duplicate content Scalable storage of crawled data Sh Focus Areas: E fficiency:Avoid duplicate crawling. Scalability:Use distributed crawling with multiplebots. Storage:Store metadata with MongoDB or Elasticsearch. 10. How to Design an API Rate Limiter? Key Topics: T oken bucket and leaky bucket algorithms Monitoring traffic per IP Created By Shalini Goyal Throttling strategies Focus Areas: A PI limits:Enforce request limits to prevent abuse. Storage:Use Redis to manage tokens. Monitoring:Provide analytics for rate-limited endpoints. 11. How to Design a News Feed like Facebook? Key Topics: F eed ranking algorithms al Real-time updates Social graph management oy Focus Areas: R anking:Use algorithms to rank posts. G Caching:Cache user feeds to minimize recomputation. Real-time:Use WebSockets for live updates. i in 12. How to Create a Deployment System? Key Topics: al C ontinuous Integration/Continuous Deployment (CI/CD) Rollbacks and versioning Sh Deployment automation tools Focus Areas: A utomation:Use Jenkins or GitHub Actions. Version control:Integrate with Git repositories. Rollback:Ensure fast recovery from failed deployments. 13. How to Design a Multiplayer Card Game? Key Topics: Created By Shalini Goyal G ame state synchronization Real-time communication Handling latency issues Focus Areas: G ame logic:Sync states across players. Communication:Use WebSockets for real-time gameplay. Scalability:Ensure servers can handle concurrentplayers. 14. How to Design a Ride-Hailing App like Uber? al Key Topics: eal-time GPS tracking R oy Driver-rider matching algorithms Payment processing Load balancing for surge requests Focus Areas: G L ocation tracking:Use GPS APIs for tracking. Matching system:Optimize rider-driver matching. i Payment:Secure payment gateway integration. in al 15. Design an Application like Google Docs Key Topics: Sh C ollaborative editing Conflict resolution in real-time edits Document storage and permissions Focus Areas: C ollaboration:Implement operational transforms (OT)for edits. Storage:Use cloud storage with versioning. Permissions:Manage shared access. Created By Shalini Goyal Learning Path and Plan of Execution 1. Week 1-2: ○ Start with simpler systems likeURL shortenerandPastebin. ○ Focus ondatabase designand caching mechanisms. 2. Week 3-4: ○ Move tochat applicationsandsocial media appslikeInstagram. ○ Dive deep intoreal-time communicationandfeed generation. 3. Week 5-6: ○ Work onfile sharing systemsandvideo streaming platforms. ○ StudyCDNsand load balancing techniques. 4. Week 7-8: ○ Design complex systems likeUberandGoogle Docs. ○ Learnsynchronization algorithmsand handlepaymentintegration. al 5. Week 9-10: ○ Builddeployment systemsand multiplayer games. oy ○ PracticeCI/CD workflowsand real-time state management. 6. Final Week: ○ Take on large-scale challenges likenews feedandweb crawlers. ○ Present the designs and discuss trade-offs in group reviews. i G in al Sh Created By Shalini Goyal ow to Solve System Design Problems: Example – H URL Shortener Step 1: Requirement Clarification efore diving into design, it is essential to understand what the system needs to do and what B trade-offs we are willing to make. Questions to Ask: 1. What are the features the system must support? Example: Users can input a long URL and receive a shortened version. Should we allow URL expiry? (Optional feature) 2. Should we focus only on the backend, frontend, or both? al Backend: Core URL shortening logic. Frontend: Simple UI to input and display URLs. oy 3. What scale of users are we designing for? Estimate based on millions of users (high traffic). 4. Are there any extended requirements? Example: API access for third-party services. G 5. Should the system prioritizeavailabilityorconsistency? Availability is often prioritized in web services with high traffic. i in Step 2: Capacity Estimations al his helps us decide the resources the system will need and the appropriate architecture to T handle load. Sh Questions to Answer: 1. What scale is expected from the system? Example: 500 million URLs per year. 2. Define theread/write ratio. Example: 80% reads (accessing URLs) and 20% writes (creating new URLs). 3. Traffic Estimation: Assume 10 million requests per day. Peak traffic: 500 requests per second. 4. Storage Estimation: Assume 1 KB per URL entry (including metadata). For 5 years: 1.8 billion URLs ≈ 1.8 TB of storage. 5. Bandwidth Estimation: Example: 50 GB/day for incoming/outgoing data requests. Created By Shalini Goyal Step 3: System API (System Interface) Design This step involves defining how external users or systems will interact with your service. A PI Design Approach: UseRESTful APIssince they are stateless and easyto integrate with web services. Endpoints: ○ POST /shorten: Accepts a long URL and returns a shortversion. ○ GET /{shortURL}: Redirects to the original URL. ○ GET /info/{shortURL}: Returns metadata (e.g., numberof hits). ○ Optional:API for bulk URL shortening for enterpriseuse. al Step 4: Database Schema (Data Modeling) oy hoosing the correct database and schema is crucial to meet system performance and storage C requirements. G D atabase Choice: UseNoSQL databases(like Redis or DynamoDB) for fastreads and writes. NoSQL also supports scalability. i Schema: in ○ shortURL (Primary Key): Unique short identifier. ○ longURL: Original long URL. ○ createdAt: Timestamp of creation. al ○ expiryDate: Optional field for URL expiration. Block Storage: If storing URLs and metadata becomes large, you can use cloud storage systems (like Sh AWS S3) for archival purposes. Step 5: High-Level Design At this stage, create a block diagram showing the core components. 1. Components: ○ Frontend:Web interface to enter URLs and retrieveshort ones. ○ Backend Services: URL Shortening Service: Generates short URLs. Redirection Service: Maps short URLs to original URLs. Analytics Service: Tracks URL clicks and metadata. Created By Shalini Goyal Database:Stores the mapping between long and short URLs. ○ ○ Cache Layer:Stores frequently accessed URLs (e.g.,Redis). 2. Diagram: Include the following: ○ User → Web Frontend → API Layer → URL Shortener → Database. ○ Cache sits between the API and the database for performance. Step 6: Detailed Design elve into the technical details such as caching strategies, partitioning, and handling high D traffic. al P artitioning Strategy: Useshardingon URL keys to split the database loadacross multiple servers. Handling Hot URLs: oy Cache popular URLs in Redis to avoid frequent database lookups. Load Balancing: Use around-robin load balancerto distribute trafficacross multiple servers. Caching Strategy: G ○ Store short-to-long URL mappings in Redis. ○ Implementcache-aside pattern: Check cache first;if not found, query the database and update the cache. i in Step 7: Handling Bottlenecks and Single Points of Failure al Design for reliability by eliminating SPOFs and introducing redundancy. Sh Questions to Ask: 1. Are there any single points of failure? Example: If the database crashes, all requests will fail. 2. How to remove these SPOFs? Usedatabase replicationto ensure availability. 3. Do we have enough data replicas? Maintain multiple replicas across regions to prevent data loss. 4. How are we monitoring performance? Use monitoring tools like Prometheus to set up alerts for high latencies or errors. Step 8: Performance Monitoring and Scaling Created By Shalini Goyal Monitoring helps identify bottlenecks early and allows proactive scaling. Metrics to Track: ○ Latency:How fast the service returns a shortenedURL. ○ QPS (Queries per Second):How many requests are handledconcurrently. ○ Cache Hit Rate:Percentage of requests served fromcache. Scaling Approach: ○ Horizontal Scaling:Add more servers as the user basegrows. ○ Auto-scaling Policies:Use cloud-based auto-scalingto handle traffic spikes. Final Solution Summary: al The URL Shortener system will have: . 1 rontend:Simple interface to input URLs. F oy 2. Backend Services:Stateless API services for URL shorteningand redirection. 3. Database:NoSQL for fast access with sharding. 4. Cache:Redis for hot URLs and quick lookups. 5. Monitoring Tools:Prometheus or Grafana to track performanceand set alerts. G 6. Load Balancer:Distribute requests across servers. i in Conclusion y following the structured approach outlined above, we designed a highly scalable, available, B al and reliable URL shortening service. This problem-solving framework can be applied to other system design challenges, ensuring you address both functional and non-functional requirements systematically. Sh Created By Shalini Goyal