DDR: A Deadline-Driven Routing Protocol for Delay Guaranteed Service PDF
Document Details
Uploaded by CuteWatermelonTourmaline
University of Victoria
Pu Yang, Tianfang Chang, Lin Cai
Tags
Summary
This paper proposes a new deadline-driven routing protocol (DDR) for delay-guaranteed service in computer networks. It leverages real-time traffic conditions to optimize routing decisions. DDR is designed to ensure on-time packet delivery while accommodating heterogeneous latency requirements and offering backward compatibility with existing routing protocols.
Full Transcript
DDR: A Deadline-Driven Routing Protocol for Delay Guaranteed Service Pu Yang, Tianfang Chang, Lin Cai Department of Electrical & Computer Engineering, University of Victoria, Victoria, Canada...
DDR: A Deadline-Driven Routing Protocol for Delay Guaranteed Service Pu Yang, Tianfang Chang, Lin Cai Department of Electrical & Computer Engineering, University of Victoria, Victoria, Canada [email protected], [email protected], [email protected] Abstract— Time-sensitive applications have become increas- engineering, leveraged by pruning linear programming prob- ingly prevalent in modern networks, necessitating the develop- lems while adding artificial constraints. However, the above ment of Delay-Guaranteed Routing (DGR) solutions. Finding solutions require manual intervention by experienced experts an optimal DGR solution remains a challenging task due to the NP-hard nature of the problem and the dynamic nature and require lengthy iterations, which is not feasible when of network traffic. In this paper, we propose Deadline-Driven routers require real-time results. Routing (DDR), a distributed traffic-aware adaptive routing To address the aforementioned limitations, we propose protocol that addresses the DGR problem. Inspired by online a distributed traffic-aware adaptive routing protocol named navigation techniques, DDR leverages real-time traffic conditions Deadline-Driven Routing (DDR). To ensure delay-guaranteed to optimize routing decisions and ensure on-time packet delivery. By combining network topology-based path generation with real- services, each packet carries its delay limit in the network- time traffic knowledge, each router can adjust packet forward- layer packet header, making it accessible to all routers. In- ing directions to meet its heterogeneous latency requirements. spired by online navigation , which recommends routes Comprehensive simulations on real-world network topologies and alternatives based on real-time traffic conditions, DDR demonstrate that DDR can consistently provide delay-guaranteed optimizes routing solutions for punctual delivery by leveraging service in different network topologies with varying traffic conditions. In addition, DDR ensures backward compatibility the inherent synergy between network capacity and local traffic with legacy devices and existing routing protocols, making it a information. viable solution for supporting delay-guaranteed service. To achieve efficient path selection, the proposed DDR Index Terms—Delay guaranteed routing, Adaptive routing. utilizes both the network topology information and the traffic dynamics information. In particular, DDR benefits from a I. I NTRODUCTION local complete Forwarding Information Base (FIB), which Delay-Guaranteed Routing (DGR) is a fundamental and provides a wealth of routing information. This FIB data can crucial problem in computer networking. From advanced vir- be used by the proposed approach to make intelligent and tual reality (VR) and digital twin technologies to traditional informed route selections. Finally, the path selection process is financial transactions and online meetings, the demand for optimized based on real-time network conditions and packet- DGR keeps growing [1, 2]. Existing traffic engineering [3, 4] specific requirements. and software-defined networking (SDN) methods [5, 6] can To gain information about traffic dynamics, DDR uses a improve network performance in terms of delay, jitter, and locally adaptive forwarding strategy. As a result, routers can throughput. However, they cannot effectively solve the DGR dynamically adjust their forwarding decisions based on the problem due to limited flexibility and scalability. immediate network status and traffic conditions in their local To provide delay-guaranteed services, both network capacity vicinity. Using locally adaptive forwarding, the proposed ap- and traffic dynamics should be considered for routing proto- proach can efficiently and effectively handle traffic variations cols. In this context, the DGR problem can be formulated as and congestion at a fine-grained level, ensuring timely packet an integer programming problem, with the objective of mini- delivery while efficiently utilizing network resources. This mizing the percentage of packets delivered that exceeds their feature fits well with the needs to make fast and real-time designed budgets. However, how to develop effective routing routing decisions for the DGR problem. solutions for delay-guaranteed services remains a challenging From the above analysis, an accurate estimation of the problem. neighborhood traffic state is important for routing decisions. Existing solutions to integer programming problems are Ideally, the router can promptly identify alternative routes limited by their scalability, which is particularly challenging in response to congestion. However, in real-world networks, in large-scale dynamic network environments. To address traffic volume may change within tens of milliseconds, while this limitation, researchers introduce constraints to reduce the the neighbor information delivery delay ranges from a few search space. For example, Yiyang et al. developed an op- milliseconds to tens of milliseconds. To minimize the negative timization framework that can account for variations in traffic impact caused by traffic information delay, DDR adopts a route demand and potential link failures by manually introducing selection mechanism based on queue status estimation. constraints and multiple relaxations. Bogle et al. proposed The main contributions of this work are summarized below: TEAVAR, which takes a risk management approach to traffic First, we present a formulation of the DGR problem in the context of an autonomous system (AS) network. The Therefore, the DGR solution requires a cross-layer approach problem is formulated to address the challenges of delay that involves both network and link layers. At the network guarantee for time-sensitive applications. layer, the DGR solution determines appropriate paths to route Second, to effectively tackle the DGR problem, we packets while meeting the specified latency requirements. At introduce DDR, a novel distributed traffic-aware dy- the link layer, it implements priority queues to accommodate namic routing protocol. DDR explores a large set of packets based on their latency budgets and prevailing network feasible paths while leveraging local traffic conditions conditions. This cross-layer approach enables DGR to make and network topology knowledge to ensure packet-level well-informed and coordinated routing decisions to achieve delay guarantees, and optimizes the routing decisions in efficient and punctual packet delivery. real time. Through careful design, we have ensured its Furthermore, the DGR solution in an AS network should be compatibility with legacy devices in AS networks. This addressed using distributed methods. Instead of relying on a feature makes DDR a practical and viable solution that centralized controller to make routing decisions for the entire can be deployed without significant disruption to the network, the DGR solution allows individual routers to make existing infrastructure. localized and adaptive decisions based on real-time observa- Third, we have developed and implemented a working tions of traffic conditions and queue status. This decentralized prototype of the DDR protocol. Extensive evaluations approach makes it possible to deploy in traditional networks. have been conducted on real network topologies and traf- It also streamlines the routing process, reducing overhead and fic data extracted from production networks. The results complexity, resulting in faster and more efficient decision- show that DDR consistently provides delay-guaranteed making. In addition, local optimization allows each router services in a variety of network and traffic scenarios. This to make routing decisions independently without extensive performance underscores the adaptability and reliability communication with other routers or controllers, improving of DDR. scalability and robustness in large networks. The rest of the paper is organized as follows: Sec. II provides a primer on the DGR solution, addressing its key Node Normal Busy Congested concepts and challenges. In Sec. III, we discuss the pain points in the existing approaches to DGR. The system design is illustrated in Sec. IV. The detailed implementation of the DDR B C protocol is presented in Sec. V. Evaluation results, including performance comparisons and simulations, are discussed in Sec. VI. We discuss future research directions in Sec. VII, A D followed by conclusions Sec. VIII. Open-source. The code of DDR is publicly available at F E http://tinyurl.com/yowaq7th. II. DGR P RIMER Fig. 1: An illustrative example Large Internet Service Providers (ISPs) operate AS net- We illustrate a delay-guaranteed routing service example, works consisting of tens of servers. These AS networks must assuming that a packet is to be delivered from A to D within continuously adapt to meet evolving user demands, services, the given latency limit, as shown in Fig. 1. There are four and traffic patterns, requiring optimization for efficient and alternative routes from A to D, Route 1 (A-B-C-D), Route reliable communications. In this context, routing protocol 2 (A-F-E-D), Route 3 (A-F-C-D), and Route 4 (A-B-C-F-E- plays a crucial role in achieving efficient and reliable packet D), but different routes may suffer from different levels of delivery for latency-sensitive applications. The main objective congestion. To satisfy the DGR requirement, Node A needs of DGR is to identify paths that can route packets within a to know the topology of the subnet as well as the delay upper given latency budget. bound of each link along the paths and select an appropriate The high-level goal of DGR is to provide efficient and direction to route the packet considering the neighborhood link reliable routing mechanisms that ensure timely packet delivery traffic status. for latency-sensitive applications. First, DGR represents a challenging combinatorial opti- III. E XISTING A PPROACH AND C HALLENGES mization problem, with the primary objective of identifying an efficient path for packet routing within a given latency In this section, we formulate the DGR problem as an opti- budget. In practical networks, fluctuating traffic patterns can mization problem and analyze the challenges it poses. We pro- cause variations in edge delay, making static routing protocols vide an overview of the pain points of the existing techniques unable to achieve optimal performance. The decision-making in traffic engineering, resource reservation, and deterministic process must consider both static link properties and real-time scheduling used to tackle this problem. These techniques, traffic conditions to handle latency requirements. while valuable, have certain limitations in effectively solving the DGR problem, which motivates our exploration of more static routing table potentially suboptimal. On the other hand, intelligent and systematic solutions in the subsequent sections. capturing the traffic state of the entire network is prohibitively expensive and will suffer from long delays. Therefore, how to A. Problem Formulation consider both network topology and traffic dynamics to solve Symbol Description the DGR problem remains a challenging open issue. G A directed graph with m edges and n nodes V Node set B. Pain Points in Today’s Approach E Edge set τi Latency of packet i We formulate the DGR problem above as an integer pro- Li Latency upper limit of packet i gramming problem above, but it could not be solved directly Pi Set of edges that generates a path to deliver packet i with an off-the-shelf integer programming solver. The main de The delay of edge e ∈ Pi , de is dynamic zi Binary vector edge indicator of path Pi problem lies in the computational complexity of Integer Pro- I(·) Indicator function of the on-time delivery gramming solvers, making it challenging to directly apply off- the-shelf solvers such as Gurobi and MOSEK. In TABLE I: Key notations in problem formulation addition, the iterative computation of the integer programming approach results in high computational resource consumption The AS network topology is abstracted as a directed graph, and time overhead, which becomes even more problematic where each router is represented as a node, and the links under dynamic traffic conditions. between them form the edges. Traffic is represented as packets Constrained by hardware processing capabilities, the early between different nodes with various latency budgets based networks were designed to provide best-effort services. Over on the service requirements. The objective of DGR is to the past few decades, an increasing number of concepts and maximize the number of packets delivered on time while methods for optimizing network performance have been pro- satisfying all constraints imposed by hardware and operational posed to meet various Quality of Service (QoS) requirements. requirements. The problem can be formulated as an optimiza- The Internet deployed simple and efficient distributed rout- tion problem under a set of constraints. The mathematical ing algorithms based on static routing tables OSPF , formulation is given below. Table I summarizes the key BGP , and IS-IS. They simplify the routing prob- notations of the problem formulation. lem by ignoring traffic conditions and transforming it into Consider N packets are scheduled to be delivered on a a shortest path problem that can be solved efficiently using directed graph G = (V, E) with m edges and n nodes. Let algorithms such as Dijkstra’s algorithm and Bellman-Ford Pi be the path that delivers packet i from its source to its algorithm. However, table-based routing schemes cannot ef- destination. Let de be the delay of edge e ∈ Pi , which changes fectively handle traffic dynamics in the network. dynamically over time. The cumulative latency τi of packet i Multipath routing protocols like Equal-cost Multi-path over the path Pi is given by: Routing (ECMP) , and traffic engineering systems [16, 17] X can respond to regular network dynamics. However, when τi = de. (1) congestion occurs along a path, the routers are unable to e∈Pi change the forwarding direction and search for a new route. Therefore, the objective of the DGR is to maximize the This limitation can affect network reliability as well as delay number of packets that can be delivered within their respective and jitter, and make the Internet incapable of guaranteeing delay requirements (called delay budgets in this paper), Li , latency for time-sensitive applications. i.e., The inadequacy of traditional Internet interior gateway rout- ing systems led the interest in traffic engineering. Multiproto- N X col Label Switching (MPLS) employs reactive traffic en- maximize I(τi < Li ), (2) gineering, which allows routers to make forwarding decisions i=1 m based on labels rather than destination addresses, resulting in subject to: zi ∈ {0, 1} , more efficient packet delivery. NetPlumber introduces a where Pi = ziT E. The decision vector zi of length m policy-checking tool that allows routers to evaluate network represents the selection of edges. Using matrix-vector multi- policies without delay and make adaptive routing decisions. plication, path Pi can be denoted by the binary vector zi. The However, current traffic engineering methods can only op- inequality constraint I(τi < Li ) ensures that the cumulative timize a constant scenario but lack robustness. Some even delay of each packet τi is less than its delay budget Li. The rely on a manual intervention based on analysis of historical indicator function I(·) returns 1 if the condition inside the experience. We need more automated and intelligent solutions parentheses is true, and 0 otherwise. that adaptively adjust routing decisions based on real-time This problem is an integer programming problem and is network conditions to ensure efficient and reliable packet an NP-hard problem, making it challenging to find an opti- delivery for time-sensitive applications. mal solution using traditional iterative optimization methods. More recently, SDN has applied centralized control to Furthermore, in real networks, traffic variability causes fluc- network management, where the separation of control and tuations in edge delay, making the shortest path chosen by a data planes allows for more flexible control of network management and configuration. B4 employs hierarchical routing and centralized control based on global network state DDR Framework information to achieve dynamic routing optimization and load balancing. Microsoft proposed SWAN , a system that Routing increases network utilization by reconfiguring traffic via the Algorithm central controller, while LiveNet applies the centralized Link State Database approach to global routing computation and path assignment, improving performance for large-scale and low-latency stream- Route Local Complete FIB Selection ing. In addition, Kumar et al. formulated the end-to-end Traffic Condition delay in SDN as a multi-constraint optimization problem and Database used heuristic algorithms to solve it. Although the centralized approach offers advantages in SDN due to the logically centralized control plane and regular topologies, it faces scal- ability and complexity limitations when applied to large-scale Packet Queue Forwarding Process Discipline Decision networks. Lin proposes the SET , which aims to meet the diverse requirements of network applications in various environments through intelligent configuration. However, it Fig. 2: Framework of DDR has only been proposed as an architecture and urgently awaits implementation. Advances in hardware performance have led to the explo- The DDR framework, as depicted in Fig. 2, requires the ration and extension of the capabilities of routing devices. cooperation of the link layer and network layer. A queuing One of the approaches is the programming slick network discipline is used to quantify queuing delay, while a routing al- functions , which enables dynamic and flexible network gorithm identifies possible forwarding directions. In addition, a control by defining and deploying network functions on-the- route selection algorithm dynamically makes forwarding deci- fly. In addition, Kinetic et al. used machine learning sions based on real-time traffic conditions. This collaborative to optimize network routing and improve performance in approach ensures that DDR can efficiently find high-quality dynamic environments. routing solutions while adapting to the dynamics of network In addition to routing, scheduling is critical for service traffic. quality. Priority Queues can classify packets into different priorities based on their importance or time sensitivity to en- A. Traffic Condition Database sure that high-priority packets can access high-priority queues The Traffic Condition Database (TCDB) is designed to as- to be forwarded ahead of lower-priority packets to reduce sist the router in estimating the delay of routes under different their queuing delay. The scheduling scheme could improve the traffic conditions, which consists of two main functions: delay performance of time-sensitive packets, but cannot thoroughly quantization and delay estimation. solve the DGR problem in the network. Yang proposed Delay quantization: To meet the latency requirements of a delay-guaranteed routing protocol (DGRP) that uses real- delay-sensitive applications, we introduce a priority queue time neighbor congestion information to adjust forwarding discipline, as shown in Fig. 3. This priority queue consists of directions. However, the study only considers ideal neighbor two virtual queues: a high-priority queue for delay-sensitive traffic conditions obtained from a simulator and does not con- packets and a low-priority queue for normal best-effort pack- sider the impact of delay to obtain such traffic conditions. To ets. The length of the high-priority queue is configured in fully satisfy the DGR service, we need cross-layer cooperation milliseconds, representing the amount of time it takes to drain between the network layer and the link layer. the queue. By measuring the traffic condition reflected in the To address these challenges, we adopt a greedy routing length of the high-priority queue, we can assign a congestion strategy to ensure timely packet delivery in the presence of state to the queue. The local router periodically broadcasts this traffic fluctuations. In the next section, we show why the congestion state to neighboring nodes. Subsequently, neighbor- central idea of DDR - local traffic condition-based adaptive ing routers can estimate the queueing delay and intelligently routing - can be a powerful tool to improve the on-time arrival choose an appropriate forwarding direction for packets. This rate. approach enables routers to make informed and adaptive routing decisions based on real-time traffic conditions. IV. DDR D ESIGN Delay estimation: In real-world networks, the delays for We propose DDR, a hybrid approach that leverages the the control packet which carries the traffic state to arrive advantages of table-based routing and a local traffic state at neighboring routers can cause discrepancies between the adaptive method to solve the DGR problem. By exploiting observed traffic conditions and the actual conditions at the the strengths of both methods, DDR achieves a millisecond- time of arrival. To address this issue, the TCDB implements a level latency guarantee, enabling efficient and reliable packet latency estimation process that allows routers to estimate the delivery under practical network conditions. current traffic conditions of their neighbors based on historical High Priority Queue under real-time network conditions. S X Dq = pij · dj (3) j=1 Classify ··· Link By leveraging the Markov model, the latency estimation Low Priority Queue process in TCDB provides routers with a reliable and efficient means to estimate their neighbor’s future traffic conditions. Fig. 3: DDR Queue Discipline This estimation contributes to better adaptive behavior of the DDR protocol, enabling timely and efficient packet delivery while considering real-time network conditions. records, while taking into account potential delays between neighbors. B. Local Complete FIB The delay estimation problem can be formulated as a Time Series Estimation (TSE) problem, which is commonly encountered in various domains such as signal processing, Node Normal Busy Congested control engineering, and traffic engineering. Techniques such as Markov models , Hidden Markov models , and B C B C Reinforcement Learning can be applied to solve TSE problems. In the context of DGR, the goal is to estimate A D A D the neighbor’s future traffic conditions in a few milliseconds, which is closely associated with the link delay. While machine F E F E learning and reinforcement learning methods can achieve accurate estimations, they often demand substantial comput- Fig. 4: Feasible Path Candidate Set ing resources and extensive training datasets. We employ a straightforward yet effective Markov model for the latency OSPF uses the Dijkstra’s algorithm to compute the shortest estimation process. path tree, also known as the Shortest-Path First (SPF) algo- rithm. However, relying on a single path may lead to network Actual level under-utilization and degraded performance with unbalanced Predicted level S1 S2 S3 S1 p11 p12 p13 traffic. Therefore, for DDR, we aim to identify all potential S2 p21 p22 p23 paths that can meet the delay requirement. To achieve it, a S3 p31 p32 p33 DDR router recursively uses the SPF algorithm to generate a TABLE II: Transition Matrix for Markov Model recursive feasible path candidate set. While the RecursiveSPF algorithm efficiently generates the time-oriented routing table To represent the traffic state, we utilize the high-priority based on the static delay component, there are opportunities queue to define the state. The Markov model captures the tran- to optimize its performance. By leveraging parallel computing sition probabilities between different traffic condition states. techniques, such as parallelization across multiple processing By observing the current state, the routers can estimate the units or distributed computing across a network of routers, probabilities of transitioning to different states in the future. the algorithm can be executed efficiently. This reduces the This information enables the routers to make better decisions computational complexity and enables faster generation of the about the most appropriate forwarding directions, taking into deadline-driven routing table. account the expected queuing delays and avoiding congestion. The transition matrix, as presented in Table II, is a crucial Algorithm 1: Feasible Path Candidate Set tool used for estimating and evaluating the Markov model. Data: A graph G(V, E), a root node root The values in the matrix represent the probabilities of state Result: A feasible path candidate set F for the node transitions. root The estimated queueing delay (Dq ) in each neighbor can 1 Let depth be the depth of the forest. be calculated using (3), where S represents the total number 2 Function RecursiveSPF(G, root, depth, F ): of states in the Markov model. pij denotes the transition 3 if depth == 0 then probability from the ith state to the jth state, while dj 4 F.push(Dijkstra(G, root)) represents the expected queuing delay associated with the 5 end jth state. The delay expectation (Dq ) is a crucial metric 6 for each neighbor router i of root do that enables routers to make intelligent decisions regarding 7 RecursiveSPF(G, i, depth − 1, F ) forwarding directions. This information plays a significant role 8 end in the adaptive and dynamic behavior of the DDR protocol, 9 return ultimately contributing to efficient and reliable packet delivery To explore more alternative forwarding paths, we generate potential routes to the destination, and a loop avoidance check a feasible path candidate set for each node. The pseudo-code function ensures that the forwarding direction is closer to the is given in Algorithm 1. For a directed graph G(V, E), we destination. Each route is then analyzed based on its static assign any node as the root and initialize an empty tree set latency and the estimated local queueing delay. Routes that F to store the feasible path candidate set. Here we introduce can deliver the packet within its budget time are selected to depth to represent the number of hops from the root node to its the candidate route set S. The M inimumDelay(·) function neighbors, which also determines the number of shortest-path then selects the route with the shortest estimated delay from trees in the current forest. Then we design a recursive function this set. RecursiveSPF with a core of the Dijkstra’s algorithm to The adaptive route selection function plays a critical role calculate the feasible path candidate set. Recursive depth in ensuring timely packet delivery while considering both represents the range of traffic conditions observed by the router the dynamic traffic conditions and the link capacity. By when making routing choices. For instance, as shown in Fig. 1, intelligently selecting the most appropriate route, the algorithm we set depth to 1. At node A, there are two neighbors in 1 helps achieve millisecond-level delay guarantees in the DDR hop, B and F, so the feasible path candidate set includes two system. shortest-path trees rooted at node B and node F. The recursive depth can be flexibly set according to the research needs, V. I MPLEMENTATION which can gather more neighbor information across multiple hops and optimize the algorithm. In this section, we describe our implementation of DDR By exploring alternative paths beyond the shortest route, using the ns-3 simulator and discuss some of the imple- routers can dynamically adapt to changing network condi- mentation issues that arose. The packet-level simulation used tions, spread traffic load more evenly, and optimize resource to evaluate the performance of DDR is illustrated in the next utilization. This approach ensures efficient data transmission, section. minimizes congestion and enhances the overall performance The implementation of DDR is based on an OPSF link- of the time-oriented routing system. state routing protocol that generates the LSDB by link-state advertisement (LSA). The route computation function for C. Adaptive Route Selection generating the local complete FIB and the route selection The adaptive route selection function aims to find a suitable function has been modified according to Sec. IV. However, forwarding direction for a packet within its budget time. to ensure the optimal performance of DDR, two configuration Algorithm 2 presents the pseudo-code for adaptive route issues need to be addressed: determining how to set the traffic selection. It uses two functions related to the previous section. condition scale and selecting the appropriate sampling rate. Function D(r) returns the cost of route r, which is described In the proposed priority queue discipline for latency use in Sec. IV-B. Function Di (r) returns the estimated queueing delay before quantization, we define the maximum queueing delay based on local traffic condition, which is described in delay for the priority queue as Qmax. The queue is divided Sec. IV-A. into K states, from 1 to K, and each state k represents a range of queueing delays. Specifically, the queueing delay in Algorithm 2: Adaptive Route Selection state K is between (k−1)Q K max and kQKmax. Data: Forwarding Table T , Destination Address d, The scale traffic condition state setting is crucial and sig- Latency budget L nificantly impacts the protocol’s performance. If the range for Result: A forwarding direction R each state is too small, the Markov Model may necessitate 1 Let D(r) return the cost of route r. predicting more steps to estimate the current state, poten- 2 Let Dl (r) estimate the local traffic delay of route r. tially decreasing the accuracy due to the increased prediction 3 Let M inimumDelay(·) return the index of the complexity. Conversely, using a large granularity and fewer estimated fastest path of Set (·). states might result in higher estimation errors for certain 4 Let S be an empty set of routes. states. Thus, finding the right balance between granularity and 5 S ←∅ prediction steps is essential to achieve accurate and efficient 6 for each route r ∈ T to d do traffic condition estimations in the Markov Model for the 7 if LoopAvoidCheck(r) priority queue. Properly selecting the traffic condition scale 8 && D(r) + Dl (r)