Q-learning-based Smart Clustering Routing in Flying Ad Hoc Networks PDF
Document Details
Uploaded by CleverParallelism
2024
Mehdi Hosseinzadeh, Jawad Tanveer, Amir Masoud Rahmani, Khursheed Aurangzeb, Efat Yousefpoor, Mohammad Sadegh Yousefpoor, Aso Darwesh, Sang-Woong Lee, Mahmood Fazlali
Tags
Summary
This paper proposes a Q-learning-based smart clustering routing method for flying ad hoc networks. The method dynamically adjusts clustering parameters based on factors like energy, centrality, and speed to improve network efficiency and lifespan. Simulation results are compared to other clustering schemes, highlighting its performance aspects.
Full Transcript
Journal of King Saud University - Computer and Information Sciences 36 (2024) 101894 Contents lists available at ScienceDirect Journal of King Saud University - Computer...
Journal of King Saud University - Computer and Information Sciences 36 (2024) 101894 Contents lists available at ScienceDirect Journal of King Saud University - Computer and Information Sciences journal homepage: www.sciencedirect.com Full length article A Q-learning-based smart clustering routing method in flying Ad Hoc networks Mehdi Hosseinzadeh a,b ,1 , Jawad Tanveer c ,1 , Amir Masoud Rahmani d , Khursheed Aurangzeb e , Efat Yousefpoor f , Mohammad Sadegh Yousefpoor f , Aso Darwesh g , Sang-Woong Lee h ,∗, Mahmood Fazlali i ,∗ a Institute of Research and Development, Duy Tan University, Da Nang, Viet Nam b School of Medicine and Pharmacy, Duy Tan University, Da Nang, Viet Nam c Department of Computer Science and Engineering, Sejong University, Seoul 05006, Republic of Korea d Future Technology Research Center, National Yunlin University of Science and Technology, Yunlin, Taiwan e Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P.O. Box 51178, Riyadh 11543, Saudi Arabia f Department of Computer Engineering, Dezful Branch, Islamic Azad University, Dezful, Iran g Department of Information Technology, University of Human Development, Sulaymaniyah, Iraq h Pattern Recognition and Machine Learning Lab, Gachon University, 1342 Seongnamdaero, Sujeonggu, Seongnam 13120, Republic of Korea i Cybersecurity and Computing Systems Research Group, School of Physics, Engineering and Computer Science, University of Hertfordshire, AL10 9AB Hertfordshire, UK ARTICLE INFO ABSTRACT Keywords: Flying ad hoc networks (FANETs) have particular importance in various military and civilian applica- Flying ad hoc networks (FANETs) tions due to their specific features, including frequent topological changes, the movement of drones in a Clustering three-dimensional space, and their restricted energy. These features have created challenges for designing Unmanned aerial vehicles (UAVs) cluster-based routing protocols. In this paper, a Q-learning-based smart clustering routing method (QSCR) is Reinforcement learning (RL) suggested in FANETs. In QSCR, each node discovers its neighbors through the periodic exchange of hello Machine learning (ML) messages. The hello time interval is different in each cluster, and cluster leaders determine this interval based on the average speed similarity. Next, an adaptive clustering process is presented for categorizing drones in the clusters. In this step, the cluster leader is selected based on a new parameter called merit value, which includes residual energy, centrality, neighbor degree, speed similarity, and link validity time. Then, a centralized Q- learning model is presented to tune weight coefficients related to merit parameters dynamically. In the last step, the routing process is done using a greedy forwarding technique. Finally, QSCR is run on NS2, and the simulation results of QSCR are compared with those of ICRA, WCA, and DCA. These results show that QSCR carries out the clustering process rapidly but has less cluster stability than ICRA. QSCR gets energy efficiency and improves network lifetime. In the routing process, QSCR has a high packet delivery rate compared to other schemes. Also, the number of isolated clusters created in QSCR is less than other clustering methods. However, the proposed scheme has a higher end-to-end delay than ICRA. Also, this scheme experiences more communication overhead than ICRA slightly. 1. Introduction aerial photos, 5G technology, wildfire monitoring, road traffic monitor- ing, intelligent UAV-assisted agriculture, UAV-assisted connections, and Recently, unmanned aerial vehicles (UAVs) have been successfully search and rescue (Alam and Moh, 2022; Alam et al., 2022). In these used in military and civilian areas for different applications, including ∗ Corresponding authors. E-mail addresses: [email protected] (S.-W. Lee), [email protected] (M. Fazlali). 1 These authors contributed equally to this work. Peer review under responsibility of King Saud University. https://doi.org/10.1016/j.jksuci.2023.101894 Received 1 November 2023; Received in revised form 14 December 2023; Accepted 15 December 2023 Available online 8 January 2024 1319-1578/© 2024 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). M. Hosseinzadeh et al. Journal of King Saud University - Computer and Information Sciences 36 (2024) 101894 applications, reliable and fast wireless communication between UAVs utilize previous experiences to make more smart decisions and increase must be formed using a network. This is a motivation to carry out drone reward (Khan et al., 2022; Lansky et al., 2022a; Hosseinzadeh et al., missions in the form of a flying ad hoc network (FANET) (Hossein- 2023b). RL is employed for many applied scenarios such as forecast- zadeh et al., 2023c; Yousefpoor et al., 2021). In this network, drones ing network topology, estimating communication channels, optimizing cooperate with each other, and consequently, FANET is more effective flight trajectory, and designing routing. A well-known reinforcement and efficient than a single-UAV system in terms of cost development, learning technique is Q-learning (QL) whose features are off-policy, scalability, survivability, and efficiency (Messaoudi et al., 2023; Hos- model-free, and value-based (Rahmani et al., 2022c; Lansky et al., seinzadeh et al., 2023d). In addition, FANET includes distinct features 2022b). It can be used for multi-objective optimization goals in FANET. that make it different from other ad hoc networks such as mobile ad QL examines the expected cumulative reward and adopts an optimal hoc networks (MANETs) or vehicular ad hoc networks (VANETs). Some policy according to the obtained experiences. In addition, QL is a of these features are highly moving UAVs, very dynamic topology, successful solution to choose an optimal path in the data transfer sparse network, and limited energy capacity (Darabkh et al., 2022; process to the destination (Rahmani et al., 2022a; Wang et al., 2022). Rahmani et al., 2022b). It is essential to pay attention to these spe- In this paper, a Q-learning-based smart clustering routing method cific features when creating routing protocols in these networks. For (QSCR) is suggested for flying ad hoc networks. QSCR includes four this reason, many routing protocols proposed for MANET and VANET phases: dynamic neighbor discovery, adaptive clustering, dynamic and cannot be used directly in FANET (Lee et al., 2021; Lansky et al., 2023). smart adjustment of clustering parameters, and greedy routing. In Generally, routing protocols presented in FANET can be categorized QSCR, the clustering phase is responsible for constructing network into two groups: single-path routing and multi-path routing (Almeida topology, and the greedy routing process is responsible for creating et al., 2021; Rovira-Sugranes et al., 2021). Single-path routing protocols paths. In the clustering process, cluster leaders are chosen based on consider a static routing table, which contains routing paths calculated the weighted sum of five merit parameters, including residual energy, before network operation begins. These routes are unchanged at all centrality, neighbor degree, speed similarity, and link validity time. times. Multi-path routing protocols guide data packets step-by-step toward the destination (Zhang et al., 2022b; Yang et al., 2022). A key The weight coefficient related to each merit parameter is tuned up subject in the route discovery process of these routing protocols is how based on a centralized QL-based learning model so that these weight to single out the next-hop node. In addition, routing protocols can be coefficients change according to the conditions of UAVs in the network, grouped into two categories: topology-based routing and position-based and consequently, the importance of each parameter will be different routing. The first group includes three classes: proactive, reactive, and accordingly. QSCR adjusts Q-learning parameters to be more consistent hybrid (Liu et al., 2023; Jin et al., 2023). with the dynamic topology of FANET. In general, the main innovations To transfer data packets between UAVs in a high-dense FANET, in this paper are as follows: researchers have proposed cluster-based routing protocols. In these In QSCR, the neighbor discovery process focuses on the content of methods, FANET includes a hierarchical structure in which UAVs act each hello packet and its propagation time interval. Each cluster as cluster heads (CHs) or cluster members (CMs) (Dhall and Dhongdi, leader is responsible for specifying the hello propagation time 2023; Hosseinzadeh et al., 2023e). The selection of cluster heads (also interval in its cluster based on the average similarity between its called cluster leaders) is a major challenge in the clustering process. velocity vector and cluster members’ velocity vector. This role is determined based on different factors such as residual In QSCR, an adaptive clustering process is proposed for FANET energy, centrality, neighbor degree, and so on (Arafat and Moh, 2019; to distribute energy consumption in the network uniformly and Abdulhae et al., 2022). Note that some clustering techniques are single- criterion methods, meaning that, cluster leaders are chosen only based increase network longevity. At this phase, cluster leaders are cho- on one criterion such as energy, while other clustering techniques are sen based on a new parameter called merit, which includes five multi-criteria methods, meaning that, they consider different criteria parameters, namely residual energy, centrality, neighbor degree, to choose cluster leaders (Hosseinzadeh et al., 2023a; Mansoor et al., speed similarity, and link validity time. 2023). Multi-criteria clustering methods allocate a specific weight to In QSCR, a QL-based learning model is presented to tune weight each clustering criterion, and the weighted sum of these clustering coefficients related to the merit parameters and determine their criteria is considered as the final score for determining cluster lead- effect on the merit value in a dynamic manner and according to ers (Gharib et al., 2022; Khedr et al., 2023). In the clustering process, FANET conditions. In this learning process, the reward function an important challenge is how to choose a suitable weight coefficient is also calculated based on three metrics, namely the balance for each clustering criterion. Most studies only emphasize that the sum of energy consumption, the number of isolated clusters, and the of these weight coefficients should be equal to one (Khan et al., 2019, distribution of cluster leaders in the network. 2020). Most research studies consider the same value for these weight In QSCR, learning parameters are chosen adaptively. QSCR cal- coefficients, while some protocols also determine different weights for culates the learning rate based on the stability of clusters. In each clustering criterion. In a clustered network, CMs are responsible addition, the discount coefficient is calculated based on the sim- for transmitting data collected from the environment to its cluster ilarity between the velocity vectors of cluster members and the leader (Jawhar et al., 2017; Shahzadi et al., 2021). In addition, clus- cluster leader. ter leaders are responsible for managing intra-cluster communication In QSCR, the inter-cluster routing process follows a greedy routing and intra-cluster data aggregation. They act as relay nodes and form technique so that the cluster leader examines the positions of inter-cluster communication. Cluster leaders are also responsible for its adjacent clusters and selects the closest cluster leader to the communicating with the ground control station (GCS) (Alzahrani et al., destination as its next-hop node. 2020; Hadi et al., 2023). Due to the dynamic nature of UAVs in FANET, intra or inter-cluster communication links have a very short lifetime, The structure of this paper includes the following sections: Section 2 and the stability of clusters is very weak. These affect the reliability of introduces some related works in flying ad hoc networks. Section 3 the data transmission process and, consequently, cause routing ineffi- provides the concepts of reinforcement learning, especially Q-learning, ciency and weaken the quality of services (QoS) in FANET (Fanian and because of its use in QSCR. Section 4 expresses the assumptions of the Rafsanjani, 2019; Cheng et al., 2023). network model. Section 5 describes the details of the proposed method. Today, reinforcement learning (RL) is widely applied for improving In Section 6, the simulation and evaluation results of the proposed wireless communication paths in ad hoc networks. In RL, agents exe- method are compared to other schemes. Finally, this paper is concluded cute various actions in the dynamic environment such as FANET, and in Section 7. 2 M. Hosseinzadeh et al. Journal of King Saud University - Computer and Information Sciences 36 (2024) 101894 2. Related works of links connected to their neighbors with regard to their 3D movement and transmit data packets to the destination. Each UAV makes its In Ergenç et al. (2019), a dependability-based clustering algorithm routing decisions in accordance with the path construction part in 3DQ. (DCA) is offered in MANETs. It can properly deal with the challenge The evaluation results show that 3DQ is suitable for FANET and is more that originates from the lack of a central administrator and centralized successful than other routing schemes. infrastructure in MANETs. This scheme makes a clustered topology, In Alam and Moh (2023), a Q-learning routing scheme based on which has high scalability and reliability. DCA pays attention to the adaptive flocking control (QRIFC) in FANETs. This scheme provides an dependency of clusters along with individual nodes to stabilize the adaptive flocking control system to manage the optimized movement clustered network structure. In addition, DCA analyzes various criteria of UAVs with regard to the distance traveled by each UAV. Hence, this and introduces a complete optimization structure to choose CHs and system can handle the density of UAVs in FANET. Furthermore, it relies calculate the dependency of clusters in a weighted clustering algorithm. on the information of two-hop neighbors to calculate the upper and In DCA, the weight coefficient related to each clustering metric is lower boundaries of distance between UAVs in the network. This causes computed by the moment-independent Delta analysis technique. DCA a balance between area coverage and network connectivity, increases makes a flexible clustering structure, which is stable, energy-efficient, the stability time of communication links between neighboring nodes, and provides quality of services (QoS) in the network. Each node and decreases control overhead in the network. A multi-objective Q- employs the dependency related to each cluster for joining a suitable learning model is designed to carry out the exploration and exploitation cluster. Simulation results indicate the superiority of DCA compared to operations for finding the best path in terms of delay, stability, and other approaches for fixed and dynamic scenarios. energy efficiency. The evaluation results express the superiority of In Chatterjee et al. (2002), an on-demand decentralized clustering QRIFC compared to other methods. scheme called WCA is suggested in MANET. In these dynamic networks, In Cui et al. (2022), a topology-aware and flexible Q-learning rout- nodes are mobile in nature. Consequently, frequent connections and ing algorithm (TARRAQ) is presented in FANETs. It tracks topology disconnections of nodes to/from the clusters threaten network stability, changes, controls overhead, and chooses routing paths in an indepen- so the system needs to be configured again. In WCA, CHs create a dent and distributed manner. TARRAQ checks the behavior of UAVs dominant set to guarantee network scalability and stability. WCA takes using queue theory and gets two scales, namely closed-form solutions into consideration several metrics, including ideal degree, movement of neighbor change rate (NCR) and neighbors’ change inter-arrival time of nodes, residual energy, and transmission power, to decide on cluster (NCIT). According to NCR and NCIT, a flexible evaluation time interval heads. It benefits from a weighted approach to choose CHs based on is calculated in accordance with a new metric called the expected their connection and energy levels. In WCA, the number of nodes perception delay of events that occur in the network. In addition, a around a cluster must be more than a threshold value to improve Q-learning-based routing model is suggested to find different paths the performance of the media access control (MAC) protocol. WCA between UAVs in a distributed, independent, and adaptive manner. employs a non-periodic CH selection process to lower computational The expected perception delay obtained in this scheme can lower and communication costs. CHs are responsible for making inter-cluster communication costs when updating the action set. The evaluations communications in the network. The experimental results show that performed on this scheme confirm the high accuracy and successful WCA works better than other approaches. performance of TARRAQ in FANET. In Guo et al. (2022), an intelligent clustering routing approach In Arafat and Moh (2021), a Q-learning-based topology-aware rout- (ICRA) is presented for FANETs. ICRA constitutes three processes, ing (QTAR) protocol for FANET. It uses the information of two-hop namely clustering procedure, clustering adjustment procedure, and neighbors to get local network topology and create reliable routes routing procedure. The first procedure calculates the fitness value of between UAVs in the network. To find the most suitable paths in the each node to determine its state, namely CH or CM. Furthermore, to network, QTAR employs the information of two-hop neighbors and extend topology stability and network lifespan in diverse states, rein- considers several metrics such as residual energy, location, speed, and forcement learning assists the clustering adjustment procedure to learn delay. QTAR extends its knowledge about local network topology due constantly the network environment and obtain the best strategy by to the use of two-hop neighbor information and, consequently, makes taking different actions to calculate the fitness of UAVs in each network optimal routing decisions in the network. In addition, these routing state. In accordance with this knowledge, the clustering adjustment decisions are made based on a Q-learning system to manage topological procedure aids the clustering technique to be compatible with the changes adaptively. The evaluation results confirm the efficiency of current network state. In addition, in the last procedure, ICRA employs QTAR compared to other routing schemes. a few gateways in each cluster to facilitate the inter-cluster routing Table 1 expresses the most important strengths and weaknesses of procedure. This shortens end-to-end delay and increases the packet scheme mentioned in this section. delivery ratio. The simulation results emphasize that ICRA is better than other clustering methods. 3. Basic concept In Lansky et al. (2022c), a Q-learning-based routing approach called QFAN to control weather conditions using flying ad hoc networks. Reinforcement learning (RL) belongs to the family of machine learn- In QFAN, there are two phases, namely the discovery and the main- ing (ML) techniques in artificial intelligence (AI). This science can be tenance of paths created in the network. The first phase constitutes explained using a simple example. Assume that a person did not test a Q-learning-aided routing model. In this learning model, the state a particular food previously. Now, he (she) wants to test this food for space is restricted by calculating a filtering parameter, which removes the first time. As a result, this person detects whether this food has some UAVs from this space. The second phase repairs the paths, which a good taste or not. Accordingly, he (she) acquires knowledge about may fail in the near future. Evaluation results confirm that QFAN various foods, and this acquired knowledge can be applied to decide has a superior performance than other routing approaches. However, whether this person will again eat this food or not in the future (da overhead in QFAN is slightly high. Costa et al., 2021; Adams et al., 2022). In computer science, RL is In Zhang et al. (2022a), a three-dimensional Q-learning-based rout- used in algorithms, which need knowledge about their task. These ing (3DQ) approach is offered to provide QoS requirements, especially algorithms utilize RL to make the best decision for doing this task. PDR, in flying ad hoc networks. 3DQ includes a learning model, which Furthermore, RL can be used in FANETs to improve and enhance benefits from Q-learning to make routing decisions on the networks. applications, for example, resource management, channel modeling, Two main parts of 3DQ are link-state prediction and path construction. localization, network security aspects, and routing. RL is made up of The first part, link-state prediction, enables UAVs to forecast the status various elements, namely agent, learning environment, action, state, 3 M. Hosseinzadeh et al. Journal of King Saud University - Computer and Information Sciences 36 (2024) 101894 Table 1 Comparison of related works. Approach Strengths Weaknesses DCA (Ergenç et al., 2019) Making reliable communication links between nodes, improving High time complexity, need to high time for creating clusters due energy efficiency, getting high scalability, prioritizing nodes in the to high computational complexity, lack of adaptability to the CH selection process, reducing communication overhead dynamic environment of FANET WCA (Chatterjee et al., 2002) Increasing energy efficiency, scalability, high fault tolerance High communication overhead due to the exchange of control messages between CHs, lack of adaptation to the very dynamic topology of FANET because of its high topology changes ICRA (Guo et al., 2022) Improvement of energy consumed by nodes in the network, Use of fixed learning parameters in the proposed Q-learning model, enhancing the quality of services, growing throughput in the not calculating a propagation time interval for disseminating hello clustering procedure, stabilizing network topology, shortening messages in the network end-to-end delay, rising packet delivery rate, lowering routing overhead QFAN (Lansky et al., 2022c) Filtering the search space, converging to the optimal solution Considering the fixed values for Q-learning parameters, namely rapidly, shortening delay, increasing the data transfer rate, learning rate and discount factor, increasing communication improving network scalability overhead 3DQ (Zhang et al., 2022a) Suitable throughput, improving QoS requirements, including delay Considering two fixed values for learning rate and discount factor in and PDR in the network, avoiding congestion in the routing Q-learning, ignoring the energy consumed by UAVs, low scalability process, preventing routing holes QRIFC (Alam and Moh, 2023) Decreasing delay in the network, creating stable routes, getting High time complexity, the possibility of falling the adaptive energy efficiency, attention to learning parameters and updating flocking control algorithm in a local optimum them constantly, improving throughput, PDR, and reliability TARRAQ (Cui et al., 2022) Improving data transmission rate, minimizing energy consumption, Low scalability, converging to optimal solution slowly because of increasing reliability and throughput, adaptability to topological having a large Q-table, high time and computational complexities changes in FANET, calculating the hello propagation time interval adaptively QTAR (Arafat and Moh, 2021) Adaptability to the dynamic environment of FANET, high High time complexity, having a large Q-table, and converging to scalability, decreasing delay in the routing process, increasing the optimal solution slowly throughput and PDR in the network, tuning learning parameters (𝛼 and 𝛾) adaptively and reward. Continuous interaction with the environment causes the Discount factor (𝛾): 𝛾 is a coefficient limited to the interval [0, 1]. agent to learn its best behavior in a learning environment. During Q-learning algorithms often assume that the value of future re- the learning process, the agents acquire diverse experiences by ex- wards is less than that of current rewards. As a result, Q-learning periencing different scenarios on the environment. Each scenario is function discounts them. called a state. However, in each state, the agent can single out an Learning rate (𝛼): 𝛼 shows( the knowledge ) acquired from the action from a set of permissible actions. The selected action affects environment for updating 𝑄 𝑆𝑡 , 𝐴𝑡. the environment and produces a result. Accordingly, the agent gets a 𝜖-Greedy policy: It is a simple solution to choose actions by reward or penalty from the environment. Over time, the agent tries to considering the estimations of Q-values. Accordingly, the next increase the rewards gotten from the environment, and consequently, it action is chosen based on the highest Q-value by considering the probability (1 − 𝜖) or it is selected randomly based on the will learn its best behavior in each state (Prudencio et al., 2023; Elallid probability 𝜖. et al., 2022). Q-learning is a subset of RL, which uses a new concept called Q- 4. Network model value to optimize the agent’s behavior using an iterative manner. This ( ) algorithm allocates Q-values (i.e. 𝑄 𝑆𝑡 , 𝐴𝑡 ) to different states and In QSCR, the flying ad hoc network includes a number of flying ( ) actions. 𝑄 𝑆𝑡 , 𝐴𝑡 estimates how desirable it is to do the action 𝐴𝑡 in the nodes and a ground station control (GCS). In this scheme, each flying state 𝑆𝑡. According to Q-learning, the agent is in an initial state. Then, node is displayed as 𝑈𝑖 where 𝑖 = 1, 2, … , 𝑛. 𝑈𝑖 has a direct com- it is transferred from the current state to the next state. Each transition munication with flying nodes available in its communication range occurs because of an action performed by the agent when interacting (𝑟𝑖 ) while it has created an indirect or multi-hop communication with with the environment. Therefore, at each step, the agent applies its other flying nodes that are out of 𝑟𝑖. In QSCR, 𝑈𝑖 has access to a selected action to the environment and gets a reward from it. Then, positing system and can obtain its position and speed at any moment. it goes to another state. This process continues until the agent reaches In this network, UAVs are connected to each other through UAV-to- ( ) UAV communication (U2U) in the air. However, UAVs and GCS are the goal. In this situation, an episode is completed. In QL, 𝑄 𝑆𝑡 , 𝐴𝑡 is connected to each other through UAV-to-GCS communication (U2G) or estimated based on Eq. (1) to calculate Q-values when interacting the GCS-to-UAV communication (G2U). These UAVs are categorized into 𝑚 agent with the environment (Ganesh and Xu, 2022; Sutton and Barto, groups using a dynamic and intelligent clustering method. Each group 2018). [ is called cluster (𝐶𝑘 ) so that 𝑘 = 1, 2, … , 𝑚. 𝐶𝑘 includes a cluster ( ) ( ) ( ) ( )] 𝑄 𝑆𝑡 , 𝐴𝑡 ← 𝑄 𝑆𝑡 , 𝐴𝑡 + 𝛼 𝑅𝑡+1 + 𝛾max 𝑄 𝑆𝑡+1 , 𝐴 − 𝑄 𝑆𝑡 , 𝐴𝑡 (1) leader (𝐿𝑈𝑘 ) and a number of cluster member UAVs (𝑀𝑈𝑗𝑘 so that 𝐴 𝑗 = 1, 2, … , 𝑁𝑘 ). In each 𝐶𝑘 , some cluster members close to the cluster 𝑆𝑡 and 𝑆𝑡+1 show the current and next states of the agent, respec- border also play the role of inter-cluster gateways (𝐺𝑈𝑔𝑘 ) and are used tively. 𝐴𝑡 indicates the present action selected based on a particular to connect with adjacent clusters. This network model is shown in policy. 𝐴′ is the best action selected based on the current Q-value. 𝑅𝑡+1 Fig. 1. In the following, the task of each node is explained in the expresses the reward obtained from the environment when performing network. ( ) the current action. max 𝑄 𝑆𝑡+1 , 𝐴 indicates the maximum Q-value GCS: It benefits from an unlimited and stable energy source. 𝐴 under the next state. In the following, some learning parameters related GCS is responsible for monitoring the network, controlling flight to QL are described briefly: routes, determining the mission of flying nodes, and transmitting 4 M. Hosseinzadeh et al. Journal of King Saud University - Computer and Information Sciences 36 (2024) 101894 Fig. 1. Network model in QSCR. various commands to UAVs. In QSCR, the most important task of 5. Proposed scheme GCS is to tune clustering parameters using a Q-learning algorithm intelligently and dynamically. Here, a Q-learning-based smart clustering routing method (QSCR) 𝐋𝐔𝐤 : In QSCR, the role of cluster leader periodically rotates is described for flying ad hoc networks. QSCR includes four phases: between flying nodes so that energy consumption is uniformly Dynamic neighbor discovery distributed between UAVs. This increases network lifespan. Each Adaptive clustering 𝐿𝑈𝑘 is responsible for managing and monitoring its clusters. It Smart and dynamic adjustment of clustering parameters creates two types of communication, namely intra-cluster and Greedy routing inter-cluster communications on the network. See the schematic design of QSCR in Fig. 2. In addition, Table 2 𝐌𝐔𝐤𝐣 : In QSCR, each 𝑀𝑈𝑗𝑘 is responsible for sensing the sur- presents the symbols used in the proposed scheme. rounding environment and transmitting this information to it. These nodes support only one type of communication, namely, 5.1. Dynamic neighbor discovery intra-cluster communication. 𝐆𝐔𝐤𝐠 : In QSCR, each 𝐺𝑈𝑔𝑘 must perform the tasks of a cluster In QSCR, the neighbor discovery process gives 𝑈𝑖 this opportunity member node, namely sensing the environment and sending infor- to get a local network topology and use this local information in mation to 𝐿𝑈𝑘. Also, it participates in inter-cluster routing process clustering and routing processes. In this process, hello packets play an and sends the required information to adjacent clusters. These important role in finding neighboring nodes and building a neighbor nodes support two types of communication, namely inter-cluster table. The hello propagation operation between neighboring UAVs is shown in Fig. 3. In QSCR, the hello packet related to 𝑈𝑖 is marked as and intra-cluster communications. 𝐻𝑖. The content of 𝐻𝑖 and its propagation period are two important 5 M. Hosseinzadeh et al. Journal of King Saud University - Computer and Information Sciences 36 (2024) 101894 Fig. 2. Schematic design of QSCR. points in the neighbor discovery process. The content of 𝐻𝑖 specifies When 𝑈𝑖 gets 𝐻𝑗 from one of the adjacent UAVs, such as 𝑈𝑗 , then that 𝑈𝑖 wants to access what information from its neighboring nodes, 𝑈𝑖 examines the following conditions for storing the content of and the propagation period determines what time interval is suitable 𝐻𝑗 in its neighbor table (𝑇 𝑎𝑏𝑙𝑒𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 ). 𝑖 for broadcasting these packets on the network regularly. Algorithm 1 provides the pseudo-code related to this process. – If 𝑈𝑖 has information about 𝑈𝑗 in 𝑇 𝑎𝑏𝑙𝑒𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑖 , then 𝑈𝑖 ( ) examines the new content of 𝐻𝑗 and updates the entry Content: 𝐻𝑖 includes 𝐼𝐷𝑖 , spatial coordinates 𝐿𝑜𝑐𝑖 = 𝑥𝑡𝑖 , 𝑦𝑡𝑖 , 𝑧𝑡𝑖 , related to 𝑈𝑗 in 𝑇 𝑎𝑏𝑙𝑒𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟. ( 𝑡 𝑡 𝑡) 𝑖 speed and movement angle 𝑀𝑜𝑏𝑖 = 𝑉𝑖 , 𝜃𝑖 , 𝜑𝑖 , identifier of – If 𝑈𝑖 has no information about 𝑈𝑗 in 𝑇 𝑎𝑏𝑙𝑒𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 , then 𝑈𝑖 𝑖 cluster leader (𝐼𝐷𝐿𝑈𝑘 ), and the propagation period of 𝐻𝑖 (𝐻𝑇𝑖 ). extracts the content of 𝐻𝑗 and stores it in a new entry added The structure of 𝐻𝑖 is stated in Eq. (2). Note that if 𝑈𝑖 is a cluster to 𝑇 𝑎𝑏𝑙𝑒𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑖. leader, it inserts its ID into the field 𝐼𝐷𝐿𝑈𝑘. Otherwise, it inserts – If 𝑈𝑖 has information about 𝑈𝑗 in 𝑇 𝑎𝑏𝑙𝑒𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑖 but it does the identifier of its cluster leader in this field. ⟨ ⟩ not get any new 𝐻𝑗 from 𝑈𝑗 , then 𝑈𝑖 deletes 𝑈𝑗 from its ‖ ‖ ‖ ‖ 𝐻𝑖 = 𝐼𝐷𝑖 ‖𝐿𝑜𝑐𝑖 ‖𝑀𝑜𝑏𝑖 ‖𝐼𝐷𝐿𝑈𝑘 ‖𝐻𝑇𝑖 (2) neighbor list by removing its information from 𝑇 𝑎𝑏𝑙𝑒𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟. ‖ ‖ ‖ ‖ 𝑖 6 M. Hosseinzadeh et al. Journal of King Saud University - Computer and Information Sciences 36 (2024) 101894 Table 2 of the speed vector on the plane 𝑋𝑌 and the positive axis 𝑋, and Symbols used in QSCR. the angle between the velocity vector and the positive axis 𝑍. Symbol Description 𝑈𝑖 Flying node 𝑖 in the network Then, 𝐿𝑈𝑘 calculates the average similarity of its velocity vector to 𝑛 Total number of flying nodes in the network cluster members’ velocity based on Eq. (7). 𝑟𝑖 Communication range of 𝑈𝑖 𝑁𝑘 𝑁𝑖 Number of neighbors of 𝑈𝑖 in 𝑇 𝑎𝑏𝑙𝑒𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑖 1 ∑ 𝑡 𝐶𝑘 Cluster 𝑘 𝑆𝐶𝑘𝑡 = 𝑆𝐶𝑘𝑗 (7) 𝑁𝑘 𝑗=1 𝑚 Total number of clusters in the network 𝐿𝑈𝑘 Cluster leader in 𝐶𝑘 So that 𝑁𝑘 indicates the number of cluster members in 𝐶𝑘. 𝑀𝑈𝑗𝑘 𝑗th cluster member in 𝐶𝑘 Then, 𝐿𝑈𝑘 must obtain 𝐻𝑇𝑘 from Eq. (8). 𝑁𝑘 Total number of cluster members in 𝐶𝑘 0 𝐺𝑈𝑔𝑘 𝑔th inter-cluster gateways in 𝐶𝑘 ⎧ 𝐻𝑇𝑘 , 𝑡=0 𝐻𝑖 Hello packet related to 𝑈𝑖 𝑡 ⎪ ⎛ 𝑆𝐶 𝑡 −𝑆𝐶 𝑡−1 ⎞ 𝐻𝑇𝑘 = ⎨ ⎜ 𝑘 𝑘 ⎟ (8) 𝐻𝑇𝑖 Propagation time interval corresponding to 𝐻𝑖 ⎪ ⎜ 𝐻𝑇 𝑡−1 ⎟ 𝐼𝐷𝑖 Identifier of 𝑈𝑖 ⎩𝐻𝑇𝑘𝑡−1 𝑒⎝ 𝑘 ⎠, 𝑒𝑙𝑠𝑒 ( ) 𝐿𝑜𝑐𝑖 = 𝑥𝑡𝑖 , 𝑦𝑡𝑖 , 𝑧𝑡𝑖 Spatial coordinates of 𝑈𝑖 ( 𝑡 𝑡 𝑡) So that, 𝑀𝑜𝑏𝑖 = 𝑉𝑖 , 𝜃𝑖 , 𝜑𝑖 Speed and movement angle of 𝑈𝑖 ( 𝑡 ) 𝑉 −𝑉 𝑇 𝑎𝑏𝑙𝑒𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑖 Neighbor table stored in 𝑈𝑖 − 𝐿𝑈𝑘 min 𝑉max −𝑉min 𝑉max Upper boundary of UAVs’ speed in the network 𝐻𝑇𝑘0 = 𝐻𝑇 𝐷𝑒𝑓 𝑎𝑢𝑙𝑡 𝑒 (9) 𝑉min Lower boundary of UAVs’ speed in the network 𝑚𝑡𝑖 Merit of 𝑈𝑖 Here, 𝐻𝑇 𝐷𝑒𝑓 𝑎𝑢𝑙𝑡 represents a default value, for example, one second, 𝐸𝑖 Residual energy of 𝑈𝑖 for broadcasting hello messages in the network. 𝑉max and 𝑉min indicate 𝐸𝑖𝑛𝑖𝑡𝑖𝑎𝑙 Initial energy capacity of UAVs in the network 𝐶𝑁𝑅𝑖 Centrality of 𝑈𝑖 the upper and lower boundaries related to the speed of UAVs in the 𝑡 network. 𝑉𝐿𝑈 is the velocity length of 𝐿𝑈𝑘. 𝐷max Maximum distance between UAVs in the network 𝑘 𝐷𝑒𝑔𝑖 Neighbor degree of 𝑈𝑖 Algorithm 1 Dynamic neighbor discovery 𝑆𝐶𝑖𝑡 Average similarity between the velocity vector of 𝑈𝑖 and those of its neighbors Input: 𝑈𝑖 : Flying node 𝑖 so that 𝑖 = 1, 2,..., 𝑛 𝐿𝑇𝑖𝑗 Validity time of link between 𝑈𝑖 and 𝑈𝑗 𝑁𝑖 : Number of neighbors of 𝑈𝑖 𝑖 𝑀𝑚𝑒𝑟𝑖𝑡 Merit message of 𝑈𝑖 𝐻𝑖 : Hello message related to 𝑈𝑖 𝐼𝐷𝑖 : Identifier of 𝑈𝑖 𝑖 𝑀𝐿𝑈 Leader message sent by 𝐿𝑈𝑘 ( ) 𝐿𝑜𝑐𝑖 = 𝑥𝑡𝑖 , 𝑦𝑡𝑖 , 𝑧𝑡𝑖 : Location information of 𝑈𝑖 𝑘 ( ) 𝑖 𝑀𝑀𝑈 𝑘 Connection message from 𝑀𝑈𝑗𝑘 to 𝐿𝑈𝑘 𝑀𝑜𝑏𝑖 = 𝑉𝑖𝑡 , 𝜃𝑖𝑡 , 𝜑𝑡𝑖 : Mobility information of 𝑈𝑖 𝑗 𝐼𝐷𝐿𝑈 : Identifier of the leader cluster corresponding to 𝑈𝑖 𝐴𝑐𝑐𝑒𝑝𝑡 𝑘 𝑀𝐿𝑈 Accept message send by 𝐿𝑈𝑘 𝑁𝑘 : Number of members of the cluster 𝑘 𝑘 𝐺𝑎𝑡𝑒𝑤𝑎𝑦 𝑀𝐺𝑈 𝑘 Gateway message from 𝐺𝑈𝑔𝑘 to 𝐿𝑈𝑘 𝐻𝑇𝑖 : Hello dissemination period 𝑔 𝑡𝑁𝑒𝑡 : A timer for measuring the network time. 𝑅𝑡 (𝐹 ) Reward function Output: 𝑇 𝑎𝑏𝑙𝑒𝑖 𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 : Neighboring table stored in 𝑈𝑖 𝛼 Learning rate Begin 𝛾 Discount factor 1: repeat 𝐶𝑘𝑐ℎ𝑎𝑛𝑔𝑒 Stability value of 𝐶𝑘 2: if 𝑡𝑁𝑒𝑡 mod 𝐻𝑇𝑖 = 0 then 𝑁𝐶 𝑐ℎ𝑎𝑛𝑔𝑒 Number of role changes in 𝐶𝑘 3: 𝐔𝐢 : Obtain 𝐿𝑜𝑐𝑖 and 𝑀𝑜𝑏𝑖 from a positioning system (GPS); 𝑘 4: if 𝑈𝑖 plays the role of 𝐿𝑈𝑘 then 5: for 𝑗 = 1 to 𝑁𝑘 do 6: 𝐔𝐢 : Obtain the cosine similarity (𝑆𝐶𝑘𝑗 𝑡 ) between the velocity vectors of 𝑈 𝑖 and 𝑀𝑈𝑗𝑘 from Equation (3); Propagation time interval (𝐇𝐓𝐤 ): In QSCR, the cluster leader 7: end for 𝐿𝑈𝑘 is responsible for specifying the hello propagation time 8: 𝐔𝐢 : Calculate the average cosine similarity (𝑆𝐶𝑘𝑡 ) based on Equation (7); interval (𝐻𝑇𝑘 ) in its cluster (𝐶𝑘 ) and updating it at each step. 9: 𝐔𝐢 : Refresh 𝐻𝑇𝑖 based on Equation (8); Then, 𝐿𝑈𝑘 must notify 𝐻𝑇𝑘 to the cluster member UAVs (𝑀𝑈𝑗𝑘 10: 𝐔𝐢 : Send 𝐻𝑇𝑖 to all 𝑀𝑈𝑗𝑘 in its cluster; 11: else so that 𝑗 = 1, 2, … , 𝑁𝑘 ), and 𝑀𝑈𝑗𝑘 must adhere to this propagation 12: 𝐔𝐢 : Send a request message to 𝐿𝑈𝑘 to obtain updated 𝐻𝑇𝑖 ; time interval. To determine 𝐻𝑇𝑘 in 𝐶𝑘 , 𝐿𝑈𝑘 calculates the cosine 13: end if similarity between two velocity vectors related to 𝐿𝑈𝑘 and 𝑀𝑈𝑗𝑘 14: 𝐔𝐢 : Insert 𝐼𝐷𝑖 , 𝐿𝑜𝑐𝑖 , 𝑀𝑜𝑏𝑖 , 𝐼𝐷𝐿𝑈 , and 𝐻𝑇𝑖 into 𝐻𝑖 based on Equation (2); 𝑘 based on Eq. (3). 15: 𝐔𝐢 : Broadcast 𝐻𝑖 to the neighboring nodes; 16: end if 𝑥 𝑉𝐿𝑈 𝑦 𝑉 𝑥 𝑘 + 𝑉𝐿𝑈 𝑉 𝑦 𝑘 + 𝑉𝐿𝑈 𝑧 𝑉𝑧 𝑘 17: for 𝑗 = 1 to 𝑁𝑖 do 𝑘 𝑀𝑈𝑗 𝑘 𝑀𝑈𝑗 𝑘 𝑀𝑈𝑗 𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑡 18: if 𝑈𝑖 has information about 𝑈𝑗 in 𝑇 𝑎𝑏𝑙𝑒𝑖 and 𝑈𝑗 sends a new 𝐻𝑗 to 𝑈𝑖 𝑆𝐶𝑘𝑗 = √ √( ) ( )2 ( )2 ( )2 ( )2 ( )2 2 then 𝑥 𝑦 𝑧 𝑥 𝑦 𝑧 19: 𝐔𝐢 : Refresh the entry related to 𝑈𝑗 based on this 𝐻𝑗 ; 𝑉𝐿𝑈 + 𝑉𝐿𝑈 + 𝑉𝐿𝑈 × 𝑉𝑀𝑈 𝑘 + 𝑉𝑀𝑈 𝑘 + 𝑉𝑀𝑈 𝑘 𝑘 𝑘 𝑘 𝑗 𝑗 𝑗 𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 20: else if 𝑈𝑖 has information about 𝑈𝑗 in 𝑇 𝑎𝑏𝑙𝑒𝑖 and 𝑈𝑗 does not send any new 𝐻𝑗 to 𝑈𝑖 then (3) ( ) 21: 𝐔𝐢 : Remove the entry related to 𝑈𝑗 from 𝑇 𝑎𝑏𝑙𝑒𝑖 𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 ; 𝑥 ,𝑉 𝑦 ,𝑉 𝑧 So that 𝑉𝐿𝑈 is the velocity vector of 𝐿𝑈𝑘 calculated 𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 22: else if 𝑈𝑖 has not any information about 𝑈𝑗 in 𝑇 𝑎𝑏𝑙𝑒𝑖 and 𝑈𝑗 sends a 𝑘 𝐿𝑈𝑘 𝐿𝑈𝑘 ( ) new 𝐻𝑗 to 𝑈𝑖 then through Eqs. (4), (5), and (6). 𝑉 𝑥 𝑘 , 𝑉 𝑦 𝑘 , 𝑉 𝑧 𝑘 is also the 23: 𝐔𝐢 : Add a new entry related to 𝑈𝑗 to 𝑇 𝑎𝑏𝑙𝑒𝑖 𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 ; 𝑀𝑈𝑗 𝑀𝑈𝑗 𝑀𝑈𝑗 𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 24: 𝐔𝐢 : Insert the information of 𝑈𝑗 into 𝑇 𝑎𝑏𝑙𝑒𝑖 based on this new 𝐻𝑗 ; velocity vector of 𝑀𝑈𝑗𝑘 that is obtained using a similar technique. 25: end if 26: end for 𝑥 𝑡 27: until Simulation time is finished 𝑉𝐿𝑈 = 𝑉𝐿𝑈 sin 𝜑𝑡𝐿𝑈 cos 𝜃𝐿𝑈 𝑡 (4) End 𝑘 𝑘 𝑘 𝑘 𝑦 𝑡 𝑉𝐿𝑈 = 𝑉𝐿𝑈 sin 𝜑𝑡𝐿𝑈 𝑡 sin 𝜃𝐿𝑈 (5) 𝑘 𝑘 𝑘 𝑘 𝑧 𝑡 5.2. Adaptive clustering process 𝑉𝐿𝑈 = 𝑉𝐿𝑈 cos 𝜑𝑡𝐿𝑈 (6) 𝑘 𝑘 𝑘 where 𝑉𝐿𝑈𝑡 , 𝜃𝑡 , and 𝜑𝑡𝐿𝑈 indicate the speed information of In this section, QSCR proposes an adaptive clustering process for 𝑘 𝐿𝑈𝑘 𝑘 𝐿𝑈𝑘 , namely the velocity length, the angle between the projection flying ad hoc networks to categorize 𝑛 flying nodes into 𝑚 groups. Each 7 M. Hosseinzadeh et al. Journal of King Saud University - Computer and Information Sciences 36 (2024) 101894 energy value to [0, 1]. This leads to the same effect of merit parameters with different units on 𝑚𝑡𝑖. 𝐸𝑖𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝐸𝑖 = (10) 𝐸𝑖𝑛𝑖𝑡𝑖𝑎𝑙 where 𝐸𝑖𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 and 𝐸𝑖𝑛𝑖𝑡𝑖𝑎𝑙 present the current and initial energies of 𝑈𝑖 , respectively. Centrality (𝐂𝐍𝐑𝐢 ): It is very important to consider the centrality parameter when determining the merit value of 𝑈𝑖 for playing the role of 𝐿𝑈𝑘. If 𝑈𝑖 is close to the center of the cluster, the distance between it and cluster members is short. As a result, 𝑈𝑖 requires less energy to form intra-cluster communication with 𝑀𝑈𝑗𝑘. Eq. (11) calculates 𝐶𝑁𝑅𝑖 that indicates the average dis- tance between 𝑈𝑖 and its neighbors, such as 𝑈𝑗 , in 𝑇 𝑎𝑏𝑙𝑒𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑖. √ 𝑁𝑖 ( )2 ( )2 ( )2 1 ∑ 𝐶𝑁𝑅𝑖 = 𝑥𝑡𝑖 − 𝑥𝑡𝑗 + 𝑦𝑡𝑖 − 𝑦𝑡𝑗 + 𝑧𝑡𝑖 − 𝑧𝑡𝑗 (11) 𝑁𝑖 𝑗=1 ( ) ( ) where 𝐿𝑜𝑐𝑖 = 𝑥𝑡𝑖 , 𝑦𝑡𝑖 , 𝑧𝑡𝑖 and 𝐿𝑜𝑐𝑗 = 𝑥𝑡𝑗 , 𝑦𝑡𝑗 , 𝑧𝑡𝑗 shows the location information of 𝑈𝑖 and 𝑈𝑗 in the current time 𝑡, respectively. Furthermore, 𝑁𝑖 indicates total number of adjacent UAVs around 𝑈𝑖 in 𝑇 𝑎𝑏𝑙𝑒𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑖. As mentioned above, QSCR carries out nor- malization operation for limiting 𝐶𝑁𝑅𝑖 to the interval [0, 1]. This Fig. 3. Hello propagation operation between neighboring UAVs. causes the same effect of merit parameters with different units on 𝑚𝑡𝑖. 𝐶𝑁𝑅𝑖 𝐶𝑁𝑅𝑛𝑜𝑟𝑚 𝑖 = (12) 𝐷max group is called a cluster 𝐶𝑘 so that 𝑘 = 1, 2, … , 𝑚. 𝐶𝑘 includes a cluster leader (𝐿𝑈𝑘 ) and a number of cluster members (𝑀𝑈𝑗𝑘 ) so that 𝑗 = So that 𝐷max indicates the longest distance between UAVs in the 1, 2, … , 𝑁𝑘. Note that 𝑁𝑘 indicates the total number of cluster members network. It is determined by Eq. (13) and is dependent on the in 𝐶𝑘. In QSCR, the role of 𝐿𝑈𝑘 periodically rotates between UAVs so dimensions of the network. that energy consumption is uniformly distributed between all UAVs in √ 𝐷max = 𝑋 2 + 𝑌 2 + 𝑍 2 (13) the network. This leads to the improvement of network lifespan. 𝐿𝑈𝑘 is responsible for managing and monitoring its clusters. It provides As, 𝑋, 𝑌 , and 𝑍 represent the length, width, and height of the two types of communication, namely intra-cluster and inter-cluster desired network. communications, on the network. Furthermore, some cluster member Neighbor degree (𝐃𝐞𝐠𝐢 ): This parameter is very important for nodes close to the cluster border play the role of inter-cluster gateways determining the merit value of 𝑈𝑖 and accepting the role of 𝐿𝑈𝑘 (𝐺𝑈𝑔𝑘 ) to connect with the adjacent cluster. Algorithm 2 expresses the because if 𝑈𝑖 has many neighbors, it has high communication clustering process in QSCR. Furthermore, three following steps, namely capability. As a result, it can create stable and suitable communi- merit value, cluster formation, and cluster support, explain how to cation with its cluster members as well as other adjacent cluster select and support these nodes. leaders. However, if 𝑈𝑖 does not have a high neighbor degree, it may not be able to accept any node as its cluster member, and 5.2.1. Merit value consequently, an isolated cluster is formed in the network. 𝐷𝑒𝑔𝑖 In this step, each 𝑈𝑖 firstly finds its neighbors and establishes is calculated based on the number of neighbors in 𝑇 𝑎𝑏𝑙𝑒𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟. 𝑇 𝑎𝑏𝑙𝑒𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑖 in accordance with the dynamic neighbor discovery pro- 𝑖 Note that QSCR performs normalization operation to limit 𝐷𝑒𝑔𝑖 cess in Section 5.1. Then, 𝑈𝑖 must prove its merit (𝑚𝑡𝑖 ) to obtain the role to [0, 1]. of the cluster leader 𝐿𝑈𝑘 among other neighboring nodes. To get this purpose, 𝑈𝑖 calculates its merit with regard to five merit parameters, 𝑁𝑖 𝐷𝑒𝑔𝑖 = (14) namely remaining energy, centrality, neighbor degree, speed similarity, 𝑛−1 and link validity time. Where 𝑁𝑖 is the number of adjacent flying nodes around 𝑈𝑖 stored in 𝑇 𝑎𝑏𝑙𝑒𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 and 𝑛 represents the total number of flying nodes Remaining energy (𝐄𝐢 ): It is essential to pay attention to energy 𝑖 in the network. in determining the merit value of flying nodes to accept the role Speed similarity (𝐒𝐂𝐭𝐢 ): The importance of speed similarity to of cluster leader. If 𝑈𝑖 has little remaining energy, then it cannot the merit value is that if the velocities of 𝑈𝑖 and its neighbors play the role of 𝐿𝑈𝑘 well because the cluster leader has important and decisive tasks, namely cluster management, determination are similar, then the cluster 𝐶𝑘 is stable, and 𝑈𝑖 can play the of the hello propagation time interval, intra-cluster communi- role of 𝐿𝑈𝑘 for a long period. However, if the difference between cation, data aggregation, and inter-cluster communication. As a the speed of 𝑈𝑖 and its neighbors is high, cluster members may result, it deals with a lot of control and computational overhead. speedily leave the cluster area of 𝐶𝑘 , if 𝑈𝑖 is selected as 𝐿𝑈𝑘. Therefore, 𝐿𝑈𝑘 needs more energy than normal nodes. In order As a result, the need to rebuild the cluster will increase. This to balance energy consumption in the network and prevent the imposes a lot of communication and computational overhead. In death of low-energy UAVs in FANET, 𝑈𝑖 should consider its Section 5.1, Eq. (3) explains how to calculate 𝑆𝐶𝑖𝑡. The only point remaining energy for calculating its merit. 𝑈𝑖 knows about its that is important here is that 𝑆𝐶𝑖𝑡 has a value in [−1, 1]. Hence, energy level at any moment and normalizes it through Eq. (10). the normalization operation is performed on it using Eq. (15). In Note that QSCR carries out normalization operation to restrict this case, the normalized value is limited to [0, 1]. 8 M. Hosseinzadeh et al. Journal of King Saud University - Computer and Information Sciences 36 (2024) 101894 𝑆𝐶𝑖𝑡 + 1 𝑆𝐶𝑖𝑛𝑜𝑟𝑚 = (15) 2 Link validity time (𝐋𝐓𝐭𝐢 ): The importance of the link validity time in determining the merit value of 𝑈𝑖 is that a flying node with powerful and stable communication links can play the role of 𝐿𝑈𝑘 better, and this increases the stability of 𝐶𝑘. However, if there are weak connections between 𝐿𝑈𝑘 and its cluster members, these connections will be broken rapidly, and consequently, 𝐶𝑘 needs to be reconstructed. To calculate the link validity time, the distance between 𝑈𝑖 and 𝑈𝑗 in moment 𝐿𝑇𝑖𝑗 is obtained from Eq. (16). 2 2 ⎛ ⎞ ⎛ ⎞ ( )2 ⎜( ) ( ) ⎟ ⎜( ) ( ) ⎟ = ⎜ 𝑥𝑡𝑖 − 𝑥𝑡𝑗 + 𝑉𝑖𝑥 − 𝑉𝑗𝑥 𝐿𝑇𝑖𝑗 ⎟ + ⎜ 𝑦𝑡𝑖 − 𝑦𝑡𝑗 + 𝑉𝑖𝑦 − 𝑉𝑗𝑦 𝐿𝑇𝑖𝑗 ⎟ 𝐿𝑇𝑖𝑗 𝐷𝑖𝑗 ⎜ ⎟ ⎜ ⎟ ⎜⏟⏞⏞⏟⏞⏞⏟ ⏟⏞⏞⏞⏞⏟⏞⏞⏞⏞⏟ ⎟ ⎜⏟⏞⏞⏟⏞⏞⏟ ⏟⏞⏞⏞⏞⏟⏞⏞⏞⏞⏟ ⎟ ⎝ 𝛼1 𝛽1 ⎠ ⎝ 𝛼2 𝛽2 ⎠ 2 ⎛ ⎞ Fig. 4. Broadcasting merit messages for neighboring UAVs. ⎜( ) ( ) ⎟ + 𝑧𝑖 − 𝑧𝑗 + 𝑉𝑖 − 𝑉𝑗 𝐿𝑇𝑖𝑗 ⎟ ⎜ 𝑡 𝑡 𝑧 𝑧 (16) ⎜ ⎟ ⎜⏟⏞⏞⏟⏞⏞⏟ ⏟⏞⏞⏞⏞⏟⏞⏞⏞⏞⏟ ⎟ ⎝ 𝛼3 𝛽3 ⎠ ( ) So that 𝑉𝑖𝑥 , 𝑉𝑖𝑦 , 𝑉𝑖𝑧 is the velocity ∑ ( vector of)𝑈𝑖 obtained through coefficients are limited to [0, 1] so that 5𝑖=1 𝜔𝑖 = 1. In QSCR, GCS is Eqs. (17), (18), and (19). Also, 𝑉𝑗𝑥 , 𝑉𝑗𝑦 , 𝑉𝑗𝑧 shows the velocity responsible for adjusting these weight coefficients using a Q-learning vector of 𝑈𝑗 and is calculated in a similar manner. algorithm, which will be detailed in Section 5.3. 𝑉𝑖𝑥 = 𝑉𝑖𝑡 sin 𝜑𝑡𝑖 cos 𝜃𝑖𝑡 (17) 5.2.2. Cluster formation process Here, the cluster construction steps, including cluster leader selec- 𝑉𝑖𝑦 = 𝑉𝑖𝑡 sin 𝜑𝑡𝑖 sin 𝜃𝑖𝑡 (18) tion, connection to the cluster, and gateway selection are explained. Step (1) Cluster leader selection: To determine 𝐿𝑈𝑘 in 𝐶𝑘 , 𝑈𝑖 𝑉𝑖𝑧 = 𝑉𝑖𝑡 cos 𝜑𝑡𝑖 (19) calculates its merit value 𝑚𝑡𝑖 in accordance with Eq. (25) in Section 5.2.1 and then broadcasts this information through the The general form of Eq. (16) is stated below. 𝑖 merit message 𝑀𝑚𝑒𝑟𝑖𝑡 to surrounding nodes in the network. See ( 𝐿𝑇 )2 ∑ 3 ( )2 ∑ 3 ( )2 ∑ 3 ( )2 Fig. 4. 𝑀𝑚𝑒𝑟𝑖𝑡 includes 𝐼𝐷𝑖 , 𝑚𝑡𝑖 , and a timestamp 𝑇𝑚𝑒𝑟𝑖𝑡. Its structure 𝑖 𝐷𝑖𝑗 𝑖𝑗 = 𝛼𝑞 + 𝐿𝑇𝑖𝑗 2𝛼𝑞 𝛽𝑞 + 𝐿𝑇𝑖𝑗 𝛽𝑞 (20) is stated in Eq. (26). 𝑞=1 𝑞=1 𝑞=1 ⟨ ⟩ 𝑖 ‖ ‖ 𝑀𝑚𝑒𝑟𝑖𝑡 = 𝐼𝐷𝑖 ‖𝑚𝑡𝑖 ‖𝑇𝑚𝑒𝑟𝑖𝑡 (26) On the other hand, the distance between 𝑈𝑖 and 𝑈𝑗 in moment ‖ ‖ 𝐿𝑇 𝐿𝑇𝑖𝑗 equals their communication radius, i.e. 𝐷𝑖𝑗 𝑖𝑗 = 𝑟𝑖. Hence, it Now, 𝑈𝑖 waits for a certain time interval to get merit messages can be concluded that: from other UAVs. After receiving these messages, 𝑈𝑖 extracts the (( 3 ) ) merit values of UAVs from their merit messages and compares 𝑚𝑡𝑖 ∑ ( )2 ( )2 ∑ 3 ( )2 ∑ 3 ( )2 𝛼𝑞 − 𝑟𝑖 +𝐿𝑇𝑖𝑗 2𝛼𝑞 𝛽𝑞 + 𝐿𝑇𝑖𝑗 𝛽𝑞 = 0 (21) with these values. Here, there are two modes. 𝑞=1 𝑞=1 𝑞=1 – First mode: If 𝑈𝑖 has the greatest merit value among other Now, the Delta method is used to solve the above two-order neighboring nodes, it introduces itself as 𝐿𝑈𝑘 and transmits equation to calculate 𝐿𝑇𝑖𝑗 (Eq. (22)). a leader message 𝑀𝐿𝑈 𝑖 to its adjacent nodes. The structure 𝑘 √( )2 (∑ of 𝑀𝐿𝑈 is stated in Eq. (27) and contains 𝐼𝐷𝑖 , 𝑚𝑡𝑖 , the role 𝑖 ∑3 ∑ 3 3 ( )2 ) ((∑3 ( )2 ) ( )2 ) 𝑘 − 𝑞=1 2𝛼𝑞 𝛽𝑞 + 𝑞=1 2𝛼𝑞 𝛽𝑞 −4 𝑞=1 𝛽𝑞 𝑞=1 𝛼𝑞 − 𝑟𝑖 of 𝑈𝑖 (i.e. 𝐿𝑈𝑘 ), and a timestamp 𝑇𝐿𝑈𝑘. 𝐿𝑇𝑖𝑗 = ∑3 ( )2 ⟨ ⟩ 2 𝛽𝑞 𝑖 ‖ ‖ ‖ 𝑞=1 𝑀𝐿𝑈 = 𝐼𝐷𝑖 ‖𝑚𝑡𝑖 ‖𝐿𝑈𝑘 ‖𝑇𝐿𝑈𝑘 (27) 𝑘 ‖ ‖ ‖ (22) – Second mode: If the merit value of 𝑈𝑖 is less than other Then, 𝑈𝑖 obtains the average link time from Eq. (23). adjacent nodes, 𝑈𝑖 waits to get a leader message from other 𝑁𝑖 flying nodes and goes to Step 2. 1 ∑ 𝐿𝑇𝑖𝑡 = 𝐿𝑇 (23) 𝑁𝑖 𝑗=1 𝑖𝑗 Step (2) Connection to the cluster: In this step, 𝑈𝑖 waits to get a leader message from other nodes. Now, three modes can occur. so 𝑁𝑖 represents the number of surrounding flying nodes around This process is shown in Fig. 5. 𝑈𝑖 in 𝑇 𝑎𝑏𝑙𝑒𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑖. Eq. (24) is also used to normalize 𝐿𝑇𝑖𝑡 and limit it to [0, 1]. – First mode: If 𝑈𝑖 gets only a leader message from 𝐿𝑈𝑘 , then 𝑈𝑖 acts as a cluster member 𝑀𝑈𝑗𝑘 and transfers a connection 𝐿𝑇𝑖𝑡 message 𝑀 𝑖 𝑘 to 𝐿𝑈𝑘. This message contains 𝐼𝐷𝑖 , 𝑀𝑈𝑗𝑘 𝐿𝑇𝑖𝑛𝑜𝑟𝑚 = (24) 𝑀𝑈𝑗 𝐿𝑇max (role of 𝑈𝑖 ), 𝐼𝐷𝐿𝑈𝑘 , and timestamp 𝑇𝑀𝑈 𝑘 , and its structure 𝑗 Finally, 𝑈𝑖 uses Eq. (25) to calculate 𝑚𝑡𝑖. is presented in Eq. (28). ( ) ⟨ ⟩ ‖ ‖ ‖ 𝑚𝑡𝑖 = 𝜔1 𝐸𝑖 + 𝜔2 1 − 𝐶𝑁𝑅𝑛𝑜𝑟𝑚 + 𝜔3 𝐷𝑒𝑔𝑖 + 𝜔4 𝑆𝐶𝑖𝑛𝑜𝑟𝑚 + 𝜔5 𝐿𝑇𝑖𝑛𝑜𝑟𝑚 (25) 𝑀 𝑖 𝑘 = 𝐼𝐷𝑖 ‖𝑀𝑈𝑗𝑘 ‖𝐼𝐷𝐿𝑈𝑘 ‖𝑇𝑀𝑈 𝑘 (28) 𝑖 𝑀𝑈𝑗 ‖ ‖ ‖ 𝑗 where 𝜔𝑖 (𝑖 = 1, 2, 3, 4, 5) represents the weight coefficient that de- When this message reaches 𝐿𝑈𝑘 , it sends an accept message 𝐴𝑐𝑐𝑒𝑝𝑡 termines the effect of the relevant parameter on 𝑚𝑡𝑖. These weight 𝑀𝐿𝑈 to 𝑈𝑖 and accepts it as 𝑀𝑈𝑗𝑘 in the cluster 𝐶𝑘. 𝑘 9 M. Hosseinzadeh et al. Journal of King Saud University - Computer and Information Sciences 36 (2024) 101894 Algorithm 2 Adaptive clustering Input: 𝑈𝑖 : Flying node 𝑖 so that 𝑖 = 1, 2,..., 𝑛 𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑇 𝑎𝑏𝑙𝑒𝑖 : Neighboring table stored in 𝑈𝑖 𝑁𝑖 : Number of neighbors of 𝑈𝑖 𝐶𝑘 : Cluster 𝑘 so that 𝑘 = 1, 2,..., 𝑚 𝐿𝑈𝑘 : Leader of 𝐶𝑘 𝑀𝑈𝑗𝑘 : Cluster member UAV so that 𝑗 = 1, 2,..., 𝑁𝑘 𝐺𝑈𝑔𝑘 : Gateway UAV in 𝐶𝑘 𝑡𝑁𝑒𝑡 : A timer for measuring the network time. Output: 𝐶1 ,..., 𝐶𝑘 ,..., 𝐶𝑚 : Clusters in the network Begin 1: repeat 2: if 𝑡𝑁𝑒𝑡 mod 𝐶𝑙𝑢𝑠𝑡𝑒𝑟𝑖𝑛𝑔 𝑝𝑒𝑟𝑖𝑜𝑑 = 0 then 3: 𝐔𝐢 : Obtain the weight coefficients (𝜔𝑖 , where 𝑖 = 1, 2, 3, 4, 5) using the learning model designed in GCS; 4: 𝐔𝐢 : Calculate 𝑚𝑡𝑖 based on Equation (25); 5: 𝐔𝐢 : Insert 𝑚𝑡𝑖 into the message 𝑀𝑚𝑒𝑟𝑖𝑡 𝑖 based on Equation (26); 6: 𝐔𝐢 : Broadcast 𝑀𝑚𝑒𝑟𝑖𝑡 𝑖 to its neighboring nodes; 7: for 𝑗 = 1 to 𝑁𝑖 do 𝑗 8: 𝐔𝐢 : Wait to receive 𝑀𝑚𝑒𝑟𝑖𝑡 from 𝑈𝑗 ; 𝑗 9: 𝐔𝐢 : Extract 𝑚𝑡𝑗 from 𝑀𝑚𝑒𝑟𝑖𝑡 ; 10: if 𝑚𝑡𝑖 > 𝑚𝑡𝑗 then 11: 𝑚max = 𝑚𝑡𝑖 ; 12: else 13: 𝑚max = 𝑚𝑡𝑗 ; 14: end if 15: end for 16: if 𝑚max = 𝑚𝑡𝑖 then 17: 𝑖 𝐔𝐢 : Play the role of 𝐿𝑈𝑘 and adjust the message 𝑀𝐿𝑈 based on Equation 𝑘 (27); 18: 𝑖 𝐔𝐢 : Broadcast 𝑀𝐿𝑈 to its neighboring nodes; Fig. 5. Determining cluster members. 𝑘 19: if 𝐿𝑈𝑘 receives a message 𝑀 𝑖 𝑘 from a 𝑀𝑈𝑗𝑘 then 𝑀𝑈 𝑗 𝐴𝑐𝑐𝑒𝑝𝑡 20: 𝐋𝐔𝐤 : Set a message 𝑀𝐿𝑈 based on Equation (29); 𝑘 𝐴𝑐𝑐𝑒𝑝𝑡 21: 𝐋𝐔𝐤 : Send 𝑀𝐿𝑈 to the relevant 𝑀𝑈𝑗𝑘 ; 𝑘 22: end if This message contains the cluster leader ID, the cluster 23: else member ID, and timestamp 𝑇𝐴𝑐𝑐𝑒𝑝𝑡 , and its structure is stated 24: 𝑖 𝐔𝐢 : Wait to receive the messages 𝑀𝐿𝑈 from different 𝐿𝑈𝑘 nodes; 𝑘 in Eq. (29). 25: 𝑖 if 𝑈𝑖 receives only one 𝑀𝐿𝑈 from a 𝐿𝑈𝑘 node then 𝑘 ⟨ ⟩ 26: 𝐔𝐢 : Play the role of 𝑀𝑈𝑗𝑘 and adjust a message 𝑀 𝑖 based on Equation