Traffic Weaver: Semi-Synthetic Time-Varying Traffic Generator (PDF)
Document Details
Piotr Lechowicz, Aleksandra Knapińska, Adam Włodarczyk, Krzysztof Walkowiak
Tags
Related
- Chesapeake Fire Department Traffic Accidents And Property Loss Policy PDF
- Chesapeake Fire Department Traffic Incident Management & Roadway Incidents Procedure PDF
- 2019 California Traffic Stop Document Searches No Longer Permissible PDF
- 2019 Traffic Stop Searches for Documents (PDF)
- How Hackers Use Trojans PDF
- Trip Generation Forecast Model PDF
Summary
This paper describes Traffic Weaver, a Python package for generating semi-synthetic time-varying traffic in telecommunication networks. It utilizes oversampling, signal reconstruction, and other techniques to create time series that match original patterns, aiming to support network optimization algorithms. The paper primarily focuses on the methodology and applications of the software.
Full Transcript
SoftwareX 28 (2024) 101946 Contents lists available at ScienceDirect SoftwareX journal homepage...
SoftwareX 28 (2024) 101946 Contents lists available at ScienceDirect SoftwareX journal homepage: www.elsevier.com/locate/softx Original software publication Traffic weaver: Semi-synthetic time-varying traffic generator based on averaged time series Piotr Lechowicz a,b ,∗, Aleksandra Knapińska a , Adam Włodarczyk a , Krzysztof Walkowiak a a Department of Systems and Computer Networks, Wrocław University of Science and Technology, Wrocław, Poland b Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden ARTICLE INFO ABSTRACT Keywords: Traffic Weaver is a Python package developed to generate a semi-synthetic signal (time series) with finer Time varying traffic granularity, based on averaged time series, in a manner that, upon averaging, closely matches the original Telecommunication network signal provided. The key components utilized to generate the signal encompass oversampling, recreating from Semi-synthetic traffic generator average with a given strategy, stretching to match the integral of the original time series, interpolating, smoothing, repeating, applying trend, and adding noise. The primary motivation behind Traffic Weaver is to furnish semi-synthetic time-varying traffic in telecommunication networks, facilitating the development and validation of traffic prediction models, as well as aiding in the deployment of network optimization algorithms tailored for time-varying traffic. Code metadata Current code version 1.5.0 Permanent link to code/repository used for this code version https://github.com/ElsevierSoftwareX/SOFTX-D-24-00196 Permanent link to Reproducible Capsule https://codeocean.com/capsule/8804531/tree/v3 Legal Code License GNU AGPL Code versioning system used git Software code languages, tools, and services used Python Compilation requirements, operating environments & dependencies Python ≥ 3.9 If available Link to developer documentation/manual http://w4k2.github.io/traffic-weaver/ Support email for questions [email protected] 1. Motivation and significance sufficient quality. To this end, the community relies on artificially generated data with various distributions and patterns based on their In telecommunication networks, such as backbone optical networks, domain knowledge (e.g., [1–6]). Exemplary packet generating tools are many small end-to-end transmissions between individual users and de- iPerf and Cisco TRex. TrafPy generates data-center networks vices combine into time-varying traffic, representing aggregated traffic traffic. Several works present only the methodology of generating the over time. Thus, daily and weekly patterns can be observed in network traffic without supplying the software. A 5G network traffic generator is traffic due to increased user activity in certain periods. Driven by the presented in. A self-similar traffic-generator with heavy tails based paradigm of self-driving and self-healing networks, traffic prediction, on wavelets theory is discussed in. An AI-based generators have and anomaly detection gained significant research community attention been presented in [11,12]. However, predicting and detecting changes in recent years. However, the community faces the problem of lacking in real data can bring significantly more challenges than artificially real data, allowing for thorough experiments. Network operators are generated ones. Additionally, extensive experiments performed on a often constrained by legal aspects and cannot share the details of traffic generated by their customers. In turn, many researchers can large pool of appropriately diverse datasets are necessary for the de- have access either to small exemplary data or to averaged data without velopment and thorough evaluation of the designed algorithms. ∗ Corresponding author at: Department of Systems and Computer Networks, Wrocław University of Science and Technology, Wrocław, Poland. E-mail address: [email protected] (Piotr Lechowicz). https://doi.org/10.1016/j.softx.2024.101946 Received 26 March 2024; Received in revised form 2 October 2024; Accepted 20 October 2024 Available online 30 October 2024 2352-7110/© 2024 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Piotr Lechowicz et al. SoftwareX 28 (2024) 101946 data, and provides an interface for processing functionalities. The time series can be either specified by the user or obtained from the embedded example datasets. Individual functionalities provided by the Weaver are delegated to other modules, e.g., recreating from average functionality is located in the rfa.py module. However, it is possible to use individual functionalities from the corresponding modules regard- less of wrapping the time series into Weaver. Weaver allows retrieving the processed data either as sampled points or as a continuous spline function. 2.2. Software functionalities This section describes the main functionalities provided by the Traffic Weaver. In the below description, the term interval refers to the distance between two sampled points in the input time series. The aim of the Weaver is to create an output time series with multiple points inserted in each interval. Class Weaver(x, y) Weaver is an interface for recreating the signal. It takes as an input a time series provided as two lists containing values of independent and dependent variables. It delegates processing to other modules and allows to retrieve the recreated signal either as lists of values of independent and dependent variables or as a spline, using the get() and Fig. 1. Software architecture. to_function() methods, respectively. Recreating from average Recreating from average is a recreation of a signal with finer sam- Therefore, some of the tools rely on captured real data statistics. pling granularity based on the supplied strategy. The number of created In , a packet generator is presented based on the on/off sources points between each interval (pair of points in the original time series) with various distributions. Swing is a tool capturing stationary is provided as a parameter. The strategy determines how the created empirical cumulative distributions of packet parameters from packet time series transits between points, i.e., how the new points are located. traces and generating traffic according to that. However, the available The software provides several strategies, namely, ExpAdaptiveRFA(), solutions do not recreate dynamic properties of the traffic over time. ExpFixedRFA(), LinearAdaptiveRFA(), LinearFixedRFA(), PiecewiseCon- The purpose of Traffic Weaver is to generate new time-varying data stantRFA(), CubicSplineRFA(). E.g., ExpAdaptiveRFA() creates an adap- based on an already available sample of data, i.e., to create semi- tive transition window for each interval by combining linear and synthetic data when the size of real data is either insufficient or the time exponential functions. The size of the window is inversely proportional points at which the data were measured are too rare. In consequence, to the change of the function value on both edges of the interval, i.e., if it can generate larger and diverse datasets with similar traffic patterns the function value has a higher change on the right side than on the left based on the original traffic. In particular, the software has been used side of the interval, the right side transition window is smaller than the in scientific research to create semi-synthetic time-varying traffic to left one. develop and evaluate traffic prediction models [16,17] and multi- The Weaver class provides the recreate_from_average(n, rfa_class) layer network optimization algorithms using generated time-varying method that delegates the execution to the rfa.py module and takes as connection requests (intents) [18,19]. Semi-synthetic data allowed a thorough evaluation of the developed algorithms in real-world settings an input number of samples n in each interval after oversampling, recre- and various desired characteristics. ating from average strategy rfa_class inheriting AbstractRFA() class, and The aim of Traffic Weaver is to read averaged time series and a dictionary of parameters passed to the selected strategy. to create a semi-synthetic signal with finer granularity that, after Integral matching averaging, matches the original signal provided. The following tools It aims to reshape the time series to match its integral to the integral are applied to recreate the signal: recreating from average with a of the reference function over the same domain (the original time given strategy, stretching to match the integral of the original time series). It does that by stretching the signal in intervals such that the series, interpolating, smoothing, repeating, applying trend, and adding integral in the interval of the current time series is equal to the integral noise. Software users may provide exemplary data for the investigated of the same interval in the reference function. The points in each problem or use one of the available datasets to create semi-synthetic interval are transformed inversely proportionally to the exponential time-varying traffic by applying various trends and noise profiles. The value of distance from the interval center. The integral for the recreated increased input data size allows for a more thorough investigation function and for the original function can be calculated using either of the problem. Moreover, the ability to create datasets with specific trapezoidal or rectangular rule. characteristics enables detailed testing of the developed algorithms in The Weaver class provides the integral_match(target_function_integral_ various conditions. method, reference_function_integral_method) method that delegates the 2. Software description execution to the match module and takes as an input a dictionary of parameters passed to the matching function. The time series currently 2.1. Software architecture stored in the Weaver is matched with a reference to the originally passed function to the class. target_function_integral_method and Fig. 1 presents an overview of the software architecture. Weaver, reference_function_integral_method specifies how the integral is calculated located in the weaver.py module, wraps the supplied signal (time series) for the target and reference function, respectively. 2 Piotr Lechowicz et al. SoftwareX 28 (2024) 101946 Smoothing with the public. However, the community can access averaged or sum- It smooths a function using smoothing splines. mary data presented in a form of plots. This module provides a set of The Weaver class provides the smooth(s) method to delegate the datasets recreated from graphical plots which can be further resampled execution to the smoothing function and takes s as an argument. The and regenerated using Traffic Weaver. argument s is a smoothing condition that controls the tradeoff between closeness and smoothness of the fit. Larger s means more smoothing, while smaller values of s indicate less smoothing. If s is None, its ‘‘good’’ 3. Illustrative examples value is calculated based on the number of samples and standard devi- ation. Fig. 2 shows a general usage example. Based on the provided Repeating original averaged time series (a), the signal is 𝑛-times oversampled and It repeats the time series a given number of times, resulting in a recreated from average values with a predefined strategy (b). Next, it long-term time series containing periodic, e.g., daily or weakly, pat- is stretched to match the integral of the input time series function (c). terns. Further, it is smoothed with a spline function (d). In order to create The Weaver class provides the repeat(n) method to repeat the time weekly semi-synthetic data, the signal is repeated seven times (e), series. n is an argument passed to the function, defining how many applying a long-term trend consisting of sinusoidal and linear functions times to repeat the time series. (f). Finally, the noise is introduced to the signal, starting from small Trending values and increasing over time (g). To validate the correctness of the It applies a trend to the time series according to the specified func- applied processing, (h) presents the averaged two periods of the created tion. It allows adding a long-term trend to the time series, e.g., constant signal, showing that they closely match the original signal (except the dependent variable increase over time. applied trend). The Weaver class provides the trend(trend_func, normalized) method to apply a trend to the processed time series. The argument trend_function is a callable that shifts the value for the dependent vari- 3.1. Minimal processing example able based on the value of the independent variable. The callable takes one argument – the value of the independent variable – and has to Traffic Weaver is an open-source Python module released under return the shift value for the dependent variable. Argument normalized MIT license and versioned in the public Python Package Index (PyPI) is a boolean determining if the trend function is normalized to the range repository. It can be installed using pip package manager. of [0, 1]. Noising pip i n s t a l l t r a f f i c −weaver It applies a constant or changing over time Gaussian noise to the time series, expressed as signal to noise ratio. Traffic Weaver import is done with the standard import command. The Weaver class provides the noise(snr) method to apply noise to the signal. The argument snr defines the signal-to-noise ratio of a func- import t r a f f i c _ w e a v e r tion either as a scalar value or as a list of changing values over time whose size matches the size of the independent variable. Interpolating To load one of the exemplary datasets, use the load_dataset function It applies an interpolation of time series using specified points. specifying the name of the dataset. The Weaver class provides the interpolate(n, new_x, method) method to interpolate the time series. The argument n is the number of # load example d a t a s e t with average measurements over 1h fixed space samples in the new interpolated function. new_x is a list of data = t r a f f i c _ w e a v e r. d a t a s e t s. \ load_dataset ( ’ sandvine_tiktok ’ ) points where to evaluate the interpolated function. It overrides the n parameter. Range should be the same as the original function domain. Interpolation is done according to the method parameter. Supported The traffic_weaver module provides the Weaver class that serves strategies are linear, constant, cubic and spline. as an API to other processing capabilities. The Weaver(x, y) con- Truncating structor takes the time series independent and dependent variables It truncates a time series to a specified range. If specified points are as arguments, denoted as x and y, respectively. Alternatively, the not present in the time series, the closest points are selected such that Weaver.from_2d_array(data) factory method takes a 2-D array as argu- the specified range is included. ment containing dependent and independent variables. The Weaver class provides the truncate(x_left, x_right, x_left_as_ratio, x_right_as_ratio) method to truncate time series. Arguments x_left and # c r e a t e Weaver i n s t a n c e x_right are values in the independent variable array to which truncate wv = t r a f f i c _ w e a v e r. Weaver. f r o m _ 2 d _ a r r a y ( data ) its content from the left and right side, respectively. Arguments x_left_as_ratio and x_right_as_ratio are boolean that determine if x_left and Further signal processing is applied through the Weaver methods. x_right are treated as ratios of the independent variable range to trun- Most of the methods return an instance to the Weaver itself, allowing cate from the left and right, respectively. for chaining the processing commands. Normalizing It normalizes the independent and dependent variable to the speci- # p r o c e s s i t c r e a t i n g samples every minute fied range. wv. r e c r e a t e _ f r o m _ a v e r a g e ( 6 0 ). \ The Weaver class provides the normalize_x(min_val, max_val) and i n t e g r a l _ m a t c h ( ). smooth ( 1. 0 ). n o i s e ( s n r =30) normalize_y(min_val, max_val) method to normalize the independent and dependent variable, respectively. Arguments min_val and max_val are To obtain the created new time series, call either Weaver’s get() the minimum and maximum values for normalization. or to_function() methods. Next, visualize time series with matplotlib Datasets – the original data, created semi-synthetic traffic, and averaged semi- The datasets module provides collected network traffic datasets. synthetic traffic to verify that the integral of semi-synthetic traffic does Network operators often collect data about traffic generated by their not differ much from the one in original signal. The result of the below customers. However, due to legal aspects, exact values are not shared listing is presented in Fig. 3. 3 Piotr Lechowicz et al. SoftwareX 28 (2024) 101946 Fig. 2. Regenerating time-varying traffic from the averaged traffic sample: original traffic (a); recreated from average (b); matched with the integral of the original signal (c); smoothed (d); repeated 7 times (e); trended with sinusoidal and linear function (f); noised (g); averaged (h). Fig. 3. Minimal processing example of recreating time-varying traffic: original traffic (a); processed by recreating from average, matching integral with the original signal, smoothing and noising (b); averaged (c). 4. Impact import m a t p l o t l i b. p y p l o t as p l t # plot original signal The networking community lacks a public data repository for re- f i g , axes = p l t. s u b p l o t s ( nrows =1, n c o l s =3, search purposes and the development of new optimization methods f i g s i z e =(14 , 4 ) ) based on network traffic. Existing analyses of real traffic data (e.g., [21– axes [ 0 ]. p l o t ( ∗ wv. g e t _ o r i g i n a l ( ) , d r a w s t y l e =" s t e p s −p o s t " ) 23]), collected by the authors over long periods, usually stop at the data characterization stage and are not further used in networking research, # p l o t modified s i g n a l nor are they easily accessible. Traffic Weaver closes this gap, allowing axes [ 1 ]. p l o t ( ∗ wv. g e t ( ) ) easy access to data and enabling a thorough evaluation of the devel- oped algorithms. Using various options provided in Traffic Weaver, the # p l o t averaged s i g n a l created methods can be tested in diverse traffic conditions representing x , y = t r a f f i c _ w e a v e r. p r o c e s s. average ( ∗ wv. g e t ( ) , 60) axes [ 2 ]. p l o t ( x , y , d r a w s t y l e =" s t e p s −p o s t " ) actual traffic patterns. In turn, the package allows a fair and versatile algorithm development, evaluation, and comparison with the existing axes [ 0 ]. s e t _ t i t l e ( " a ) O r i g i n a l " , l o c =" l e f t " ) solutions. It also helps in gaining insights into the operation of various axes [ 1 ]. s e t _ t i t l e ( " b ) P r o c e s s e d " , l o c =" l e f t " ) methods in specific traffic conditions considering parameters such as axes [ 2 ]. s e t _ t i t l e ( " c ) Averaged " , l o c =" l e f t " ) noise levels, trends, and traffic types. These parameters are impossible p l t. show ( ) to steer using the available sparse raw data. Moreover, Traffic Weaver is implemented in Python, which is the primary programming language used for the development of machine learning methods. The proposed tool does not contain an embedded traffic generator based 4 Piotr Lechowicz et al. SoftwareX 28 (2024) 101946 on the analytical formulas. Therefore, it requires as an input either Petale S, Lin S-C, Matsuura M, Hasegawa H, Subramaniam S. PRODIGY: A user-provided traffic or one of the supplied datasets. Moreover, signal progressive upgrade approach for elastic optical networks. In: IEEE global communications conference. IEEE; 2023, p. 2129–34. http://dx.doi.org/10.1109/ recreation from average with user-specified parameters for probability GLOBECOM54140.2023.10437935. distributions is not possible. Han Y, Yoo J-H, Hong JW-K. Poisson shot-noise process based flow-level traffic matrix generation for data center networks. In: 2015 IFIP/IEEE international 5. Conclusions symposium on integrated network management. 2015, p. 450–7. http://dx.doi. org/10.1109/INM.2015.7140322. Han Y, Seo S-s, Hwang C, Yoo J-H, Hong JW-K. Flow-level traffic matrix This article presents Traffic Weaver – a semi-synthetic time-varying generation for various data center networks. In: 2014 IEEE network operations traffic generator. The software creates new datasets based on either and management symposium. 2014, p. 1–6. http://dx.doi.org/10.1109/NOMS. existing examples of real data supplied as datasets or user-specified 2014.6838394. data and enables adding desired characteristics. Through a variety of iPerf3. https://software.es.net/iperf/. [Accessed 21 September 2024]. Cisco TRex. https://trex-tgn.cisco.com. [Accessed 21 September 2024]. processing methods, including recreating from average, interpolating, Ziazet JM, Jaumard B, Duong H, Khoshabi P, Janulewicz E. A dynamic traffic integral matching, smoothing, repeating, trending, and noising meth- generator for elastic 5G network slicing. In: 2022 IEEE international symposium ods, the package allows a thorough evaluation of created optimization on measurements & networking (M&N). 2022, p. 1–6. http://dx.doi.org/10.1109/ and prediction methods based on the network traffic. The available MN55117.2022.9887734. example datasets provide a versatile entry for networking research. In Savu-Jivanov A, Isar A, Stolojescu-Crisan C, Gal J. Network self-similar traffic generator with variable hurst parameter. In: 2020 international symposium on the future, we plan to extend the software with signal recreation with electronics and telecommunications. 2020, p. 1–4. http://dx.doi.org/10.1109/ various probability distributions varying over time. ISETC50328.2020.9301120. Alsulami K, Zhang J, Ye F. Improvement on a traffic data generator for network- CRediT authorship contribution statement ing AI algorithm development. In: 2021 IEEE global communications conference. 2021, p. 1–6. http://dx.doi.org/10.1109/GLOBECOM46510.2021.9685616. Bikmukhamedov RF, Nadeev AF. Multi-class network traffic generators and Piotr Lechowicz: Writing – review & editing, Writing – origi- classifiers based on neural networks. In: 2021 systems of signals generating nal draft, Visualization, Validation, Software, Methodology, Investiga- and processing in the field of on board communications. 2021, p. 1–7. http: tion, Formal analysis, Data curation, Conceptualization. Aleksandra //dx.doi.org/10.1109/IEEECONF51389.2021.9416067. Knapińska: Writing – review & editing, Writing – original draft, Vi- Hoffmann F, Bertram T, Mikut R, Reischl M, Nelles O. Benchmarking in classification and regression. Wiley Interdiscip Rev Data Min Knowl Discov sualization, Validation, Resources, Methodology, Investigation, Data 2019;9(5):e1318. http://dx.doi.org/10.1002/widm.1318. curation, Conceptualization. Adam Włodarczyk: Writing – review & Varet A, Larrieu N. How to generate realistic network traffic? In: 2014 IEEE editing, Writing – original draft, Visualization, Validation, Methodol- 38th annual computer software and applications conference. 2014, p. 299–304. ogy. Krzysztof Walkowiak: Writing – review & editing, Writing – origi- http://dx.doi.org/10.1109/COMPSAC.2014.40. Vishwanath KV, Vahdat A. Swing: Realistic and responsive network traffic nal draft, Visualization, Validation, Supervision, Project administration, generation. IEEE/ACM Trans Netw 2009;17(3):712–25. http://dx.doi.org/10. Methodology, Funding acquisition, Formal analysis, Conceptualization. 1109/TNET.2009.2020830. Knapińska A, Lechowicz P, Spadaro S, Walkowiak K. Agnostic prediction of Declaration of competing interest multiple types of time-varying traffic in optical networks. In: IEEE global communications conference. IEEE; 2023, p. 1125–30. http://dx.doi.org/10.1109/ GLOBECOM54140.2023.10436763. The authors declare that they have no known competing finan- Ułanowicz B, Dopart D, Knapińska A, Lechowicz P, Walkowiak K. Combining cial interests or personal relationships that could have appeared to random forest and linear regression to improve network traffic prediction. In: influence the work reported in this paper. 23rd international conference on transparent optical networks. IEEE; 2023, p. 1–4. http://dx.doi.org/10.1109/ICTON59386.2023.10207506. Knapińska A, Lechowicz P, Spadaro S, Walkowiak K. On advantages of traffic Acknowledgments prediction and grooming for provisioning of time-varying traffic in multilayer networks. In: 27th international conference on optical network design and This work was supported by the National Science Center, Poland modeling. IEEE; 2023, p. 1–6. under Grant 2019/35/B/ST7/04272. Knapińska A, Lechowicz P, Spadaro S, Walkowiak K. Performance analysis of multilayer optical networks with time-varying traffic. In: 23rd international conference on transparent optical networks. IEEE; 2023, p. 1–4. http://dx.doi. Data availability org/10.1109/ICTON59386.2023.10207179. Stapor K, Ksieniewicz P, García S, Woźniak M. How to design the fair experi- Data included in the package. mental classifier evaluation. Appl Soft Comput 2021;104:107219. http://dx.doi. org/10.1016/j.asoc.2021.107219. García-Dorado JL, Finamore A, Mellia M, Meo M, Munafo M. Characterization of References ISP traffic: Trends, user habits, and access technology impact. IEEE Trans Netw Serv Manag 2012;9(2):142–55. http://dx.doi.org/10.1109/TNSM.2012.022412. 110184. Parsonson CW, Benjamin JL, Zervas G. Traffic generation for benchmarking data Jurkiewicz P, Rzym G, Boryło P. Flow length and size distributions in campus centre networks. Opt Switch Netw 2022;46:100695. http://dx.doi.org/10.1016/ internet traffic. Comput Commun 2021;167:15–30. http://dx.doi.org/10.1016/j. j.osn.2022.100695. comcom.2020.12.016. Valkanis A, Papadimitriou G, Nicopolitidis P, Beletsioti GA, Varvarigos E. A Goścień R, Knapińska A, Włodarczyk A. Modeling and prediction of daily traffic traffic prediction assisted routing algorithm for elastic optical networks. In: patterns—WASK and SIX case study. Electronics 2021;10(14):1637. http://dx. International conference on communications, computing, cybersecurity, and doi.org/10.3390/electronics10141637. informatics. IEEE; 2021, p. 1–6. http://dx.doi.org/10.1109/CCCI52664.2021. Nguyen G, Dlugolinsky S, Bobák M, Tran V, López García Á, Heredia I, et al. 9583188. Machine learning and deep learning frameworks and libraries for large-scale data Włodarczyk A, Lechowicz P, Szostak D, Walkowiak K. An algorithm for provi- mining: a survey. Artif Intell Rev 2019;52:77–124. http://dx.doi.org/10.1007/ sioning of time-varying traffic in translucent SDM elastic optical networks. In: s10462-018-09679-z. 22nd international conference on transparent optical networks. IEEE; 2020, p. 1–4. http://dx.doi.org/10.1109/ICTON51198.2020.9203045. 5