Tor Traffic Classification Seminar 4 PDF
Document Details
Uploaded by LucidThunderstorm1119
University of Strathclyde
2021
Pitpimon Choorod, George Weir
Tags
Summary
This document presents a novel approach to classifying Tor traffic based on encrypted payload characteristics, using machine learning techniques. It analyzes encrypted packets from various applications. The study suggests that encrypted payload content can be used to distinguish between Tor and nonTor packets.
Full Transcript
Tor Traffic Classification Based on Encrypted Payload Characteristics...
Tor Traffic Classification Based on Encrypted Payload Characteristics Pitpimon Choorod George Weir Department of Computer and Information Sciences Department of Computer and Information Sciences University of Strathclyde University of Strathclyde Glasgow, United Kingdom Glasgow, United Kingdom 2021 National Computing Colleges Conference (NCCC) | 978-1-7281-6719-0/20/$31.00 ©2021 IEEE | DOI: 10.1109/NCCC49330.2021.9428874 [email protected] [email protected] Abstract—Tor is increasingly used on the Internet as a means tory authorities, but these techniques fail when users connect of accessing illicit or illegal services. If enacted by employees, to Tor using Tor bridges and pluggable transports ,. such use may lead to negative impact on the organization. By In this case, means to identify Tor communication become its nature, Tor traffic is encrypted multiple times before being sent across networks to reach a destination. Therefore it may be important and techniques for network traffic classification play impossible to detect the nature of a Tor user’s online activities. a vital role in detecting and monitoring Tor traffic. Nevertheless, such users cannot hide the fact that they are using Network traffic classification describes a range of techniques Tor. This paper proposes a novel data payload analysis as a means to investigate the network packets passing in a computer of classifying Tor traffic using machine learning. To this end, we network. At its most basic, based upon ’well-known’ ports consider the characteristics of the encrypted data payload for Tor and encrypted nonTor packets from 8 different applications and , traffic can be classified by port number and presumed extract features to train our machine learning model. Our results protocol. This approach has the advantage of being simple and indicate that, contrary to the commonsense assumption that Tor fast, however, since there is no compulsory adherence to the packets resemble other encrypted packets, such payload content use of assigned ports, with random ports being employed by can be used to distinguish between Tor and nonTor packets. many applications these days, port-based classification is not reliable. For this reason, other classification techniques using Index Terms—Tor, Traffic classification, Payload features, Ma- Payload-based or Deep Packet Inspection (DPI) are preferred. chine learning DPI relies upon examining the characteristic signatures or patterns of strings found in the payload packets of applications. I. I NTRODUCTION While this technique usually allows for classifying network Tor is a popular means to provide privacy and anonymity traffic with very accurate results, it fails with encrypted to network users. The initial purpose of the Tor creator, packets. To address the problem of encrypted traffic, there United States Naval Research Lab, is to conceal the identity has been a move to statistical-based classification which relies of Internet users from traffic monitoring and surveillance. on Machine Learning techniques. Many studies , , , Consequently, Tor has become a significant tool for people show the effectiveness of classifying network traffic using who seek liberty in online activities because their identities such an approach. are completely obscured while they are accessing the Internet Commonly, the features employed in machine learning via Tor. Meanwhile, these benefits have made Tor a double- classification are flow-based and depend upon connection char- edged sword through its possible misuse by malefactors. acteristics in the communication and have been applied Several studies , , , illustrate the diverse to various objectives, including malware detection , , illegal services that exist because of Tor, including drugs, network intrusion detection , and Botnet detection , fraud, counterfeiting, weapons, terrorism, child abuse and as well as protocol-based classification , ,. access to other illicit content such as stolen data. The Silk Our target is also the classification of Tor traffic using ma- Road darknet marketplace, the Mevade Botnet, the ChewBacca chine learning but we adopt a novel strategy that does not rely malware , Cryptowall 2.0 ransomware and Wannacry upon traffic flow characteristics. Instead, we employ features ransomware are notorious examples of cybercriminal derived solely from the encrypted payload. Given the focus activities enabled by the Tor network. As a result, there is on protocol payload, this may appear similar to DPI, but our a need for law enforcement or governments to seek methods approach operates seamlessly with encrypted packets and does to block illicit usage of the Tor network ,. not compromise the confidentiality of the packet payload. In Tor is a sophisticated network in which there are no easy fact, the characteristics considered in this innovative approach ways to determine a user’s identity, location, and activities, are precisely the contents of the encrypted payload. Features yet it cannot hide the network traffic generated through Tor are measured frequencies of the possible hexadecimal digits usage. A simple and straightforward way to prevent Tor usage (0-F) which belong to the encrypted payload in the TCP or is blocking the public IP addresses of Tor relays and Tor direc- TLS/SSL layer. These counts are converted to ratios in order to Authorized licensed use limited to: Högskolan Väst. Downloaded on October 01,2024 at 16:58:28 UTC from IEEE Xplore. Restrictions apply. ensure normalisation in the event of different sizes of payload. is equally important. As a result, the cipher is invulnera- The remainder of this paper is organized as follows: Section ble to statistical attacks” [7, p. 2]. Our assumption is that 2. Related work; Section 3. Payload analysis framework; while modern encryption algorithms will produce different Section 4. Experiment and Results; Section 5. Conclusion and distributions of ciphertext letters, they will aim to maximise Future work. entropy, as effected by randomness in the resultant cyphertext. This assumption is the basis for our null hypothesis that II. R ELATED WORK characteristics of encrypted data payload should not allow the A. Tor background discrimination of two varieties of encrypted payload (Tor and Tor traffic starts when the Tor Browser is used to request a encrypted nonTor). Before moving to describe our approach website. Encryption occurs before sending out the packets. At in detail, we review related work in Tor traffic classification. this point, a Tor circuit is created by randomly selecting three Tor routers from information retrieved from the Tor directory B. Tor traffic classification service. Once a connection is established, Tor traffic will be As noted above, network traffic analysis is the key to encrypted three times from the Tor client and routed to the first detecting Tor traffic. Some researchers use DPI techniques to virtual routing point (the entry node). The encrypted packet identify such traffic by looking at details in the associated TLS will be decrypted here, revealing only the next destination IP certificates, e.g., hostname, TLS handshake, process, certificate address, which is the second virtual router (the intermediate size and port number. Since Tor uses different TLS charac- node). Once again, the encrypted packet - the payload resulting teristics from nonTor traffic, these features can discriminate from the previous decryption - will be decrypted to reveal Tor usage , ,. While effective, this method is slow the next destination IP address, which is the third virtual and requires considerable processing power. For these reasons, router (the exit node). At this last node, the encrypted packet machine learning approaches have been explored, in order to will be fully decrypted to reveal both the plain-text payload overcome these drawbacks. The basis for such techniques is and originating IP header. Having reached the destination, a sometimes called statistical analysis because it relies on the response will be engaged by the reverse encryption process analysis of statistical attributes such as packet size, packet and returned via the same Tor circuit until the data requested length, flow segment size, round trip time, duration, etc. Such is received and decrypted at the originating Tor client. statistical information facilitates the machine to learn and A network packet consists of two parts: Header and Pay- improve its performance through experience. load. The header, located at the beginning of the packet, Among the many publications that report Tor traffic clas- contains metadata which is always in plain-text since this is sification using machine learning, two scenarios are most necessary for communication between source and destination. commonly addressed, using a variety of classifiers: (a) seeking The payload is the actual data part of the entire transmitted to detect Tor traffic, and (b) seeking to categorize Tor-based message. Many Internet protocols, such as HTTP, SMTP and applications. Arash , for instance, whose Tor dataset has FTP, use payload data that is plain-text. More recent protocols, been widely used by other researchers, implemented 4 classi- such as HTTPS, SSH and Tor, employ encrypted payload fiers using traffic flow features in order to classify Tor. Their data. The encryption mechanism ensures that the transferred 2 scenarios give the highest performance of 96.4% and 97% payload content cannot be read is they are snooped by a for weighted average recall and weighted average precision of hacker (or other unintended recipient). A variety of encryption Tor and nonTor respectively when using the C4.5 algorithm for technologies are deployed by different applications. Tor scenario (a) and 84.3% and 83.8% for weighted average recall uses five main cryptographic components to transmit its data: and weighted average precision of Tor and nonTor respectively 1) Public-key cryptography - Tor uses 1024-bit RSA and 256- when using the Random Forest algorithm for scenario (b). bit elliptic-curve Diffie–Hellman based on Curve25519; 2) The same dataset and flow-based features were employed by Symmetric encryption - Tor uses 128-bit AES in CTR mode; Cuzzocrea et. al. , in conjunction with different classifiers. 3) Digital signatures - Tor uses 1024-bit RSA digital signatures Their results showed that JRip provided the best performance and elliptic-curve digital signatures based on Curve25519; 4) (100% for both precision and recall when applied in scenario Hash functions - Tor uses SHA-1, SHA-256; 5) SSL/TLS - Tor (a) and J48 gave the best performance (99.8% for both uses feature either AES or 3TDES encryption in CBC mode, weighted average of recall and weighted average precision and Diffie–Hellman key exchange. respectively) when applied in scenario(b). Different applications will choose a set of encryption com- A third example , using the same dataset, sought to ponents that is suitable for their purposes. Although encryption detect nonTor in a Tor Traffic dataset. Their results are based algorithms vary in their ability to obscure messages, most on two algorithms and show that a CFS-ANN hybrid classifier modern approaches aim to disperse the components of en- provides a better performance than SVM in detecting nonTor crypted data in order to maximise entropy and guard against traffic with an overall accuracy of 99.8% and 94% respectively. attacks based upon frequency analysis. Thus, ”A main focus The research that is closest to our approach in using payload of testing the randomness of a ciphertext is its distribution. features, is Minsu kim et. al.. Their work exploits the Ideally, this distribution of ciphertext is uniform, because a first 54 bytes of TCP packets (TCP/IP header and Ethernet II uniform distribution implies that the each symbol in ciphertext header) by converting the hexadecimal values to the decimal Authorized licensed use limited to: Högskolan Väst. Downloaded on October 01,2024 at 16:58:28 UTC from IEEE Xplore. Restrictions apply. values as features. Their 2-aim work classified Tor and nonTor traffic and also sought to identify 8 specific varieties of ap- plication. They reported that a Convolutional Neural Network model gave the highest performance of 100% in both precision and recall and 99.3% accuracy, respectively for both aims. Their approach proved that the raw packet header features with this model are able to classify Tor traffic effectively. Given the recognised efficacy of modern encryption algorithms, determining traffic type solely by looking at SSL/TLS packets should be infeasible and a frequency-based analysis should not be successful with encryption such as AES, 3TDES, or RSA. Yet, our approach suggests that the characteristics of Tor SSL/TLS packets differ from nonTor encrypted packets, and also give very high accuracy results in Tor traffic classification. III. PAYLOAD ANALYSIS FRAMEWORK To achieve our goal we use a combination of two methods, DPI and machine learning technique to help identify packets that are from the Tor network. As noted above, we employed the dataset of network traffic introduced by that is available from https://www.unb.ca/cic/datasets/tor.html. A. Dataset collection The datasets, originally generated by Lashkari et. al. , are comprised of Tor traffic data collected from a Whonix installation (a Linux-based Tor-integrated open-source oper- Fig. 1. The frequency distribution of total characters of Tor and nonTor payloads ating system). The collection mechanism used two virtual machines - a workstation and a gateway - running Debian GNU/Linux. All traffic captured from the Whonix workstation maximum number of payload characters for Tor and nonTor is nonTor while traffic captured from the Whonix gateway is are similar. Tor. As noted above, they used traffic flows or time-related features to identify Tor and nonTor traffic and differentiate be- TABLE I tween 8 different application types of Tor traffic. Consequently, D ESCRIPTIVE STATISTICS OF TOTAL CHARACTERS OF T OR AND NON T OR PAYLOAD their dataset includes 8 types of traffic (Audio-streaming, Web browsing, Chat, Email, File transfer, P2P, Video-streaming and Application Type Total* Mean* S.D.* Min* Max* VoIP) generated from 18 delegate software applications (e.g., Audio Tor 13727 1131.414 257.1375 58 2816 nonTor 13727 2445.669 757.1146 4 2920 Firefox, Chrome, Facebook, Skype, Gmail, etc.). Tor 39323 1280.045 417.0376 66 2772 Since our approach centres solely on features extracted from Browser nonTor 39323 659.0165 830.5042 48 2910 encrypted data payloads, we selected data files captured from Chat Tor 3419 1191.452 330.9949 128 2264 8 different applications (Audio, Browser, Chat, Email, FTP, nonTor 3419 687.7929 763.2851 4 2920 Tor 5076 1075.155 435.3447 128 2904 P2P, Video and VoIP) from both Tor and nonTor datasets. This Email nonTor 5076 461.0977 854.5757 50 2822 reflected our aim of contrasting encrypted payloads from Tor FTP Tor 271804 996.5985 539.8975 60 2904 and nonTor contexts. nonTor 271804 2580.936 834.2934 4 2920 Tor 228300 1154.696 297.8678 52 2830 To appreciate the characteristics of these Tor and nonTor P2P nonTor 228300 2526.189 878.8907 2 2920 data files more clearly, descriptive statistics for their payloads Tor 16923 1144.218 274.7713 128 2718 VDO are shown in Table I. The frequency distribution of total char- nonTor 16923 765.755 973.9731 48 2880 Tor 684601 1098.986 152.7305 66 2264 acters for Tor and nonTor payloads in pairs of 8 applications VoIP nonTor 684601 269.1999 247.0036 2 2920 is illustrated in Figure 1. *in characters Table I shows that the mean values for Tor payloads are bigger than nonTor payloads, except packets from Audio, FTP From Figure 1, it is clear that most nonTor encrypted and P2P. However, the standard deviation for total characters payloads for Audio, FTP and P2P applications are bigger in Tor payload are all less than nonTor payload. This shows than encrypted Tor payloads which is in accordance with that the distribution of the total characters for Tor payloads has the mean values shown in Table I. For most samples of less variance than nonTor payloads. The minimum number of encrypted nonTor payloads in Audio, FTP and P2P the total payload characters for Tor are all larger than nonTor, while the characters have the same values at 2920 characters for 63.23%, Authorized licensed use limited to: Högskolan Väst. Downloaded on October 01,2024 at 16:58:28 UTC from IEEE Xplore. Restrictions apply. 83.83% and 79.45% respectively. In contrast, most of the validate the model with K-fold cross-validation, the dataset total characters of encrypted nonTor payloads from Browser, will be spitted into K folds (or groups) equally and randomly Chat, Email, VDO and VoIP are smaller than Tor, with the to reduce the bias when building the model. For each K fold, majority of total characters of each application being less, at the K-1 data set will be trained and a dataset will be validated 228 (56.17%), 170 (40.45%), 196 (74.52%), 172 (53.73%) and with the model in rotation for K iterations. Its output will give 388 (87.90%) respectively. For Tor, most encrypted payloads the model performance evaluation which is calculated from the have a fixed-length of 1076 characters for 92.30%, 79.18%, average performance measure of each test set. 88.27%, 45.51%, 66.84%, 90.28%, 92.35% and 97.74% for E. Evaluation the applications respectively. To evaluate the model’s performance we considered the B. Data pre-processing following measures: Accuracy, Precision, Recall and F1 score Following the data collection, data preprocessing is applied which are derived from a confusion matrix, i.e., an NxN in order to transform raw data into a machine learning table that summarizes model performance. The matrix is compatible format. The raw data is comprised of packets comprised of the number of predicted classes and the number captured travelling in and out of the network interface card, of actual classes to give insight on errors and the type of errors including TCP handshake flows. In preprocessing, the samples generated by the classifier. were cleaned to leave only packets containing the application data. This filtering was accomplished using Wireshark. This process eliminated unnecessary parameters from the data packets and formatted the packets for subsequent use in classi- fication. Thereby, 16 features are extracted from each packet. These features are the basis for training the classification model. Fig. 2. Confusion matrix table C. Feature extraction Where, True Positive (TP) is when the sample value is Each payload sample was processed to provide a set of 16 true and the model prediction value is positive. In this case, features using the Charcount option of the Posit Text Profiling when classified as Tor and the prediction is Tor (correct Toolset , ,. The function of Charcount is to count prediction). True Negative (TN) is when the sample value the occurrences of individual characters in a string. In our is true and the model prediction value is negative. In other case, a string of the data payload comprises combinations of words, when classified as Tor and the prediction is nonTor 16 possible individual characters from 0 - F (the character set (correct prediction). False Positive (FN) is when the sample of the hexadecimal digits). Having generated counts for these value is false and the model prediction value is positive, 16 unique characters, these absolute values are converted into i.e., when classified as nonTor and the prediction is nonTor ratio values. The final output of feature details is in.arff (the (incorrect prediction). False Negative (FN) is when the sample Weka file) format. value is false and the model prediction value is negative, i.e., D. Classification when classified as nonTor and the prediction is Tor (incorrect prediction). Before the input was fed into Weka to train the model, we addressed any imbalanced datasets by applying the SpreadSub- IV. E XPERIMENTS AND R ESULTS Sample(distributionSpread 1.0,randomSeed 1) filter to equalize These classification experiments focused on payload fea- the number of instances for each pair of datasets in each ap- tures, and these were derived from the.pcap format data files, plication. Thereafter, we used several classification algorithms that were provided as part of the original dataset. As noted, (J48, Random Forest and KNN) with 10-fold cross validation the.pcap files were assembled to represent the Tor and nonTor to train and test our datasets. We focus here on J48, since this encrypted traffic. Calculating values for the characters present gave the best classification results in our tests. Results for all in each payload, and the corresponding ratio values for each three algorithms are included in Table II. specific data sample, i is a subset of [0,1,2,.,9,A,B,C,.,F] and J48 is a supervised learning algorithm used to built a features can be calculated by the following equation: decision tree which is determined by the concept of informa- tion gain on which attribute to split the data at a certain step Ni x100 Ri = (1) in order to have the best split. Cross-validation is a common T re-sampling method to train and test models. There are many where Ri is the percentage ratio of the total number of Ni in cross-validation techniques but we use K-fold cross-validation. a payload string, Ni is a total number of i in a payload string This is simple and less biased compared to other methods. and T is a total number of all individual i in a payload string. The K parameter is the number of divided sets which is normally 5 or 10. The larger number of K, the smaller bias Once the feature data extraction step is completed, the traffic but the more time-consuming is the processing. To train and classification is implemented using a machine learning model Authorized licensed use limited to: Högskolan Väst. Downloaded on October 01,2024 at 16:58:28 UTC from IEEE Xplore. Restrictions apply. with the Weka software application. Each pair of 8 application of these papers agree with our conclusion that J48 provided datasets was trained using 3 machine learning algorithms: J48, the best results, with both precision and recall above 0.9. Random forest and KNN, all with 10-fold cross-validation. V. C ONCLUSION AND F UTURE WORK Table II shows a comparison of the 3 models’ performance in Tor traffic classification, measured using 16 payload features This paper presents a character frequency approach to the across 8 different applications. All dataset pairs are balanced, classification of Tor traffic based solely upon characteristics so only the accuracy is noted in comparing their performance. of the encrypted payload. Features are extracted from the The table indicates that J48 produces the best performance encrypted payload in hexadecimal format by splitting their amongst the 3 classifiers in every application. The highest strings into individual characters (0 – F), counting the fre- accuracy result is in VoIP, 99.73%. Second best performance is quency for each character and converting these sums to a Browser with 99.13%. The poorest performance is in the Chat ratio of the total number of characters in the payload instance. application, 92.46%. However, training the models with more Classifiers were trained with a variety of algorithms and tested data samples may achieve better performance results for Chat with 10-fold validation, resulting in J48 giving greater than and Email. Table III gives further detail on the best performing 90% accuracy in classification of 8 (nonTor) applications J48 model. All results are above 0.9, with the highest for VoIP (Audio, Browser, Chat, Email, FTP, P2P VDO and VoIP) (0.997), Browser (0.997) and the poorest result for Chat. against Tor packets. This shows that the encrypted content of the packets can be used broadly as a basis for distinguishing TABLE II between Tor and nonTor packets. T HE COMPARISON OF ACCURACY RESULTS FROM J48, R ANDOM F OREST While the approach described here relies upon unigram AND KNN MODELS FROM 8 APPLICATIONS payload features, i.e., individual characters, as a basis for discriminating between Tor and nonTor, future work will Application Total instances J48 RF KNN Audio 27454 96.83 90.34 65.58 explore the use of an extended feature set based upon other Browser 78645 99.13 95.44 71.59 values of ngram (character combinations). Chat 6838 92.46 84.51 64.08 Email 10152 92.99 93.37 83.16 R EFERENCES FTP 543608 97.49 94.12 72.79 J. Reynolds and J. Postel, “Assigned numbers,” in P2P 456600 98.64 96.81 72.06 VDO 33846 97.83 93.04 69.39 STD 2, RFC 1700, USC/INFORMATION SCIENCES VoIP 1369202 99.73 98.68 95.38 INSTITUTE, 1992. G. R. S. Weir, “The posit text profiling toolset,” in Proceedings of the 12th Conference of Pan-Pacific As- sociation of Applied Linguistics, 2007. TABLE III P RECISION , R ECALL AND F1 SCORE RESULTS OF J48 MODEL FROM 8 G. R. S. Weir, “Corpus profiling with the posit tools,” in APPLICATIONS Proceedings of the 5th Corpus Linguistics Conference. University of Liverpool, 2009. Application J48 A. O. Granerud, “Identifying tls abnormalities in tor,” Precision Recall F1 score M.S. thesis, 2010. Audio 0.968 0.968 0.968 Browser 0.991 0.991 0.991 D. Schatzmann, W. Mühlbauer, T. Spyropoulos, and Chat 0.925 0.925 0.925 X. Dimitropoulos, “Digging into https: Flow-based clas- Email 0.930 0.930 0.930 sification of webmail traffic,” in Proceedings of the 10th FTP 0.975 0.975 0.975 P2P 0.986 0.986 0.986 ACM SIGCOMM conference on Internet measurement, VDO 0.978 0.978 0.978 2010, pp. 322–327. VoIP 0.997 0.997 0.997 A. Dainotti, A. Pescapé, and C. Sansone, “Early classi- fication of network traffic through multi-classification,” We note that other work , , using the Tor in International Workshop on Traffic Monitoring and connection TLS/SSL certificate to identify Tor network traffic, Analysis, Springer, 2011, pp. 122–135. reports high accuracy, however, this technique relies upon the Y. Wu, J. P. Noonan, and S. Agaian, “Shannon entropy characteristics of TLS certificates to achieve the goal. This based randomness measurement and test for image type of packet can only be captured at the early stage of the encryption,” arXiv preprint arXiv:1103.5520, 2011. TLS handshake connection. In contrast, our technique is able P. Gogoi, M. H. Bhuyan, D. Bhattacharyya, and to detect Tor packets with a single payload, regardless of the J. Kalita, “Packet and flow based network intrusion sequence position of the captured packet. dataset,” in IC3, 2012. Other papers that identify Tor traffic using TCP/IP flow N. Bhargava, G. Sharma, R. Bhargava, and M. Math- features include. This reports best results with both uria, “Decision tree analysis on j48 algorithm for data precision and recall at above 0.9, as is the case with. The mining,” Proceedings of International Journal of Ad- latter employs features close to our approach and used a Deep vanced Research in Computer Science and Software learning model (CNN) to classify Tor and nonTor traffic. Both Engineering, vol. 3, no. 6, 2013. Authorized licensed use limited to: Högskolan Väst. Downloaded on October 01,2024 at 16:58:28 UTC from IEEE Xplore. Restrictions apply. C. Guitton, “A review of the available content on tor Conference on Advanced Communication Technology hidden services: The case against further development,” (ICACT), IEEE, 2018, pp. 153–158. Comput. Hum. Behav., vol. 29, pp. 2805–2815, 2013. M. Kim and A. Anpalagan, “Tor traffic classification M. Kuhn, K. Johnson, et al., Applied predictive model- from raw packet header using convolutional neural ing. Springer, 2013, vol. 26. network,” in 2018 1st IEEE International Conference A. Moore, D. Zuev, and M. Crogan, “Discriminators on Knowledge Innovation and Invention (ICKII), IEEE, for use in flow-based classification,” Tech. Rep., 2013. 2018, pp. 187–190. G. Owen and N. Savage, The tor dark net, https://www. E. Mahdavi, H. Hassannejad, et al., “Classification of cigionline.org/sites/default/files/no20 0.pdf, Accessed: encrypted traffic for applications based on statistical 2021-03-12, Sep. 2015. features,” ISeCure-The ISC International Journal of R. B. Yetter, “Darknets, cybercrime & the onion router: Information Security, vol. 10, no. 1, pp. 29–43, 2018. Anonymity & security in cyberspace,” Ph.D. disserta- M. Mirea, V. Wang, and J. Jung, “The not so dark side tion, Utica College, 2015. of the darknet: A qualitative study,” English, Security L. Lee, D. Fifield, N. Malkin, G. Iyer, S. Egelman, Journal, Aug. 2018, 12 month embargo, ISSN: 0955- and D. Wagner, “Tor’s usability for censorship cir- 1662. DOI: 10.1057/s41284-018-0150-5. cumvention,” Ph.D. dissertation, Master’s thesis, EECS B. Monk, J. Mitchell, R. Frank, and G. Davies, “Un- Department, University of California, Berkeley, 2016. covering tor: An examination of the network structure,” mrphs, Breaking through censorship barriers, even Security and communication networks, vol. 2018, 2018. when tor is blocked, Aug. 2016. [Online]. Available: G. Weir, K. Owoeye, A. Oberacker, and H. Alshahrani, https : / / blog. torproject. org / breaking - through - “Cloud-based textual analysis as a basis for document censorship-barriers-even-when-tor-blocked. classification,” in 2018 International Conference on F. A. Saputra, I. U. Nadhori, and B. F. Barry, “Detecting High Performance Computing & Simulation (HPCS), and blocking onion router traffic using deep packet in- IEEE, 2018, pp. 672–676. spection,” in 2016 International Electronics Symposium M. Yeo, Y. Koo, Y. Yoon, T. Hwang, J. Ryu, J. Song, (IES), IEEE, 2016, pp. 283–288. and C. Park, “Flow-based malware detection using A. Cuzzocrea, F. Martinelli, F. Mercaldo, and G. Ver- convolutional neural network,” in 2018 International celli, “Tor traffic analysis and detection via machine Conference on Information Networking (ICOIN), IEEE, learning techniques,” in 2017 IEEE International Con- 2018, pp. 910–913. ference on Big Data (Big Data), IEEE, 2017, pp. 4474– M. Faizan and R. A. Khan, “Exploring and analyzing 4480. the dark web: A new alchemy,” First Monday, 2019. E. Hodo, X. Bellekens, E. Iorkyase, A. Hamilton, C. R. Koch, The list of countries that have banned vpns, Tachtatzis, and R. Atkinson, “Machine learning ap- Oct. 2019. [Online]. Available: https://protonvpn.com/ proach for detection of nontor traffic,” in Proceedings blog/are-vpns-illegal/. of the 12th International Conference on Availability, V. Lapshichyov and O. Makarevich, “Tls certificate as a Reliability and Security, 2017, pp. 1–6. sign of establishing a connection with the network tor,” A. H. Lashkari, G. Draper-Gil, M. S. I. Mamun, and in Proceedings of the 12th International Conference on A. A. Ghorbani, “Characterization of tor traffic using Security of Information and Networks, 2019, pp. 1–6. time based features.,” in ICISSp, 2017, pp. 253–262. A. Pektaş and T. Acarman, “A deep learning method to K. M. Martin, Everyday cryptography: Fundamental detect network intrusion through flow-based features,” principles and applications, Jul. 2017. DOI: 10.1093/ International Journal of Network Management, vol. 29, oso / 9780198788003. 001. 0001. [Online]. Available: no. 3, e2050, 2019. https : / / oxford. universitypressscholarship. com / view / N. P. Tran, D. T. Nguyen, H. H. Le, N. T. Nguyen, 10. 1093 / oso / 9780198788003. 001. 0001 / oso - and N. B. Nguyen, “An efficient algorithm to extract 9780198788003-chapter-12. control flow-based features for iot malware detection,” C. Sanders, Practical packet analysis: Using Wireshark The Computer Journal, 2020. to solve real-world network problems. No Starch Press, W. Wang, Y. Shang, Y. He, Y. Li, and J. Liu, “Botmark: 2017. Automated botnet detection with hybrid analysis of R. Vinayakumar, K. Soman, and P. Poornachandran, flow-based and graph-based traffic behaviors,” Informa- “Secure shell (ssh) traffic analysis with flow based fea- tion Sciences, vol. 511, pp. 284–296, 2020. tures using shallow and deep networks,” in 2017 Inter- national Conference on Advances in Computing, Com- munications and Informatics (ICACCI), IEEE, 2017, pp. 2026–2032. S.-C. Hsiao and D.-Y. Kao, “The static analysis of wannacry ransomware,” in 2018 20th International Authorized licensed use limited to: Högskolan Väst. Downloaded on October 01,2024 at 16:58:28 UTC from IEEE Xplore. Restrictions apply.