Automatic Modulation Classification Using CNN-LSTM PDF
Document Details
Chongqing University of Posts and Telecommunications
Zufan Zhang, Hao Luo, Chun Wang, Chenquan Gan, and Yong Xiang
Tags
Summary
This paper presents a new approach to automatic modulation classification (AMC) using a CNN-LSTM dual-stream structure. The method leverages deep learning to analyze raw complex signals, considering both spatial and temporal features. Experiments show enhanced performance compared to existing methods.
Full Transcript
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 69, NO. 11, NOVEMBER 2020 13521 Automatic Modulation Classification Using CNN-LSTM Based Dual-Stream Structure Zufan Zhang , Hao Luo,...
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 69, NO. 11, NOVEMBER 2020 13521 Automatic Modulation Classification Using CNN-LSTM Based Dual-Stream Structure Zufan Zhang , Hao Luo, Chun Wang , Chenquan Gan , and Yong Xiang , Senior Member, IEEE Abstract—Deep learning (DL) has recently aroused substan- the wireless communication are moving towards diversification, tial concern due to its successful implementations in many fields. and the number of wireless devices is mushrooming rapidly, re- Currently, there are few studies on the applications of DL in the sulting in an increasingly complex communication environment. automatic modulation classification (AMC), which plays a critical role in non-cooperation communications. Besides, most previous Consequently, the ability to recognize and classify communi- work ignores the feature interaction, and only considers spatial cation signals quickly and automatically becomes increasingly or temporal attributes of signals. Combining the advantages of important. The automatic modulation classification (AMC), as the convolutional neural network (CNN), and the long short-term an intermediate process of blind signal processing, is of great memory (LSTM), this paper addresses the AMC using CNN-LSTM significance for many fields, such as cognitive radios and based dual-stream structure, which efficiently explores the feature interaction, and the spatial-temporal properties of raw complex software defined radios. temporal signals. Specifically, a preprocessing step is first imple- In general, two categories of AMC approaches can be crys- mented to convert signals into the temporal in-phase/quadrature tallised : likelihood-based (LB) methods and feature-based (I/Q) format, and the amplitude/phase (A/P ) representation, (FB) methods. In the former –, the AMC is equivalent to which facilitates the acquirement of more effective features for the multiple composite hypothesis test problem, which makes classification. To extract features from each signal pattern, each stream is composed of CNN, and LSTM (denoted as CNN-LSTM). different assumptions for different modulation modes and com- Most importantly, the features learned from two streams interact putes the likelihood function of the received signals under dif- in pairs, which increases the diversity of features and thereby ferent hypotheses, and then compares its value with a predefined further improves the performance. Finally, some comparisons with threshold to obtain the possible category of the received signals. previous work are performed. The experimental results not only Even though the LB methods are optimal in the Bayesian sense, demonstrate the superior performance of the proposed method compared with the existing state-of-the-art methods, but also reveal they are usually accompanied by considerable computational the potential of DL-based approaches for AMC. complexity or demand complete priori knowledge as well as sensitivity to unknown channel conditions scenarios. Index Terms—Automatic modulation classification, con- volutional neural network, dual-stream structure, long short term Alternatively, the FB methods have been proved to provide memory network , signal representation. the suboptimal solutions, which can be conducted more easily with a low computational complexity. The modulation type is distinguished based on some appropriate features extracted from I. INTRODUCTION the received signals. Undoubtedly, the higher-quality features ITH the remarkable evolution of wireless communica- can gain robust performance with a low complexity. Indeed, a va- W tion technology, the categories of modulation scheme in riety of features for AMC have already been explored in previous work ,, such as wavelet transforms , cyclostationary fea- Manuscript received June 3, 2020; revised September 21, 2020 and October tures , and higher-order cumulants , , to name a few. 6, 2020; accepted October 6, 2020. Date of publication October 12, 2020; date In the phase of classification, the traditional classifiers contain of current version November 12, 2020. This work was supported in part by the random forest , k-nearest neighbors (KNN) , Gaussian Natural Science Foundation of China under Grants 61702066 and 11747125, in part by the Project of Science and Technology Research Program of Chongqing Naive Bayes , and support vector machine (SVM). Education Commission of China under Grant KJZD-M201900601, in part by Unfortunately, these methods have shown to be time-consuming the Chongqing Research Program of Basic Research and Frontier Technology because they usually depend on the extraction of handcrafted under Grants cstc2017jcyjAX0256 and cstc2018jcyjAX0154, and in part by the Project Supported by Chongqing Municipal Key Laboratory of Institutions of features that require extensive domain knowledge and engineer- Higher Education under Grant cqupt-mct-201901. The review of this article was ing. coordinated by Dr. B. Mao. (Corresponding author: Yong Xiang.) Recently, the explosive development of deep learning (DL) Zufan Zhang, Hao Luo, Chun Wang, and Chenquan Gan are with the School of Communication and Information Engineering, Chongqing University has grabbed widespread attention due to its significant success of Posts and Telecommunications, Chongqing 400065, China, and with in various applications like computer vision , sentiment anal- the Chongqing Key Laboratory of Mobile Communications Technology, ysis , , and speech recognition. Most importantly, Chongqing 400065, China, and also with the Engineering Research Center of Mobile Communications, Ministry of Education, Chongqing 400065, compared with the traditional methods in terms of data analysis China (e-mail: [email protected]; [email protected]; and processing , the DL can automatically obtain better rep- [email protected]; [email protected]). resentation of complex high-dimensional data without designing Yong Xiang is with the School of Information Technology, Deakin University, Victoria 3125, Australia (e-mail: [email protected]). manual features. For this reason, it has also been gradually Digital Object Identifier 10.1109/TVT.2020.3030018 applied in the aspects of communications. Significantly, 0018-9545 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on July 28,2024 at 10:34:34 UTC from IEEE Xplore. Restrictions apply. 13522 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 69, NO. 11, NOVEMBER 2020 some engineers have successfully utilized the DL to address simulations under various signal-to-noise ratios (SNRs), and the tasks of AMC. O’Shea et al. presented a convolutional the experimental results imply that our method surpasses the neural network (CNN) model for AMC, in which the CNN is existing state-of-the-art methods. trained by using time domain in-phase/quadrature (I/Q) data The main contributions of our work could be summarized as and it is superior to conventional approaches based on expert follows. features. Since then, Kulin et al. adopted a CNN model 1) The CNN-LSTM based dual-stream structure is proposed to operate on amplitude and phase information of the received for AMC. One stream is to extract the local raw temporal signals and obtained a slight performance improvement. Very features from raw signals, and the other stream is to learn recently, Yashashwi et al. introduced a signal distortion the knowledge from amplitude and phase information. correction module and co-trained with a CNN model, which 2) To learn spatial and temporal information from each signal can remove the effect of random frequency and phase offset. representation, each stream consists of CNN and LSTM However, these work ignored the persistent properties since the (denoted as CNN-LSTM), which combines the excellent CNN is not good at learning temporal information in time series. performance of CNN in spatial feature extraction and the Additionally, an AMC method by training a CNN on eye superior capacity of LSTM in processing time series data. diagrams of the modulated signals was proposed. In , 3) Through an effective operation, the features trained from a CNN-based model for recognizing high-frequency radio sig- two streams interact in pairs, enriching the diversity nals was developed, in which the features from spectrum im- of properties, and thereby further enhancing the perfor- ages of the received signals can be automatically extracted mance. and the sparse-filtering criterion is introduced to improve the The rest of this paper is organized as follows. Section II intro- performance. Similarly, Peng et al. investigated several duces signal model and preprocessing. Section III describes the approaches to denote the complex signals into images with proposed method in detail. Section IV gives some experiments gridlike topologies, and utilized two CNN models (AlexNet to verify the proposed method. Finally, Section V summarizes and GoogLeNet) to learn features from these images for AMC. this paper. In , all signals were transformed into images by adopting different time-frequency transformations, and the feature fusion II. SIGNAL MODEL AND PREPROCESSING was presented to enhance performance, but it is only valid for the signal model of additive white Gaussian noise (AWGN). It In this section, the signal model is briefly introduced, and the is found that these approaches all subtly transform the AMC signal preprocessing is implemented to acquire a good signal to well-researched image recognition issues, but introduce a representation. complex image processing step. Note that a long short term memory (LSTM) is skilled in A. Signal Model learning persistent features from series data. Rajendran et al. As usual, based on the wireless impulse response model , tried to train a LSTM network by using amplitude and phase in- a general representation for the received signal r(t) can be formation of the received signals. However, the spatial properties expressed as of signals are neglected. Additionally, the authors trained a τ0 model (CLDNN) based on CNN and LSTM by only using I/Q j̃Δf o (t) r(t) = e s(Δco (t − τ ))h(t, τ )dτ + ñ(t), (1) representation, ignoring the interaction of different features. So 0 far, it can be seen from the above-mentioned work – that both CNN and LSTM methods have their own flaws. The com- where j̃ stands for the imaginary number, Δf o (t) and Δco (t) mon defect is that only considers spatial or temporal attributes are the carrier frequency offset and sampling rate offset, re- of signals and neglects the feature interaction. Therefore, it is spectively, τ0 is the maximum delay spread, s(t) denotes the worthwhile to study an AMC method that integrates the different noise free complex transmitted signal that is obtained by the features of signals. transmitted binary data sequence, h(t, τ ) is the Rayleigh fading In this paper, an AMC method using CNN-LSTM based channel realized by sum-of-sinusoids, and ñ(t) represents the dual-stream structure, which considers the feature interaction AWGN with mean zero and variance σñ2. and combines the advantages of CNN and LSTM, is proposed. In this work, the received signal r(t) will be transformed Firstly, the signal preprocessing, in which signals are respec- into a discrete version r[n] with a sampling rate fs = 1/Ts. tively denoted by the temporal I/Q format and the ampli- It is comprised of the in-phase component rI ∈ R and the tude/phase (A/P ) representation, is implemented to achieve quadrature component rQ ∈ R. Thus, the discrete signal r[n] more effective features. Afterwards, the CNN and the LSTM can be described as are entirely combined (called CNN-LSTM) in each stream to r[n] = rI [n] + j̃rQ [n]. (2) collect effective characteristics from each signal representation, which facilitates the exploration of the spatial and temporal cor- Assume that a total of N samples can be collected in a sampling relation of signals. Besides, an effective operation is employed to period, and the samples r[n] ∈ C, n = 0, 1,... , N − 1, can be strengthen the interactions between different features extracted deemed as a data vector. Write the j-th data vector as from each representation for the further improvement. Finally, the effectiveness of the proposed method is verified by some rj = [rj , rj ,... , rj [N − 1]]T , (3) Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on July 28,2024 at 10:34:34 UTC from IEEE Xplore. Restrictions apply. ZHANG et al.: AUTOMATIC MODULATION CLASSIFICATION USING CNN-LSTM BASED DUAL-STREAM STRUCTURE 13523 where the superscript T denotes the transpose operator. To more the proposed method is introduced. Finally, the concrete training apposite for training, the data vector rj will be converted to new and testing steps are described. representations, so the signal preprocessing will be performed. A. Motivation B. Signal Preprocessing In wireless communication systems, the AMC can be treated In the signal preprocessing, two simple data conversions are as a classification problem in machine learning. Given the supe- adopted. The one is denoted as the I/Q format that is identical riority of CNN in the spatial feature extraction, it can convolute to the raw complex samples. The other one is inspired by the signals and generate useful high-level information. Similarly, work for recognizing radar signals through the use of signal the LSTM is beneficial for exploring temporal correlations of amplitude and phase information. Mathematically, the j-th I/Q I/Q signals, and can learn gate weights to address the specified task and A/P vectors of rj ∈ C N can be denoted as rj → xj by considering the previous cell state and input data. However, in A/P and rj → xj , respectively. For each data representation, the the case of uncertain factors (e.g., different sampling rates), the corresponding transformation is introduced as follows. CNN model for AMC may inaccurate, because it only considers 1) Conversion 1 (I/Q vector): The I/Q vector is obtained the spatial features and neglects the temporal attributes. Like- from the raw complex samples. It is comprised of two data wise, the LSTM model for AMC may ignore the spatial features. vectors: the in-phase component xIj and the quadrature Considering the model complementarity, our work attempts to component xQ j , i.e., combine CNN and LSTM for AMC, which enables the joint ⎡ T⎤ exploitation of both spatial and temporal features. xIj The previous work claimed that I/Q and A/P are both effec- I/Q ⎢ ⎥ xj =⎣ ⎦, (4) tive to map signal features. Specifically, the I/Q format can be T xQ j utilized to solely learn the raw temporal characteristics from raw complex signals. Likewise, the A/P representation will provide T I/Q where xIj , xQ T j ∈ RN , and xj ∈ R2×N. effective temporal and spatial features related to the amplitude 2) Conversion 2 (A/P vector): The A/P vector consists of and phase of raw complex signals. So far, the previous work two real valued vectors: the amplitude xA j and the phase operates on I/Q or A/P data individually. Besides, in the case xP , i.e., of the low SNR, the ability of I/Q vector to accurately represent j signal characteristics may be affected, and the A/P vector can ⎡ AT ⎤ xj complement some signal features directly from the amplitude A/P xj =⎣ ⎦, (5) and the phase but it is still affected by wireless interference. Our PT work tries to consider both I/Q and A/P data. At the same xj time, it was found through experiments that the dual-stream T T A/P where xA j , xj P ∈ RN , and xj ∈ R2×N. Based on Eq. structure is better than the single-stream structure. Consequently, (2), the translation to magnitude and phase vectors can be this paper adopts the CNN-LSTM based dual-stream structure expressed as to get more accurate signal features. 2 xA + rjQ [n] 2 j [n] = rjI [n] , (6) B. The CNN-LSTM Based Dual-Stream Structure As shown in Fig. 1, the CNN-LSTM based dual-stream struc- rjQ [n] xP j [n] = arctan , (7) ture combines the above mentioned advantages of CNN and rjI [n] LSTM. Its input comprises of two parts: the j-th I/Q vector I/Q A/P where rjQ [n] and rjI [n] are in-phase component and quadrature xj and the j-th A/P vector xj. Each input part will be component of the n-th sample of rj , respectively. Similarly, processed by a CNN-LSTM stream. Since these two streams AT T are identical in the structure, for simplicity, only the structure of xA P j [n] and xj [n] are elements in xj and xP j , respectively. Stream 1 is described in detail. One can observe that the modulation signals are interfered In Stream 1, three convolution layers are adopted to learn with the impairments of the wireless channel, but there still exist spatial features from different representations of modulation some typical patterns that are beneficial to extract more useful signals by using a set of learnable filters and activation functions. features for modulation classification. Therefore, the data vector I/Q A/P To take advantage of the temporal correlation, two LSTM layers rj ∈ C N can be seen as a feature vector (e.g., xj or xj ) are added after convolution layers. with a uniform vector shape that facilitates the application of the For convolution layers, the convolution processing of the same model structure to collect effective features. feature map by the convolution kernel is the key step of feature extraction. The numbers of kernels used in the i-th convolution III. THE PROPOSED MODULATION CLASSIFICATION METHOD layer are 256, 256 and 80 with the size of 1 × 3, 2 × 3 and 1 × 3, This section will explain the proposed method in detail. respectively. The convolution layer processing is the same as Firstly, the motivation for choosing the dual-stream structure the traditional convolution processing described in , and the and the CNN-LSTM model is explained. Next, the structure of rectified linear unit (ReLU) is used as the activation function. Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on July 28,2024 at 10:34:34 UTC from IEEE Xplore. Restrictions apply. 13524 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 69, NO. 11, NOVEMBER 2020 Fig. 1. The CNN-LSTM based dual-stream architecture. Besides, the zero-padding is imposed to regulate the output Hence, the output of the model can be depicted as shape before each convolution layer. After learning the features of the three convolution layers, an ŷ = F xj I/Q , xj A/P ; θ , (9) 80 × 134 feature map is obtained. It will be used as the input of LSTM layers, which are composed of LSTM 1 and LSTM where θ is the model parameter obtained by training, F (·) de- 2 with 100 and 50 memory units, respectively. These LSTM notes the overall function of classifier based on the dual-stream units connected in series use the input gate and the forget gate to structure. extract the time correlation as depicted in. In the connection Alternatively, the output can also be represented as process, a 100 × 134 feature map is the collection of feature vectors of all units in LSTM 1, and is also the input of LSTM 2. ŷ = [ŷ1 , ŷ2 , · · · , ŷK ]T , (10) In LSTM 2, retaining the output of the last cell, the structure gets where K is the number of modulation schemes, and the element the 50 × 1 feature vectors as the elements of feature interaction. ŷk , k = 1, 2,... , K, is defined as Additionally, the dropout method is employed on both CNN layers and LSTM layers with a dropout rate p = 0.5 to avoid ex k ŷk = K , (11) overfitting. In Stream 2, the operation is the same as that in i=1 ex i Stream 1. To increase the diversity of features, an effective operation where xi denotes the i-th pre-activation output. inspired by the work is performed to make the features In this work, the feature extraction is achieved via CNN- learned from two streams interact in pairs. Concretely, let fA LSTM. Most importantly, an effective operation is employed to and fB respectively denote the feature functions of Stream 1 strengthen the interactions between different features extracted I/Q from two different representations for further improvement. and Stream 2 with the corresponding input I/Q vector xj A/P and A/P vector xj , then the final features can be obtained C. Model Training and Testing by In our implementation, the dataset is split into training set T z = fA I/Q xj A/P fB xj. (8) and testing set. To ensure that the training set is completely different from testing set, a seed is utilized for producing random mutex array indices to divide the dataset into two sets. In the This operation may result in large memory overhead when the I/Q A/P phase of training, the model is executed epoch by epoch. In dimensions of fA (xj ) and fB (xj ) are high. For instance, each epoch, the training samples are shuffled and the mini-batch the outer product of 512-dimensional features brings about a is imposed to speed up the process of parameters learning. In 512 × 512 dimensional representation. However, the sizes of addition, the categorical cross entropy error is adopted as the I/Q A/P features extracted from xj and xj are both only 50 × 1 loss function, which can obtain accurate classification results in our work. Intuitively, the outer product adjusts the outputs of with less calculation and fast convergence. The loss function feature functions fA and fB on each other by interacting in pairs. can be rewritten as This is similar to the feature expansion in a quadratic kernel. Nb After the outer product, the output of size 50 × 50 will be 1 L(θ) = − yi log ŷi + (1 − yi ) log(1 − ŷi ), (12) reshaped into 2500-dimensional features via a flatten layer, Nb i=1 which contains various effective extraction information. Then the feature vector is combined with a dense softmax layer to where yi represents the target output, and Nb is the number of map the collected features into a candidate modulation scheme. samples in a training batch. Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on July 28,2024 at 10:34:34 UTC from IEEE Xplore. Restrictions apply. ZHANG et al.: AUTOMATIC MODULATION CLASSIFICATION USING CNN-LSTM BASED DUAL-STREAM STRUCTURE 13525 The adaptive moment estimation (Adam) , which is an TABLE I RADIOML2016.10A PARAMETERS extension of stochastic gradient descent (SGD) algorithm, em- ploys default settings to minimize the loss function in this work. The network parameters are updated in an iterative manner with the expression as θt+1 = θt − η∇L(θt ), (13) where η is the learning rate, and ∇ represents the gradient operator. The Adam optimizer is verified to be efficient in deep learning models, because it enables the independent design of adap- tive learning rate η for different parameters by calculating the gradient estimates of the first and second moments, which is conducive to accelerate the convergence process. The parameter set θ and validation loss are updated continuously with each iteration, and the optimal model is obtained until the validation loss becomes minimum. Consequently, the specific training process are described as follows. Step 1: Randomly initialize model parameter θ. Step 2: Randomly separate training set X into mini-batches. delay profile, local oscillator offset, and frequency selective Step 3: Randomly choose one mini-batch Xb. fading. Meanwhile, it has 11 different modulation signals with Step 4: Convert signal Xb into I/Q and A/P representations SNR ranging from -20 to 18 dB in 2 dB increments. These mod- for using as model input. ulation formats, including 8PSK, AM-DSB, AM-SSB, BPSK, Step 5: Utilize model to obtain final predictive label distri- CPFSK, GFSK, PAM4, QAM16, QAM64, QPSK, and WBFM, bution. are the most representative in modern mobile communications Step 6: Calculate the error between predictive and true label and are widely used in wireless communication systems around distributions by Eq. (12). us. Among them, four analog modulations (AM-DSB, AM-SSB, Step 7: Update parameter θ on Xb in the process of reverse PAM4 and WBFM) are added to the modulation types to improve propagation by Eqs. (12) and (13). the processing performance of the proposed structure in actual Step 8: Repeat Steps 4-7 until L(θ) keeps conver- scenarios. As the benchmark for modulation classification , gence. these modulation formats with similar characteristics are ex- tremely challenging for classification and recognition, which IV. EXPERIMENTS can effectively reflect the performance of the proposed method. In this section, some experiments are implemented to illustrate Furthermore, this dataset has a total of 220000 examples, each the effectiveness of the proposed method. Since each method has modulation format has 1000 examples per SNR, and each exam- its own advantages, it is effective for recognizing the particular ple has 8 symbols and 128 samples. Considering the imperfect types of modulations under the given signal model, but may transmission, the dataset contains actual channel defects, such not be suitable for other types of modulations. For instance, the as channel frequency offset, sampling rate offset, and noise. The work in achieved a good performance under an AWGN more explicit parameters used for signal generation are listed in signal model. However, this is an idealized model hypothesis, Table I, and the more detailed specifications of the model for it is not competent for the complex signals studied in this dataset generation can be available in. paper. For fair comparison, the proposed method is compared with the latest research methods based on DL (CNN-I , CNN-II , CM-CNN and CLDNN ), and the classical B. Results and Discussions methods based on machine learning (random forest (with 150 Due to the severe distortion of signals under low SNRs, the trees) , KNN , Gaussian Naive Bayes , SVM , models proposed in all methods are unable to learn meaningful and Expert-SVM ). Also, the ablation analysis and the run features, resulting in less differentiating and unsatisfactory per- time are included. Finally, the flexibility of the proposed method formance. This fact can also be seen from the subsequent figures. is evaluated on the recognition of orthogonal frequency division Therefore, in our simulations, the performance of all methods multiplexing (OFDM) digital modulation. with SNR ranging from -6 to 18 dB is displayed. Firstly, the newly studied method CM-CNN is considered, A. Experimental Dataset which introduced a signal distortion correction module to elimi- All comparisons are implemented with the public available nate the influence of random frequency and phase noise as well as dataset provided in. The dataset is synthetically generated co-trained with CNN. To enhance the diversity of comparisons, by using GNU Radio. Specifically, the used RadioML2016.10a CNN-I, CNN-II and CLDNN are also performed in the experi- dataset contains some practical impairments, such as power ments. Specifically, CNN-I utilized the time domain IQ data to Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on July 28,2024 at 10:34:34 UTC from IEEE Xplore. Restrictions apply. 13526 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 69, NO. 11, NOVEMBER 2020 Fig. 2. Comparison of the proposed method and DL-based methods at different Fig. 4. The combined effect of CNN and LSTM on classification accuracy at SNRs. different SNRs. methods, even the current best method CM-CNN, its best per- formance is about 88%, while our method is more than 85% at a SNR range of 0 to 18 db, and the average accuracy is over 90% at a SNR range of 4 to 18 db. This implies that our method has higher classification accuracy under high SNRs. Fig. 3 illustrates the comparison of the proposed method and machine learning-based methods under different SNRs. The performance of our method is far superior to that of machine learning-based methods. It is easy to see that the best machine learning-based method Expert-SVM obtains about 75% of ac- curacy with SNR in a high level, but far from 90%. Thus, it is safe to conclude that the DL-based methods has a significant improvement compared with others when inputting the raw com- plex signals. Also, this justifies that DL models have tremendous advantages in classifying modulation signals with a short length of samples compared with expert features-based models who need thousands of samples to average. In summary, our method has shown its advantages, whether compared with the DL-based methods or the machine learning- based methods. To find out the reasons that affect performance, Fig. 3. Comparison of the proposed method and machine learning-based the ablation analysis is performed (please refer to Figs. 4–8). methods at different SNRs. Fig. 4 demonstrates the combined effect of CNN and LSTM on classification accuracy under different SNRs. In particular, CNN-LSTM-IQ indicates that only time domain IQ data is used train a CNN model and was proved to have a significant im- to train CNN-LSTM model like CNN-I, both of them take IQ provement upon those of the expert features based approaches. data as the input. Likewise, CNN-LSTM-AP and CNN-II both CNN-II used a CNN model to operate on the amplitude and use amplitude and phase information as the input. It can be phase information of the received signals. CLDNN adopted seen that using IQ samples to train the combination model of three layer convolutions and a recurrent layer to train the time CNN-LSTM achieves significant enhancement 9% of accuracy domain IQ data. Besides, the classical methods such as random compared with CNN-I. Similarly, adopting amplitude and phase forest (with 150 trees), KNN, Gaussian Naive Bayes, SVM, and representation to the CNN-LSTM model can also obtain a slight Expert-SVM are also compared. The classification accuracies improvement compared with CNN-II. Hence, the combination of the above mentioned methods are displayed in Figs. 2 and 3, of CNN and LSTM is a practical way to enhance the clas- respectively. sification performance. Moreover, it is evident that the clas- It is noteworthy in Fig. 2 that the proposed method performs sification accuracies of CNN-LSTM-IQ and CNN-LSTM-AP best in all methods under different SNRs. In the DL-based are both inferior to the proposed method. Because our superior Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on July 28,2024 at 10:34:34 UTC from IEEE Xplore. Restrictions apply. ZHANG et al.: AUTOMATIC MODULATION CLASSIFICATION USING CNN-LSTM BASED DUAL-STREAM STRUCTURE 13527 performance is obtained via the interaction of the features ex- tracted from two different representations. From the above discussion, the excellent performance of the proposed method can be attributed to two reasons. One is the combination of CNN and LSTM that can extract more effective features. The other is the special interaction mechanism that increases the diversity of features. To better understand what limits the average accuracy, the confusion matrices for various SNRs and the impact of model architectures with different CNN layers and LSTM layers are visualized in Figs. 5 and 6, respectively. As shown in Fig. 5, three asymmetric confusion matrices are obtained, because error identification is irreversible in AMC. For example, 8PSK is misjudged as AM-SSB, and AM-SSB is not judged as 8PSK. From Fig. 5(a), it can be noticed that the classification accuracies of most modulation schemes are in a high level excluding WBFM, QAM16 and QAM64 in the case of SNR=8 dB. The main discrepancy is that WBFM is misidentified as AM-DSB. This can be explained by the fact that there is only one carrier tone in the silent period of analog signal to make these instances indiscernible. Furthermore, Fig. 5(b) show that, at the SNR of 0 dB, there exists a certain extent of confusion between QAM16 and QAM64. This can be attributed to the reality that QAM16 is the subset of QAM64. As depicted in Fig. 5(c), the diagonal is getting unsharp and the degree of confusion becomes increasingly serious with the decrease of SNR. Especially, AM-DSB, AM-SSB and PAM4 can be accurately identified at high or low SNR. This implies that the amplitude modulation can be effectively identified by the proposed method. As a result, the above basic analysis clearly reveals that our method can recognize most modulation types with high accuracy at the high SNR. Fig. 6 shows the influence of model architectures with differ- ent CNN layers and LSTM layers. For simplicity, the classifi- cation performance of different models at the SNR of 8 dB and -6 dB is displayed. The number of CNN layers is increasing from 1 to 4, and the number of filters used in each respective layer is 256, 256, 80, 80 with sizes of 1x3, 2x3, 1x3 and 1x3, respectively. Most concretely, 1-layer indicates the only one LSTM layer with 50 cells; 2-layer denotes two LSTM layers with 100 and 50 cells, respectively; 3-layer represents three LSTM layers with 100, 50 and 50 cells, respectively. From the experimental results in Fig. 6, it can be clearly seen that the accuracy enhances with the increase of CNN layers, and reaches saturation at three CNN layers. This phenomenon can be explained by the fact that the model is unable to learn more effective features when using the less CNN layers. Besides, it can also observe that the performance of two LSTM layers is slightly superior than others at the SNR of 8 dB. Nevertheless, the classification performance of these model architectures is poor at the low SNR of -6 dB, and the increase of LSTM layers does not ensure a certain performance promotion. This is mainly because the severe distortion of signals makes it more difficult for these models to extract the powerful representations. Therefore, based Fig. 5. Confusion matrices for the proposed method at different SNRs. (a) SNR = 8 dB. (b) SNR = 0 dB. (c) SNR = −6 dB. on the aforementioned discussion, three CNN layers and two Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on July 28,2024 at 10:34:34 UTC from IEEE Xplore. Restrictions apply. 13528 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 69, NO. 11, NOVEMBER 2020 Fig. 6. Performance of different model architectures at different SNRs. Fig. 8. The sensitivity of the proposed method versus batch sizes. reaching the sub-optimal solution. Similarly, a high learning rate (such as 0.001) results in quick convergence, but it is possible to skip the optimal solution, especially in the case of low SNRs. Thus, increasing the learning rate will reduce the classification accuracy. Therefore, the learning rate of 0.0001 is selected and the batch size is analyzed below in this case. Fig. 8 displays the average classification accuracy of the proposed method under different batch sizes. It can be observed that all batch sizes can eventually ensure convergence to the optimal performance in 400 epochs. As the batch size increases, more epochs of training are required to achieve convergence. Considering the total time cost (epochs × training time of each epoch), the batch size is set to 64 in our proposed method. The main reason is that small batch sizes (like 32) lead to quick convergence but require more time cost in each epoch (120 × 132 s), while large batch sizes (like 128 or 256) just do the opposite (250 × 49 s or 390 × 36 s). The performance is the most comprehensive (160 × 75 s) only when the batch size is Fig. 7. Performance of the proposed method with different learning rates versus different SNRs. equal to 64, so this configuration is chosen for the proposed method. Next, the run time is also estimated by using Python on the LSTM layers are chosen as our final model architecture in this NVIDIA DGX server with CPU Intel E5, 2.20 GHZ, GUP V100 paper. and 32 G of video memory. The average training and testing run To maximize the accuracy rate and optimize the training time, time per instance of several approaches is demonstrated in Fig. 9. the sensitive parameters of deep learning including learning rate It can be noticed that the training and testing time of all methods and batch size are considered. Figs. 7 and 8 display the impact are milliseconds, and the difference is small. Although our of learning rate and batch size on classification accuracy under method is slightly more time-consuming than other methods, the different SNRs. main reason is that the feature interaction consumes a little time. Fig. 7 illustrates the performance of the proposed method at But for most communication systems, such a time-demanding different SNRs when the learning rate is changed. It can be seen can be tolerated and allowed. With the rapid development of that in the off-line training phase with the limited 400 epochs, the graphics processing unit (GPU) technology and the continuous proposed method at the learning rate of 0.0001 has the highest evolution of DL, it is believed that the feasibility of the proposed accuracy, and the performance decreases at the learning rate of method will be further improved. 0.001 or 0.00001. This is because a low learning rate (such as Finally, the flexibility of the proposed method is explored on 0.00001) requires a lot of iterations and training, which leads to the recognition of OFDM digital modulation. In the verification slow convergence and may not reach the optimal solution after experiment, the commonly used digital modulation types for Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on July 28,2024 at 10:34:34 UTC from IEEE Xplore. Restrictions apply. ZHANG et al.: AUTOMATIC MODULATION CLASSIFICATION USING CNN-LSTM BASED DUAL-STREAM STRUCTURE 13529 Fig. 9. Comparison of run time of different methods. Fig. 10. Performance of common OFDM digital modulation types at different SNRs. OFDM are selected, including BPSK, QPSK, 8PSK, QAM16, QAM64 and PAM4. According to the source generation, signal modulation, channel simulation and data standardization in , a dataset about OFDM signal is produced. Specifically, the OFDM signal contains 16 subcarriers, 6 symbols per subcarrier, 2-length of cyclic prefix and 256 fast Fourier transform points. Furthermore, the used channel is a Rician fading channel with a Rician K-factor of 4, which is polluted by an AWGN. The per- formance of each modulation classification is shown in Figs. 10 and 11. In Fig. 10, it can be observed that with the increase of SNR, the classification accuracy is gradually improved. Clearly, OFDM- based digital modulation signals at low SNRs are difficult to identify accurately. At high SNRs, the recognition accuracy of BPSK and PAM4 can be close to 100%, while the recognition Fig. 11. Confusion matrices for the proposed method under common OFDM accuracy of 8PSK, QAM16 and QAM64 is approximately 75%, digital modulation types at different SNRs. (a) SNR = 18 dB. (b) SNR = 2 dB. and the recognition accuracy of QPSK is about 65%. This result (c) SNR = −6 dB. Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on July 28,2024 at 10:34:34 UTC from IEEE Xplore. Restrictions apply. 13530 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 69, NO. 11, NOVEMBER 2020 indicates that the proposed method can effectively identify most J. L. Xu, W. Su, and M. C. Zhou, “Software-defined radio equipped with digital modulation types of OFDM at high SNRs. rapid modulation recognition,” IEEE Trans. Veh. Technol., vol. 59, no. 4, pp. 1659–1667, Feb. 2010. In Fig. 11, the confusion matrices of common OFDM digital O. A. Dobre, A. Abdi, Y. Bar-Ness, and W. Su, “Survey of automatic modu- modulation types under the SNRs of 18 dB, 2 dB and -6 dB are lation classification techniques: Classical approaches and new trends,” IET given. These confusion matrices show that QPSK and 8PSK are Commun., vol. 1, no. 2, pp. 137–156, Apr. 2007. F. Hameed, O. A. Dobre, and D. Popescu, “On the likelihood-based easier to be confused, and QAM64 is easier to be misjudged approach to modulation classification,” IEEE Trans. Wireless Commun., as QAM16. This is mainly because the proposed method cannot vol. 8, no. 12, pp. 5884–5892, Dec. 2009. fully extract the characteristics of the OFDM signal after the fast J. L. Xu, W. Su, and M. C. Zhou, “Likelihood ratio approaches to automatic modulation classification,” IEEE Trans. Syst. Man. Cybern. Part C, vol. 41, inverse Fourier transform. However, it can accurately classify no. 4, pp. 455–469, Jul. 2011. the phase modulation, amplitude modulation, pulse modulation J. Zheng and Y. Lv, “Likelihood-based automatic modulation classification of OFDM, showing its effectiveness in classifying modulation in OFDM with index modulation,” IEEE Trans. Veh. Technol., vol. 67, no. 9, pp. 8192–8204, May 2018. types. H. C. Wu, M. Saquib, and Z. Yun, “Novel automatic modulation classifica- tion using cumulant features for communications via multipath channels,” IEEE Trans. Wireless Commun., vol. 7, no. 8, pp. 3098–3105, Aug. 2008. V. CONCLUSION W. Su, “Feature space analysis of modulation classification using very high order statistics,” IEEE Commun. Lett., vol. 17, no. 9, pp. 1688–1691, In this paper, an AMC method using CNN-LSTM based Aug. 2013. dual-stream structure, which takes the features interaction into D. Boutte and B. Santhanam, “A hybrid ICA-SVM approach to continuous account and combines the advantages of CNN and LSTM, has phase modulation recognition,” IEEE Signal Proc. Lett., vol. 16, no. 5, pp. 402–405, May 2009. been proposed. Specifically, the signals are preprocessed to L. Xie and Q. Wan, “Cyclic feature based modulation recognition us- transform into temporal I/Q format and A/P representation, ing compressive sensing,” IEEE Wireless Commun. Lett., vol. 6, no. 3, which is beneficial to obtain more effective features for classi- pp. 402–405, Apr. 2017. L. Han, F. Gao, Z. Li, and O. Dobre, “Low complexity automatic mod- fication. Then, CNN and LSTM are fully combined (denoted ulation classification based on order-statistics,” IEEE Trans. Wireless as CNN-LSTM) to extract the spatial and temporal features Commun., vol. 16, no. 1, pp. 400–411, Jan. 2017. from each signal representation, which can take full advantages S. Huang, Y. Yao, Z. Wei, Z. Feng, and P. Zhang, “Automatic modulation classification of overlapped sources using multiple cumulants,” IEEE of CNN in spatial feature extraction and LSTM in processing Trans. Veh. Technol., vol. 66, no. 7, pp. 6089–6101, Jul. 2017. time series data. Most importantly, an effective operation has L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, been performed to strengthen the interactions between different Oct. 2001. T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Trans. features extracted from two representations to further improve Informat. Theory, vol. 13, no. 1, pp. 21–27, Jan. 1967. performance. Finally, the effectiveness of the proposed method N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian network classi- has been discussed under various SNRs. Also, the experimental fiers,” Mach. Learn., vol. 29, no. 2-3, pp. 131–163, Nov. 1997. C. Cortes and V. Vapnik, “Support-vector network,” Mach. Learn., vol. 20, results have revealed that the combination of CNN and LSTM no. 3, pp. 273–297, Sep. 1995. along with the interactions of features learned from different O. Russakovsky et al., “Imagenet large scale visual recognition challenge,” representations can achieve a significant improvement compared Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Jan. 2015. Z. Zhang, Y. Zou, and C. Gan, “Textual sentiment analysis via three differ- with the existing state-of-the-art methods. ent attention convolutional neural networks and cross-modality consistent In the future, the study could be continued the following regression,” Neurocomputing, vol. 275, pp. 1407–1415, Jan. 2018. aspects. On the one hand, this paper mainly focuses on the C. Gan, L. Wang, Z. Zhang, and Z. Wang, “Sparse attention based separable dilated convolutional neural network for targeted sentiment analysis,” classification of single source. However, there usually exist the Knowl.-Based syst., vol. 188, no. 1, pp. 1–10, Jan. 2020. situations of multiple modulation signals overlapping in non- Y. Tokozume and H. Tatsuya, “Learning environmental sounds with end- cooperative communication. Therefore, it is great significance to to-end convolutional neural network,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Mar. 2017, pp. 2721–2725. explore AMC for overlapped sources. On the other hand, there is C. Luo, K. Zhang, S. Salinas, and P. Li, “Secfact: Secure large-scale QR almost no a priori information about the received signals in actual and LU factorizations,” IEEE Trans. Big Data, early access, Dec. 13, 2017, communication. Thus, it is essential to develop more effective doi: 10.1109/TBDATA.2017.2782809. C. Luo, J. Ji, Q. Wang, X. Chen, and P. Li, “Channel state information AMC methods for unknown channel conditions and modulation prediction for 5G wireless communications: A deep learning approach,” parameters. Additionally, OFDM is widely used in practice, IEEE Trans. Netw. Sci. Eng., vol. 7, no. 1, pp. 227–236, Jan.–Mar. 2020. but the classification accuracy of the proposed method still T. J. O’Shea, J. Corgan, and T. C. Clancy, “Convolutional radio modulation recognition networks,” in Proc. Int. Conf. Eng. Appl. Neural Netw., 2016, needs to be improved. It is necessary to improve the proposed pp. 213–226. method to OFDM in the future work. Regarding the modulation M. Kulin, T. Kazaz, I. Moerman, and E. D. Poorter, “End-to-end learn- identification of OFDM, a feasible solution is to add a carrier ing from spectrum data: A deep learning approach for wireless signal identification in spectrum monitoring applications,” IEEE Access, vol. 6, estimation module separately, and then use the proposed method pp. 18484–18501, Mar. 2018. to modulate and identify a single carrier. Of course, how to deal K. Yashashwi, A. Sethi, and P. Chaporkar, “A learnable distortion correc- with the above issues will be the focus of our next work. tion module for modulation recognition,” IEEE Wireless Commun. Lett., vol. 8, no. 1, pp. 77–80, Feb. 2019. D. Wang et al., “Modulation format recognition and OSNR estimation REFERENCES using CNN-based deep learning,” IEEE Photon. Technol. Lett., vol. 29, no. 19, pp. 1667–1670, Aug. 2017. B. Ramkumar, “Automatic modulation classification for cognitive radios R. Li, L. Li, S. Yang, and S. Li, “Robust automated VHF modulation using cyclic feature detection,” IEEE Circ. Syst. Mag., vol. 9, no. 2, recognition based on deep convolutional neural networks,” IEEE Commun. pp. 27–45, Jun. 2009. Lett., vol. 22, no. 5, pp. 946–949, Feb. 2018. Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on July 28,2024 at 10:34:34 UTC from IEEE Xplore. Restrictions apply. ZHANG et al.: AUTOMATIC MODULATION CLASSIFICATION USING CNN-LSTM BASED DUAL-STREAM STRUCTURE 13531 S. Peng et al., “Modulation classification based on signal constellation Chun Wang received the B.E. degree in 2016 from diagrams and deep learning,” IEEE Trans. Neural Netw. Learn. Syst., the Chongqing University of Posts and Telecommu- vol. 30, no. 3, pp. 1–10, Jul. 2018. nications, Chongqing, China, where he is currently Z. Zhang, C. Wang, C. Gan, S. Sun, and M. Wang, “Automatic modulation working toward the master’s degree in information classification using convolutional neural network with features fusion and communication engineering. His research inter- of SPWVD and BJD,” IEEE Trans. Signal Inf. Proc., vol. 5, no. 3, ests include mmWave mobile communications and pp. 469–478, Sep. 2019. machine learning. S. Rajendran, W. Meert, D. Giustiniano, V. Lenders, and S. Pollin, “Deep learning models for wireless signal classification with distributed low-cost spectrum sensors,” IEEE Trans. Cognit. Commun. Netw., vol. 4, no. 3, pp. 433–445, May 2018. N. E. West and T. O’Shea, “Deep architectures for modulation recogni- tion,” in IEEE Int. Symp. Dyn. Spectr. Access Netw., 2017, pp. 1–6. A. Selim, F. Paisana, J. A. Arokkiam, Y. Zhang, L. Doyle, and L. A. Dasilva, “Spectrum monitoring for radar bands using deep convolutional Chenquan Gan received the B.S. degree from the neural networks.” in Proc. IEEE Global Commun. Conf., 2017, pp. 1–6. Department of Mathematics, Inner Mongolia Nor- Y. LeCun et al., “Backpropagation applied to handwritten zip code recog- mal University, Huhhot, China, in 2010, and the nition,” Neural Comput., vol. 1, no. 4, pp. 541–551, Dec. 1989. Ph.D. degree from the Department of Computer Sci- F. A. Gers, J. Schmidhuber, and F. A. Cummins, “Learning to forget: ence, Chongqing University, Chongqing, China, in Continual prediction with LSTM,” Neural Comput., vol. 12, no. 10, 2015. He is currently an Associate Professor with the pp. 2451–2471, Oct. 2000. School of Communication and Information Engineer- T. Y. Lin, A. Roychowdhury, and S. Maji, “Bilinear convolutional neural ing, Chongqing University of Post and Telecommu- networks for fine-grained visual recognition,” IEEE Trans. Pattern Anal., nications, Chongqing, China. His research interests vol. 40, no. 6, pp. 1309–1322, Jun. 2018. include difference equations, computer virus propa- D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” gation dynamics, deep learning, and blockchain. 2014, arXiv:1412.6980. T. J. O’ Shea and N. West, “Radio machine learning dataset generation with GNU radio,” in Proc. GNU Radio Conf., 2016, pp. 1–6. S. Hong et al., “Convolutional neural network aided signal modula- tion recognition in OFDM systems,” in IEEE 91st Veh. Technol. Conf. (VTC2020-Spring), 2020, pp. 1–5. Yong Xiang (Senior Member, IEEE) received the Zufan Zhang received the B.E. and M.E. degrees Ph.D. degree in electrical and electronic engineering from the Chongqing University of Post and Telecom- from The University of Melbourne, Melbourne, VIC, munications (CQUPT), Chongqing, China, in 1995 Australia. He is currently a Professor with the School and 2000, respectively, and the Ph.D. degree in of Information Technology, Deakin University, Mel- communications and information systems from the bourne, VIC, Australia. His research interests include University of Electronic Science and Technology of information security and privacy, signal and image China (UESTC), Chengdu, China, in 2007. He is processing, data analytics and machine intelligence, currently a Professor with the School of Communica- Internet of Things, and blockchain. He has authored tion and Information Engineering, CQUPT. He was a or coauthored five monographs, more than 150 refer- Visiting Professor with Centre for Wireless Commu- eed journal articles, and numerous conference papers nications (CWC), Oulu of University, Finland, from in these areas. He is the Senior Area Editor for the IEEE SIGNAL PROCESSING 2011 to 2012. His current main research interests include wireless and mobile LETTERS, an Associate Editor for the IEEE COMMUNICATIONS SURVEYS AND communication networks. TUTORIALS, and a Guest Editor for the IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS. He was an Associate Editor for the IEEE SIGNAL PROCESSING LETTERS and IEEE ACCESS, and the Guest Editor for the IEEE MULTIMEDIA. He has served as Honorary Chair, General Chair, Program Chair, TPC Chair, Hao Luo received the B.E. degree in 2018 from the Symposium Chair, and Track Chair for many conferences, and was invited to Chongqing University of Posts and Telecommunica- give keynotes at number of international conferences. tions, Chongqing, China, where he is currently work- ing toward the master’s degree in information and communication engineering. His research interests include deep learning and wireless communication. Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on July 28,2024 at 10:34:34 UTC from IEEE Xplore. Restrictions apply.