Intrusion detection and traffic classification using application-aware traffic profiles

Universidade de Aveiro Departamento de Electrónica, Telecomunicações e Informática. 2017. Hassan Alizadeh Deteção de Intrusões e Classificação de Tráfego usando Perfis de Tráfego de Aplicações. Intrusion Detection and Traffic Classification using Application-aware Traffic Profiles. Universidade de Aveiro Departamento de Electrónica, Telecomunicações e Informática. 2017. Hassan Alizadeh Deteção de Intrusões e Classificação de Tráfego usando Perfis de Tráfego de Aplicações. Intrusion Detection and Traffic Classification using Application-aware Traffic Profiles. Tese apresentada às Universidades do Minho, Aveiro e Porto para cumpri- mento dos requisitos necessários à obtenção do grau de Doutor em Enge- nharia Eletrotécnica no âmbito do programa doutoral MAP-tele, realizada sob a orientação cient́ıfica do Doutor André Zúquete, Professor Auxiliar do Departamento de Eletrónica, Telecomunicaçães e Informática da Universi- dade de Aveiro.. This research was funded by National Funds through FCT - Fundação para a Ciência e a Tecnologia, in the context of the PhD scholarship SFRH/BD/84037/2012.. To Samaneh for her love, continuous support and encouragement.. o júri / the jury. presidente / president Prof. Doutor Carlos Alberto Diogo Soares Borrego Professor Catedrático, Universidade de Aveiro. vogais / examiners committee. Prof. Doutor Rui Jorge Morais Tomaz Valadas Professor Catedrático, Universidade de Lisboa. Prof. Doutor Paulo Jorge Salvador Serra Ferreira Professora Auxiliar, Universidade de Aveiro. Prof. Doutor Alexandre Júlio Teixeira dos Santos Professor Associado com Agregação, Universidade do Minho. Prof. Doutor Miguel Nuno Dias Alves Pupo Correia Professor Associado, Universidade de Lisboa. Vogal - Orientador Prof. Doutor André Ventura da Cruz Marnoto Zúquete Professor Auxiliar, Universidade de Aveiro. agradecimentos / acknowledgements. First and foremost, I am very grateful to my advisor, Professor André Zúquete for sharing his knowledge, time, encouragement, and support through every stage of this journey. I am indeed fortunate to have had him as my advisor.. I am very thankful to Professor Luis Almeida for his valuable support during the time of thesis preparation, which will always be remembered with gratitude. Thank you so much.. Next, I must thank all the people who have created such a warm and friendly atmosphere at IEETA and IT Porto.. At Ryerson University, I had the honour to work with Professor Ali Miri. Thank you so much for your valuable suggestions and fruitful discussions.. Thanks to MAP-Tele doctoral programme, for giving the chance to meet such wonderful people around the world who helps me grow, not only pro- fessionally but also on a personal level.. I would like to express my deepest love to my wife Samaneh whose love, care, support and encouragement have always been there for me. Thank for being there on my life.. I express my respects and thanks to my parents for their blessings and prayers for me.. Above all, countless thanks to God for all the blessings bestowed upon me.. palavras- chave. Perfis de tráfego de aplicações, Deteção de intrusão, Classificação de tráfego de rede, Deteção de anomalias, Modelo Gaussian Mixture, Modelo Universal de Fundo.. Resumo Em paralelo com o número crescente de aplicações e usuários finais, os ataques em linha na Internet e as gerações avançadas de malware têm proliferado continuadamente. Muitos estudos abordaram a questão da deteção de intrusões através da inspeção do tráfego de rede agregado, sem o conhecimento das aplicações / serviços responsáveis. Esses sistemas podem detetar tráfego anormal, mas não conseguem detetar intrusões em aplicações sempre que seu tráfego anormal encaixa nos perfis de normalidade da rede. Além disso, eles não conseguem identificar as aplicações infetadas por intrusões que são responsáveis pelo tráfego anormal.. Este trabalho aborda a deteção de intrusões em aplicações quando seu tráfego exibe anomalias. Para isso, precisamos: (1) vincular o tráfego a aplicações; (2) possuir perfis de tráfego por aplicação; e (3) detetar desvios dos perfis dado um conjunto de amostras de tráfego. O primeiro requisito foi abordado em trabalhos nossos anteriores. Assumindo que essa ligação está dispońıvel, o trabalho desta tese aborda os dois últimos tópicos na deteção de tráfego anormal e, assim, identificar a sua aplicação fonte (possivelmente infetada por um malware).. Os perfis de tráfego das aplicações não são um conceito novo, uma vez que os investi- gadores na área da Identificação e Classificação de Tráfego (TIC) utilizam-nos nos seus sistemas para identificar e categorizar amostras de tráfego por tipos de aplicações (ou tipos de interesse). Mas eles não parecem ter recebido muita atenção no âmbito dos sistemas de deteção de intrusões (IDS). Assim, primeiramente fornecemos um levantamento de estratégias de TIC, dentro de uma estrutura taxonómica, tendo como foco a forma como as técnicas de TIC existentes nos poderiam ajudar a lidar com perfis de tráfego de aplicações. Como resultado deste estudo, verificou-se que a maioria das metodologias TIC baseia-se nalguns pressupostos estat́ısticos (bem conhecidos) extráıdos de diferentes fontes de tráfego e usam técnicas de aprendizagem automática para construir os modelos (perfis) para o reconhecimento de quaisquer tipos de interesse ou protocolos aplicacionais. Além disso, a literatura de classificação de tráfego analisou algumas fontes de tráfego (por exemplo, primeiros pacotes de fluxos e subfluxos múltiplos) que não parecem ter recebido muita atenção no âmbito da IDS. Um IDS pode aproveitar essas fontes de tráfego para fornecer deteção atempada de intrusões antes de propagarem o seu tráfego infetado.. Primeiro, utilizamos modelos convencionais de mistura gaussiana (GMMs) para construir perfis por aplicação. Nenhuma informação prévia sobre a distribuição de dados de cada aplicação estava dispońıvel. Apesar da melhoria no desempenho, a estabilidade com dados de alta dimensionalidade e a calibração de um limiar adequado para a deteção de intrusões continuaram a ser um problema. Consequentemente, melhorámos a infraestru- tura de deteção através da introdução do modelo basal universal (UBM) para robustecer a aprendizagem do modelo espećıfico de cada aplicação. As abordagens de modelação que propomos também podem ser usadas cenários de classificação de tráfego, onde o objetivo é atribuir cada fluxo espećıfico a uma aplicação (tipo de interesse).. Os sistemas de deteção de anomalias propostos baseiam-se em mecanismos de limiar espećıficos de classes e globais, nos quais um limiar é definido no ponto de operação da Taxa de Erros Igual (EER) para determinar se um fluxo reivindicado por uma aplicação é genúıno.. Também investigamos a adequação das abordagens propostas com apenas alguns pacotes iniciais de um fluxo de tráfego, a fim de proporcionar um sistema de deteção mais eficiente e oportuno.. Para avaliar a eficácia das aproximações tomadas realizamos vários testes com múltiplos conjuntos de dados públicos, coletados em redes reais. Nas numerosas experiências que são relatadas, são fornecidas evidências da eficácia das abordagens propostas.. Keywords Applications’ traffic profiles, Intrusion detection, Network traffic classification, Anomaly detection, Gausian Miture Model, Universal Background Model.. Abstract Along with the ever-growing number of applications and end-users, online network attacks and advanced generations of malware have continuously proliferated. Many studies have addressed the issue of intrusion detection by inspecting aggregated network traffic with no knowledge of the responsible applications/services. Such systems may detect abnormal traffic, but fail to detect intrusions in applications whenever their abnormal traffic fits into the network normality profiles. Moreover, they cannot identify intrusion-infected applications responsible for the abnormal traffic.. This work addresses the detection of intrusions in applications when their traffic exhibits anomalies. To do so, we need to: (1) bind traffic to applications; (2) have per-application traffic profiles; and (3) detect deviations from profiles given a set of traffic samples. The first requirement has been addressed in our previous works. Assuming that such binding is available, this thesis’ work addresses the last two topics in the detection of abnormal traffic and thereby identify its source (possibly malware-infected) application.. Applications’ traffic profiles are not a new concept, since researchers in the field of Traffic Identification and Classification (TIC) make use of them as a baseline of their systems to identify and categorize traffic samples by application (types-of-interest). But they do not seem to have received much attention in the scope of intrusion detection systems (IDS). We first provide a survey on TIC strategies, within a taxonomy framework, focusing on how the referred TIC techniques could help us for building application’s traffic profiles. As a result of this study, we found that most TIC methodologies are based on some statistical (well-known) assumptions extracted from different traffic sources and make the use of machine learning techniques in order to build models (profiles) for recognition of either application types-of-interest or application-layer protocols. Moreover, the literature of traffic classification observed some traffic sources (e.g. first few packets of flows and multiple sub-flows) that do not seem to have received much attention in the scope of IDS research. An IDS can take advantage of such traffic sources in order to provide timely detection of intrusions before they propagate their infected traffic.. First, we utilize conventional Gaussian Mixture Models (GMMs) to build per-application profiles. No prior information on data distribution of each application is available. Des- pite the improvement in performance, stability in high-dimensional data and calibrating a proper threshold for intrusion detection are still main concern. Therefore, we improve the framework restoring universal background model (UBM) to robustly learn application specific models.. The proposed anomaly detection systems are based on class-specific and global thresholding mechanisms, where a threshold is set at Equal Error Rate (EER) operating point to determine whether a flow claimed by an application is genuine.. Our proposed modelling approaches can also be used in a traffic classification scenario, where the aim is to assign each specific flow to an application (type-of-interest).. We also investigate the suitability of the proposed approaches with just a few, initial packets from a traffic flow, in order to provide a more efficient and timely detection system.. Several tests are conducted on multiple public datasets collected from real networks. In the numerous experiments that are reported, the evidence of the effectiveness of the proposed approaches are provided.. Contents. 1 Introduction 1 1.1 Motivation and Problem Definition . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Main Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. 2 Traffic Classification for Managing Applications’ Networking Profiles 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 A Taxonomy of TIC Methodologies . . . . . . . . . . . . . . . . . . . . . . 9. 2.2.1 Classification types . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 Level of traffic classification . . . . . . . . . . . . . . . . . . . . . . 9 2.2.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.4 Real–time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.5 Robustness against evasion . . . . . . . . . . . . . . . . . . . . . . 12 2.2.6 Robustness against encryption . . . . . . . . . . . . . . . . . . . . 12 2.2.7 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13. 2.3 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.1 Packet matching Methods . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.2 Statistical-Based . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.3 Connection Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . 25. 2.4 Overview of Survey on TIC . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.6 A Combination of Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 33. 2.6.1 The need of a combination method . . . . . . . . . . . . . . . . . . 33 2.6.2 A blend of real-time approaches . . . . . . . . . . . . . . . . . . . . 33 2.6.3 A possible detection system . . . . . . . . . . . . . . . . . . . . . . 36. 2.7 Wrap Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37. ix. x CONTENTS. 3 Tools, Datasets and Performance Criteria 45 3.1 Features Extraction Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2 Description of the datasets used . . . . . . . . . . . . . . . . . . . . . . . . 47. 3.2.1 UNIBS Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.2 Measurement Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 48. 3.3 Flow Definition and Feature Extraction . . . . . . . . . . . . . . . . . . . 49 3.4 Performance Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51. 3.4.1 Evaluation Metrics in Traffic Classification . . . . . . . . . . . . . 51 3.4.2 Evaluation Metrics in Intrusion Detection . . . . . . . . . . . . . . 53. 3.5 Wrap up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57. 4 Using Classic Learning of Gaussian Mixture Model 59 4.1 Model Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59. 4.1.1 Why GMM? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.1.2 Model Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.1.3 Traffic Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.1.4 Application-aware traffic anomaly detection . . . . . . . . . . . . 62. 4.2 Experimental Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2.1 Verification Setup and Results . . . . . . . . . . . . . . . . . . . . 66 4.2.2 Classification Setup and Results . . . . . . . . . . . . . . . . . . . 71. 4.2.2.1 Classification Results . . . . . . . . . . . . . . . . . . . . 72 4.3 Wrap up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79. 5 Using an Unsupervised Learning of Gaussian Mixture Models 83 5.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.2 Model Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.3 GMM Parameter Estimations . . . . . . . . . . . . . . . . . . . . . . . . . 85. 5.3.1 The EM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3.2 Estimating the number of components . . . . . . . . . . . . . . . . 86. 5.4 Traffic Classification and Verification . . . . . . . . . . . . . . . . . . . . 88 5.4.1 Traffic Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.4.2 Traffic Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . 89. 5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.5.1 Building Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89. 5.5.1.1 Traffic Verification Results . . . . . . . . . . . . . . . . . 90 5.5.1.2 Traffic Classification Results . . . . . . . . . . . . . . . . 92 5.5.1.3 On Measurement dataset . . . . . . . . . . . . . . . . . . 92 5.5.1.4 On UNIBS dataset . . . . . . . . . . . . . . . . . . . . . . 93. 5.6 Wrap Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95. CONTENTS xi. 6 Using Universal Background Model 97 6.1 Universal Background Model . . . . . . . . . . . . . . . . . . . . . . . . . 98. 6.1.1 Hypothesis Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.1.2 UBM Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . 100 6.1.3 Application Specific Model . . . . . . . . . . . . . . . . . . . . . . . 100 6.1.4 Decision Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102. 6.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2.1 Verification Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2.2 Classification Results . . . . . . . . . . . . . . . . . . . . . . . . . . 107. 6.3 Wrap Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114. 7 Timely Traffic Classification and Verification using uGMM 117 7.1 Experimental Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 117. 7.1.1 Flow Definition and Feature Extraction . . . . . . . . . . . . . . . 118 7.1.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 7.1.3 Model design setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 7.1.4 Traffic Classification Experiments . . . . . . . . . . . . . . . . . . 119 7.1.5 Traffic Verification Experiments . . . . . . . . . . . . . . . . . . . 119. 7.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.2.1 Traffic Classification Results . . . . . . . . . . . . . . . . . . . . . 120 7.2.2 Traffic Verification Results . . . . . . . . . . . . . . . . . . . . . . . 126. 7.3 Wrap Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129. 8 Conclusions and Future Work 131 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132. Appendices 135. A Table of Features 137. xii CONTENTS. List of Figures. 3.1 Basic evaluation metrics. A good TIC method aims at minimizing FP and FN (the narrow ellipse). The broader ellipse contains more FP and FN and presents a poor TIC method. . . . . . . . . . . . . . . . . . . 54. 3.2 – Illustration of FAR and FRR for a given threshold τ over the genuine and impostor score distributions. . . . . . . . . . . . . . . . . . . . . . . . 55. 3.3 Illustration of FAR and FRR curves as a function of threshold settings. The decision threshold is set to the value corresponding to EER, where the curves of FAR and FRR intersect each other. . . . . . . . . . . . . . . 56. 3.4 An illustration of the system performance using detection error tradeoff (DET) curve. The closeness of the DET curve to the origin is called EER, where FRR=FAR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56. 4.1 The steps for training per-application GMMs to be used in both traffic verification and classification systems . . . . . . . . . . . . . . . . . . . . 64. 4.2 A matrix of scatter plots for 5,000 randomly selected flow samples from the measurement dataset, illustrating the distinguishing capability of the reduced flow feature subset obtained by SU feature selection method. 67. 4.3 Impact of number of GMM components on the verification results for Measurement dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69. 4.4 Effect of the number of the GMM components on the verification results for UNIBS dsataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70. 4.5 Effect of the number of per-class GMM components on the results of classification experiment performed on Measurement dataset by Flow (a), Packet (b) and Byte (c). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74. 4.6 Impact of the number of per-class GMM components on the results of classification experiment performed on UNIBS dataset by Flow (a), Packet (b) and Byte (c). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77. 6.1 The block diagram of the proposed system. . . . . . . . . . . . . . . . . . 99 6.2 Impact of the number of mixture components on the results of verifica-. tion experiment performed on Measurement dataset. . . . . . . . . . . . 104 6.3 Impact of the number of mixture components on the results of verifica-. tion experiment performed on UNIBS dataset. . . . . . . . . . . . . . . . 106. xiii. xiv LIST OF FIGURES. 6.4 Impact of the number of mixture components on the results of classification experiment performed on Measurement dataset by Flow (a), Packet (b) and Byte (c) after removing few Elephant flows specified in Table 4.5. . . . . . . . 109. 6.5 Impact of the number of mixture components on the results of classification experiment performed on UNIBS dataset by Flow (a), Packet (b) and Byte (c). 112. 7.1 Illustration of the distinguishing capability of the reduced flow feature subset obtained by SFS method using “Andrews plot” (top) and “Parallel coordinates plot” (bottom). . . . . . . . . . . . . . . . . . . . . . . . . . . . 121. 7.2 Assessment of the classifiers through their Overall Accuracy (left) and Aver- age F-measure (right) by Flow (top row), Byte (middle row) and Packet (bottom row) for different RPC values. RPC refers the total packet count in both forward and backward directions (ith packet) of a flow. . . . . . . . . . . . . . . . 124. 7.3 Average overall accuracy vs. size of training set. . . . . . . . . . . . . . . 126 7.4 HTER as a function of RPC for both the evaluation and test subsets us-. ing Global Threshold (GT) and Class-specific Thresholds (CST). These results were obtained using GT- and CST-dependent feature subsets shown in the 2nd and 3rd columns of Table 7.5, respectively for (a) and (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128. List of Tables. 2.1 Assessment of TIC methods . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Examples of features describing source in a full-flow [12] . . . . . . . . . 20 2.3 Strengths, weaknesses and challenges of real-time TIC methods for develop-. ing per-applications’ profiles. . . . . . . . . . . . . . . . . . . . . . . . . . . . 34. 3.1 The Measurement dataset - TCP Flows / Packets / Bytes break-down by applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50. 3.2 The UNIBS dataset - TCP flows/packets/bytes break-down by application categories for each day and all days alongside examples of protocols allocated to each category. . . . . . . . . . . . . . . . . . . . . . . . . . . . 52. 4.1 The first 9 most discriminative features derived using the SU algorithm performed on the training subset of Measurement dataset. . . . . . . . . 66. 4.2 The first 9 most discriminative features derived using the SU algorithm performed on the training subset of UNIBS dataset. . . . . . . . . . . . . 66. 4.3 Results obtained by the global thresholding technique (GT) for different applications in Measurement dataset. 16-components GMM were used to build per-application profiles. . . . . . . . . . . . . . . . . . . . . . . . . 70. 4.4 Results obtained by the class-specific threshold technique for different applications in UNIBS dataset. Each application category was trained using 16 components GMM. . . . . . . . . . . . . . . . . . . . . . . . . . . 71. 4.5 Count of Elephant flows (observed in the test subset) for each application along with the percent of Packets and Bytes they carry for that application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72. 4.6 Flow-based assessment – Average and Per-class F-Score along with Overall Accuracy – of the classifiers on Measurement dataset (the best and second best results per application are shown in bold and red, respectively). . . . . . . 75. 4.7 Packet-based assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.8 Byte-based assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.9 Flow-based assessment – Average and Per-class F-Score along with. Overall Accuracy – of the classifiers on Measurement dataset (the best and second best results are shown in bold and red, respectively). . . . . 78. xv. xvi LIST OF TABLES. 4.10 Packet-based assessment – Average and Per-class F-Score along with Overall Accuracy – of the classifiers on UNIBS dataset (the best and second best results are shown in bold and red, respectively). . . . . . . . 78. 4.11 Byte-based assessment – Average and Per-class F-Score along with Over- all Accuracy – of the classifiers on UNIBS dataset (the best and second best results are shown in bold and red, respectively). . . . . . . . . . . . 78. 5.1 Number of mixture components after training uGMM for each application on Measurement dataset. . . . . . . . . . . . . . . . . . . . . . . . . . 90. 5.2 Number of mixture components for each application category on UNIBS dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90. 5.3 Comparison of GT and CST techniques performance results on Evalu- ation and Test subsets of Measurement dataset . . . . . . . . . . . . . . . 91. 5.4 Comparison of GT and CST techniques performance results on Evalu- ation and Test subsets of UNIBS dataset . . . . . . . . . . . . . . . . . . 91. 5.5 Comparison of cGMM and uGMM approaches on HTER of Evaluation and Test subsets in Measurement dataset. . . . . . . . . . . . . . . . . . 92. 5.6 Comparison of cGMM and uGMM approaches on HTER of Evaluation and Test subsets in UNIBS dataset. . . . . . . . . . . . . . . . . . . . . . 92. 5.7 Flow-based assessment of cGMM and uGMM classification approaches on Measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94. 5.8 Packet-based assessment of cGMM and uGMM classification approaches on Measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94. 5.9 Byte-based assessment of cGMM and uGMM classification approaches on Measurement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94. 5.10 Flow-based assessment of cGMM and uGMM classification approaches on UNIBS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96. 5.11 Packet-based assessment of cGMM and uGMM classification approaches on UNIBS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96. 5.12 Byte-based assessment of cGMM and uGMM classification approaches on UNIBS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96. 6.1 Verification results obtained by the GT for different applications in Measurement dataset. A mixture of 512-components was used to build UBM and per-application models. . . . . . . . . . . . . . . . . . . . . . . . 105. 6.2 Comparison between GMM-UBM and uGMM on the results (HTER) of verification experiment performed on Evaluation and Test subsets of Measurement dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105. 6.3 Verification results obtained by the CST thresholding technique for different applications in UNIBS dataset. A mixture of 512-components was used to build UBM and per-application Models. . . . . . . . . . . . . 107. 6.4 Comparison between GMM-UBM (with 512 components) and uGMM on the results of verification experiment performed on Evaluation and Test subsets of UNIBS dataset. . . . . . . . . . . . . . . . . . . . . . . . . 107. LIST OF TABLES xvii. 6.5 Flow-assessment between uGMM and GMM-UBM (with 128 components) classification approaches with the Measurement dataset. . . . . . . . . . . . . . . 110. 6.6 Packet-assessment between uGMM and GMM-UBM (with 128 components) classification approaches with the Measurement dataset. . . . . . . . . . . . . 110. 6.7 Byte-assessment between uGMM and GMM-UBM (with 128 components) classification approaches with the Measurement dataset. . . . . . . . . . . . . . . 110. 6.8 Flow-based assessment between uGMM and GMM-UBM classification approaches on UNIBS dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113. 6.9 Packet-based assessment between uGMM and GMM-UBM classification approaches on UNIBS dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . 113. 6.10 Byte-based assessment between uGMM and GMM-UBM classification approaches on UNIBS dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113. 7.1 The best feature subsets obtained for each dataset (RPC) using SFS method applied in the proposed uGMM Traffic Classification approach. 120. 7.2 Flow-based assessment of the classifiers through Per-class F-measure and Overall Accuracy. Features set was extracted from the 9th packet of each flow (the best results are highlighted in bold) . . . . . . . . . . . 123. 7.3 Byte-based assessment of the classifiers through Per-class F-measure and Overall Accuracy. Features set was extracted from the 9th packet of each flow (the best results are highlighted in bold). . . . . . . . . . . . 125. 7.4 Packet-based assessment of the classifiers through Per-class F-measure and Overall Accuracy. Features set was extracted from the 9th packet of each flow (the best results are highlighted in bold). . . . . . . . . . . . 125. 7.5 The optimal feature subset selected by SFS algorithm on the evaluation subset for each RPC when the minimization of EER was the target function for obtaining the optimal GT (2nd column) and CST (3rd column).127. 7.6 Class-specific HTER of GT and CST on the test subset when RPC is 8. . 128. A.1 Table of Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137. xviii LIST OF TABLES. Acronyms. ANIDS Anomaly-based NIDS APSM Application Specific Model BNM Bayesian Neural Networks CEM Component-wise Expectation Maximization cGMM Classic GMM CST Class-Specific Threshold DET Detection Error Trade-off DNS Domain Name System DPI Deep Packet Inspection DT Decision Tree EER Equal Error Rate EM Expectation Maximization FA False Acceptance FAR False Acceptance Rate FN False Negatives FP False Positives FR False Rejection FRR False Rejection Rate FTP File Transfer Protocol GMM Gaussian Mixture Models GT Global Threshold HMM Hidden Markov Model HTER Half Total Error Rate HTTP Hypertext Transfer Protocol IANA Internet Assigned Numbers Authority k-NN k-Nearest Neighbor LI Lawful Interception MAP Maximum A Posteriori ML-GMM Maximum Likelihood GMM MML Minimum Message Length NB Naive Bayes NBKE Naive Bayes with Kernel Estimation. xix. xx Acronyms. NGFW Next Generation FireWall NIDS Network Intrusion Detection System P2P Peer-to-Peer RBF Radial Basis Function RFC Reference Packet Count SFS Sequential Forward Selection SMTP Simple Mail Transfer Protocol SNIDS Signature-based NIDS SSH Secure Shell SSL Secure Sockets Layer SU Symmetrical Uncertainty SVM Support Vector Machines TAG Traffic Activity Graph TDG Traffic Dispersion Graph TIC Traffic Identification and Classification TLS Transport Layer Security TN True Negatives TP True Positives UBM Universal Background Model uGMM Unsupervised learning of GMM UTM Unified Threat Management. CHAPTER 1. Introduction. The world is increasingly getting more dependent on the Internet, which is being used by the already huge and continuously increasing number of persons through vulnerable machines. The worldwide, Internet-connected population has grown from less than 49 million in 1995 to an estimate of more than 3.4 billions in 2016 2. Such widespread use of the Internet, along with its consequent ubiquitou and the development of information and communication technology has encouraged companies to rely on an ever-growing number of internet applications to offer a set of services to their customers.. Beside companies, this huge and ever-growing Internet-connected population has also attracted the attention of network criminals and hackers to exploit the Internet as a favourite platform to compromise vulnerable Internet-connected machines for performing their illegal activities, such as information stealing or machine enslaving in Botnets. Accordingly, online network attacks and advanced generations of malware is continuously being emerging and rapidly proliferating. This means that the Internet traffic has become more and more complex, and understanding the type of traffic over networks continues to be the main issue of network managers.. Both areas of Traffic Identification and Classification (TIC) and Network Intru- sion Detection System (NIDS), as two parallel research lines in the field of traffic monitoring and analysis, aim at mitigating the previously referred problem to support a wide set of network management tasks such as resource allocation, accounting, traffic scheduling, quality of service, lawful interception (LI) of IP traffic, and network security.. 2http://www.internetlivestats.com/internet-users/. 1. 2 Introduction. Even though TIC, and particularly NIDS, are well established research areas with long histories, there are still many open research problems. In this thesis, we study traffic classification techniques and intrusion detection systems, we analyse their methodologies, and we propose novel approaches that can be used for practical pur- poses.. 1.1 Motivation and Problem Definition. Most intrusions result in network activities. An effective strategy to detect those intrusions is to have robust techniques for inspecting network traffic. Detection of intrusions from traffic inspection, as part of a NIDS, can be performed by matching observed traffic against a pre-configured set of (well-known) intrusion signatures (signature-based NIDS), or by noticing deviations from pre-defined models (profiles) describing normal traffic (behaviour or anomaly-based NIDS). Unlike signature- based NIDS (SNIDS), anomaly-based NIDS (ANIDS) may detect new types of intrusions (unknown) and tackle the so-called zero-day attacks, although there is a diffi- culty in their adaptation due to constantly changing profiles. Due to the advantages of ANIDS over SNIDS paradigms regarding zero-day attacks, most existing research studies have focused on ANIDS.. Much of the research on ANIDS was devoted to build network normality profiles from aggregated network traffic to answer the following question:. Is this traffic sample normal?. In such scenarios the main challenge is that, given the enormous number of different traffic types of interest, building a well-defined profile encompassing all normal traffic is extremely difficult, causing high false alarm rates. The general trend observed in the literature for reducing such rates is that “anomalies must be identified within each separate traffic class of interest” [2,3].. No matter which of the above-mentioned methods is adopted to detect a network anomaly (possibly caused by an intrusion), they fail to detect the source application responsible for the anomaly, since they cannot extract the source application from the inspected traffic.. Moreover, an application under the influence of a possible sophisticated intrusion may generate well-formed (normal) traffic of other applications present in the network. Such abnormality fits into the network normality profiles and cannot be detected, which contributes to an increased rate of false-negatives (absence of alarms in the presence of real intrusions).. 1.2 Objectives 3. To detect this kind of abnormality, as well as to achieve better accuracy, a more fine-grained question must be answered:. Is this traffic sample normal for its source application?. This means that the process of anomaly detection should be separately applied on each application’s traffic. To this end, an ANIDS requires to:. 1. Identify the claimant (source application) for being responsible for the traffic;. 2. Have accurate traffic profiles (or models) for each genuine application present in the network; and. 3. Detect deviations from applications’ traffic profiles given a set of traffic samples.. To meet the first requirement, the architectures presented in [4,5] provide a reliable binding between network traffic and source application. Such binding is marked on each outgoing packet of a surveyed host by extending its IP Option, indicating the “identity” of a claimed application responsible for the packet. By leveraging such a binding, or similar ones, it is possible to:. 1. Simply check whether the claimed application exists in the list of expected applications running in the surveyed host, which results in the identification of unwanted applications in the host once they generate network traffic.. 2. Determine whether a traffic sample claimed by an application conforms to its expected traffic model. It should be noted that such a goal resembles a typical authentication or verification system in the field of biometry, where a claimed identity of a person is confirmed or rejected by the use of his/her biometric characteristics.. Therefore, assuming that the claimant (application) is known to detection systems (the first requirement), the fulfilment of the two latter requirements, namely building accurate application-aware traffic models and forming a detection system, is one of the main objectives of this thesis’s work.. 1.2 Objectives. The main goal of this thesis was to develop a set of effective application-aware traffic anomaly detection methodologies, which enable a network auditor to monitor network traffic generated by each application in order to detect traffic anomalies (possibly caused by intrusions) in that application. The network auditor is assumed to. 4 Introduction. have access to a reliable binding between network traffic and the source application responsible for it.. In the development of detection methodologies, we propose multiple learning algorithms for building application-aware traffic profiles.. The application-aware traffic profiles can also be used in the design of methodologies for traffic classification, which is the second main goal of this thesis. The aim of traffic classification is to assign each unseen traffic sample to an application (type of interest).. 1.3 Main Contributions. In this thesis, we address two main issues in the domain of network monitoring and security. We put forward a framework to effectively learn a network model for individual applications in order to. 1. Detect abnormal traffic generated by malware-infected applications.. 2. Classify traffic to applications.. Initially, a survey on Traffic Identification and Classification (TIC) was conducted [1], where applications’ traffic profiles have been widely utilized as a baseline to identify and categorize traffic flows by applications (types-of-interest). Several TIC methodologies, within a proposed taxonomy framework, were studied and assessed in terms of their capabilities, limitations and challenges for being used in the definition of applications’ traffic profiles aiming at detecting abnormal traffic (possibly caused by intrusions) in applications.. A framework was proposed for traffic inspection (both for traffic anomaly detection and traffic classification), where an individual model is learnt for each application (type of interest). In the first step, the most popular group of generative models, Gaussian Mixture Model (GMM), is employed to build per-application traffic models.. The framework was altered using an unsupervised learning of GMM (uGMM), where the number of components is unknown beforehand. In fact, the framework does not require pre-knowledge on the distribution of data and attempts to find and fit the most appropriate model.. The framework was further extended using generative models built by a Maxi- mum a Posteriori (MAP) adaptation of the Universal Background Model (UBM). The adapted approaches enabled us to achieve accurate and stable decisions.. 1.4 Document Structure 5. The uGMM framework was extended to provide an efficient and timely decision- making. To this end, the effectiveness of different numbers of first initial packets of traffic flows was explored.. 1.4 Document Structure. Chapter 2 reviews the background and related work relevant to this thesis. Chap- ter 3 introduces tools and datasets and describes the evaluation metrics used in this thesis. Our intrusion detection strategy and traffic classification using GMMs are presented in Chapter 4. We provide a detailed presentation of uGMM algorithm in Chapter 5. Chapter 6 presents our GMM-UBM strategy to address the traffic inspection problems. A timely framework for time-effective decisions is detailed in Chapter 7. Finally, conclusions are drawn in Chapter 8. 1.5 Main Achievements. This section lists the publications , which have been produced during the PhD period and reported in journals and conferences.. Papers in Journals. 1. Hassan Alizadeh, André Zúquete, Ali Miri. Timely Traffic Classification and Verification using an Unsupervised Learning GMM. under review.. 2. Hassan Alizadeh, André Zúquete. Traffic classification for managing applications’ networking profiles. Security and Communication Networks, vol. 9, no. 14, p. 2557-2575, September 2016. Papers in Conferences. 1. Hassan Alizadeh, André Zúquete. Anomaly-based intrusion detection using application-specific traffic profiles. Proceedings of 22nd RecPad, Aveiro, Por- tugal, October, 2016.. 2. Hassan Alizadeh, Abdolrahman Khoshrou, André Zúquete. Traffic Classifica- tion and Verification using Unsupervised Learning of Gaussian Mixture Mod- els. 3rd IEEE International Workshop on Measurement & Networking (M&N 2015), Coimbra, Portugal, October 2015.. 6 REFERENCES. 3. Hassan Alizadeh, Samaneh Khoshrou, André Zúquete. Application-Specific Traffic Anomaly Detection Using Universal Background Model. Proceedings of the 2015 ACM International Workshop on International Workshop on Security and Privacy Analytics (IWSPA ’15), p. 11-17, March 2015.. 4. André Zúquete, Pedro Correia, Hassan Alizadeh. Packet Tagging System for Enhanced Traffic Profiling. 3rd IEEE Workshop on Collaborative Security Tech- nology (CoSec 2011), Bangalore, India, December 2011.. References. [1] Hassan Alizadeh and André Zúquete. Traffic classification for managing applications’ networking profiles. Security and Communication Networks, 9(14):2557– 2575, 2016. SCN-14-0826.R3.. [2] Jack W Stokes, John C Platt, Joseph Kravis, and Michael Shilman. Aladin: Active learning of anomalies to detect intrusions. Technique Report WA 98052, 2008.. [3] A. Valdes. Detecting novel scans through pattern anomaly detection. In Pro- ceedings of the DARPA Information Survivability Conference and Exposition, vol- ume 1, pages 140–151, April 2003.. [4] André Zúquete, Pedro Correia, and Hassan Alizadeh. Packet tagging system for enhanced traffic profiling. In 5th IEEE International Conference on Internet Mul- timedia Systems Architecture and Application, pages 1–6. IEEE, 2011.. [5] André Zúquete and Miguel Rocha. Identification of source applications for enhanced traffic analysis and anomaly detection. In First IEEE International Work- shop on Security and Forensics in Communication Systems, pages 6694–6698. IEEE, 2012.. CHAPTER 2. Traffic Classification for Managing Applications’ Network- ing Profiles. Chapter Overview. The previous chapter has raised the problem of detecting intrusions in applications when their traffic exhibit anomalies. It has been also emphasised that building traffic profiles for each application is the main challenge of this problem that needs to be addressed. The concept of applications’ traffic profiles is not new and has been re- searched in the field of Traffic Identification and Classification (TIC) to identify and categorize traffic flows into application (types-of-interest). This chapter provides a structured overview of the research on TIC methodologies, within a taxonomy framework, to find out how they could help us answer the following question: given a traffic sample, generated by a particular application, does it conforms to the expected application’s traffic? Considering this question in mind, after introducing fundamental concepts of TIC and describing some of the key properties, the taxonomy framework categorizes existing methodologies into multiple groups and discusses techniques under each group. Then, each group is assessed in terms of its capabilities, limitations and challenges to be used in managing applications’ profiles.. 2.1 Introduction. Due to the constantly growing number of applications and a variety of end-users, In- ternet traffic is increasingly getting more complex and voluminous. Consequently,. 7. 8 Traffic Classification for Managing Applications’ Networking Profiles. network operators continuously face network management issues; and understanding what type of traffic is flowing over a network is an important task of such network management.. Over the last decade, the area of TIC has a long record of work, in both research and industry, to identify and categorize traffic information with/into application types. The various methodologies developed may be used as a core part in several areas of network management, such as resource management (e.g. resource allocation, control, and accounting), IDS, traffic scheduling, quality of service, and lawful interception (LI) of IP traffic [13].. Traffic classification is the process of identifying or classifying network traffic as part of a specific application or a group of applications of interest. According to the TCP/IP model, network traffic can be grouped into one or more data flows. Each flow can be defined by a (unidirectional or bidirectional) sequence of IP packets sharing typically a 5-tuple identifier (source IP, destination IP, source port, destination port, IP protocol type) within a certain period of time. Thus, network traffic generated by different individual applications running on one or different hosts can be separated by traffic flows. The aim of TIC is to assign each specific flow to the individual application (fine-grained) or a group of applications of interest (coarse-grained). Hence, in TIC methodologies, a traffic flow is the fundamental object.. TIC methods make use of prior knowledge (traffic protocols) and classification algorithms to reach a proper decision on the classification of data flows. The knowledge comes from one or more elements of the flow object: transport-layer ports in a single packet, payload patterns, the whole or some parts of a single flow, connection patterns between hosts from a set of flows.. Chapter Outline. The rest of this chapter is organized as follows. Section 2.2 introduces a taxonomy of TIC techniques with the attributes that are relevant for building applications’ profiles. Section 2.3 presents the state of the art in the TIC area. Section 2.4 presents related reviews. In section 2.5, we discuss how the referred TIC methods can be used to manage applications’ traffic profiles aiming at identifying those that have the potential to provide a real-time detection of intrusions in applications. Section 2.6 suggests a combination of methods and a possible detection system. Section 2.7 con- cludes the chapter.. 2.2 A Taxonomy of TIC Methodologies 9. 2.2 A Taxonomy of TIC Methodologies. Based on the properties of source objects, various methods have been proposed in the literature for traffic classification. The existing works can be divided into 3 main groups: Packet-matching-based, Statistical-based and Connection-patterns-based.. Table 2.1 presents a taxonomy of existing methodologies which categorizes them into different groups based on the underlying technique adopted by each methodology along with the properties of source objects. Table 2.1 also asses each group with regards to the most important properties which are relevant in the definition of desirable traffic profiles for applications. According to [55,64], these properties are the following: classification types, level of traffic classification, performance evaluation, real-time, robustness against evasion, robustness against encryption and portability. The rest of this section describes these properties.. 2.2.1 Classification types. Based on the classification type, we grouped TIC methods into recognition and identification.. Recognition. TIC methods based on recognition simply try to assign a flow to the most probable class from a set of predefined classes. They mostly employ multi-class classifiers, where flows are compared against templates of all available classes. Taking a decision consists in assigning the flow to the class with the highest confidence level.. Identification. TIC methods based on identification give which single-application (or individual group of applications) is the application of interest of a given flow. They only have a model for a target application of interest (type) and hunt for the flows more similar to the model of the target (e.g. Skype vs. others, P2P vs. others).. 2.2.2 Level of traffic classification. The results of traffic classification techniques can be categorized into coarse-grained and fine-grained according to the granularity. Coarse-grained algorithms can recognize groups of applications sharing common characteristics (such as using a special. 10 Traffic Classification for Managing Applications’ Networking Profiles. Table 2.1:A. ssessm ent. ofT IC. m ethods. M ethod. Source O. bject C. lassification type. L eveloftraffic. classification P. erform ance. E valuation. Real-time. Robust to Evasion. Robust to Encrypted traffic. Portability. C oarse. grained F. ine grained. Overall Accuracy. C lassification. T im. eline. Computational cost. Recognition. Identification. Protocol type. App type. Specific App. 1st packet. 1st payload. After a few packets. End of the flow. After a time period. P acket. m atching. T ransport. L ayer. P ort. [51,53] [51,53]. [51,53] -. - L. ow [51,53]. - -. - -. L ight. Yes N. ot N. ot Yes. P ayload. Signature [51,62]. [51,62] [51]. - [48,62]. V ery. H igh. - [51,62]. - -. - M. oderate Yes. N ot. N ot. Yes. Statistical. P ayload. characteristics. [28,39, 40,59,. 66]. [29,59, 66]. [28,29, 40,66]. [39,40] [28,59]. [40,66]: H. igh O. thers: V. ery H. igh. - -. [28,29, 39,40, 59,66]. [28] -. H igh. [29,39, 40,59,. 66] [28]. [59, 66]:N. ot O. thers: M. oderate. [28,59, 66]. F ull-F. low characteris-. tics. [12,25, 26,49, 52,68]. [19] [25]. [12,26, 49,52,. 68] [19]. H igh. - -. - √. - L. ight N. o M. oderate M. oderate N. o. T runcated. F low. characteristics. [16,17, 22,32,. 46] -. [16,17, 22,32]. [46] -. H igh. - -. √ -. - L. ight N. ear L. ow M. oderate N. o. M ultiple. Sub-F low. s characteris-. tics. [54,56– 58]. [54,56– 58]. - [54,56–. 58] [54,56–. 58] H. igh -. - √. - -. L ight. N ear. M oderate. M oderate. N o. C onnection. P atterns. H ost. [37] [36,60]. - [36,37,. 60] -. H igh. - -. - -. √ L. ight N. o M. oderate M. oderate N. o. E ndpoint. [14,65] -. [14,65] -. - H. igh -. - -. - √. L ight. N o. M oderate. M oderate. N o. H ost. C om. m unity. [10,33– 35]. - -. [10,33– 35]. - H. igh -. - -. - √. L ight. N o. H igh. M oderate. N o. 2.2 A Taxonomy of TIC Methodologies 11. application-layer protocol), whereas fine-grained algorithms can identify individual applications.. Coarse-grained traffic classifications. Coarse-grained approaches can fall into two categories: those that can categorize network traffic according to application-layer protocols (such as HTTP, SMTP, FTP, DNS) and those that can classify aggregated traffic according to high-level, applica- tional functionalities (such as email, Web, P2P, chat, game).. Fine-grained traffic classifications. Classifiers using the fine-grained approach have the ability to identify specific applications generating related traffic (such as Skype, specific P2P-TV application [14]).. 2.2.3 Performance Evaluation. Classification accuracy is the most important criteria for evaluating the performance of any classification techniques. However, classification timeline (latency) and computational cost are also additional important properties in traffic classification [55].. Classification accuracy. One of the most obvious criteria for ranking various classification approaches is how accurately they are able to classify a flow in a given network traffic dataset. The most commonly used accuracy measures in the literature are the following: false and true negatives/positives, precision, recall, F-measure and overall accuracy [23, 63]. Chapter 3 presents a definition of these measures.. This chapter reports the accuracy results as presented in the original papers during their analysis. However, in Table 2.1 the results of overall accuracy, as the most common and “generally accepted evaluation metric” [23], are used to compare various methods. The terms “Low”, “High” and “Very High” indicate the overall accuracy results if they are less than 85%, between 85% and 95%, and more than 95%, respectively.. Classification timeline. Early detection of intrusions (in known applications) before the propagation of infected payloads is essential, in order to trigger efficient and timely countermeasures.. 12 Traffic Classification for Managing Applications’ Networking Profiles. Therefore, taking into consideration the classification timeline is important. Differ- ent classification techniques complete their decision-making process in different time positions (first packet of a flow, first payload of a flow, after a few flow packets, after observing the complete flow or after a certain period of time upon the occurrence of traffic).. Computational cost. The efficiency of every method is stated as a function of the algorithm applied in the framework, as well as the nature and the number of features fed to the system. Each of these factors imposes a burden to the system and influences its overall performance. Computational cost highlight the amount of resources (either RAM or CPU) needed in a real (or experimental) scenario.. 2.2.4 Real–time. A TIC approach is considered real-time when it can classify or identify a flow as early as possible. This indicates that its decision-making process should be performed within a flow’s lifetime using information from as few packets as possible with reasonable computational cost. Therefore, the real-time capability is related with classification timeline (see 2.2.3) as well as with computational costs (see 2.2.3).. 2.2.5 Robustness against evasion. This capability of classifiers is of utmost importance for choosing the desirable techniques for managing applications’ network profiles in order to detect intrusions in those applications. In fact, sophisticated intrusions can capitalize the know-how about the common methods used in TIC to evade detection. In other words, such intrusions may attempt to produce well-formed traffic, similar to the one generated by another usual application, in order to mask abnormal traffic flows as normal flows. As a consequence, they are able to hide the abnormality of their traffic to avoid detection by IDS’s using TIC methods. This has also a negative impact on the system’s accuracy performance.. 2.2.6 Robustness against encryption. As users become more security-savvy, there is a growing interest in developing and using applications that make use of encryption or secure encapsulation methods to obfuscate traffic. The secure communication such as SSL/TLS, SSH are available in. 2.3 State of the Art 13. application-layer protocols, which give application developers the ability to exchange encrypted payload which make them unusable for payload inspection strategies (for instance HTTP over SSL).. 2.2.7 Portability. A model produced by a classifier in a network is portable if it has the ability to be used in any other networks. Although this ability eliminates the need of re-learning in a variety of network locations, evasion is the main concern of using such models.. 2.3 State of the Art. This section reviews the State of the Art in TIC methodologies according to the taxonomy presented in Table 2.1, and highlights their limitations and advantages.. 2.3.1 Packet matching Methods. Port-based. As a conventional approach, a port-based method provides a classification mechanism for “standard” applications, where the involved service port numbers are well-known, fixed port numbers preassigned by IANA (Internet Assigned Numbers Authority) (e.g. 20/21 for FTP data/commands, 22 for SSH, 23 for Telnet, etc.).. Direct inspection of each flow packet, as well as a small memory size needed for keeping the bindings between standard application names and relevant port numbers, provides a quick and real-time classification. However, the free access to arbi- trary port numbers, provided by operating systems, allows applications, notably P2P applications and intruders, to evade detection by overloading well-known, common ports. This undermines the port-to-application assignment and renders use of the port identification method impractical [31]. In [51], the authors have shown that this approach cannot often classify flows more accurately than 70%.. The portability and the real-time classification with low computational costs are the main advantages of this approach. However, it suffers from evasion attempts and low accuracy. To transcend the limitation of low accuracy, the payload-signature- based method has been proposed.. 14 Traffic Classification for Managing Applications’ Networking Profiles. Payload-Signature-based. Also known as Deep Packet Inspection (DPI), it makes use of a meticulous inspection of the contents of TCP or UDP payloads, looking for manually predefined, unique signatures of specific applications or protocols [1]. For example, DPI may use “GET.*HTTP” as a signature for HTTP. As DPI has presented highly accurate and fine-grained identification (low percentages of false positives and negatives), it has been widely employed in multiple commercial devices such as UTM (Unified Threat Management) [3, 4] and NGFW (Next Generation FireWall) [2], and in open source projects [6], including the Linux kernel firewall [1]. The regular-expression and string-based matching techniques, used in L7-filter [1] and OpenDPI [6], respectively, are two techniques generally employed in this approach. In [21], the authors evaluated and compared several well-known DPI tools (PACE, OpenDPI, L7-filter, NDPI, Libprotoident, and NBAR) in terms of accuracy performance by applying them on an available and reliable labeled dataset of applications’ traffic with full packet payloads [7]. PACE, as a commercial tool, and Libprotoident, as an open source tool, achieved the highest precision, 94.22% and 93.86%, respectively.. The extremely reliable results of DPI make this approach a ground truth for ver- ifying other classification approaches. However, it has some problems:. • A huge burden (complexity, processing load) is placed on the classification system.. • As extracting signatures from encrypted flows is very complex (almost impractical), this method cannot properly tackle encrypted traffic.. • Intrusions can easily clone signatures of well-known applications/protocols in order to evade and confuse classifiers.. • Privacy concerns are raised by the examination of users’ payloads.. • Keeping signatures up-to-date is necessary to maintain the system’s accuracy performance.. To overcome the problems raised by using this method, different techniques based on statistical data have been proposed.. 2.3.2 Statistical-Based. Due to the increasing proliferation of applications that do not comply with the IANA standard anymore, as well as the growing desire of some application developers to. 2.3 State of the Art 15. use encryption methods, the packet-based methods no longer can be relied upon. Hence, new methods relying on higher-level information have been developed to classify traffic. This information can be extracted from observable statistical attributes of different sources, either packet-level properties (packet header or packet payload), or flow-level characteristics. The main assumption behind such methods is that the statistical characteristics, aka features or discriminators, of network traffic are distinct for different applications and can be used to distinguish applications from each other.. Given the off-the-shelf availability of many Machine Learning (ML) techniques, as well as their high efficiency in dealing with statistical data, they have been ex- tensively used in the traffic classification literature [55]. They generally fall into two categories: supervised learning and unsupervised learning (or clustering). Their ul- timate goal is to either cluster together network flows with similar characteristics (cluster), or classify one or more applications of interest.. An unsupervised learning algorithm is applied to break down network traffic flows into groups (showing similar statistical characteristics) without requiring a la- belled training dataset. This means that an unsupervised learning process is able to infer correlations among flows without knowing the actual class of application of interest. This capability allows them to be applied to samples of completely unknown traffic for the first step of classification [16,26,49,69].. In contrast, a supervised learning algorithm needs a training dataset created in advance with traffic from the classes of interest. With such dataset, the algorithm is trained to develop a model of associations between sets of features and traffic from the classes of interest. Once using trained models, an algorithm can then be used to classify unseen traffic under examination [22,52,54,61].. Herein we categorize statistical approaches into four groups according to the data sources they employ: Payload characteristics, Full-Flow characteristics, Truncated Flow characteristics and Multiple Sub-flows characteristics. Payload characteristics. The statistical properties of the whole (or part) of per-flow payloads can be used for an automatic signature generation. Such automation avoids resorting to tedious, time-consuming and highly-complex manual approaches for extracting accurate signatures for payload-signature-based methods.. In [29], the authors proposed to use a few of the first payload bytes of a reassembled, unidirectional TCP flow to build a binary vector (ordered by the position of the. 16 Traffic Classification for Managing Applications’ Networking Profiles. byte in the flow payload) as a feature of 3 ML algorithms (Naive Bayes, AdaBoost and Maximum Entropy) in order to develop signatures of application-layer protocols (FTP control, SMTP, POP3, IMAP, HTTPS, HTTP and SSH). Using the first 64 bytes of flows in each direction, results showed that both Naive Bayes and Maximum En- tropy are effective by achieving a precision up to 99% for all protocols and a recall up to 99% for all protocols except for SSH which showed a lower recall (86.6%). How- ever, one specific classifier had to be trained for representing the signature of each particular protocol. The authors also argued that their method has a reasonable computational cost. Besides, the method reaches decisions in the first 64 bytes of flows’ payloads. Thus, such method can provide a real-time identification of per-application profiles. However, if the intrusion evidence only happens after those 64 bytes, the system would fail to detect it.. In [59], the authors used the LASER algorithm to automatically generate payload signatures of one particular flow from a few of its first packet payloads. The LASER algorithm is capable of providing the longest sequence of common substrings between two samples. The algorithm is initialized by two distinct flows generated by one particular application. Using LASER, the first few packets in each flow that have similar range of size are compared to each other to collect the longest sequence of common substrings between them. From the collection, the longest one is chosen to compare with other flows iteratively, in order to refine it. The final refinement is considered the signature. For evaluating the reliability of this approach, signatures generated from 3 P2P applications were applied in a backbone network, reaching an overall accuracy of 97.39% with no FP and 10% of FN.. Classifying flows from their first few packets’ payload enables this method to provide near real-time classification. Moreover, updatability and portability are the other ad-vantages of LASER signatures. However, the complexity of a particular signature (e.g. sequence of several substrings) may be higher than others (e.g. one substring). This may lead the classifier to suffer from computational cost. The authors also have shown that LASER may not be able to find signatures for applications with encrypted payloads (e.g. Skype). Furthermore, intrusions’ evasions can be simply performed by cloning applications’ signatures.. In [28,40], the authors proposed the use of entropy-based measures in the analysis of per-flow payloads.. In [28], KISS was introduced to automatically discover signatures from the headers’ format of UDP-based application-layer protocols, without considering their se- mantic values as well as their communication rules. The key idea behind KISS is that. 2.3 State of the Art 17. different protocols produce different randomness on their first N-bytes of each UDP packet, since their UDP header information is typically included in those N-bytes: “constant identifiers, counters, words from a small dictionary (message/protocol type, flags, etc.), or truly random values (coming from encryption or compression algorithms)”. Such randomness can be simply distinguished by measuring entropy of values observed in a sequence of packets. The authors performed a Chi-square-like test to extract the statistical signature for each particular protocol. Two different supervised classifiers (Euclidean and Support Vector Machine (SVM)) were trained with signatures of 3 protocols (RTP, eMule and DNS).. Results demonstrated that SVM had better performance than Euclidean, in which TP varies between 99.3% and 99.9% with a FP of 0.15% at worst. The flexibility of KISS signatures were proved by achieving an overall accuracy up to 99.15% when SVM was adopted for different kinds of protocols (such as Sopcat, TVAnt, PPLive, Skype, DNS, etc.). The authors state that KISS signatures are quickly updatable, easily portable, robust to problems caused by packet arrival (e.g. congestion, path changes, etc.), partially robust to encrypted payloads (e.g. Skype), and suitable to both per-flow and per-endpoint classification.. Intrusions may face a complex task to emulate KISS signatures in order to evade detection systems using them. However, as the authors recognize, KISS fails to classify short-lived UDP flows/endpoints. Moreover, in spite of his reasonable computational cost (under certain conditions), it may not be capable of providing real-time classification due to the need of numerous packets’ payload of a flow for some signatures. Since KISS can automatically extract signatures from both specific applications (e.g. Skype) and application-layer protocols (e.g. DNS), it may have both fine-grained and coarse-grained results in terms of level of classification. But since one single SVM should be trained for all of them, we have placed it on the list of recognition in terms of classification type.. In [40], the authors introduced Iustitia, in which the entropy of the N first payload bytes of a reassembled flow was considered as feature to train two ML algorithms (SVM and CART Decision Tree) in order to infer the nature of flows (either text, binary or encrypted). The idea behind this approach is twofold: 1) text flows have the lowest entropy and encrypted flows have the highest entropy, while the entropy of binary flows stands in between; 2) the randomness of the first bytes represents the randomness of whole payload. Moreover, they extended Iustitia to classify types of binary contents in flows (such as image, video, and executables) and even file formats (such as JPEG and GIF for images, MPEG and AVI for videos) carried by binary. 18 Traffic Classification for Managing Applications’ Networking Profiles. flows. Using a buffer size of 1KB, 91.2% of flows were correctly classified with overall accuracy of 88.27%. Such buffer size is a tradeoff between computational cost and accuracy in order to provide real-time classification using N first payload bytes of each flow.. Since ML algorithms should be learned for multiple classes of interest, Iustitia was placed in the recognition list. In some way, the nature of flows mentioned above refers the type of applications responsible for them. For this reason, Iustitia gives coarse-grained results in terms of classification level. Moreover, due to accuracy results as well as to the fact that detecting encrypted traffic is among its goals, Iusti- tia is partially robust to encrypted traffic. However, it is vulnerable to evasions; a malware-infected application is able to stay undetected if engaging into abnormal traffic patterns during the on-going flow after the first N payload bytes. Further- more, the property of portability was not validated in the work.. KISS [28] uses the entropy of a piece of each payload across a stream of packets, whereas Iustitia [40] calculates the entropy over the piece of payload within the reassembled flow. Hence, unlike KISS, Iustitia is capable of classifying flows before their termination.. In [66], the authors demonstrated, both theoretically and empirically, that given a set of training flows, the occurrence of special terms and their appropriate weights (instead of sequences of terms in [59]) in the flows’ payloads can be a distinctive hint to identify a traffic flow by the application-layer protocol. This is the main idea behind CUTE, a method that automatically extracts the weighted terms for various protocols from the payloads of a set of training flows. Based on these weighted terms (say signatures), it estimates the similarity of a flow to each protocol, and classifies the flow accordingly. The experimental results showed that CUTE signatures are portable, efficient (achieving up to 90% for both precision and recall rates) and capable of classifying flows using their first few bytes (50-100 bytes) with more than 80% of precision and recall in most cases and refine it as new packets arrive. The authors argue that CUTE signatures are much more efficient than the LASER ones in terms of computational cost. However, CUTE suffers from high FP rates, leading to low precision, for those protocols containing terms shared with other protocols (e.g. IRC).. Similarly to LASER, CUTE can be employed in both recognition and identification activities targeting mainly coarse-grained results, but cannot handle encrypted traffic and are vulnerable against evasion strategies. To perform evasions, intrusions should disguise themselves with the statistical characteristics of the flows’ payload; this is a much harder task than for the two previous approaches (i.e., port-based and. 2.3 State of the Art 19. payload-signature-based).. These methods have the ability to identify and classify traffic flows with high accuracy levels. However, besides their high computational cost, privacy assurance is one of the main challenges when using them. For overcoming this challenge, alterna- tive approaches relying on statistical characteristics of either full-flow or truncated flow or multiple sub-flows have been proposed.. Full-Flow characteristics. The fundamental object of this method is a flow associated with a specific application and represented by one or a set of consecutive packets travelling between two endpoints (defined by srcIP:srcPort and dstIP:dstPort) using a particular protocol (TCP, UDP) within a finite period of time. According to TCP/IP model, a TCP flow is started by a 3-way handshake (SYN, SYN-ACK and ACK) and terminated by either FIN/RST packets or by a 4-way handshake (ACK-FIN, ACK, ACK-FIN and ACK). However, in practice it may be created upon the observation of a packet with a previously unseen 5-tuple, and the flow is reconstructed by packets with a previously seen 5-tuple. A timeout mechanism may be used to determine the end of a flow when a termination is not observed. Unlike UDP, the observation of a SYN packet of a TCP flow is ad- equate, and may be used, to determine which network endpoint is client (TCP flow initiator) or server.. A flow associated with a particular application can be described by a number of statistical properties parametrizing its behavior. In this approach, statistical characteristics of a flow are extracted from the packet information regardless of their payloads and computed after the end of the flow (full-flow statistical characteristics). These features can be extracted directly from some parts of packet headers (such as average packet size, total transferred byte figures) as well as indirectly from the time when a packet arrives (i.e., inter-packet timings such as min/max/average/variance packet inter-arrival time, flow duration). The technology for producing such features is supported by most network equipment providers, though under a different name (NetFlow for Cisco, Jflow/Cflowd for Juniper Networks or NetStream for 3Com/HP), while Sflow is an industry standard technology. Flow information has been exten- sively explored in the literature of TIC [12, 50, 52, 68] as well as in anomaly-based IDS’s [11,30].. In [50], the authors provided a manually classified dataset and comprehensively described 248 per-flow features. Table 2.2 describes some examples of those features.. 20 Traffic Classification for Managing Applications’ Networking Profiles. Table 2.2: Examples of features describing source in a full-flow [12]. Sources Features Flow Packet count, total Bytes,. duration, total PUSHED packets . . . Packet inter-arrival time. Mean, variance, minimum, maximum, . . .Size of TCP/IP control fields. Payload size. In [52], the authors used the dataset described in [50] and its whole variety of flow features to train a basic form of a Naive Bayes (NB) algorithm in order to classify traffic by application-type of interest (WWW, bulk, database, interactive, email, services, P2P, attack, games and multimedia). Results showed that 65% of the flows were correctly classified. With the combination of Fast Correlation-Based Filter (FCBF) as a feature selection method and Naive Bayes Kernel Estimation (NBKE) method, they improved the classifier to achieve up to 95% overall accuracy and acknowledged that this algorithm is sensitive to its initial assumptions. Using Bayesian Neural Networks (BNN) instead of NB, they further extended their work [12] and improved the overall accuracy up to 99% (for data trained and tested on the same day) and up to 95% (for data trained on one day and tested eight months later). They also provided a ranking list of the top 20 and top 10 features for both BNN and NB methods, respectively.. In [68], the authors used 4 common SVM kernel functions and two feature selection algorithms (for choosing the optimal set of 19 different per-flow features) in order to propose an accurate classifier to classify traffic flows into broad application categories (e.g. bulk traffic, interactive, WWW, service, P2P, email and other). Using an RBF kernel function with SVM, an accuracy of 97.17% was achieved with a set of 9 features.. Most approaches based on full-flow statistical characteristics have shown that they are able to provide high accuracy with a moderate level of robustness to evasions. Moreover, having no access to payload contents makes these approaches very lightweight and robust to deal with encrypted payloads. However, the making- decision process cannot reach a verdict on a given flow before the end of it. This requirement prevents the use of this approach for real-time or continuous operation when facing long-lived flows, containing thousands of packets.. In addition, it may be hard to manipulate statistical parameters on the flow level by an intrusion to make them similar to the features of flows extracted from a trusted. 2.3 State of the Art 21. application. Therefore, intrusions may be forced to adopt more rigorous procedures to evade this detection method, especially if more features with high complexity are exploited for classification. However, packets due to intrusions cannot be detected before the end of the flow.. Although exploiting more features with high complexity may protect classifiers by evasions, this causes the classifiers to suffer from computational cost. Hence, they may use Feature Selection (FS) methods as a preprocessing step of ML tasks for choosing the best and smallest (optimized) subsets of features that can efficiently and effectively describe a class. By using FS one is able to discard redundant and ir- relevant features from network traffic data. Consequently, classification systems can explore such methods in order to facilitate their task in terms of improving classification accuracy and computational performance, as well as reducing storage requirements [18,43]. Some feature selection methods were introduced in [24,44,47,67,71]. The authors of [27] proposed three new performance metrics (goodness, stability and similarity) in order to evaluate FS methods. Since no existing individual method per- forms well with all three metrics, they developed a hybrid approach to obtain a better and smallest set of features.. Some other approaches relying on statistical characteristics of either truncated flows or multiple sub-flows were proposed for providing real-time classifiers.. Truncated Flow Characteristics. Statistical knowledge of the first few packets of traffic flows (Truncated Flow) is used as a distinctive property to separate applications from each other, since it captures the applica