Combining MLP and Using Decision Tree in Order to Detect the Intrusion into Computer Networks

(1)

68

Combining MLP and Using Decision Tree in

Order to Detect the Intrusion into Computer

Networks

A.Saba Sedigh Rad , B.Alireza Zebarjad

Young Researchers and Elite Club, Baft Branch, Islamic Azad University, Baft, Iran sabasedighrad@yahoo.com

Department of Computer, Islamic Azad University, Andimeshk branch, Andimeshk, Iran

Ali_reza_zebarjad@yahoo.com

Abstract

The security of computer networks has an important role in computer systems. The increasing use of computer networks results in penetration and destruction of systems by system operations. So, in order to keep the systems away from these hazards, it is essential to use the intrusion detection system (IDS). This intrusion detection is done in order to detect the illicit use and misuse and to avoid damages to the systems and computer networks by both the external and internal intruders. Intrusion detection system based on the combination of experts has been offered using MLP neural networks with the aid of a mediator expert which is applied by Decision Tree. The offered method has been tested using KDDCup99 dataset. The results show the increase in attack detection precision. Also, using this method has a high output considering the previous methods in detecting R2L attacks which have few pedagogical samples.

Keywords: Intrusion detection system, combining experts, MLP neural networks, Decision Tree, KDDCup99.

I. Introduction

Having access to information and fast processing of this information is an essential issue.

This has led to data processing in computer systems and saving and retrieving these data and

we need to share them in order to be able to use them. The main challenge in using these

systems is to keep these data safe and protecting them, because alongside the fast

development of computer network attacks, the amounts of the attacks and intrusions have

been increased significantly. For this reason, we need a security system which analyzes all

the internal and external events and activities related to the system. This need resulted in the

creation of an Intrusion Detection (IDS). The intrusion detection system is one that observes

(2)

69

the signs of their intrusion, and it alarms the system manager in the case of intrusion

detection and misuse (Srinoy et al, 2005). So, the aim of using an intrusion detection system is

not to prevent attacks, but it alarms the managers about the intrusions into computer systems

and networks in order to make a good decision about it. So, the aim of preparing this paper is

to design an intrusion detection system with a good efficiency and rate according to the

learning algorithm of machine.

II. Introducing KDDCup99 dataset

In 1998, a program was offered by MIT university lab for intrusion detection and this

program was called DARPA 1998, and this program was assigned to intrusion detection

(Sheikhan et al, 2012). Dataset has been provided in a simulated environment in a local

network of USA air base during 9 weeks (Sima, 1998). Dataset was used in KDDCup99

intrusion detection competition (Zeng et al,2009). This standard dataset includes something

around 5 million records, and each record includes 41 features (Peddabachigaria et al, 2007).

Because there is a high number of a dataset record, usually 10% of these records are used for

testing and pedagogical reasons (Chennoufi et al, 2012). The number of each class in datasets

has shown in table (1) which includes a normal class which shows the normal activity of

users and the four classes are as follow:

•Probing: In fact this attack is a background for the next intrusions and searches computer to find out the vulnerabilities.

Dos: In this kind of attack, the aim is to use the host resources. The intruder tries to disconnect the legal access of the users to the server (Sadiq et al,2010).

•U2R: In this attack the intruder tries to take control of the system root using a real external machine.

R2L: The intruder tries to take control of an external machine using the system vulnerabilities and to take this control as a local user. (Arumugam et al, 2010)

TABLE I

The number of attack and normal groups in learning and testing datasets.

Dataset test

Dataset train Class name

Class

No. of instances No. of instances

60593 97271

Normal 1

4166 4107

Probe 2

231455 391458

DOS 3

88 59

(3)

70 III. Suggested method

The diagram of the suggested method has been shown in figure (1) and includes several steps

which we will elucidate them.

Figure .1. The suggested method

A. Normalization

14727 1119

R2L

(4)

71

Because of the difference in intervals of feature differences and the amounts of some features

which are greater effect on cost function and this does not indicate their importance in

detecting the attack class, therefore, it is not considered as bad result for avoidance to use of

normalization. It assumes that the minimum and maximum amounts of X, are alternatively

Xi,max and Xi,min in the i th feature, equation number (1) is used for normalization. So, the liner

conversion of data will be used in [0, 1] interval. (Sadiq et al,2010)

(1)

B. Reducing dimensions

By reducing dimensions, the features that have less effect in diagnosing attacks will be

eliminated in the overhead data (Sadiq et al, 2013). Therefore, the algorithm of relief feature

selection was applied on the normalized data of the previous stage using Weka software and

features number (10, 29, 39,18,13,8,20,21,7,15 and9) which less than 10 percent effective in

attack detection were eliminated from the dataset.

C. Detecting normal or attack groups

After reducing the dimensions of dataset sample, these samples will be divided in to two sets:

attack sets or normal sets. In fact, attacks samples one grouped in a group named zero class

and class number 1 is considered for normal samples. Then, an MLP neural network IS

trained in order to detect the attack types or normal by two abovementioned classes. This

network includes 30 input neurons and two hidden layers each contain 50 neurons and 2

output neurons. After training, this network will be able to detect the normal groups of the

attack but is not able to detect the type of attack. In fact, this expert network is used to detect

normal groups. The next step includes the suggested working method for detecting the type

of attacks.

D. Adding rare data and creating a suitable dataset

Datasets will be divided into 4 classes according to 4 types of attack (DOS, PROBE, R2L,

U2R ). The number of data of two R2L and U2R classes is little for training, so the training

will not be done in a suitable way. According to the reference, number (Graupe ,2007) those

classes which include less data will be repeated to receive good training. For example, if the )

(

) (

min , max

,

min ,

i i

i i i

X X

X X X

(5)

72

numbers of class1 data are 3 and class zero data are 100, each data of the first class will be

repeated 10 times. In this research, we do this for two R2L and U2R classes. In order to train

the experts to detect a specific attack. We consider the samples of this attack as class 1 and

the samples of three other attacks as zero class. This way, 4 datasets were created as follow: • Dataset to train the expert which detects the DOS attack: Data of DOS attack are

considered as class 1 and the sum of the data of three other classes (PROBE, R2L,

U2R) are considered as zero class.

• Dataset to train the expert which detects the PROBE attack: Data of PROBE attack are considered as the 1 class and the sum of data of 3 other classes (DOS, R2L,

• Dataset to train the expert which detects the R2L attack: Data of R2L attack are considered as the 1 class and the sum of data of 3 other classes (DOS, PROBE,

• Dataset to train the expert which detects the U2R attack: Data of U2R attack are considered as the 1 class and the sum of data of 3 other classes (DOS, PROBE, R2L)

are considered as zero class.

E. Creating certified specialists

Using 4 abovementioned datasets, 4 MLP neural networks are trained as separated experts,

each of them is able to detect a special type of attack according to its training dataset.

Applying expert networks has been done by MATLAB software. The weight of neural

networks is adjusted according to the algorithm of gradient descent that tends to minimize the

amount of error. The training set deliverer to the network of data in a successive manner and

the errors are calculated and the weight of each vector is adjusted so that this error reduces to

an acceptable minimal point for all training sample. After training step, four experts (DOS,

PROBE, R2L, U2R) are able to detect the attacks alternatively. If an expert will be shown in

the output indicating the inability to detect the attack.

F. Decision Tree

If the four experts from the previous step do not have the ability to detect the attack, we make

use of the expert of decision tree. We use this expert when it is not possible to detect the

attacks by certified specialists, and this way the decision of decision tree is considered as the

(6)

73

detect four classes of attacks. The decision tree is consisted of the nodes, leaves and edges.

Each feature by which data are separated is considered as a node in decision tree. Each node

joins to the next node or leaf by means of edges. These edges are labeled by different

amounts. The labels guide the data to the class according to the feature amount. For the

discrete features a branch is created for each amount, and for the continuous features an

interval is determined and is divided into two branches.

G. Combining the results

In this step, the testing dataset is applied to the suggested algorithm. If the new data is

realized as normal, the algorithm will reach to the end, algorithm, if the input data of the

attack is realized, the test sample will be applied in a synchronous and Parallel manner to 4

certified specialists, and each of these experts will opine. If only one certified specialist

diagnose class 1, its opinion indicates the type of attack and the algorithm reaches to its end.

But if the opinions obtained from the certified specialists all were equal to the zero class, or

more than one expert diagnose class 1, (for example, two U2R and DOS expert show class 1

as the result) this sample is given to the decision tree and the opinion of this expert which is

trained to recognize all the attacks will be considered as the end result.

IV. Results

The results obtained after applying the entire dataset test to the suggested algorithm have

been shown in table (2).

Table II

the results obtained from the suggested method.

One of the strengths of this suggested method is that the normal types are diagnosed. So,

(7)

74

that there is not enough training samples in R2L attack, but it has a precision suitable for

detecting These kinds of attacks. The reason is that various experts have different errors.

Because are different errors, the experts could cover the weak points of each other. In table

(3), the results obtained from the comparison of the proposed method are indicated alongside

the other algorithms and the better results of this method could be observed in figure (2).

Table III

The comparison of the proposed method with the other methods.

The diagram of the mean results of the abovementioned methods for attack detection has

been shown in figure (2).

Figure .2. Diagram for comparing the mean results of the proposed method with the other methods. C4.5 MLP SVM Proposed Approach Class 99.69 99.4 99.86 99.22 DOS Pricision% 52.20 64.81 77.72 89.21 PROBE 30.32 90.84 62.39 91.82 R2L 9.35 10.3 53.49 18.18 U2R 96.99 97.10 97.65 99.38 DOS

Recall% 81.88

(8)

75 V. Conclusion and suggestions:

In this paper, we made use of the initiative method for combining MLP and Decision Tree

experts. The research revealed that the classifications could not detect all the attacks by

themselves, but combining them will lead to a better result. Therefore, we make use of the

idea of experts combination so that we could diagnose the attack classes with less error, but it

should be mentioned that in this method, expert training is time consuming, but this difficulty

exists after training and while testing, since both testing samples are applied in a parallel

manner, So the results will be obtained faster.

References

i. D. Graupe, "PRINCIPLES OF ARTIFICIAL NEURAL NETWORKS", Advanced Series on Circuits and Systems Vol. 6,2007.

ii. J. Sima, "Introduction to Neural Networks", Technical Report, 1998.

iii. J. Zeng , X. Liu, T.Li, G. Li, H. Li, J. Zeng , "A novel intrusion detection approach learned from the change of antibody concentration in biological immune response. ApplIntell, 2009.

iv. M. Sheikhan, Z. Jadidi, A. Farrokhi, "Intrusion detection using reduced-size RNN basedon feature grouping”,NeuralComput&Applic,Vol. 21, PP.1185–1190.2012.

v. M. Chennoufi, F. Bendella, "New Approach for Detecting Intrusions", International Journal of Scientific & Engineering Research, Vol. 3, Issue 1, ISSN 2229-5518, January 2012.

vi. M. Sadiq Ali Khan, Dr. S. M. Aqil Burney, S. M.Aqil Burney, "Feature Deduction and Ensemble Design of Parallel Neural Networks for Intrusion Detection System", IJCSNS International Journal of Computer Science and Network Security, Vol. 10, No. 10, 2010.

vii. M. Arumugam , P. Thangaraj , P. Sivakumar, P. Pradeepkumar, "Implementation of Two Class Classifiers for Hybrid Intrusion Detection", Proceedings of the International Conference on Communication and Computational Intelligence, pp.486-490, 2010.

viii. M. Sadiq Ali Khan, Dr. S. M. Aqil Burney, S. M.Aqil Burney, " Feature Deduction and Ensemble Design of Parallel Neural Networks for Intrusion Detection System", International IJCSNS Journal of Computer Science and Network Security, vol. 10, no. 10, 2010.

(9)

76

x. S. Srinoy, W. Kurutach, W. Chimphlee, S. Chimphilee, " World Academy of Science", Engineering and Technology, PP. 140-144,2005.

(10)