68
Combining MLP and Using Decision Tree in
Order to Detect the Intrusion into Computer
Networks
A.Saba Sedigh Rad , B.Alireza Zebarjad
Young Researchers and Elite Club, Baft Branch, Islamic Azad University, Baft, Iran sabasedighrad@yahoo.com
Department of Computer, Islamic Azad University, Andimeshk branch, Andimeshk, Iran
Ali_reza_zebarjad@yahoo.com
Abstract
The security of computer networks has an important role in computer systems. The increasing use of computer networks results in penetration and destruction of systems by system operations. So, in order to keep the systems away from these hazards, it is essential to use the intrusion detection system (IDS). This intrusion detection is done in order to detect the illicit use and misuse and to avoid damages to the systems and computer networks by both the external and internal intruders. Intrusion detection system based on the combination of experts has been offered using MLP neural networks with the aid of a mediator expert which is applied by Decision Tree. The offered method has been tested using KDDCup99 dataset. The results show the increase in attack detection precision. Also, using this method has a high output considering the previous methods in detecting R2L attacks which have few pedagogical samples.
Keywords: Intrusion detection system, combining experts, MLP neural networks, Decision Tree, KDDCup99.
I. Introduction
Having access to information and fast processing of this information is an essential issue.
This has led to data processing in computer systems and saving and retrieving these data and
we need to share them in order to be able to use them. The main challenge in using these
systems is to keep these data safe and protecting them, because alongside the fast
development of computer network attacks, the amounts of the attacks and intrusions have
been increased significantly. For this reason, we need a security system which analyzes all
the internal and external events and activities related to the system. This need resulted in the
creation of an Intrusion Detection (IDS). The intrusion detection system is one that observes
69
the signs of their intrusion, and it alarms the system manager in the case of intrusion
detection and misuse (Srinoy et al, 2005). So, the aim of using an intrusion detection system is
not to prevent attacks, but it alarms the managers about the intrusions into computer systems
and networks in order to make a good decision about it. So, the aim of preparing this paper is
to design an intrusion detection system with a good efficiency and rate according to the
learning algorithm of machine.
II. Introducing KDDCup99 dataset
In 1998, a program was offered by MIT university lab for intrusion detection and this
program was called DARPA 1998, and this program was assigned to intrusion detection
(Sheikhan et al, 2012). Dataset has been provided in a simulated environment in a local
network of USA air base during 9 weeks (Sima, 1998). Dataset was used in KDDCup99
intrusion detection competition (Zeng et al,2009). This standard dataset includes something
around 5 million records, and each record includes 41 features (Peddabachigaria et al, 2007).
Because there is a high number of a dataset record, usually 10% of these records are used for
testing and pedagogical reasons (Chennoufi et al, 2012). The number of each class in datasets
has shown in table (1) which includes a normal class which shows the normal activity of
users and the four classes are as follow:
•Probing: In fact this attack is a background for the next intrusions and searches computer to find out the vulnerabilities.
Dos: In this kind of attack, the aim is to use the host resources. The intruder tries to disconnect the legal access of the users to the server (Sadiq et al,2010).
•U2R: In this attack the intruder tries to take control of the system root using a real external machine.
R2L: The intruder tries to take control of an external machine using the system vulnerabilities and to take this control as a local user. (Arumugam et al, 2010)
TABLE I
The number of attack and normal groups in learning and testing datasets.
Dataset test
Dataset train Class name
Class
No. of instances No. of instances
60593 97271
Normal 1
4166 4107
Probe 2
231455 391458
DOS 3
88 59
70 III. Suggested method
The diagram of the suggested method has been shown in figure (1) and includes several steps
which we will elucidate them.
Figure .1. The suggested method
A. Normalization
14727 1119
R2L
71
Because of the difference in intervals of feature differences and the amounts of some features
which are greater effect on cost function and this does not indicate their importance in
detecting the attack class, therefore, it is not considered as bad result for avoidance to use of
normalization. It assumes that the minimum and maximum amounts of X, are alternatively
Xi,max and Xi,min in the i th feature, equation number (1) is used for normalization. So, the liner
conversion of data will be used in [0, 1] interval. (Sadiq et al,2010)
(1)
B. Reducing dimensions
By reducing dimensions, the features that have less effect in diagnosing attacks will be
eliminated in the overhead data (Sadiq et al, 2013). Therefore, the algorithm of relief feature
selection was applied on the normalized data of the previous stage using Weka software and
features number (10, 29, 39,18,13,8,20,21,7,15 and9) which less than 10 percent effective in
attack detection were eliminated from the dataset.
C. Detecting normal or attack groups
After reducing the dimensions of dataset sample, these samples will be divided in to two sets:
attack sets or normal sets. In fact, attacks samples one grouped in a group named zero class
and class number 1 is considered for normal samples. Then, an MLP neural network IS
trained in order to detect the attack types or normal by two abovementioned classes. This
network includes 30 input neurons and two hidden layers each contain 50 neurons and 2
output neurons. After training, this network will be able to detect the normal groups of the
attack but is not able to detect the type of attack. In fact, this expert network is used to detect
normal groups. The next step includes the suggested working method for detecting the type
of attacks.
D. Adding rare data and creating a suitable dataset
Datasets will be divided into 4 classes according to 4 types of attack (DOS, PROBE, R2L,
U2R ). The number of data of two R2L and U2R classes is little for training, so the training
will not be done in a suitable way. According to the reference, number (Graupe ,2007) those
classes which include less data will be repeated to receive good training. For example, if the )
(
) (
min , max
,
min ,
i i
i i i
X X
X X X
72
numbers of class1 data are 3 and class zero data are 100, each data of the first class will be
repeated 10 times. In this research, we do this for two R2L and U2R classes. In order to train
the experts to detect a specific attack. We consider the samples of this attack as class 1 and
the samples of three other attacks as zero class. This way, 4 datasets were created as follow: • Dataset to train the expert which detects the DOS attack: Data of DOS attack are
considered as class 1 and the sum of the data of three other classes (PROBE, R2L,
U2R) are considered as zero class.
• Dataset to train the expert which detects the PROBE attack: Data of PROBE attack are considered as the 1 class and the sum of data of 3 other classes (DOS, R2L,
U2R) are considered as zero class.
• Dataset to train the expert which detects the R2L attack: Data of R2L attack are considered as the 1 class and the sum of data of 3 other classes (DOS, PROBE,
U2R) are considered as zero class.
• Dataset to train the expert which detects the U2R attack: Data of U2R attack are considered as the 1 class and the sum of data of 3 other classes (DOS, PROBE, R2L)
are considered as zero class.
E. Creating certified specialists
Using 4 abovementioned datasets, 4 MLP neural networks are trained as separated experts,
each of them is able to detect a special type of attack according to its training dataset.
Applying expert networks has been done by MATLAB software. The weight of neural
networks is adjusted according to the algorithm of gradient descent that tends to minimize the
amount of error. The training set deliverer to the network of data in a successive manner and
the errors are calculated and the weight of each vector is adjusted so that this error reduces to
an acceptable minimal point for all training sample. After training step, four experts (DOS,
PROBE, R2L, U2R) are able to detect the attacks alternatively. If an expert will be shown in
the output indicating the inability to detect the attack.
F. Decision Tree
If the four experts from the previous step do not have the ability to detect the attack, we make
use of the expert of decision tree. We use this expert when it is not possible to detect the
attacks by certified specialists, and this way the decision of decision tree is considered as the
73
detect four classes of attacks. The decision tree is consisted of the nodes, leaves and edges.
Each feature by which data are separated is considered as a node in decision tree. Each node
joins to the next node or leaf by means of edges. These edges are labeled by different
amounts. The labels guide the data to the class according to the feature amount. For the
discrete features a branch is created for each amount, and for the continuous features an
interval is determined and is divided into two branches.
G. Combining the results
In this step, the testing dataset is applied to the suggested algorithm. If the new data is
realized as normal, the algorithm will reach to the end, algorithm, if the input data of the
attack is realized, the test sample will be applied in a synchronous and Parallel manner to 4
certified specialists, and each of these experts will opine. If only one certified specialist
diagnose class 1, its opinion indicates the type of attack and the algorithm reaches to its end.
But if the opinions obtained from the certified specialists all were equal to the zero class, or
more than one expert diagnose class 1, (for example, two U2R and DOS expert show class 1
as the result) this sample is given to the decision tree and the opinion of this expert which is
trained to recognize all the attacks will be considered as the end result.
IV. Results
The results obtained after applying the entire dataset test to the suggested algorithm have
been shown in table (2).
Table II
the results obtained from the suggested method.
One of the strengths of this suggested method is that the normal types are diagnosed. So,
74
that there is not enough training samples in R2L attack, but it has a precision suitable for
detecting These kinds of attacks. The reason is that various experts have different errors.
Because are different errors, the experts could cover the weak points of each other. In table
(3), the results obtained from the comparison of the proposed method are indicated alongside
the other algorithms and the better results of this method could be observed in figure (2).
Table III
The comparison of the proposed method with the other methods.
The diagram of the mean results of the abovementioned methods for attack detection has
been shown in figure (2).
Figure .2. Diagram for comparing the mean results of the proposed method with the other methods. C4.5 MLP SVM Proposed Approach Class 99.69 99.4 99.86 99.22 DOS Pricision% 52.20 64.81 77.72 89.21 PROBE 30.32 90.84 62.39 91.82 R2L 9.35 10.3 53.49 18.18 U2R 96.99 97.10 97.65 99.38 DOS
Recall% 81.88
75 V. Conclusion and suggestions:
In this paper, we made use of the initiative method for combining MLP and Decision Tree
experts. The research revealed that the classifications could not detect all the attacks by
themselves, but combining them will lead to a better result. Therefore, we make use of the
idea of experts combination so that we could diagnose the attack classes with less error, but it
should be mentioned that in this method, expert training is time consuming, but this difficulty
exists after training and while testing, since both testing samples are applied in a parallel
manner, So the results will be obtained faster.
References
i. D. Graupe, "PRINCIPLES OF ARTIFICIAL NEURAL NETWORKS", Advanced Series on Circuits and Systems Vol. 6,2007.
ii. J. Sima, "Introduction to Neural Networks", Technical Report, 1998.
iii. J. Zeng , X. Liu, T.Li, G. Li, H. Li, J. Zeng , "A novel intrusion detection approach learned from the change of antibody concentration in biological immune response. ApplIntell, 2009.
iv. M. Sheikhan, Z. Jadidi, A. Farrokhi, "Intrusion detection using reduced-size RNN basedon feature grouping”,NeuralComput&Applic,Vol. 21, PP.1185–1190.2012.
v. M. Chennoufi, F. Bendella, "New Approach for Detecting Intrusions", International Journal of Scientific & Engineering Research, Vol. 3, Issue 1, ISSN 2229-5518, January 2012.
vi. M. Sadiq Ali Khan, Dr. S. M. Aqil Burney, S. M.Aqil Burney, "Feature Deduction and Ensemble Design of Parallel Neural Networks for Intrusion Detection System", IJCSNS International Journal of Computer Science and Network Security, Vol. 10, No. 10, 2010.
vii. M. Arumugam , P. Thangaraj , P. Sivakumar, P. Pradeepkumar, "Implementation of Two Class Classifiers for Hybrid Intrusion Detection", Proceedings of the International Conference on Communication and Computational Intelligence, pp.486-490, 2010.
viii. M. Sadiq Ali Khan, Dr. S. M. Aqil Burney, S. M.Aqil Burney, " Feature Deduction and Ensemble Design of Parallel Neural Networks for Intrusion Detection System", International IJCSNS Journal of Computer Science and Network Security, vol. 10, no. 10, 2010.
76
x. S. Srinoy, W. Kurutach, W. Chimphlee, S. Chimphilee, " World Academy of Science", Engineering and Technology, PP. 140-144,2005.
68
Combining MLP and Using Decision Tree in
Order to Detect the Intrusion into Computer
Networks
A.Saba Sedigh Rad , B.Alireza Zebarjad
Young Researchers and Elite Club, Baft Branch, Islamic Azad University, Baft, Iran sabasedighrad@yahoo.com
Department of Computer, Islamic Azad University, Andimeshk branch, Andimeshk, Iran
Ali_reza_zebarjad@yahoo.com
Abstract
The security of computer networks has an important role in computer systems. The increasing use of computer networks results in penetration and destruction of systems by system operations. So, in order to keep the systems away from these hazards, it is essential to use the intrusion detection system (IDS). This intrusion detection is done in order to detect the illicit use and misuse and to avoid damages to the systems and computer networks by both the external and internal intruders. Intrusion detection system based on the combination of experts has been offered using MLP neural networks with the aid of a mediator expert which is applied by Decision Tree. The offered method has been tested using KDDCup99 dataset. The results show the increase in attack detection precision. Also, using this method has a high output considering the previous methods in detecting R2L attacks which have few pedagogical samples.
Keywords: Intrusion detection system, combining experts, MLP neural networks, Decision Tree, KDDCup99.
I. Introduction
Having access to information and fast processing of this information is an essential issue.
This has led to data processing in computer systems and saving and retrieving these data and
we need to share them in order to be able to use them. The main challenge in using these
systems is to keep these data safe and protecting them, because alongside the fast
development of computer network attacks, the amounts of the attacks and intrusions have
been increased significantly. For this reason, we need a security system which analyzes all
the internal and external events and activities related to the system. This need resulted in the
creation of an Intrusion Detection (IDS). The intrusion detection system is one that observes
69
the signs of their intrusion, and it alarms the system manager in the case of intrusion
detection and misuse (Srinoy et al, 2005). So, the aim of using an intrusion detection system is
not to prevent attacks, but it alarms the managers about the intrusions into computer systems
and networks in order to make a good decision about it. So, the aim of preparing this paper is
to design an intrusion detection system with a good efficiency and rate according to the
learning algorithm of machine.
II. Introducing KDDCup99 dataset
In 1998, a program was offered by MIT university lab for intrusion detection and this
program was called DARPA 1998, and this program was assigned to intrusion detection
(Sheikhan et al, 2012). Dataset has been provided in a simulated environment in a local
network of USA air base during 9 weeks (Sima, 1998). Dataset was used in KDDCup99
intrusion detection competition (Zeng et al,2009). This standard dataset includes something
around 5 million records, and each record includes 41 features (Peddabachigaria et al, 2007).
Because there is a high number of a dataset record, usually 10% of these records are used for
testing and pedagogical reasons (Chennoufi et al, 2012). The number of each class in datasets
has shown in table (1) which includes a normal class which shows the normal activity of
users and the four classes are as follow:
•Probing: In fact this attack is a background for the next intrusions and searches computer to find out the vulnerabilities.
Dos: In this kind of attack, the aim is to use the host resources. The intruder tries to disconnect the legal access of the users to the server (Sadiq et al,2010).
•U2R: In this attack the intruder tries to take control of the system root using a real external machine.
R2L: The intruder tries to take control of an external machine using the system vulnerabilities and to take this control as a local user. (Arumugam et al, 2010)
TABLE I
The number of attack and normal groups in learning and testing datasets.
Dataset test
Dataset train Class name
Class
No. of instances No. of instances
60593 97271
Normal 1
4166 4107
Probe 2
231455 391458
DOS 3
88 59
70 III. Suggested method
The diagram of the suggested method has been shown in figure (1) and includes several steps
which we will elucidate them.
Figure .1. The suggested method
A. Normalization
14727 1119
R2L
71
Because of the difference in intervals of feature differences and the amounts of some features
which are greater effect on cost function and this does not indicate their importance in
detecting the attack class, therefore, it is not considered as bad result for avoidance to use of
normalization. It assumes that the minimum and maximum amounts of X, are alternatively
Xi,max and Xi,min in the i th feature, equation number (1) is used for normalization. So, the liner
conversion of data will be used in [0, 1] interval. (Sadiq et al,2010)
(1)
B. Reducing dimensions
By reducing dimensions, the features that have less effect in diagnosing attacks will be
eliminated in the overhead data (Sadiq et al, 2013). Therefore, the algorithm of relief feature
selection was applied on the normalized data of the previous stage using Weka software and
features number (10, 29, 39,18,13,8,20,21,7,15 and9) which less than 10 percent effective in
attack detection were eliminated from the dataset.
C. Detecting normal or attack groups
After reducing the dimensions of dataset sample, these samples will be divided in to two sets:
attack sets or normal sets. In fact, attacks samples one grouped in a group named zero class
and class number 1 is considered for normal samples. Then, an MLP neural network IS
trained in order to detect the attack types or normal by two abovementioned classes. This
network includes 30 input neurons and two hidden layers each contain 50 neurons and 2
output neurons. After training, this network will be able to detect the normal groups of the
attack but is not able to detect the type of attack. In fact, this expert network is used to detect
normal groups. The next step includes the suggested working method for detecting the type
of attacks.
D. Adding rare data and creating a suitable dataset
Datasets will be divided into 4 classes according to 4 types of attack (DOS, PROBE, R2L,
U2R ). The number of data of two R2L and U2R classes is little for training, so the training
will not be done in a suitable way. According to the reference, number (Graupe ,2007) those
classes which include less data will be repeated to receive good training. For example, if the )
(
) (
min , max
,
min ,
i i
i i i
X X
X X X
72
numbers of class1 data are 3 and class zero data are 100, each data of the first class will be
repeated 10 times. In this research, we do this for two R2L and U2R classes. In order to train
the experts to detect a specific attack. We consider the samples of this attack as class 1 and
the samples of three other attacks as zero class. This way, 4 datasets were created as follow: • Dataset to train the expert which detects the DOS attack: Data of DOS attack are
considered as class 1 and the sum of the data of three other classes (PROBE, R2L,
U2R) are considered as zero class.
• Dataset to train the expert which detects the PROBE attack: Data of PROBE attack are considered as the 1 class and the sum of data of 3 other classes (DOS, R2L,
U2R) are considered as zero class.
• Dataset to train the expert which detects the R2L attack: Data of R2L attack are considered as the 1 class and the sum of data of 3 other classes (DOS, PROBE,
U2R) are considered as zero class.
• Dataset to train the expert which detects the U2R attack: Data of U2R attack are considered as the 1 class and the sum of data of 3 other classes (DOS, PROBE, R2L)
are considered as zero class.
E. Creating certified specialists
Using 4 abovementioned datasets, 4 MLP neural networks are trained as separated experts,
each of them is able to detect a special type of attack according to its training dataset.
Applying expert networks has been done by MATLAB software. The weight of neural
networks is adjusted according to the algorithm of gradient descent that tends to minimize the
amount of error. The training set deliverer to the network of data in a successive manner and
the errors are calculated and the weight of each vector is adjusted so that this error reduces to
an acceptable minimal point for all training sample. After training step, four experts (DOS,
PROBE, R2L, U2R) are able to detect the attacks alternatively. If an expert will be shown in
the output indicating the inability to detect the attack.
F. Decision Tree
If the four experts from the previous step do not have the ability to detect the attack, we make
use of the expert of decision tree. We use this expert when it is not possible to detect the
attacks by certified specialists, and this way the decision of decision tree is considered as the
73
detect four classes of attacks. The decision tree is consisted of the nodes, leaves and edges.
Each feature by which data are separated is considered as a node in decision tree. Each node
joins to the next node or leaf by means of edges. These edges are labeled by different
amounts. The labels guide the data to the class according to the feature amount. For the
discrete features a branch is created for each amount, and for the continuous features an
interval is determined and is divided into two branches.
G. Combining the results
In this step, the testing dataset is applied to the suggested algorithm. If the new data is
realized as normal, the algorithm will reach to the end, algorithm, if the input data of the
attack is realized, the test sample will be applied in a synchronous and Parallel manner to 4
certified specialists, and each of these experts will opine. If only one certified specialist
diagnose class 1, its opinion indicates the type of attack and the algorithm reaches to its end.
But if the opinions obtained from the certified specialists all were equal to the zero class, or
more than one expert diagnose class 1, (for example, two U2R and DOS expert show class 1
as the result) this sample is given to the decision tree and the opinion of this expert which is
trained to recognize all the attacks will be considered as the end result.
IV. Results
The results obtained after applying the entire dataset test to the suggested algorithm have
been shown in table (2).
Table II
the results obtained from the suggested method.
One of the strengths of this suggested method is that the normal types are diagnosed. So,
there is no difficulty of common or legal users to access these data. Also, in spite of the fact
74
detecting These kinds of attacks. The reason is that various experts have different errors.
Because are different errors, the experts could cover the weak points of each other. In table
(3), the results obtained from the comparison of the proposed method are indicated alongside
the other algorithms and the better results of this method could be observed in figure (2).
Table III
The comparison of the proposed method with the other methods.
The diagram of the mean results of the abovementioned methods for attack detection has
been shown in figure (2).
Figure .2. Diagram for comparing the mean results of the proposed method with the other methods.
V. Conclusion and suggestions:
In this paper, we made use of the initiative method for combining MLP and Decision Tree
experts. The research revealed that the classifications could not detect all the attacks by
themselves, but combining them will lead to a better result. Therefore, we make use of the
idea of experts combination so that we could diagnose the attack classes with less error, but it
75
exists after training and while testing, since both testing samples are applied in a parallel
manner, So the results will be obtained faster.
References
i. D. Graupe, "PRINCIPLES OF ARTIFICIAL NEURAL NETWORKS", Advanced Series on Circuits and Systems Vol. 6,2007.
ii. J. Sima, "Introduction to Neural Networks", Technical Report, 1998.
iii. J. Zeng , X. Liu, T.Li, G. Li, H. Li, J. Zeng , "A novel intrusion detection approach learned from the change of antibody concentration in biological immune response. ApplIntell, 2009.
iv. M. Sheikhan, Z. Jadidi, A. Farrokhi, "Intrusion detection using reduced-size RNN basedon feature grouping”,NeuralComput&Applic,Vol. 21, PP.1185–1190.2012.
v. M. Chennoufi, F. Bendella, "New Approach for Detecting Intrusions", International Journal of Scientific & Engineering Research, Vol. 3, Issue 1, ISSN 2229-5518, January 2012.
vi. M. Sadiq Ali Khan, Dr. S. M. Aqil Burney, S. M.Aqil Burney, "Feature Deduction and Ensemble Design of Parallel Neural Networks for Intrusion Detection System", IJCSNS International Journal of Computer Science and Network Security, Vol. 10, No. 10, 2010.
vii. M. Arumugam , P. Thangaraj , P. Sivakumar, P. Pradeepkumar, "Implementation of Two Class Classifiers for Hybrid Intrusion Detection", Proceedings of the International Conference on Communication and Computational Intelligence, pp.486-490, 2010.
viii. M. Sadiq Ali Khan, Dr. S. M. Aqil Burney, S. M.Aqil Burney, " Feature Deduction and Ensemble Design of Parallel Neural Networks for Intrusion Detection System", International IJCSNS Journal of Computer Science and Network Security, vol. 10, no. 10, 2010.
ix. S. Sedigh Rad , A. Zebarjad , M. M. Lotfinejad, "A Combination of the MLP Neural Networks to Determine the Level of Happiness with the Parameters of the General Health", Journal of Academic and Applied Studies, Vol. 3( 3) March, pp. 40-45, 2013
x. S. Srinoy, W. Kurutach, W. Chimphlee, S. Chimphilee, " World Academy of Science", Engineering and Technology, PP. 140-144,2005.