Analysis of feature selection on the performance of multimodal keystroke dynamics biometric systems

(1)

UNIVERSIDADE FEDERAL DO RIO GRANDE DO NORTE

DEPARTAMENT OF INFORMATICS AND APPLIED MATHEMATICS

BACHELORS IN SOFTWARE ENGINEERING

Analysis of Feature Selection on the

Performance of Multimodal Keystroke

Dynamics Biometric Systems

Brenda Vasiljevic

(2)

(3)

Brenda Vasiljevic

Analysis of Feature Selection on the Performance of

Multimodal Keystroke Dynamics Biometric Systems

Course Conclusion Report End of Course

Report presented to the Universidade Federal do Rio Grande do Norte (UFRN) as one of the prerequisites to obtain a Bachelor’s Degree in Software Engineering.

Universidade Federal do Rio Grande do Norte - UFRN

Supervisor: Márjory da Costa Abreu

Natal

August 2017

(4)

Mendes, Brenda Vasiljevic Souza.

Analysis of feature selection on the performance of multimodal keystroke dynamics biometric systems / Brenda Vasiljevic Souza Mendes. - Natal, 2017.

53f.: il.

Monografia (Graduação) - Universidade Federal do Rio Grande do Norte. Centro de Ciências Exatas e da Terra. Departamento de Informática e Matemática Aplicada. Bacharelado em Engenharia de Software.

Orientador: Márjory da Costa Abreu.

1. Keyboard keystroke dynamics. 2. Touch keystroke dynamics. 3. Biometrics. 4. Feature selection. 5. Classification accuracy. I. Abreu, Márjory da Costa. II. Título.

RN/UF/CCET CDU 004:57.087.1

Universidade Federal do Rio Grande do Norte - UFRN Sistema de Bibliotecas - SISBI

(5)

Abstract

New security systems, methods or techniques need to have their performance evaluated in conditions that closely resemble a real-life situation. Moreover, biometric systems need a realistic set of biometrics data to test their accuracy when classifying individuals between legitimate users or impostors. The use of similar modalities may influence the use of the same features, however, there is no indication that basic biometrics will perform well using the same set of features. This report aims to be the first to investigate the impact of feature selection in two similar yet different biometric modalities: keyboard keystroke dynamics and touchscreen keystroke dynamics. We have found that an efficient feature selection method, chosen to suit the needs of the classification algorithm employed by the system, can multiply accuracy rates while diminishing the number of features to be processed to a small subset - which also improves the system’s processing time and overall usability.

Keywords: keyboard keystroke dynamics. touch keystroke dynamics. biometrics. feature

(6)

Introduction

The huge amount of personal and/or confidential information we keep in our electronic devices - both personal computers and smart phones - requires that we take measures to protect ourselves against unauthorized access to them (1). As virtual environ-ments became responsible for storing sensitive information and performing critical actions, security services that aim to counter the threat of impersonation have been developed and now need to evolve faster than the methods to bypass them. In order to test the accuracy and overall performance of security systems, it is necessary to subject them to similar conditions as what they would find in a real situation (2).

Authentication is a security service to counter the threat of impersonation - that is, somebody claiming to be someone else - and its objective is to protect the system from unauthorized access. There are three approaches to authentication methods that differ in how the user can prove they are who they claim to be: proof by possession, by knowledge, and by property (3). The possession factor is something the user has, like a magnetic card; the knowledge factor of authentication is something the user knows, like an username and password; and the property factor is a human characteristic (biometrics).

Knowledge-based and possession-based methods of access control have their weak-nesses. Passwords can be forgotten, overheard or guessed through various methods such as dictionary and brute-force attacks, and diminishing the risk requires that the user memorizes lengthy passwords and change them regularly (4,5). Smart cards can be lost, stolen or cloned (5). Due to these limitations, biometrics authentication methods provide an alternative or added layer of security over the above mentioned methods for relying in traits that are inherent to the person and thus can not be taken from them or easily imitated, while still being intuitive to the user and thus keeping the process practical and effortless (4).

Biometrics, or biometry, provides methods to automatically recognize or verify the identity of a person through the caption of the biometric trait, extraction of characteristic features and comparison with a database of features extracted previously (6). The process of matching a sample of biometric data with the user that most closely displays its same pattern is called classification (7). Classification methods perform a similarity measure of some sort and depending on whether the similarity between a sample and a user’s regular pattern falls withing the established parameters, recognizes the user and performs the suitable actions (such as allowing/denying entrance to the system or blocking the workstation).

(9)

2 Introduction

or confirm that they are who they claim to be (authentication).

Biometric traits are divided into physiological traits (such as fingerprint, palm print, face and iris) and behavioral traits (such as gait, pitch and amplitude of the voice, handwritten signature and typing patterns). Behavioral biometrics have an added layer of complexity when compared to static biometrics due to the inevitable changes of behavior a human being is prone to, be they intentional or not (1). Biometric traits must be unique to the user and consistent enough that a user’s biometric signature can be built from it.

Keystroke dynamics are the unique typing pattern of a user (7) and their adoption as a behavioral biometric trait allows the authentication to occur through the analysis of the user’s usual typing style. The latencies between successive keystrokes, hold durations, finger pressure and hand positioning make up for a fairly consistent pattern, specially for regularly typed strings, and a person’s typing pattern has been observed to be as unique as their written signature (1).

As a biometric trait, keystroke dynamics has many advantages: it is practical and inexpensive, since all mobile and non-mobile devices are equipped with some kind of keyboard; the data is often available thorough the entire session, allowing both static and dynamic authentication (the static approach granting access to the system and the dynamic approach ensuring the user’s identity thorough the session) (8); and it is non-intrusive, so combining it with other methods of authentication and/or performing it dynamically does not diminish usability or increase user’s annoyance significantly. Keystroke dynamics can be extracted from the typing of fixed-text or free-text - that is, from typing a predetermined string of characters such as a password or an arbitrary piece of text that does not need to match the text typed at enrolment (4).

Authentication in a biometric system happens as follows: classification algorithms, commonly referred to as classifiers, are trained on the user’s usual typing style receiving a number of samples from the user during the enrollment phase. In the verification phase, a new sample is compared to the recorded pattern in order to determine the authenticity of the user (9). Each sample is a vector of features, not all of which contribute to accurate classification of the user. Efforts have been made towards accomplishing faster, more accurate keystroke dynamics authentication through feature subset selection -searching the feature space for a subset of relevant, non-redundant features that optimizes performance (9,10, 11, 12,13, 14).

Due to the increasing integration of touch screen devices in everyday life and the subsequent need to develop security measures to protect these devices (15), keystroke dynamics have branched out into two different modalities according to the type of keyboard (physical or virtual) used for data collection. Typing patterns on touch screens, or touch keystroke dynamics, have arisen as a biometric trait with comparable authentication effectiveness as keyboard keystroke dynamics (16). Combining these two modalities has

(10)

3

been reported as a potentially effective solution to enhancing the robustness of keystroke dynamics-based authentication systems (17); in order to fully grasp this potential, the present report provides an analysis of the biometric data behavior under different classifi-cation and feature selection methods while trying to find reliable system configurations to maximize the accuracy of multimodal and unimodal keystroke dynamics.

Motivation

The design of a biometric-based classification system is a particularly challenging pattern recognition task (3). The fundamental nature of this specific type of data as well as the application domain where it will be utilized makes it very particularly specialized. The effectiveness with which individual identity can be predicted in different scenarios can benefit from seeking a broad base of identity evidence. Many approaches to the implemen-tation of biometric-based identification systems are possible, and different configurations are likely to generate significantly different operational characteristics.

The choice of implementation structure is therefore very dependent on the per-formance criteria which are most important in any particular task scenario. Keystroke dynamics have, so far, been used mostly as a supplement for more reliable security mea-sures due to performance being often compromised by diverse variables (3) such as hand injury, fatigue, hand-positioning, among others. The issue of improving performance can be addressed in a number of ways, but system configurations based on integrating different information sources are a widely adopted means of achieving this (6).

Despite all efforts to improve the performance of biometric systems, a single biometric trait (or modality) is sometimes insufficient to create a reliable system. Some of the drawbacks of unimodal systems include vulnerability to spoofing attacks, performance degradation due to noisy data and limited applicability since the modality is unavailable to some users and in some situations. A number of these limitations are inherent to the biometric trait, and the use of multiple modalities provides a better population coverage, more applicability and an extra layer of protection against attacks (18).

Multimodal systems, those that use two or more different biometric traits for authentication, also present disadvantages, such as lower usability and an increased difficulty to optimize performance measures such as error rates and processing times (19). As a system gathers more features when contemplating more modalities, selecting a subset of features that allows for better classification accuracy and/or decreases processing time becomes paramount to the creation of a system that is both effective and usable. In the particular field of keystroke dynamics, the amount of data required to establish an user’s pattern during enrollment and the amount required before detection occurs is something that requires careful consideration (9), as in a realistic situation users are not likely to

(11)

4 Introduction

either type long enough texts as to gather an abundant amount of data or accept below average accuracy as a consequence, making feature selection even more relevant as it optimizes classification with a smaller number of features.

Feature selection is the preprocessing of data through the removal of irrelevant or redundant features from the database. Classification algorithms function on the premise that a user’s pattern is consistent; user’s typing inconsistency, be it from fatigue, pauses, or other external factors, may generate noises and outliers that will contribute to misclassification (10). Selecting a subset of relevant features and ignoring the rest has been used as a way

to improve classification accuracy and diminish processing time (10,20, 9).

The objective of a feature selection algorithm is to either identify a smaller subset of features with equal or better ability to discriminate between users than the whole feature set, or rank the features according to their discriminative ability (9). Algorithms or techniques for subset selection or feature ranking can be classified as Filters by discarding features deemed irrelevant due to the chosen criteria or Wrappers by evaluating the performance of the classification algorithm in several subsets of features in order to determine the best subset. An wrapper-based approach presents better accuracy rates than a filter approach, although generally also worse processing times and a dependency on the efficiency of the classifier (11). Filter-based approaches are independent from the classification algorithm, and thus may be employed in systems that use more than one; moreover, Filters provide much faster selection.

While theoretically similar, keystroke dynamics extracted from physical and virtual keyboards are two different modalities with an unknown level of correlation between them and great potential for combined use in a multimodal system that can be accessed through platforms with both a physical keyboard and a touch screen or through multiple platforms that have either one of those. But any such system would acquire a lot of data, not all of which would be relevant or even possible to process in a timely fashion; just the time a user takes to hold a key, or release one and press another, multiplied by the number of keys in a keyboard (physical or virtual) and how they can be combined in sequences of two or threes can fill out a very big database with numerous features that must be processed during the verification phase and may delay recognition or cause misclassification due to noisy data.

Multimodal databases combining these two modalities have been scarcely studied (17) and so have their applicability in real-world situations. Although extensive research has been performed to determine optimal classification methods and approaches for both keyboard-based and touch-based keystroke dynamics, it is yet unclear if such methods will have similar results when applied to a database containing both. Moreover, few approaches to feature selection have been investigated in the context of typing features as a whole, making it difficult to discern which method is more efficient for one or both modalities, let

(12)

5

alone their combination.

With that in mind, the present work aims to analyse a series of classification and feature selection methods and their combination as to determine their impact in the performance of a multimodal biometric system that uses both physical and virtual keyboards to perform user authentication through keystroke dynamics. The key aspects studied are how different classification algorithms and their reported accuracy are affected by different feature selection approaches when applied to the multimodal database or each of the "sub-databases" within it - that is, those containing only keyboard-based or touch-based keystroke dynamics.

Objectives

The general objectives of the present work have been summarized below.

• Analyse how feature selection impacts the classification accuracy of a multimodal keystroke dynamics database and its unimodal sub-databases;

• Analyse the combination of different methods of classification and feature selection and their impact in classification accuracy;

• Identify potentially reliable combinations of databases, classifiers and feature selection methods to improve the performance of multimodal and unimodal biometric systems;

• Determine if the impact of feature selection in singular modalities can predict the behavior of a multimodal database after optimization.

In order to achieve the general objectives, a set of specific objectives was determined: conduct experiments on a keystroke dynamics multimodal database and its sub-databases by putting it through all possible combinations of three classification algorithms and four feature selection methods (selectors); compare the performance of each classifier-selector combination among different datasets; and draw conclusions from how the use of a selectors has affected the performance of the authentication system in relation to the initial performance and the performance of other combinations of datasets, classifiers and selectors.

Structure

This work is structured as follows: there are three chapters dedicated to the literature review studied in order to establish a theoretical foundation for the present work; Chapter 1 discusses related work in the field of keystroke dynamics, covering different

(13)

6 Introduction

approaches, methods and techniques employed to the task of analysing and enhancing keyboard-based and touch-based keystroke dynamics systems. Chapter 2 is about biometric databases of keystroke dynamics data and includes databases with varying features, types of entry, methods and platforms of data extraction. Chapter 3 presents related work in the field of biometric feature selection.

Chapter 4 contains the methodology employed in the development of the present work - the database, the classification algorithms and the feature selection methods - as well as the obtained results and their analysis. The final part of this report summarizes conclusions, limitations and future work.

(14)

7

1 Keystroke Dynamics

Keystroke dynamics analysis is a behavioral biometric technique - often described as something a person is, instead of something a person has (like a fingerprint) or something a person knows (like a password). In (21), Araújo et al. introduces keystroke dynamics analysis as a low cost, non-intrusive authentication method that has a clear advantage over password-based authentication: it cannot be lost, forgotten, or stolen. While the same can be said about physiological techniques, these generally require expensive and maybe intrusive hardware to collect the biometric data.

As explained by Polemi in (3), the analysis of typing rhythms can be made statically or dynamically. In the static approach, the system verifies the identity of the user before granting them access to the system, generally using typing features collected during the typing of the password or other form of fixed-text. In the dynamic approach, the typing patterns are analyzed continuously during the work session with data being extracted from arbitrary text input (free-text).

According to Morales et al. in (22), the most popular typing pattern features are as follows:

Hold Time is the difference between the time of pressure and release of the same key. Release-Press latency, or RP-latency is the difference between the time of release of

a key and the time of pressure of the next one.

Press-Press latency, or PP-latency is the difference between the time of pressure of

a key and the time of pressure of the next one.

Release-Release latency, or RR-latency is the difference between the time of release

of a key and the time of release of the next one.

Press-Release latency, or PR-latency is the difference between the time of pressure

of a key and the time of release of the next one.

These naming conventions will be used through the report even while we discuss another work that uses a different nomenclature for the same feature.

The performance of biometric systems is most often measured by the False Ac-ceptance Rate (FAR) and the False Rejection Rate (FRR). These measure, respectively, the probability an impostor will be mistaken for a legitimate user and the probability a legitimate user will be mistaken for an impostor. When the rates are equal or as close to equal as possible, the value is called the Equal Error Rate, or EER, which is also used to compare the performance of biometric systems, as it is the point where we can find the balance between security and usability.

(15)

8 Chapter 1. Keystroke Dynamics

This chapter discusses related work in the field of keystroke dynamics, or more specifically authentication methods based on typing patterns recognition. The discussion is divided into two sections: the first, named Keyboard Keystroke Dynamics, discusses research and experiments in the field done with data collected from physical keyboards. The second section, called Touch Keystroke Dynamics, discusses research and experiments about the use of keystroke dynamics-based biometric systems in touch screen devices.

1.1 Keyboard Keystroke Dynamics

In (21), a static approach to keystroke dynamics authentication is used to improve the conventional login-password authentication method, analyzing a combination of three features: hold time, PP-latency and RP-latency. An adaptation mechanism was employed to take human learning into account and substitute old samples of typing patterns with newer samples extracted after a successful authentication attempt. The best FRR and FAR rates obtained from the experiment were 1.45% and 1.89%, the FAR reaching 3.66% when impostors were allowed to watch the legitimate user type their password. Some interesting observations about the results of the experiment were that the familiarity with the password impacts the performance, with the FRR increasing when a password is imposed to the user instead of chosen by them; and that despite the adaptation mechanism decreasing both rates, if it is always active the FAR is increased by impostors’ samples being used to update the set of samples that make up the legitimate user’s template.

About the impact of human learning on the performance of keystroke dynamics authentication techniques, in (23), Ngugi, Kahn and Tremaine found evidence to the hypothesis that the hold time and RP-latency of a given user lowers over time, and that it impacts negatively the performance of classifiers (algorithms that determine if a typing pattern sample belongs to the legitimate user or to a impostor). Using a four digit PIN, and intervals of a week and a month between the extraction of typing patterns, the experiment verifies that the amount of time it takes for a user to type the PIN shortens with each practice session, and the template created with the initial typing pattern samples collected on enrollment day becomes less reliable, causing the FRR to increase significantly over time.

The works discussed above propose security systems that combine keystroke dynam-ics with a knowledge-based authentication method. In (22), the proposed system does not require passwords, and instead collects keystroke patterns from users typing their personal data. The reasoning behind this proposal was that passwords that must be complex to be secure are hard to remember, while the user will hardly forget their own family name or ID. The experiment finds a promising Equal Error Rate of 2.3%. It was found that nature of the data impacts the performance of the system, with fields like family name,

(16)

1.1. Keyboard Keystroke Dynamics 9

email and ID performing better than given name, for example, because of the relatively longer strings and consequently bigger amount of typing patterns collected. A relevant observation during the experiment is the superior performance when combining the hold time and RP-latency, which suggests a correlation between features.

On the subject of keystroke dynamics features, in (24) rough sets were used to investigate the features that were significant to the authentication process. Although 47 attributes were extracted from the typing of the ID and password, one of the conclusions of the experiment is that not all of them are necessary. Reducing the number of attributes in 60% increased the accuracy of the authentication method and reduced the execution time by up to 40%.

Up to this point, only experiments that take an static approach to user authen-tication were discussed. Typing patterns has the advantage of still being available for collection after the the access control phase has passed, allowing the system to perform user authentication continuously thorough the work session. In (5), Bergadano, Ginetti and Picardi extract keystroke dynamics attributes of individuals from free-text, or more specifically from recurring digraphs and trigraphs (combination of two and three letters) in free-text. The relative and absolute PP-latency of digraphs in a word determine the "distance" between two typing samples, and that distance must fall inside an acceptance threshold to be considered as belonging to the same individual. The system proposed shows accuracy higher than 90%, with hopes that a larger amount of information available would provide even better accuracy.

In the context of continuous user authentication, (8) contains an analysis of the performance of four keystroke dynamics features in the most frequent letters, digraphs and words in English. The results show that some features perform better than others, confirming hold time as the most effective feature during classification and discovering total typing time of the most used words in English texts as a relevant feature for classifier algorithms.

In 2006, Clarke and Furnell studied the feasibility of using keystroke analysis to continuously authenticate users in mobile devices (25). Since the touch screen technology was not widespread by then, the experiment was based on a physical numeric keyboard as found in a Nokia 5110, and focused on whether it was possible for a typing pattern-based authentication system perform well extracting data from the two most typical interactions with a mobile device: entering telephone numbers and writing text messages. Clarke and Furnell conclude that the performance is largely dependent on the user and how often they interact with the mobile device in ways that allow for keystroke data extraction. Users that do not communicate through text messages and/or prefer to select a user saved in their contact database rather than typing their number may provide too few data for accurate authentication.

(17)

While promising, the results of (25) are of questionable relevance to the present situation due to the vast changes in mobile devices since then. In the following section, works related to keystroke dynamics-based authentication systems in more modern devices, that make use of the touch screen technology, will be discussed.

1.2 Touch Keystroke Dynamics

In (16), Jeanjaitrong and Bhattarakosol have found that keystroke dynamics can be used as an authentication method in touch screen devices with similar accuracy rates as in actual keyboards. The comparison was made using only features that can be measured in both virtual and physical keyboards, namely the hold time and RP-latency. Using only these two features, an accuracy rate of 76.44% was obtained, but as the method gets more precise with a bigger number of variables being considered during the classification process, features that can be extracted from touch screens can increase the rate considerably. One such feature taken into consideration in this work is the distance between buttons, under the hypothesis that different users press the same button in different spots on the screen. The best accuracy rate found combining the conventional and touch screen-specific features was 82.18%, which is acceptable considering only up to four variables were taken into consideration during the experiment.

Touch-specific keystroke features are evaluated in (2). Examining them separately, the lowest EER were observed for hold time, touch down pressure and size, and touch offsets/locations, indicating that spatial touch features should be considered for mobile keystroke biometrics - the touch features previously mentioned often outperformed the RP-latency, one of the most used features in related work. Examining feature sets led to the following conclusions: spatial touch features are superior to temporal features, with significantly lower EER and; feature sets combining temporal and spatial features outperform feature sets containing only one or the other, indicating they should be combined to optimize the authentication method.

As the current work analyzes how less than realistic testing scenarios affects the results of an experiment, it’s relevant to mention the work of Antal and Szabo (26), that evaluates the use of two-class and one-class classification algorithms for keystroke dynamics authentication in touch screen devices. Two-class classification algorithms, while choosing whether or not to accept a sample as belonging to the legitimate user, compare each sample with two classes of samples: the positive and negative classes, the positive containing samples from the legitimate user, the negative containing samples from impostors. One-class algorithms does the comparison with positive samples only. While the two-One-class classification algorithm had a better performance, it’s unlikely data from other users will always be available, making the less accurate category of algorithm the one with more

(18)

1.2. Touch Keystroke Dynamics 11

real-life applicability.

There have been many proposed systems for authentication via keystroke dynamics, using different techniques to classify users as legitimate or impostors by analyzing their typing patterns. For touch screen keystroke dynamics, Alghamdi and Elrefaei propose in (27) a classifier based on medians vector proximity. During enrollment in the system, the median vector is calculated for each of the features collected from a given user, as is the standard deviations vector. The standard deviations vector of a feature is used as the distance threshold from the median vector for classifying purposes - if a sample falls within the distance threshold, the Feature-Score is set as one. If a sample doesn’t fall within the threshold, the Feature-Score is set as zero. The Test-Score is the sum of Feature-Scores, and in order for a user to be authenticated into the system, the Test-Score must be greater than the Pass-Score - the minimum number of features within the distance threshold required to classify a user as legitimate. The method, that extracted up to 33 touch keystroke features from free-text, had an average EER of 12.2% requiring that at least 50% of a user’s features be within the proximity threshold in order to authenticate them into the system.

Another classifier model for typing patterns-based authentication in touch screen devices was proposed by Mahmood Al-Obaidi and Al-Jarrah in (15). Results of the experiment that collected up to 71 features show EER of 6.79%, a significantly better rate than what was obtained from previous experiments using the same dataset to evaluate different classifier models (in these experiments, EER was between 12.9% and 16.6%). The model they call Med-Min-Diff stores in a user’s template the median, minimum and maximum value obtained during the enrollment phase for each feature. The minimum and maximum values are the base of the lower and upper thresholds; from there, the process to calculate the Test-Score happens as in (27). The work also briefly analyzes the average coefficient of variation for each feature, in hopes to determine which of them are the most efficient in distinguishing between different typists.

The system proposed in (28) differs from the previously mentioned in more than the classifier model used during the experiment; because of the inconsistency in size and layout of keypads in different mobile devices, it’s argued that the utility of keystroke dynamics authentication systems is worse in mobile devices than on hardware keyboards (in cases when the system the user wants to be permitted access to is hosted in the cloud or otherwise able to be used from different devices). The work proposes a graphical-based password as a better alternative in such circumstances because the size and layout of the interface is the same in every device. They were able to obtain a 12.2% EER analyzing the time features extracted from clicking and touching an apparently undivided image. The EER lowered to 6.9% when combining the time and pressure features.

(19)

1.3 Chapter Summary

This chapter is about related work in the field of keystroke dynamics as a tool for biometric authentication. These papers report investigation on both modalities of keystroke dynamics in order to determine if building security measures based on typing patterns is feasible and advantageous.

Keystroke-based authentication is approached heterogeneously among the papers. Each work proposes and studies biometric systems that vary in the employed classifica-tion algorithms, feature extracclassifica-tion methods, type and size of input, data preprocessing, among others. Analysing the performance of keystroke dynamics systems with different configurations, intent and priorities allows for the identification of how different variables may impact the system’s reliability and usability; and studies how to optimize systems of a specific configuration.

The analysis of a biometric system relies heavily on the data used for testing. Through the chapter we observed how human learning, adaptation mechanisms and familiarity with the input affects the data and consequently affects the system’s training, testing and performance. The next chapter will discuss keystroke dynamics biometric databases and how they aim to provide a realistic testing scenario and accurately represent a real-life application.

(20)

13

2 Typing Patterns Databases

Biometric systems are the technologies that extract and analyze human physiological or behavioral features for authentication purposes. To evaluate these systems in terms of performance, security and usability, it’s necessary to create a realistic testing scenario through a dataset that closely resembles the data that system would deal with if it was already available to the public.

Multimodal biometric databases can be seen as combination of different biometric databases - one for each modality. Each database provides the user’s characteristics in one modality that and combined they make up the set of characteristics of a user, what we call the user’s template. Such combination can be done at feature level, classification level or decision level and their differences can have significant on the performance of the systems (29).

Few if any keystroke dynamics databases contain data extracted from both physical keyboards and touch screens. In order to test the performance of an authentication system that works on both personal computers and smart phones it would be necessary to combine two or more databases, each made up of data from one type of device.

Even if the two or more databases had similar procedures for data extraction, a user’s template in the combined database would be made up of touch screen keystroke dynamics from one person and keyboard keystroke dynamics from another. This would create an unrealistic testing scenario in which, for example, the alleged user would have the typing speed of an experienced typist on mobile devices and type as slowly on keyboards as if they had just learned how to use a computer.

Yang et al. (29) proved the existence of feature dependency in multimodal biometric systems, concluding that it is important to gather physiological features from the same person to create a multibiometric database that, when used to test a system, shows the performance it would have in a real situation as accurately as possible. Similarly, if a person’s typing patterns in different kinds of devices are dependent on each other, the performance of these mixed databases would not be simply a combination of the performance of the unimodal databases they are made of.

Indeed, all relevant issues must be considered, such as wasting important infor-mation (non-biometrics, very often) when this could influence the decision making of the identification process. Likewise, the use of several modalities is not always the best solution, and similar performance rates might be reached with unimodal systems when well balanced or more intelligent structures were used. Even the choice of the best modalities to compose a multimodal system might be unhelpfully influenced by issues about similarities

(21)

14 Chapter 2. Typing Patterns Databases

on the data itself as well as the features used.

This chapter discusses unimodal databases of keystroke dynamics and their ability to provide realistic testing scenarios for the evaluation of typing patterns-based biometric systems. Sections 2.1 and 2.2 discuss existing databases of keystroke dynamics data collected through physical keyboards and touch screens, respectively.

2.1 Data from Physical Keyboards

Killourhy and Maxion (30) collect data from 51 subjects, each providing 400 samples (the password was typed 50 times per session, during eight sessions). The password chosen for all users (".tie5Roanl") is supposedly a representative of a typical strong password - a 10-character string containing letters, numbers and punctuation. The features extracted from the raw data were hold time, PP-latency and RP-latency. The realism was compromised in the following ways: users do not naturally type their passwords 50 times per session; they do not all use the same keyboard; and they do not all have the same password. Thus, while this dataset may be useful for it’s original purpose (evaluate anomaly-detection algorithms for arbitrary passwords), it does not represent a realistic scenario to test the security and usability of an authentication system.

Loy, Lai and Lim (31) also provide all users with the same password ("try-4mbs") and keyboard, this one modified to be sensitive to pressure, thus capturing typing pressure patterns along with the typing latency. A hundred users participated in the data collection, providing 10 samples each - a much more reasonable number of samples to require from an actual user without causing annoyance.

Vural et al. (32) collect data from three typing activities: password entries, freestyle typing and transcription of fixed text. Among the discussed databases, this one has the largest amount of text per user. It also recorded users during the data collection, providing video and sounds data that can be used to enhance the performance of user authentication.

The BeiHang database (33) was acquired under real application assumptions: the user may choose their own password, repeating it during enrollment phase four or five times; the data collection is done without supervision in two different environments, and the number of samples collected vary from subject to subject.

The works discussed in the previous chapter mostly use their own databases to test their proposed systems. Araújo et al. (21) conducted their experiments on three machines and 30 test subjects and; the user’s samples were collected in different periods of time, instead of all at once. Each user tried to authenticate themselves into their own accounts 15 to 20 times; each account was attacked by an impostor 80 to 120 times and attacked by an observer impostor (someone that has watched the legitimate user type their password

(22)

2.1. Data from Physical Keyboards 15

Paper of Origin Subjects Type of Entry Error Rate

Killourhy and Maxion (30) 51 fixed password 9.6% EER Loy, Lai and Lim (31) 100 fixed password 11.78% EER

Vural et al. (32) 39 many 0.75% FAR

3.93% FRR

Li et al. (33) 117 user’s password 11.83% EER

Araújo et al. (21) 30 user’s password 1.6% EER

Ngugi, Kahn and Tremaine (23) 12 fixed PIN 2.0% EER

Morales et al. (22) 63 personal data 2.3% EER

Clarke and Furnell (25) 32 fixed PINs 12.8% EER

Clarke and Furnell (25) 30 fixed text 17.9% EER

Table 1 – Keystroke Dynamics Databases (Physical Keyboards)

before) 12 to 20 times.

Ngugi, Kahn and Tremaine (23) obtained their data through the following procedure: the system would present the user with a trivia question, and the option to answer true or false. To confirm their answer, the user had to type their provided personal identifying number (PIN), "1234". There were twelve participants, all undergraduate students in the same american university, and they were asked to repeat the experiment two more times, a week and then a month after the first session.

Morales et al. (22) don’t provide in their database the strings the 63 participants entered in the system; that’s because the input text is not a password, but the user’s personal information, including name, email, nationality and national ID number. The data acquisition was divided into two sessions: during the first, the user typed their personal data six times to enroll, and after at least 24 hours, they typed it six more times to try and get access to their accounts, plus twelve times to try and get access to someone else’s account.

The data for (25) was acquired through a mobile phone stripped of all it’s parts except for the keypad interface and connected to a PC that received and interpreted the data. The experiment involved 32 participants and two scenarios: entry of a 4-digit PIN, and entry of an 11-digit "telephone number". Each participant repeated both entries 30 times in a single session. Another dataset was generated for the same work, this one requiring 30 subjects to enter a total of 30 "text messages" (longer passages of text) split during three sessions.

Table 1 summarizes the databases discussed above, including the number of test subjects each employed and the type of text entry required from them - whether the subjects had to type a password, a PIN or a longer text, if it was chosen by the user or if a fixed entry was provided to all users. The table also shows the equal error rate of the proposed system that used that database to measure their performance.

(23)

2.2 Data from Touch Screens

El-Abed, Dafer and Khayat present a public benchmark in (34) called RHU Keystroke. It contains keystroke dynamics conventional timing features collected from touch screen devices. While being the most easily available mobile-centered database published at the moment, it does not contain touch-specific features.

The use of touch screen devices for keystroke dynamics data collection allows the acquisition of additional typing patterns characteristics. Although Antal and Szabó (26) collected data in almost the same way Killourhy and Maxion did in (30) - by extracting 51 samples per user in a controlled environment through the typing of the same fixed password - their database contains extra features provided by the pressure and size of finger area at the moment of key press. The same database was used in (15), and it is from this experiment that the EER show in Table 2 was obtained - their results show a significant difference in EER between the two datasets (with and without touch-specific features), going from 8.53% when the authentication is done with only timing features to 6.79% when using both timing, pressure and size of touch area features.

With the goal of analyzing tapping behaviors during the input of PINs, Zheng et al. (35) collect data from over 80 users in a Samsung Galaxy Nexus. Aside from traditional timing features, among the data collected are acceleration, pressure and finger area extracted through an Android application.

Data collected from iPhones was gathered by Jeanjaitrong and Bhattarakosol in (16). The subjects were 10 randomly selected iPhone users, and to account for the difference in size of different iPhone models, the web application that extracted the data had a fixed size. Participants had to press the password 10 times per round, for ten rounds, the password being 4 symbols from a matrix containing 16 symbols (in four rows of four symbols). Aside from the press and release time, the database stores the coordinates where the participant touched the symbol, allowing the analysis of another touch-specific feature: the distance between touch events.

Buschek, De Luca and Alt (2) collected two independent variables: hand gestures and password. As dependent variables, keystroke timing and touch location features were recorded. 28 participants were asked to repeatedly enter six passwords that covered two lengths (6 and 8 characters) and three styles (dictionary words, pronounceable passwords and randomly generated passwords). All participants were undergraduate or graduate students, they were and invited to two sessions at least a week apart, and in each session they had to type the password using the three different hand postures. The EER of their proposed system as reported in Table 2 is the lowest equal error rate found when the user’s hand posture was one during enrollment and another during authentication.

(24)

2.3. Chapter Summary 17

Paper of Origin Subjects Type of Entry Error Rate

El-Abed, Dafer and Khayat (34) 53 fixed password

-Antal and Szabó (26) 42 fixed password (15) 6.79% EER

Zheng et al. (35) 80+ fixed PIN 3.65% EER

Jeanjaitrong and Bhattarakosol (16) 10 fixed password 2.0% FAR 1.78% FRR Buschek, De Luca and Alt (2) 28 fixed passwords 33.05% EER Alghamdi and Elrefaei (27) 17 user’s text 12.2% EER Chang, Tsai and Lin (28) 100 user’s password 6.9% EER

Table 2 – Keystroke Dynamics Databases (Touch Screens)

users, all of which were required to be familiar with mobile phones with touch screens before the experiment. During five session, each user had to enter a message of their choosing. Their proposed system got a ERR of 12.9% using only timing features, and 12.2% including size of touch area and pressure.

Finally, Chang, Tsai and Lin (28) asked 100 participants to enter a graphical-based password of their choice 5 times; the password was a sequence of taps on an apparently undivided image chosen by the user (both the image and the sequence). Then, 10 participants in possession of the other users’ passwords tried to attack the system, entering each legitimate user’s password 5 times. The database collected timing and pressure features, and their proposed system had a EER of 6.9% combining both.

Table 2 summarizes the databases discussed in this section including their paper of origin, the number of test subjects, type of entry and equal error rate of a system that used such database to provide their testing data.

2.3 Chapter Summary

This chapter discussed numerous databases of keystroke dynamics. Each of them had their own data extraction hardware and methods; many of them chose the more traditional approach of collecting data from the typing of a predetermined password, but some included less traditional or novel types of entry like free-text and graphic passwords; each collected a different set of features including unique combinations of time-based, spatial-based and/or pressure-based features; each dealt with a different number of test subjects, varying from 10 to 100. Almost every database was put through one or more classifiers, and their performance was reported in terms of error rates that lead to an analysis of how the databases’ set-up influenced the authentication performance.

Improving the performance of biometric systems can be achieved by preprocessing the data to make it more predictive. One of the ways of doing that is selecting features that are relevant to the classification process and discarding features that are not. Feature

(25)

selection performed in different biometric modalities will be discussed in the following chapter.

(26)

19

3 Feature Selection

Biometric authentication systems must achieve high classification accuracy and meet usability requirements that make the system feasible and applicable to real-life situations, like reasonable or better yet fast data processing time (9). The complexity of training a system to recognize user patterns is proportional to the number of features the pattern is made-up of, and so feature selection techniques reduce the number of features prior to classification in order to lower the system’s time and memory requirements without compromising the predictive ability of the patterns (12).

This chapter is about selection of biometric features employed to identify an optimal or near-optimal subset of features to enhance the performance of an authentication system. What follows is the presentation and analysis of scholarly works on the application of feature selection to keystroke dynamics and (when the number of papers was insufficient to cover many feature selection methods) other biometric modalities, .

The huge amount of features that can be extracted from a user’s typing pattern often lead to degradation of the performance of a keystroke dynamics biometric system. In (13), three different optimization algorithms are experimented with in order to discover which of them achieves a better feature reduction rate given a database of keystroke-based features. The algorithms analysed were Particle Swarm Optimization (PSO), Genetic Algorithm (GA) and Ant Colony Algorithm (ACO). Each algorithm will be briefly explained below. PSO is initialized with a population of possible solutions (called particles) randomly distributed in a N-dimensional search space. PSO’s objective is to optimize a fitness function; each particle "moves" around the search space, at every iteration attempting to update the values of pBest and gBest - the best position found in the vicinity of the particle and the best position found globally - and being updated by it, as the particles are pulled towards the best local and global positions.

GA initializes a population of chromosomes, conceptual representations of candidate solutions. The fitness of every chromosome (or more accurately, the fitness of the feature subset it represents) is calculated in every generation. Fit subsets are chosen from the existing population to contribute to the new generation by being recombined or mutated, until the maximum number of generations has been reached - and the best subset may have been found.

ACO searches for an optimal path in a graph, based on the behavior of ants looking for food; they leave pheromone trails in the traveled path that will make it more attractive to other ants, and the strength of the pheromone trails decay over time, making shorter paths - that may lead to a smaller feature subset - more attractive than longer paths.

(27)

20 Chapter 3. Feature Selection

In (13), Extreme Machine Learning is used as the fitness function for all three algorithms. The Ant Colony Algorithm was able to provide an optimal or near-optimal subset with the lowest number of features (23 out of 43 features, a reduction rate of 46.51%), but the experiment didn’t reveal how well it did in improving classification accuracy or any other measure of the system’s performance.

An evaluation of PSO and GA for feature selection and the impact of these optimization techniques in error rates and processing time as well as feature reduction can be found in (11). PSO was tested adopting two acceleration coefficients (1.5 and 2.0) and three threshold values (0.5, 0.6 and 0.7); those indicate, respectively, how fast or thoroughly the particles search the feature space and the minimum likelihood that qualifies a feature to be chosen for the selected subset. PSO outperformed the Genetic Algorithm in all evaluation criteria; the PSO with 1.5 acceleration and a threshold of 0.7 achieved the best classification accuracy; the best processing time was also achieved by PSO 1.5; and PSO 2.0 obtained the best feature reduction rate, at an impressive 77.59%.

PSO, GA, Best First Selection Algorithm and Greedy Algorithm are wrapped with three different classifiers in (9) - Support Vector Machine, Naive Bayesian and K-Nearest Neighbor - in a attempt to find a combination that provides a balance between high accuracy rates and low processing time.

Greedy Selection can be approached as Forward Selection or Backward Elimination; one approach starts with an empty feature set, the other with all features, and they add/remove features until a the subset has a predefined number of features or until accuracy can no longer be improved. Best-First Selection explores a graph for the most promising nodes (as defined by an evaluation function as the graph is explored), and also may search forward starting with an empty feature set or backwards, starting with the entire feature set. All four approaches are experimented with in (9).

High accuracy rates and low processing time have proven thorough the literature review to be both desirable and contradictory requirements. In (9), the biggest feature reduction rate (82.5%) slightly decreases the accuracy rate of the Greedy Forward Selection wrapped with K-Nearest Neighbor, and the biggest accuracy rate (94.64%) is achieved with an insignificant feature reduction rate for the combination of PSO and Naive Bayesian, but a significant feature reduction of more than 62% of the original number of features is able to slightly improve accuracy for the Best First (Forward) Selection/K-Nearest Neighbors configuration and thus achieves the goal of the investigative process.

A GA-Support Vector Machine wrapper approach was proposed on (10), with a considerable decrease in FRR after feature subset selection (as much as 93%) and reduction rates as high as 76%. The fitness function combined three different criteria: classification accuracy, training time required and feature reduction rate, in that order of relevance. It is usual to consider only the accuracy as a performance measure, but including the other

(28)

21

two performance measures in the equation prioritizes solutions with better applicability (lower processing time) over more time-consuming solutions with comparable accuracy.

Statistical-based feature selection techniques and their relationship with the sys-tem’s accuracy rates have been studied in (14). Four feature ranks were generated and features were extracted from digraphs chosen for either (a) being typed faster than average; (b) presenting very little deviation from the average typing time; (c) presenting great deviation from the average typing time or (d) being typed more frequently. The last category contains the digraphs most representative of the users, but among the other ranking criteria, digraphs and n-graphs typed with consistent time by most users (criteria b) showed promising results.

Singh and Sinha (12) achieved feature reduction without compromising classification accuracy by selecting what they denominate Specific Features - features extracted from digraphs that are highly representative of a specific user because the values of their features are vastly different from those obtained from other users. Specific features were shown to be stable through multiple samples provided by the same user and the subset of features extracted from the specific digraphs provided a significant reduction from the original feature set.

Another hand-based biometric that shares similarities with keystroke dynamics is fingerprint dynamics - patterns of behavior during the collection of multiple fingerprint scans. A modified version of Principal Component Analysis (PCA) is employed in (36) to reduce dimensionality of the feature vector. Some of the fingerprint dynamics features are the hold time and latency between finger release events and finger press events, which prompts a comparison with keystroke dynamics, specially touch keystroke dynamics.

Regular PCA involves transforming the data in order to rank the features with the largest variance, following the rationale that those features have better discriminative power; the proposed version of the algorithm maintains the original values of the features, thus eliminating the computational cost of transforming the data without compromising the feature ranking. After feature selection, the K-Nearest Neighbor classifier’s performance was equivalent to its performance prior to feature selection with a reduction rate of 62%, although the Support Vector Machine Classifier saw only a decrease in accuracy after PCA-based feature reduction.

On image-based biometrics, Bashir et al. (37) proposes two methods for feature selection in gait recognition systems: A supervised method performs selection through a wrapper cross-validation algorithm focusing a greedy search on the bottom half of the images, known for containing more dynamic information; and an unsupervised method selects the pixels whose intensity levels show larger variations. Both methods performed better than existing solutions when there are variances to the person’s shape (i.e. when they’re wearing a coat or carrying a bag), with the supervised method providing the best

(29)

overall performance. The unsupervised method is attractive for its superior performance on test subjects without shape alterations (99.4%), ability to perform without accessing labeled samples and consequently a lower computational cost.

Liau and Isa (38) present a face-iris biometric system with a feature selection method based on minimizing the area under the DET curve. The Detection Error Trade-off curve plots the FRR at the vertical axis and the FAR at the horizontal axis, and minimizing the area underneath it is equivalent to reducing either one of the error rates or both. The PSO algorithm uses a direct computation of the area as the fitness function and managed to improve the performance of iris and face recognition individually using only 70% and 60% of the features, respectively.

A kernel machine-based discriminant analysis method is proposed in (39) to deal with the nonlinearity of face images and small sample size. The kernel function maps the input vector into high dimensional feature space; that way, the non-linear and com-plex distributions of face patterns (cause by variation of viewpoint, illumination, facial expression, etc) are simplified so that linear feature selection methods, like PCA and LDA (Linear Discriminant Analysis, a classification method based on finding the best linear combination of features to separate two or more classes in the sample space), can be applied to the data. The proposed method’s error rates were found to be 34% to 48% of the errors rates of other commonly used approaches to face recognition.

In (40), Teodoro, Peres and Lima analyse the impact of three feature selectors on four classifiers. The studied biometric modality was electrocardiogram (ECG) signals. The selectors were PSO, GA and MA (Memetic Algorithm, a variation of the GA that includes local optimization during the evolution stage of the algorithm) and the classifiers were the Euclidean Classifier, K-Nearest Neighbor, Support Vector Machines and Optimum Path Forests. The GA failed to select a small subset of features; the PSO and Support Vector Machine combination achieved high accuracy with a very small number of features (13); but MA was the one that achieved the best classification performance: nearly 94% using the Optimum Path Forest classifier in a database of ECG signals from 290 subjects (33 selected features), and 100% using K-Nearest Neighbor Classifier in a database of 47

subjects (11 selected features).

Komeili et al. (41) proposed a feature selection method for two non-stationary biometric modalities. ECG and TEOAE (transient evoked otoacoustic emission) are two types of biomedical signals that vary across sessions and postures, presenting a challenge to their adoption as biometric modalities. The proposed method, Multissession Feature Selection (MSFS), determines a subset of more consistent features across multiple sessions from a auxiliary database with multiple recorded sessions - thus allowing one-session enrolment for an across-session test scenario. The Across-Session Experiment adopted such a scenario and MSFS obtained 6.9% EER for ECG signals classification, and 3,3% EER

(30)

3.1. Chapter Summary 23

for TEOAE signals classification. Across-Posture ECG recognition testing was done with the subjects being enroled in sitting posture and tested in standing posture or vice versa, and a 3.7% EER. All results were considerably better than obtained by previous state of the art selectors.

When feature selection is performed in multimodal biometric systems, one effect that may be undesirable is one of the modalities being overrepresented in the chosen feature subset, thus making authentication too heavily dependent on a single modality. In (42), Awang et al. proposes feature-level fusion of two biometric modalities during feature selection, including measures to ensure a similar quantity of features is extracted from each of the two modalities (face and signature). The selector is a wrapper GA and the fitness of the chromosome is determined by the sum of the accuracy achieved by an LDA classifier and a balance coefficient that leads the solution towards a feature subset with modalities more evenly represented. The proposed method achieved an accuracy rate of 97.50%.

In (43), PSO is used to reduce the size of the fused feature vector of face and palmprint and optimize the system in regards to user authentication and identification - a task that requires two fitness function. The proposed method is compared with an Adaptive Boosting (AdaBoost) approach to feature selection, that allows for customization of how many features will be in the selected subset, while PSO finds the adequate number of features through an iterative process. Both methods allows maintaining the accuracy rates of the multimodal system but with less features, thus decreasing the computational cost. However, the PSO scheme selected a smaller subset of features for both identification and authentication - as few as 3371 and 3520 features from the original 6400, respectively, obtaining a reduction rate of roughly 45% while AdaBoost selected 4090 features (reducing the fused feature vector by 36%).

3.1 Chapter Summary

This chapter discussed related work in selection of biometric features to improve the performance of user recognition systems. Feature selection methods for different biometric modalities are included in the literature review: keystroke and fingerprint dynamics, gait, face, iris, signature, palmprint and biomedical signals are all used as biometric traits in the related works. The chapter assessed the state of the art in feature selectors for biometric authentication and the relevance of feature subset selection for hand-based, image-based, physiological, behavioral, unimodal, multimodal, well-established and novel biometric systems.

The next chapter describes the methodology of the experiment, detailing the database, classifiers and feature selectors chosen for this analysis; the results are presented

(31)

and analysed in a attempt to study the behavior of unimodal and multimodal keystroke dynamics data under different system configurations and how keystroke-based authentica-tion can be enhanced by well-chosen combinaauthentica-tions of classificaauthentica-tion algorithms and feature selection methods.

(32)

25

4 Methodology and Results

From what we have seen in the previous section, there is no work focusing on analysing how the same features can affect modalities which are essentially the same, but are collected with very different hardware, and how feature selection may impact the conception of a reliable biometric system that combines them. For such, we will be using a very specific database, a set of well known classifiers, and a few methods of feature selection.

This chapter explains the investigative procedure conducted to either prove or disprove the hypothesis that motivated this report. Section 3.1 gives an overview of the database used in the experiment. Section 3.2 gives a brief explanation of the classification algorithms used on the database.

4.1 Database

The experiment was conducted using a Brazilian hand-based behavioral biometrics database (17) that collected both keyboard keystroke dynamics and touch screen keystroke dynamics data from the same set of subjects. The data used on this experiment was collected from 76 individuals in a controlled environment; they were asked to type the same carefully-chosen text - a mix of frequently-used words in Brazilian Portuguese that includes English cognates and, when possible, important digraphs from both languages.

In order to obtain three samples from each user, the database gathered three occurrences of the same digraph in different words or considered two or three digraphs as one due to the proximity of their keys in the keyboard. In the end, a total of 14 digraphs were included in the database: ME, ER, RI, IC, CA, IM, IR, SE, MO, OO, DE, EL, RM and UE. The features extracted from each digraph were the RP-latency and hold time of both keys in the physical keyboard, and the RP-latency and hold time of the digraph’s second key in the virtual keyboard.

Additional features were the total typing time and standard deviation.

The work of Da Silva, Silva and Da Costa-Abreu (17) includes a preliminary analysis of the databases’ predictive ability and has reached promising results, achieving accuracy rates of as much as 100% for the multimodal and keyboard keystroke databases, and 98% for the touch keystroke database.

The database also collected each subject’s online handwritten signature, hand size and demographic information including age, gender, handedness and level of familiarity with a keyboard. That data was not used in this work, but it is noteworthy that the

(33)

26 Chapter 4. Methodology and Results

Paper of Origin Subjects Type of Entry Attributes (Touch + Keyboard) Da Silva et al. (17) 76 Fixed Text 66 (23 + 43)

Table 3 – Multimodal Keystroke Dynamics Database

majority of the subjects were males under the age of 25, an overrepresented demographic in similar databases.

Table 3 summarizes the characteristics of the database. Aside from information about the number of subjects and type of entry, the table also includes the quantity of attributes available in the database.

4.2 Classifiers

In order to compare the database described above, three different classification algorithms were used - the same three algorithms used on (17), that can be found on the Weka toolkit.

The K-nearest neighbours classifier, abbreviated to KNN, is a algorithm that determines the distance between two templates by representing them as vectors with

n attributes (being n the number of features in a template) and placing them in a

n-dimensional plane. KNN predicts a template will belong to the same user as their closest neighbour(s) in the n-dimensional plane; our experiments configured the classifier to consider only a single neighbour for classification purposes.

The Support Vector Machine (or SVM) classifier builds an hyper-plane in an n-dimensional space to separate training data points by class; the hyper-plane is deliberately placed in the point where it will be furthest away from the nearest training data point - or in our case, the nearest user-template - to diminish the risk of mis-classification. Multiclass problems such as the one presented in this paper are solved by Weka’s SVM algorithm using pairwise classification, that is, through dividing the problem into binary classification problems and classifying the user template as belonging to the class that it was most often labeled as in the classification sub-problems.

The third classifier was the Multilayer Perceptron, also known as MLP. The algorithm is made up of multiple layers of processing nodes that map the inputs (user templates) to a set of outputs (users) through a neural network model. The algorithm uses backpropagation to train the neural network; when the input vector is propagated all the way to the output layer, the algorithm compares the user assigned to the template to the correct user, calculates the error value and goes backwards through all the layers, assigning each of the processing nodes in the network a weight based on their contribution to reaching that output.

(34)

4.3. Feature Selectors 27

Neural networks are relevant for the processing of biometric data for being able to, when given a sample of data, test the compatibility of multiple users in parallel (7).

4.3 Feature Selectors

With the intention of analysing the impacts of feature selection on the performance of each classifier when applied to the multimodal database, four different methods of feature selection were experimented with.

The first method was a simple selection of the features that, when removed from the database, cause a decrease in the classification accuracy rates. The method we call from this point onwards as Manual Feature Selection consists of removing each feature from the multimodal database one at a time and putting the remaining features through the three classifiers. The features that, when removed, either had no impact on the accuracy rate of a given classifier or decreased it were discarded from the selected subset of features.

After following the procedure described above, the selected subset was assumed to be the sum of the features that would have been selected had the same method been applied to each of the unimodal databases individually.

This method imitates the behavior of a wrapper by taking into consideration the classifier’s performance to determine the relevance of a feature for the classification process. The difference is that wrappers take into consideration the performance of the classifiers given a subset of features, searching the features for an optimal set that may have any number of features. The Manual Selection Method feeds the classifiers with subsets of the same size - all features but one - in order to analyse not the merit of the subset, but the merit of the missing feature. Despite being faster, this approach does not take into consideration the correlation between features, which might make a subset more or less relevant for classification than the sum of its features’ relevance would suggest.

The second method of feature selection experimented with was the Correlation-based Feature Selector (44), or CFS, that selects a subset of features with high individual predictive ability but low correlation with the other features in the subset, eliminating redundant features that may increase the processing time but not add to the accuracy of the classification.

CFS accepts into the selected subset features that predicts a sample’s user in areas of the sample space where other features can not make a prediction. The subset evaluation function used by the selector to attribute a merit M to a subset S with k features is the following:

MS =

kcuf

Analysis of feature selection on the performance of multimodal keystroke dynamics biometric systems

UNIVERSIDADE FEDERAL DO RIO GRANDE DO NORTE

DEPARTAMENT OF INFORMATICS AND APPLIED MATHEMATICS

BACHELORS IN SOFTWARE ENGINEERING

Analysis of Feature Selection on the

Performance of Multimodal Keystroke

Dynamics Biometric Systems

Brenda Vasiljevic

Brenda Vasiljevic

Analysis of Feature Selection on the Performance of

Multimodal Keystroke Dynamics Biometric Systems

Supervisor: Márjory da Costa Abreu

Natal

August 2017

Abstract

Contents

Introduction

Motivation

Objectives

Structure

1 Keystroke Dynamics

1.1

Keyboard Keystroke Dynamics

1.2

Touch Keystroke Dynamics

1.3

Chapter Summary

2 Typing Patterns Databases

2.1

Data from Physical Keyboards

2.2

Data from Touch Screens

2.3

Chapter Summary

3 Feature Selection

3.1

Chapter Summary

4 Methodology and Results

4.1

Database

4.2

Classifiers

4.3

Feature Selectors