JISTEM J.Inf.Syst. Technol. Manag. vol.14 número3

(1)

DOI: 10.4301/S1807-17752017000300005

Manuscript irst received: 2017/Sep/05. Manuscript accepted: 2017/Dec/09

Address for correspondence: Jim Samuel, Ph.D., William Paterson University, 1600 Valley Rd, Room 3064, Wayne, New Jersey

07470 United States.

E-mail: [email protected]; [email protected]. Phone: 646.493.8777. Fax: 973-720-2038

INFORMATION TOKEN DRIVEN MACHINE LEARNING FOR ELECTRONIC MARKETS: PERFORMANCE EFFECTS IN BEHAVIORAL FINANCIAL BIG DATA ANALYTICS

Jim Samuel

Cotsakos College of Business, William Paterson University, Wayne, New Jersey, United States.

ABSTRACT

Conjunct with the universal acceleration in information growth, inancial services have been immersed in an evolution of information dynamics. It is not just the dramatic increase in volumes of data, but the speed, the complexity and the unpredictability of ‘big-data’ phenomena that have compounded the challenges faced by researchers and practitioners in inancial services. Math, statistics and technology have been leveraged creatively to create analytical solutions. Given the many unique characteristics of inancial bid data (FBD) it is necessary to gain insights into strategies and models that can be used to create FBD speciic solutions. Behavioral inance data, a subset of FBD, is seeing exponential growth and this presents an unprecedented opportunity to study behavioral inance employing big data analytics methodologies. The present study maps machine learning (ML) techniques and behavioral inance categories to explore the potential for using ML techniques to address behavioral aspects in

FBD. The ontological feasibility of such an approach is presented and the primary purpose of this study

is propositioned: ML based behavioral models can efectively estimate performance in FBD. A simple machine learning algorithm is successfully employed to study behavioral performance in an artiicial stock market to validate the propositions.

(2)

INTRODUCTION

Exploring big data about big data, in their editorial for Information Systems Research journal, Agarwal and Dhar (2014) reported their August-2014 Google search indings for the phrases “Big data,” “Analytics,” and “Data science” which generated 822 million, 154 million, and 461 million results, respectively. A similar search in February of 2017 for the present study has yielded 832 million results for the term “Analytics” and 93.4 million results for the phrase “Financial Analytics”! Variance in number of search results search results trends have been used for successfully predicting

election results, stock performance, health care trends and customer sentiment. These metrics, though

vague explorative indicators, are indicators nevertheless of the relatively growing importance and prominence of analytics and very speciically ‘inancial analytics’, which measures over 11% of the search results for ‘analytics’. Applying a similar search for “artiicial intelligence” (AI) and “machine learning” (ML) in February of 2017 has yielded 89.6 million and 31.5 million results respectively. More enlightening is the “search trend” which charts the number of searches for a particular term or phrase. Running a ive year search trend analysis for the terms “artiicial intelligence” and “machine learning” showed limited increase in searches for AI but an almost exponential looking upward growth curve for searches in ML. Business and societal challenges, including ‘inancial markets stability’ and ‘complex challenges’ (Ketter et. al., 2016), need to be tackled by going beyond traditional methods and tools into big data analytics methodologies and resources. Undoubtedly, given the domain impact implications along with rising global interest as indicated above, ‘inancial analytics’, ‘big data’ and ‘machine learning’ are critically relevant phenomena which are in need of signiicant research attention so that we can gain insights into optimal management and value creation.

The paper is organized into four sections after the introduction: the growing relevance of behavioral inancial big data, development of propositions, ML application to identify inancial behavior and closes with the discussion and conclusion section. This is followed by sections on behavioral inance, machine learning and the intersection of behavioral inance and machine learning which is the focus of the present study. Using an ontological framework, the primary proposition is then developed and presented. An explorative study is used to demonstrate the viability of the primary proposition posited in the study, followed by concluding notes.

FINANCIAL BIG DATA

(3)

Of the many dimensions of FBD, some of which have mature big data analytics models, the behavioral inance dimension of FBD has not been addressed with suicient clarity by researchers, nor by practitioners, in context of big data phenomena. Behavioral inance gained attention in the 1990’s (Barberis & Thaler, 2003) and is now treated as an established research domain. Behavioral inance is deined at the intersection of inance and psychology as the science of the “ … application

of psychology to inance, with a focus on individual-level cognitive biases” (Hirshleifer, 2015). Taking an information systems perspective, the Editor-in-Chief of MISQ emphasized the critical role of behavioral analytics across domains stating “Precise capture of individual behavior and surrounding events also allows for spotting population trends and the impact of events..” (Goes, 2014). Behavioral efects in inancial markets are driven by cognitive biases (Barberis & Thaler, 2003), emotions and feelings(O’Creevy, et al., 2011; Cipriani, et al.) and personalized purposes or subjective expectations (Schwartz, et al., 2010; Sahi, et al., 2013). With the advent of internet-of-things (IoT) and its promise of rich voluminous continuous real-time high quality consumer behavior data, FBD is set to explode in quantity, acceleration of data acquisition, types of data and the essential distinctiveness of characteristics of data – aligned with the four V’s of big data – Volume, Velocity, Variety and Veracity (Chen, et al., 2012). It becomes clear that behavioral inance, across its spectrum, will be driven and shaped by big data phenomena: root level investor and trader behavior will be impacted by big data, corporations will be overwhelmed with the expanded FBD that becomes available, governments will seek to leverage behavioral FBD for regulatory purposes and researchers will have tremendous opportunities to provide thought leadership using innovative FBD analytics based strategies. This perspective is supported by Hirshleifer (2015) who emphasizes the need for a shift in focus for future studies in behavioral inance, a latent acknowledgement of the inluence of big data in society, by calling for behavioral inance to go expand itself into “ … social inance, which studies the structure of social interactions, how inancial ideas spread and evolve, and how social processes afect inancial outcomes”. Thus research, big data phenomena and industry trends

for inancial services organizations indicate that behavioral FBD (BFBD) analytics will be a critical source of competitive advantage.

Statistical learning is not a newly developed research or application domain - it started in the 1960’s, remained mostly theoretical till the 1990’s and then gained momentum with the development of new algorithms which made practical application viable (Vapnik, 1999). However, the full potential of statistical learning applications only began to be realized with the advent of three forces: cross-domain development of algorithmic models, signiicant scaling up of processing capabilities, and big data phenomena. The unique conluence of these three powerful forces over the past decade, at a ‘kairos’ moment in human-technological progress, have led to an extraordinary acceleration in the science and application of statistical learning which has morphed with vast reach under the broad title of “Machine Learning” (ML). Machine learning is itself considered a subset on artiicial intelligence (AI), but at the point of the writing of this study, ML is the science of relatively popular interest for big data analytics and organizational application. Specialized sub-domains such as “deep learning” have also been gaining traction. Machine learning has been deined in various ways, relecting various application perspectives – Murphy (2012) deines ML by its purpose stating “The goal of machine

learning is to develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest”. Alpaydin (2014) deines ML as “ … programming computers to optimize a performance criterion using example data or past

(4)

to improve based on sequence outcomes. Other categories such as active learning, hybrid learning, deep learning and semi-supervised learning are also signiicant. While domain speciic narratives and deinitions abound, ML can be summarized as being the science of using mathematical and statistical methods, technology tools, domain knowledge and information management techniques to identify

patterns, positions, relationships and designs in information artifacts ranging along the continuum

from simple tabular data to complex unstructured data.

PROPOSITION DEVELOPMENT

The present study posits that BFBD analytics can leverage ML techniques to create insights on user behaviors and thus create meaningful avenues for high quality customer intelligence and trading performance. It must be noted that the traditional monolithic deinition of “information”, which embodied the assumption of uniform objective interpretation and rational utility maximization led many behavioral inance researchers to discount information as being of little use in understanding behavioral phenomena (Barberis and Thaler, 2003). However subsequent research has shown that information inluences inancial decisions and behavior (Garcia, 2013), albeit the notion of “information” must expanded (Grover, 2006) beyond uniform objectivity and rationality for all inancial decisions. Psychologists have distinguished between information, knowledge, belief and behavior, but taken conlicting positions of role of knowledge and information in human decision making (Ajzen, et al., 2011). Recent advances in information theory and domain speciic perspectives of information present information as an evolutionary multi-dimensional construct with signiicant behavioral implications (Floridi, 2011). The complexity of the “information” construct has been discussed in information systems research and an integrative taxonomy, explorative in nature, has been proposed (McKinney, 2010). While epistemological development has progressed, we do not have any objective ontological mapping of “information” as a construct and this makes “information” a very fuzzy construct to deal with theoretically. The present study does not attempt to deine or provide any ontological analysis for “information” but it simply purposes to highlight the associated complexity and move on to deine two terms introduced in the present study “Information Virtue” and “Information Token” which are relatively tangible constructs with signiicant relevance for FBD analytics. “Information Token” is used akin to the “token” concept introduced by McKinney (McKinney, et al., 2010), who deine information from a “token” perspective as being “… an

undiferentiated commodity of data bits that are processed, not a particular relation among the bits”.

(5)

that such a variance in α would lead to changes in Informational Performance (ρ) and it would contain no implication for behavioral inance. However if Information Virtue (α) is ixed and Information Token (τ) is varied, then any corresponding change in as Informational Performance (ρ) would have signiicant implication for research behavioral inance and for inancial service practitioners. This conceptual position is tabulated below and then discussed.

Table 1. Information Token-Performance Table.

Virtue Token Performance

αi τ 1 ρ 1

αi τ 2 ρ 2

αi τ 3 ρ 3

αi… τ … ρ …

αi τ n ρ n

Traditional perspectives posited that for any speciic Information Virtue (αi), based on uniform rational expectations, there will be no variance in Informational Performance (ρn) irrespective of variance of Information Token (τn). This implies that, for a given (αi), change in Information Token “Δ (τn)” does not afect (ρn). In contrast, the present study posits that for a given (αi), (ρn) will vary with Δ (τn), assuming Δ (τn) represents suicient variance in expression of the same (αi). Thus (ρn) is expected to be a function of (τn) for a speciic (αi), given by the equation:

f(g(τn)) ≡ Ε (ρn | αi )

Where, E is the expectation of ρn for a speciic αi, given by the function of g (τn), where g (τn) represents the suicient variance function of τn. This prepares the context for the addressing the classiication of behavior given by the functions of (τn) based on Informational Performance (ρn). It has been amply demonstrated by research in psychology (Ajzen, et al., 2011), information systems (Delone and McLean, 2003), management (Bentzen, et al., 2011) and inance (De Bondt, et al.,

2013) that information afects behavior, positioning Information Token (τn) as a direct antecedent to behavior, which is often a directly unobservable latent construct. The decision variable is more tangible is a direct outcome of behavior and thus Informational Performance (ρ) serves as a proxy for understanding Information Token (τn) driven behavior. Thus the above theoretical discussion leads

us to the primary proposition of our paper:

Primary Proposition (Pa): “For a speciic Information Virtue (αi), suicient variance in Information Token (τn) will lead to diferentiated behavior, given by levels of Informational Performance (ρn)”

(6)

techniques can be used to categorize individual inancial performance relecting behavioral efects in response to suicient variance to Information Tokens (τn). This leads to the secondary proposition of

this study:

Secondary Proposition (Pb): “For a speciic Information Virtue (αi), behavior driven by suicient variance in Information Tokens (τn) can be classiied by machine learning classiication algorithms applied to Informational Performance (ρn).

The secondary proposition (Pb) is explored in the present study by using performance data from an artiicial stock market. However, the concluding proposition is developed and stated on

an a priori basis, using inductive logic from the above discussion, combined with the logic of non-parametric analytical possibilities that are now available through machine learning and other advanced mathematical techniques, which provide useful results even with unknown probabilistic distributions of data. Machine learning techniques have been leveraged to detect patterns (Fischer and Igel, 2014) with latent variables across multiple levels using Deep Learning methods such as Deep Belief Networks (DBN) using restricted Boltzmann machines (RBMs). Machine learning and data mining techniques using Bayesian Belief Networks (BNN), and fuzzy logic and genetic algorithms have been used extensively even in the absence of complete information or suicient prior probabilities to solve a variety of inancial services challenges, especially fraud detection (Sharma and Panigrahi, 2012) related patterns and behavior. Neural networks, decision trees, simulation and optimization techniques have been used extensively to develop more eicient solutions to inancial markets challenges. The third and concluding proposition is built on the knowledge of these capabilities, without which it would be suspect as conjecture, that ML techniques can be used to model behavior afecting inancial performance just as it has already been used to classify behavior associated with inancial fraud. The obvious implication, developing upon Pa and Pb, is that we use advanced algorithmic, intelligent network and fuzzy logic methods to to classify, interpret and predict expected inancial performance using BFBD. This brings us to the inal proposition of this study:

Concluding Proposition (Pf): “Machine learning behavioral models, based on the function of suicient variance in Information Tokens (f(g(τ_n)), can efectively estimate expected performance

(Ε ( ρn | αi )) using behavioral inancial big data”.

The mathematical concept underlying the inal proposition can be expressed in the context of a collection of matrices {[F...]}, not necessarily a sum, of application of relevant machine learning techniques (F) to a suiciently varied Information Token (, which is given by the equation:

{[F*(f(g(τ_n)))]} ≡ Ε (ρn | αi)

In a simple scenario such as a controlled artiicial electronic market, where an instance of application of a machine learning technique (Fn) to suiciently varied Information Token (τn), suices the necessary analytics condition, then equation E2 can be simpliied thus:

Fn*(f(g(τ_n) Ε (ρn | αi)

Based on the above, the present study uses data from an artiicial stock market to study how (2)

(7)

individual level Informational Performance (ρ) data can be trained to classify behavioral responses given by the function of Information Token (τ). This leads to the next section where a parsimonious application of a simple ML method “k-Nearest Neighbors (KNN) algorithm” classiies behavior based on distinct information artifacts representing Information Tokens (τn). The mathematical concept used for framing the analysis can be expressed as an application of equation E1, where a speciic function (KNN) is used on individual trading Informational Performance (ρ(trading)) to classify a set of seven Information Token {(τn(1:7))} conditions, given by the equation:

f_knn (ρn(trading) | αi) ≡ Ε (g(τn))

Information Virtue (αi) is held constant (source perspective) based on ensuring that the efective meaning is unchanged (such as “stock price for company ‘X’ will increase today”). Information Tokens (τn) are varied using seven representational artifacts.

MACHINE LEARNING APPLICATION TO CLASSIFY FINANCIAL BEHAVIOR

A simple ML method k-Nearest Neighbors (KNN) algorithm is used to classify artiicial stock market trading Informational Performance. Data is drawn from a series of equity trading conducted with student subjects from a large Northeast US University. Subjects were undergraduate students with limited prior exposure to electronic stock trading. Equity trading was simulated using the TraderEx (2016) platform which has been widely for simulations, teaching and training electronic market dynamics by graduate and undergraduate faculty and equity market professionals. Economically motivated subjects were introduced to the Information Token conditions and tasked to trade with an intent to maximize end-of-day trading proits. Though the present nonparametric KNN algorithm based ML analytics do not mandate the rigor of a parametric approach, for the sake of internal validity, the suicient distinctiveness of six Information Tokens were veriied using manipulation checks using scales adapted from Dennis and Kinney (1998). This satisies the necessary condition to test the data for behavioral efects. The Information Tokens were mutually distinguished using high and low levels of cognitive information based on deterministic, probabilistic and quantity of information. Subjects were subject to only one Information Token condition to avoid learning efects. The last Information

Token (τ7) was a no information guidance condition (α0), used as a base to observe information

efects with greater clarity.

Table 2. Data Organization For ML Analytics.

Token Performance ρ Count Train* Test*

T1 ρ 1 30 23 7

T2 ρ 2 35 26 9

T3 ρ 3 31 24 7

T4 ρ 4 30 20 10

T5 ρ 5 30 21 9

T6 ρ 6 34 26 8

T7 ρ 7 33 19 14

Total 223 159 64

* Train and Test items, and therefore count, were set to be randomly selected by R, based on an approximately 7:3 ratio overall for test:train, and is tabulated here as output

(8)

For the given Information Virtue (αi), with the exception of the deliberate (α0) condition, which indicated a stock price increase between 2% and 5% in all the Information Token conditions:

g(τ1) ≠ g(τ2) ≠ g(τ3) ≠ g(τ4) ≠ g(τ5) ≠ g(τ6) ≠ g(τ7)

The analysis that follows tests if, given a speciic Information Virtue (αi), behavior driven by suicient variance in Information Tokens (τn) can be classiied by the KNN algorithm applied to Informational Performance (ρn)?. A total of 223 Informational Performance (ρ) measures were recorded from these Information Tokens, given by net proits for the trading session by each subject. The data are divided into training dataset and testing dataset with a training to testing ratio of 0.7:0.3. R software is used to run the KNN algorithm using appropriate R packages and the raw output is available in the appendix. The inancial trading behavioral efect, Informational Performance (ρn) is summarized by the equation:

(ρ1) ≠ (ρ2) ≠ (ρ3) ≠ (ρ4) ≠ (ρ5) ≠ (ρ6) ≠ (ρ7)

The KNN algorithm is expected to ‘train’ the model using the training dataset and apply it to the test dataset by classifying Informational Performance (ρn) measures to corresponding Information Token (τn) conditions. This set of classiications is given by the equation

f_knn ( ρ(1:7) | αi )  { C1(g(τ1)), C2(g(τ2)), C3(g(τ3)),

C4(g(τ4)), C5(g(τ5)), C6(g(τ6)), C7(g(τ7)) }

Where each Cn represents a unique classiication of behavior given by g(τn). The usefulness

and the accuracy of the KNN algorithm increases with the increase in the size of the training dataset and is therefore very suitable for BFDB analytics, subject to it being mathematically adapted for speedy large scale implementation. Using 159 training items, applying the KNN algorithm on 64 test items demonstrated that the algorithm was able to correctly classify the behavior by associating the Information Tokens to corresponding Informational Performance in 57 of the 64 cases, leading to a success rate of 89%.

Table 3. KNN Algorithm Classiication Output Comparison.

KNN Algorithm Classiication Output Comparison – Actual : Classiication * Classiication

Actual T1 cl T2 cl T3 cl T4 cl T5 cl T6 cl T7 cl Row Total

T1 7 0 0 0 0 0 0 7

T2 0 9 0 0 0 0 0 9

T3 0 0 6 0 0 0 1 7

T4 0 0 0 9 0 1 0 10

T5 0 0 0 0 9 0 0 9

T6 0 1 0 0 0 7 0 8

T7 4 0 0 0 0 0 10 14

Column Total 11 10 6 9 9 8 11 64

* Raw R output is provided in the Appendix

(5)

(6)

(9)

Also, four of the seven erroneous items were are artifact of Information Token 7 which was the no-information token condition (T7) and hence looking at T1:T6, we see a stronger success rate of 94%, with 47 successful and 3 missed classiications.

Table 4. ML Analytics Summary.

Summary of KNN Classiication Success

Success Missed Success%

T1:T7 57 7 89%

T1:T6 47 3 94%

DISCUSSION AND CONCLUSION

The present study has attempted to highlight three key propositions at the intersection of FBD,

behavioral inance, application of analytics and the use of machine learning to classify behavior. In doing so, it has combined perspectives from information systems, behavioral inance, electronic markets and big data analytics. A new and unique perspective for identifying behavioral classiications based on Information Virtue, Information Tokens and Informational Performance has been developed and validated using a ML KNN algorithm methodology. The propositions of the present study are mainly relevant to electronic inancial markets, in studying behavior in the context of information stimuli. However, the propositions could be generalized and the underlying principles could be used

for understanding big data analytics in other domains.

The study has certain limitations. The primary concern, as it is with most experimental studies and studies using simulated data, is that of external validity – the data obtained from the artiicial stock market used student subjects with limited prior understanding of electronic equity markets and it can be claimed that such subjects are more vulnerable to Information Token manipulations than professional traders and investors. However, given the beneit of a controlled artiicial electronic market setting, the efects of the Information Artifacts could be clearly delineated. Another limitation

is the use of a single algorithm in a single instance – it has not been the purpose of the present study

to provide an exhaustive and elaborate analysis of ML techniques, but to highlight the possibilities associated with using ML methodologies (Meyer, et al., 2014) in addressing inancial behavior.

The machine learning approach and the Information Virtue – Information Token – Informational

(10)

management models will be more efective in tackling such challenges and the indings of the present study are expected to serve as a contribution in this direction.

In conclusion, the present study presents a novel approach to behavioral analytics using information stimuli (Virtue) , information artifacts (Tokens) driven behavior and corresponding performance measures. Three unique propositions were developed using inductive logic demonstrating how ML behavioral models, based on the function of suicient variance in Information Tokens can efectively estimate expected performance – simply stated: ML behavioral models can be trained to efectively estimate performance in FBD. In a parsimonious validation, an application of the KNN

algorithm demonstrated a fair measure of success in classifying information tokens. To the best of

our knowledge, there is no other study that has addressed the issues pertaining to behavioral analytics using the information framework presented in this paper. Additional research will be required to further explore the propositions presented in this study, using empirical data, and particularly in expanding the depth and scope and algorithmic application in the context of the Information Virtue – Information Token – Informational Performance framework.

REFERENCES

Ajzen, Icek, and Martin Fishbein. Understanding attitudes and predicting social behaviour. (1980).

Ajzen, Icek, et al. Knowledge and the prediction of behavior: The role of information accuracy in the theory

of planned behavior. Basic and Applied Social Psychology 33.2 (2011): 101-117.

Alpaydin, Ethem. Introduction to machine learning. MIT press, 2014.

Barberis, Nicholas, and Richard Thaler. A survey of behavioral inance. Handbook of the Economics of Finance 1 (2003): 1053-1128.

Bentzen, Eric, John K. Christiansen, and Claus J. Varnes. What attracts decision makers’ attention? Managerial allocation of time at product development portfolio meetings. Management Decision 49.3

(2011): 330-349.

Chen, Hsinchun, Roger HL Chiang, and Veda C. Storey. Business intelligence and analytics: From big data

to big impact. MIS quarterly 36.4 (2012): 1165-1188.

Cipriani, Marco, and Antonio Guarino. Herd behavior in a laboratory inancial market. The American Economic Review 95.5 (2005): 1427-1443.

De Bondt, Werner, Rosa M. Mayoral, and Eleuterio Vallelado. Behavioral decision-making in inance: An overview and assessment of selected research. Spanish Journal of Finance and Accounting/Revista Española de Financiación y Contabilidad 42.157 (2013): 99-118.

Delone, William H., and Ephraim R. McLean. The DeLone and McLean model of information systems

success: a ten-year update. Journal of management information systems 19.4 (2003): 9-30.

Dennis, Alan R., and Susan T. Kinney. Testing media richness theory in the new media: The efects of cues, feedback, and task equivocality. Information systems research 9.3 (1998): 256-274.

Fenton‐O’Creevy, Mark, et al. Thinking, feeling and deciding: The inluence of emotions on the decision

making and performance of traders. Journal of Organizational Behavior 32.8 (2011): 1044-1061.

Fischer, Asja, and Christian Igel. Training restricted Boltzmann machines: An introduction. Pattern

Recognition 47.1 (2014): 25-39.

(11)

Garcia, Maria Jose Roa. Financial education and behavioral inance: new insights into the role of information in inancial decisions. Journal of Economic Surveys 27.2 (2013): 297-315.

Gopal, Ram D., Ram Ramesh, and Andrew B. Whinston. Microproducts in a digital economy: Trading small,

gaining large. International Journal of Electronic Commerce 8.2 (2003): 9-30.

Greiner, Martina E., and Hui Wang. Building consumer-to-consumer trust in e-inance marketplaces: An

empirical analysis. International Journal of Electronic Commerce 15.2 (2010): 105-136.

Grover, V., Lim, J., & Ayyagari, R. (2006). The dark side of information and market eiciency in e‐markets.

Decision Sciences, 37(3), 297-324.

Hirshleifer, David. Behavioral inance. Annual Review of Financial Economics 7 (2015): 133-159. Meyer, George, et al. A machine learning approach to improving dynamic decision making. Information

Systems Research 25.2 (2014): 239-263.

Murphy, Kevin P. Machine learning: a probabilistic perspective. MIT press, 2012.

Nassirtoussi, Arman Khadjeh, et al. Text mining for market prediction: A systematic review. Expert Systems

with Applications 41.16 (2014): 7653-7670.

Olbrich, Rainer, and Christian Holsing. Modeling consumer purchasing behavior in social shopping communities with clickstream data. International Journal of Electronic Commerce 16.2 (2011): 15-40.

Sahi, Shalini Kalra, Ashok Pratap Arora, and Nand Dhameja. An exploratory inquiry into the psychological biases in inancial investment behavior. Journal of behavioral inance 14.2 (2013): 94-103.

Schwartz, Robert, Avner Wolf, and Jacob Paroush. The dynamic process of price discovery in an equity

market. Managerial Finance 36.7 (2010): 554-565.

Sharma, Anuj, and Prabin Kumar Panigrahi. A Review of Financial Accounting Fraud Detection based on Data Mining Techniques. International Journal of Computer Applications 39.1 (2012): 37-47.

Vapnik, Vladimir N. An overview of statistical learning theory. IEEE transactions on neural networks 10.5

(1999): 988-999.

Wang, Xiao-Wei, Dan Nie, and Bao-Liang Lu. Emotional state classiication from EEG data using machine

learning approach. Neurocomputing 129 (2014): 94-106.

(12)

R raw output of KNN algorithm, compared using

library(gmodels)

CrossTable(x = gg1.testLabels, y = gg1p, prop.chisq=FALSE)

Cell Contents

|---| | N | | N / Row Total | | N / Col Total | | N / Table Total | |---|

Total Observations in Table: 64

| gg1p

gg1.testLabels | T1 | T2 | T3 | T4 | T5 | T6 | T7 | Row Total |

---|---|---|---|---|---|---|---|---|

T1 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 7 |

| 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.109 |

| 0.636 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |

| 0.109 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |

---|---|---|---|---|---|---|---|---|

T2 | 0 | 9 | 0 | 0 | 0 | 0 | 0 | 9 |

| 0.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.141 |

| 0.000 | 0.900 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |

| 0.000 | 0.141 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |

---|---|---|---|---|---|---|---|---|

T3 | 0 | 0 | 6 | 0 | 0 | 0 | 1 | 7 |

| 0.000 | 0.000 | 0.857 | 0.000 | 0.000 | 0.000 | 0.143 | 0.109 |

| 0.000 | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.091 | |

| 0.000 | 0.000 | 0.094 | 0.000 | 0.000 | 0.000 | 0.016 | |

(13)

-|

T4 | 0 | 0 | 0 | 9 | 0 | 1 | 0 | 10 |

| 0.000 | 0.000 | 0.000 | 0.900 | 0.000 | 0.100 | 0.000 | 0.156 |

| 0.000 | 0.000 | 0.000 | 1.000 | 0.000 | 0.125 | 0.000 | |

| 0.000 | 0.000 | 0.000 | 0.141 | 0.000 | 0.016 | 0.000 | |

---|---|---|---|---|---|---|---|---|

T5 | 0 | 0 | 0 | 0 | 9 | 0 | 0 | 9 |

| 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 | 0.000 | 0.141 |

| 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 | 0.000 | |

| 0.000 | 0.000 | 0.000 | 0.000 | 0.141 | 0.000 | 0.000 | |

---|---|---|---|---|---|---|---|---|

T6 | 0 | 1 | 0 | 0 | 0 | 7 | 0 | 8 |

| 0.000 | 0.125 | 0.000 | 0.000 | 0.000 | 0.875 | 0.000 | 0.125 |

| 0.000 | 0.100 | 0.000 | 0.000 | 0.000 | 0.875 | 0.000 | |

| 0.000 | 0.016 | 0.000 | 0.000 | 0.000 | 0.109 | 0.000 | |

---|---|---|---|---|---|---|---|---|

T7 | 4 | 0 | 0 | 0 | 0 | 0 | 10 | 14 |

| 0.286 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.714 | 0.219 |

| 0.364 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.909 | |

| 0.062 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.156 | |

---|---|---|---|---|---|---|---|---|

Column Total | 11 | 10 | 6 | 9 | 9 | 8 | 11 | 64 |

| 0.172 | 0.156 | 0.094 | 0.141 | 0.141 | 0.125 | 0.172 | |