This paper aims to explain the web-enabled tools for educational data mining. The proposed web-based tool developed using Asp.Net framework and php can be helpful for universities or institutions providing the students with elective courses as well improving academic activities based on feedback collected from students. In Asp.Net tool, associationrulemining using Apriori algorithm is used whereas in php based Feedback Analytical Tool, feedback related to faculty and institutional infrastructure is collected from students and based on that Feedback it shows performance of faculty and institution. Using that data, it helps management to improve in-house training skills and gains knowledge about educational trends which is to be followed by faculty to improve the effectiveness of the course and teaching skills.
Since the introduction of a famous association technique known as Apriori algorithm [1, 2], there have long been immense attempts to integrate this technique to improve database design, consistency checking, and querying. Han et al. [10] improved the DBMiner system to work with relational databases and data warehouses. DBMiner can do many data mining tasks such as classification, prediction and association. Sreenath, Bodagala, Alsabti, and Ranka [16] adopted Apriori algorithm to work with relational database system. They created Fast UPdate algorithm to search association data when the system has new transaction. Tsechansky, Pliskin, Rabinowitz and Porath [17] applied Apriori to find association data from many relations in the database. Berzal, Cubero, Marín and Serrano [4] used Tree-Based AssociationRulemining (TBAR) to find association data in relational database. They kept large item set in tree structure format to reduce time cost in association process. Hipp, Güntzer and Grimmer [12] implemented Apriori algorithm with C++ programming language to work on DB2 database system. They used the program to find association data in Daimler-Chrysler Company database.
The traditional associationrulemining algorithms to recognize frequent events in form of itemsets were widely-used example of associationrulemining is Market Basket Analysis (Agrawal et al., 1993) were among the first to address the problem of pattern Classification by using breast cancer dataset[14] from the database. The work on association rules was extended from patterns [1,2,11] ,the authors explored data cube-based [2] rulemining algorithms on multidimensional databases, where each tuple/transaction consisted of multi-dimensional data features.In the area of multi-dimensional data sets [11], authors discussed a multidimensional data model, in which the multidimensional data was viewed as a value in the multidimensional space. Based on this model, efficient data mining have been performed using data cubes[2] based on aggregates of dimensions were computed in [9,10]. Rulemining is another well studied data mining problem and over the years many techniques have been designed to construct decision trees for mining the patterns in the data [8].However, it is necessary to perform classification in addition to associationrulemining for effective decision making. Therefore, this paper focuses on the integration of ARM with Fuzzy rulemining for better decision.
subset in a transaction , then = 1, otherwise = 0. An itemset with 1 item is called 1- itemset, an itemset with k-items is called k-itemset. An itemset is called frequent if its support value is more than or equal to the user defined threshold value is denoted by min_supp( minimum support) i.e. . We denote the frequent itemset by FI. If an itemset X is frequent and no superset of X is frequent then we can conclude that X is a maximal frequent itemset and we denote the set of all maximal frequent itemsets by MFI. If X and Y are sets of items, the confidence value of itemsetX and Y is how many times itemset X appears as well as Y with regard to the total number of transactions in the database containing X. An associationrule is an expression of the form X→Y, where , and . A rule is valid or strong if the support and confidence value of that rule are greater than the predefined threshold value set by the user. Most associationrulemining approaches consider support-confidence framework to disclose interesting rules.
Among the set of items or objects in transactional databases, relational databases. This can be used by the retailers, entrepreneur in order to make any advertisement, improvement in their business. The market based analysis find customer’s purchasing habits. This analysis is done onto the customer basket to identify the frequent combination of products. Market Basket Analysis is a technique that assists in understanding what items are likely to be purchased together according to the association rules, primarily with the aim of identifying cross-selling opportunities. A super market can use this technique to organize and place products frequently sold together into the same area. The direct marketers can use the MBA to find what new products to offer their customers. The application of market basket analysis is generally facilitated by the use of the data mining tools. Using this analysis product in demand can be identified by marketers and "combined take rates" of the products can be known. The combined take rates are defined as - how often the items are bought together. In a data base, this can be answered with a query. When there are 100 products, it will take thousands of queries to get the "most popular basket". Associationrule proposed the support- confidence measurement framework and reduced associationrulemining to the discovery of frequent item sets.
Figure 1 illustrates the framework of the study analysis. We utilized associationrulemining (ARM) to determine the associ- ations among the lifestyle risk behaviors. To avoid redundant rules, we established a support threshold of 2%, and as there were fewer rules for women than men, we used confidence thresholds of 50% for women and 60% for men. Among women, there were no rules at the 60% confidence levels; among men, there were 19 similar rules at the 50% confidence level. Thus, we set the confidence threshold differently. A support threshold of 2% meant that we accepted the rule only if there was an observed frequency of 2% or greater for the possible combination of behaviors. Our sample size was 5,908 men and 8,925 women, and we believed that 2% of support could improve statistical inference. Further, a confidence threshold of 50% meant that the conditional proba- bility of the co-occurrence of two variables was 50% or greater. We set the threshold of lift as ‘‘over 1.’’ A lift threshold over 1 meant that we accepted the positive associationrule.
Associationrulemining is a data mining technique used to extract useful trends from huge amount of data base. Data can be retrieved from many sources like flat files, Relational data bases and other information sources. As the different data sources are used to extract information the problem of how to integrate/combine these heterogeneous data sources. And also within a single business organization data is separated over different geographic locations with varying formats. Thus we have huge amount of data for data miningassociation rules. Thus the main aim of this research is to perform associationrule from multiple heterogeneous data sources using multi agent system and then unifying the result to knowledge base Data mining is the process of extracting useful information from data which is unstructured. Data Mining is also called Knowledge data discovery or data dredging. The different stages of Data mining include preprocessing, data reduction used to eliminate the variables which is of no Interest. Then the next step is to select the data mining task and execute. The results obtained should be presented to the user in understandable format Associationrule is used to find the relationship between two values of the form aÆb. That is how likely a customer purchases b when he purchases a. Associationrulemining is used in market Basket analysis. Two measures of interest are support and confidence. Only those rules are taken into consideration which satisfies minimum support and confidence. Support is set by the user. Support should be selected in such a way that it is not too low (many rules are generated) and not Too high (No or very few rules are selected)
are not so many works has been done in the field of Software fault prediction using associationmining [19- 21].This work assesses the traditional way of frequent pattern mining using Apriori algorithm and introduces the concept of F-measure by using the notion of correlation i.e., associationrule is generated by considering three factors, support, confidence and correlation: A=>B [support, confidence, correlation]. Correlation is calculated by using the “Lift” measure. F-measure is the linear summation of the support, confidence and correlation of each rule with the unknown coefficient α, , and . The values of unknown coefficient are generated by using the Least Square Regression. According to F-measure values, best association rules will be generated. Higher the F-measure value, better the associationrule will be. This paper uses this approach for the problem of finding the predictor of software fault i.e. OO-metrics. The fault prediction based on the idea of discovering best association rules within a dataset. Best association rules helps to find the OO-metrics on which other OO- metrics are dependent. So, those OO-metrics which are found in most of the rules and also found in the antecedent are taken as the predictor. And use it for predicting the software fault for Eclipse version. The results obtained by evaluating the classification model by applying this associationrulemining for defect prediction is promising and indicate the potential of our proposal.
Let D be the source database, R be a set of significant association rules that can be mined from D, and let Rh be a set of rules in R. Privacy preserving associationrule algorithms transform database D into a database D', the released database, so that all rules in R can still be mined from D', except for the rules in Rh. Based on the privacy protection technologies used, privacy preserving associationrulemining algorithms can be commonly divided into three categories. a) Heuristic-Based Techniques :Heuristic-based techniques resolves how to select the appropriate data sets for data modification. Since the optimal selective data modification or sanitization is an NP-Hard problem, heuristics is used to address the complexity issues. The methods of Heuristic- based modification include perturbation, which is accomplished by the alteration of an attribute value by a new value (i.e., changing a 1-value to a 0- value, or adding noise), and blocking, which is the replacement of an existing attribute value with a “?”. Some of the approaches used are as follows.
algorithms, it has been shown how frequent patterns dataset are generated with respect to minimum support and minimum confidence count. It has been observed, when the frequency requirements become less the set of association rules grow rapidly and when the frequent item- sets increase, more numbers of association rules are presented to the user, among them many rules found to be redundant. This problem occurs both in transactional as well as spatial data databases and elimination of redundancy of rules are required to be done in a privileged manner. But incase of dense datasets, mining all possible frequent item-sets become less feasible. As in those databases, an exponential number of frequent item-sets are produced; they are let alone to generate rules. A lot of strategies have been proposed for tackling efficiency factor, but always they are found to be successful. Depending upon the associationrulemining algorithms the effectiveness of performance is measured. The illustration of market basket data through Apriori and FP- Growth algorithm clearly shows the basics of associationrulemining that can be fruitful for everybody to understand about associationrulemining. The experimental result depicted how associationrulemining can be properly utilized for large database applications as well as both the Apriori and FP-Growth algorithms comparison performance. Our experimental result in the paper justifies, the vital importance of minimum support values for effective performance of different associationrulemining algorithms and their acceptability in different problem conditions.
A different approach to frequent itemset mining known as Frequent-Pattern Growth (FP-Growth) [5], does not require to generate candidate generation. This approach uses a memory resident data structure, FP-Tree, to limit the number of database scans to two. In the first database scan, set of global frequent 1- itemset is identified along with their support count based on given minimum support threshold. In the second database scan, remove infrequent items in each transaction to form the corresponding sub- transaction with frequent items are ordered based on their support count i.e. in descending order of their support count and if two items having the same support count then those items are ordered in alphabetic order. These sub-transactions form the paths of the FP-Tree. Each path in the FP-Tree represents an itemset/pattern along with its frequency of occurrence. Sub-transactions that share the same prefix share the same portion of the path starting from the root. Each node of FP-Tree has two field’s item name and its support count. All nodes of FP-Tree are connected by bi-directional parent-child links, to traverse FP- Tree in both bottom-up and top-down fashion. The FP-Tree has a header table, which holds global support of an item and a header link to the first occurrence of the item in the FP-Tree and connects nodes of the same item to facilitate the item traversal during the mining process.
This research aims to reduce the number of association rules that are redundant and retain the remaining rules that are important for predicting the future events. Kannan and Bhaskaran [4] proposed algorithm for reducing redundant rules by clustering association rules into many groups then cut redundant rules by interestingness measures. Mutter et al. [5] used CBA (Confidence-Based AssociationRuleMining) algorithm to reduce the number of association rules. They ranked rules by confidence values then output rules for top hundred association patterns. Our work presented in this paper is different from others in that we used associative classification technique to rank and reduce association rules.
Automated breast cancer detection has been studied for more than two decades Mammography is one of the best methods in breast cancer detection, but in some cases radiologists face difficulty in directing the tumors. We have described a comprehensive of methods in a uniform terminology, to define general properties and requirements of local techniques, to enable the readers to select the efficient method that is optimal for the specific application in detection of micro calcifications in mammogram images. In this paper, a new method for associationrulemining is proposed. The main features of this method are that it only scans the transaction database once, it does not
The generated associations can be employed to perform video classification. For example, we determine the nature of the movie and can classified the movies into different categories such as being romantic, tragedy, comedy, etc., using video associationmining. Associative classification algorithm discovers the association rules with the frequency count (minimum support) and ranking threshold (minimum confidence) while restricted to the concepts (class labels). Classification using AssociationRuleMining takes advantages of its high accuracy and ability to handle large databases. It is another major predictive analysis technique that aims to discover a small set of rule in the database that forms an accurate classifier [6]. Three main research aspects for associative classification have emerged. One is to improve the support and confidence measurements themselves. Another one is to use other evaluation criteria such as lift, coverage, leverage, and conviction. The last one is to use an integrated algorithm to generate association rules. Well-known Associative rule classification algorithms are decision tree classifier, support vector machine classifier, and traditional associationrule classifier.
Association is a powerful data analysis technique that appears frequently in data mining literature [36], [37]. Since its introduction by Agrawal et al., the task of associationrulemining has received a great deal of attention [41]. Today the mining of such rules is still one of the most popular pattern-discovery methods in KDD. AssociationRuleMining (ARM) is the process of discovering collection of data attributes that are statistically associated in the underlying data. Association rules "aim to extract interesting correlations, frequent patterns, associations or causal structures among sets of items in the transaction databases or other repositories". An associationrule generated is of the structure A-->B, where A and B are disjoint conjunctions of attribute-value pairs. Associationrule generation is a two-step process. First, minimum support is applied to find all frequent itemsets in a database. In second step, the frequent itemsets and the minimum confidence constraint are used to form rules. The main advantages of association rules are simplicity, intuitiveness and freedom from model-based assumptions. The important application of associationrulemining is market basket analysis which is a famous tool among retail enterprises, for example they inform the user about items most likely to be purchased by a customer during a visit to the retail store. They are widely used in many other areas such as telecommunication networks, market and risk management, inventory control and more [39].
Abstract - Data Mining is often considered as a process of automatic discovery of new knowledge from large databases. However the role of the human within the discovery process is essential. Domain knowledge consists of information about the data that is made available by the domain experts. Such knowledge constrains the search space and enhances the performance of the mining process. We have developed an algorithm that makes use of domain knowledge for efficient mining of association rules from university course enrollment database. The experimental results show that the developed algorithm results in faster mining of association rules from the elective course university dataset as compared to mining the same patterns with an associationrule-mining algorithm that does not makes use of domain knowledge.
A recurrent pattern is a set of items, subsequences, substructures, etc. which occurs frequently in a data set. It is the most powerful problem in associationmining. Data mining, or the efficient discovery of interesting patterns from large collections of data. Associationrulemining is a significant data mining technique to generate correlation and associationrule. An associationrule is of the form A=> B, where A ᴄ I, B ᴄ I and A ⋂ B =⌽. The rule A =>B holds in the transaction set D, with support supp., where supp. is the percentage of transactions in D that contain A U B (i.e, the union of sets A and B, or say , both A and B). This rule is taken to be the probability, P(AUB). The rule A =>B has confidence c in the transaction set D, where c is the transaction percentage in D containing A that also contain B.
Although ARM technique does not involve model selection, it necessitates a cut-off support threshold to be predefined to separate frequent patterns from the infrequent ones. Two item sets are said to be associated if they co-occurred together frequently, and only the frequent ones are reported. There are major disadvantages to having a predefined threshold. Firstly, some rules are inevitably loss if the support threshold is set inaccurately. In addition, it is usually not possible to remove the support threshold in order to find infrequent items because ARM relies on a downward closure property of support, which necessitates a threshold to search for frequent item sets. That is, if an item set passes a minimum support requirement then all its sub sets also passes this requirement. If this threshold is waived then there will be no pruning opportunity, which results in an exponential search space. As a result, search could not be completed within feasible time. In summary in the traditional associationrulemining, a minimum support threshold is needed, and should be determined accurately in order to produce useful rules for users.
Thus from the above example it can be deduce the functioning of the AssociationRuleMining, and for a matter of fact, this rule varies with diversity in population, region, individual likes and dislikes and several other parameters, therefore in spite of being a simple algorithm to implement the outcomes are always varied based on the Data Set onto which the algorithm is applied taking into account environmental factors.