Top PDF Time-Saving Approach for Optimal Mining of Association Rules

Time-Saving Approach for Optimal Mining of Association Rules

Time-Saving Approach for Optimal Mining of Association Rules

Big data is an important research topic and it has attracted considerable attention. The huge numbers of data sets are unused and redundant in the databases of companies, universities, etc. Discovering the unused and redundant information stored in these data bases is grounded on the efficient KDD (Knowledge Discovery in Database) process. This latter does not only retrieve data or let researchers find new information from data [1] but also has the ability to reveal the patterns and relationships among large amounts of data in a single or several data sets. KDD process makes use of several techniques from statistics and artificial intelligence in a variety of activities. The main activities are as follows [2-11]: Association Rules; Clustering; Classification; Regression and Prediction. We are rather interested in the association rules mining, together with classification and clustering which are two of the major data mining applications where pattern mining is extensively used to transform raw data into pattern- based description that is accepted and processed by classification and clustering algorithms. In this context, patterns which occur in data are simply considered as features that characterize data. Patterns describing the data are also called explanatory variables. Whereas Association Rules Mining is one of the most common algorithm-based data
Mostrar mais

7 Ler mais

Finding Association Rules through Efficient  Knowledge Management Technique

Finding Association Rules through Efficient Knowledge Management Technique

As shown in Figure 1, our algorithm is implemented as the data integration module to efficiently update the association rules. The large 1-item sets found through the TBAR algorithm is saved in the knowledge base .In our algorithm of update we have reused those large 1-item sets from the knowledge base and thus saved the CPU time and one scan of the database. As depicted in [6] we can couple association rule algorithm with the relational database in a number of ways. In our case we opted for the loosely coupled approach, as our data mining application process space is outside the database process space.
Mostrar mais

4 Ler mais

A pragmatic approach on association rule mining and its effective utilization in large databases

A pragmatic approach on association rule mining and its effective utilization in large databases

This paper deals with the effective utilization of association rule mining algorithms in large databases used for especially business organizations where the amount of transactions and items plays a crucial role for decision making. Frequent item-set generation and the creation of strong association rules from the frequent item-set patterns are the two basic steps in association rule mining. We have taken suitable illustration of market basket data for generating different item-set frequent patterns and association rule generation through this frequent pattern by the help of Apriori Algorithm and taken the same illustration for FP-Growth association rule mining and a FP-Growth Tree has been constructed for frequent item-set generation and from that strong association rules have been created. For performance study of Apriori and FP-Tree algorithms, experiments have been performed. The customer purchase behaviour i.e. seen in the food outlet environments is mimicked in these transactions. By using the synthetic data generation process, the observations has been plotted in the graphs by taking minimum support count with respect to execution time. From the graphs it has that as the minimum support values decrease, the execution times algorithms increase exponentially which is happened due to decrease in the minimum support threshold values make the number of item-sets in the output to be exponentially increased. It has been established from the graphs that the performance of FP-Growth is better than Apriori algorithm for all problem sizes with factor 2 for high minimum support values to very low level support magnitude.
Mostrar mais

10 Ler mais

A Hybrid Approach to Privacy Preserving in Association Rules Mining

A Hybrid Approach to Privacy Preserving in Association Rules Mining

Duraiswamy et al. proposed an algorithm called SRH. It reduces complexity, time and memory by calculating the number of transactions required for hiding sensitive rules [11]. Menon et al. proposed an algorithm with exact approach called Integer programming and also two strategies: Blanket and Intelligent. This algorithm has the best level of accuracy [12]. Verykios et al. proposed two algorithms called WSDA and BA. WSDA hides sensitive rules using the distortion techniques, and BA does the same using the blocking technique [13]. Amiri proposed three algorithms called Aggregate, Disaggregate and Hybrid, which hide sensitive rules using the support-based approach [14].
Mostrar mais

4 Ler mais

CONDCLOSE-new algorithm of association rules extraction

CONDCLOSE-new algorithm of association rules extraction

Indeed, according to the table (Table 1), RETAIL contains a high number of objects (88 162) and the average number of attributes associated to an object (10), which explains that it is scattered. The processing time of the first step of C OND C LOSE is reduced because of the use of pseudo contexts during the extraction of the minimal generators. We also notice that the condensed context and the small size of the positive border (GBd+) make a reduced time of the second step with regard to P RINCE .
Mostrar mais

11 Ler mais

investigation of source code mining using novel code mining parameter matrix: recent state of art

investigation of source code mining using novel code mining parameter matrix: recent state of art

The software projects without a proper detection technique for defects in developed modules may lead to software full of problems and not generating desired output mentioned in the required specification by customer. The requirements are generated during the mutual discussion between client and the development companies in the initial phases. The software development companies are always liable to deliver the software with agreed performance. Hence the software companies put lot of efforts in detecting and resolving the defects in the software. However the detecting and fixing process of detects is time consuming process and ignoring the defects may lead to malfunctions ranging from losing a small penny to million dollar loss. Software development companies deal with defects which are known and predictable and sometimes unknown and unpredictable defects. The known defects can be deal with pre-planned development strategies and are generally less time consuming. The known defects will not disturb the cost and time estimations for the project. However the unknown defects are unpredictable, hence the resolution of those defects also cannot be pre-defined. Hence the development industries keep huge efforts to deploy multiple prediction techniques to detect unknown defects for the next modules from the existing defect matrix generated during the development phases. The early documents have demonstrated the use of software defect matrix to demonstrate the time complexity, memory requirements and development cost in terms of time to market. An in depth calculation steps are to be executed to determine the number of defects in the software module or the complete program. However the recent researches have demonstrated the use of software defect matrix to form the guidelines for defect detection. The parallel researches have also demonstrated good classification techniques for defects matrices to focus on one specific objective and ignore the rest of the matrices for the same objective. The rest of the matrices can also
Mostrar mais

10 Ler mais

Criterion for selection the optimal physical and chemical properties of cobalt aluminate powder used in investment casting process

Criterion for selection the optimal physical and chemical properties of cobalt aluminate powder used in investment casting process

The first stage of the research work took over the investigations of physical and chemical properties of cobalt aluminate manufactured by three different companies: Remet, Mason Color and Permedia Lublin. There were determined the grain size distribution of cobalt aluminate powder, the average diameter and morphology of powder particles, phase composition, as well as sodium and cobalt content, pH value of water suspension and the bulk density. In the next step, the ceramic moulds were made with different kind of cobalt aluminate (Mason Color, Remet, Permedia Lublin) and its concentration (0, 5%) in the primary slurry. The samples of stepped shape were poured in the ceramic moulds prepared earlier. The average gr ain size of the γ phase was determined on the stepped samples.
Mostrar mais

6 Ler mais

An Intelligent Association Rule Mining Model for Multidimensional Data Representation and Modeling

An Intelligent Association Rule Mining Model for Multidimensional Data Representation and Modeling

This paper presents a new algorithm called Fuzzy-T ARM (FTA) to classify the breast cancer dataset. In this work, ARM is used for reducing the search space of the Multidimensional breast cancer dataset and Fuzzy logic is used for intelligent classification. The dimension of input feature space is reduced the instances from one third by using ARM. The FTA has applied to the Wisconsin breast cancer dataset to evaluate the overall system performance. This research demonstrated that the ARM can be used for reducing the dimension of feature space and the proposed model can be used to obtain fast automatic diagnostic systems for other cancer diseases.
Mostrar mais

8 Ler mais

Selective Marketing for Retailers to promote Stock using improved Ant Colony Algorithm

Selective Marketing for Retailers to promote Stock using improved Ant Colony Algorithm

In 1993, Agrawal [15] introduced association rule mining as one of most important techniques of data mining for point of scale (POS) systems in supermarkets. The main intention of association rule mining is to extract interesting pattern of data from huge data repositories [3]. A rule is defined as an implication of the form A=>B where A∩ B≠Ǿ. The left-hand side of the rule is called as antecedent. The right-hand side of the rule is called as consequent. For example [2,17] the rule { Onions, Potatoes}=>{beef} found in the sales data of a supermarket would indicate that if a customer buys Onions and potatoes together then the customer is likely to buy beef also. Such information is useful to make decisions about marketing activities. Association rules are also used in many applications including Web usage Mining, Intrusion Detection and Bio-informatics. I = {i1, i2, i3, …, im} is a collection of items. T be a collection of transactions associated with the items. Every transaction has an identifier TID [4]. Association rule A=>B is such that AЄI, BЄI. A is called as Premise and B is called as Conclusion. The support, S, is defined as the proportion of transactions in the data set which contains the item set. The confidence is defined as a conditional probability
Mostrar mais

11 Ler mais

An Efficient Approach to Prune Mined Association Rules in Large Databases

An Efficient Approach to Prune Mined Association Rules in Large Databases

As the number of attributes and the number of transactions becomes large, thousands of rules are from a database. As the number of rules become huge, it is difficult for the data miner to analyze the mining results. Also it is impossible to use the results. Thus, it is crucial to help the decision-maker with an efficient technique for reducing the number of rules. The interestingness of the rule strongly depends on interactivity with the user. Existing methods do not guarantee that interesting rules can be extracted. To select the interesting rule, the user knowledge should be expressed in an accurate and understandable form. In data mining, background knowledge ontology organizes domain knowledge and plays important roles at several levels of the knowledge discovery process. Ontology provides an explicit representation of concepts in a domain, where each concept is a collection if items. Instance of a concept represents the ground level items. The subsumption relation between concepts shows is-a super class, is-a subclass relations. The concept-instance relation represents the relation between concepts and the instances. There are two types of concepts: leaf-concepts and generalized concepts from the subsumption relation ( ≤) . Leaf- concepts are connected in the easiest way to database—each concept is associated to one item in the database. Generalized concepts are described as the concepts that subsume other concepts in the ontology. A generalized concept is connected to the database through its subsumed concepts.
Mostrar mais

7 Ler mais

Mining rare associations between biological ontologies.

Mining rare associations between biological ontologies.

Two tasks that come close to ours are presented in [25] and [15]. The former applied the standard Apriori algorithm to connect 238 GO terms (i.e. only a small part of the data) of three GO branches: Molecular Function (GO-MF), Cellular Compo- nent (GO-CC) and Biological Process (GO-BP) by cross-ontology rules. The same task was previously addressed in [26] by three different approaches: the first one based on similarity in the vector space, the second one based on the statistical analysis of co- occurrences of GO terms, while the third also dealt with AR mining in the standard setting. To the best of our knowledge, the most recent work in the area of cross-ontology rule mining was reported in [27,28]. Both approaches were developed explicitly for mining cross-ontology multi-level association rules between three GO branches. The first approach uses the bottom-up generaliza- tion of rules level-by-level and a Monte Carlo simulation for its termination. It applies the Apriori algorithm at each iteration. The second one generalizes GO terms to all their ancestors and requires only a single pass through the Apriori algorithm. A standard Apriori implementation with support, confidence and Chi-square thresholds was used in both cases. Additionally, several pruning criteria, e.g. for cross-ontology or ancestor rules, were employed for removing closely related, irrelevant or known rules. More general rules were also pruned unless their confidence difference as compared to a child rule was greater than 10%. There was neither reason for the choice of the certain value nor recommendations for the parameter setting given.
Mostrar mais

13 Ler mais

Delays prediction using data mining techniques for supply chain risk management company

Delays prediction using data mining techniques for supply chain risk management company

Decision tree graph represents choices and their results under the form of a tree. The nodes represent an event and the edges the decision rules. A decision tree classifies instances by sorting them from the top (root node) to the down (leaf node). It first selects an attribute to place in the root node making one branch for each possible value (Larose & Larose, 2014). This process is recursively repeated for each branch, using only the instances that reach the branch, where the selection of the best attribute is again made to test at that point of the tree. When all instances of a node have the same classification, it is stopped for that part of the tree. The goal is to have leaf nodes as pure as possible – each leaf only represents records within the same class –with the discrimination between classes (Larose & Larose, 2014). Decision tree can also be represented as sets of if-then rules to simplify the understandability (Rokach & Maimon, 2010). Different algorithms exist for learning decision trees such as CHAID (Chi-square automatic interaction detection) which uses p-value from a significance test to measure the desirability of a split (Milanović & Stamenković, 2016) and Iterative Dichotomiser 3 (ID3) algorithm which uses entropy to calculate the homogeneity of a sample. If it is completely homogeneous the entropy is zero and if the sample is equally divided the entropy is equal to one (Khedr, Idrees, & El Seddawy, 2016). We run our decision trees several times with different parameters (maximum number of branches and maximum number of depth) to obtain the optimal combination. Misclassification method was chosen as an assessment measure to select the best tree. According to SAS enterprise miner, the misclassification method selects the tree that has the smallest misclassification rate.
Mostrar mais

34 Ler mais

Optimal rules for monetary policy in Brazil

Optimal rules for monetary policy in Brazil

It is noticeable that the reaction coefficient to current inflation, under the assumption of very low interest rate smoothing and equal weights for inflation and output stabiliz[r]

28 Ler mais

Are one-sided S,s rules useful proxies for optimal pricing rules?

Are one-sided S,s rules useful proxies for optimal pricing rules?

This article is motivated by the prominence of one-sided S,s rules in the literature and by the unrealistic strict conditions necessary for their optimality. It aims to assess whether one-sided pricing rules could be an adequate individual rule for macroeconomic models, despite its suboptimality. It aims to answer two questions. First, since agents are not fully rational, is it plausible that they use such a non-optimal rule? Second, even if the agents adopt optimal rules, is the economist committing a serious mistake by assuming that agents use one-sided Ss rules? Using parameters based on real economy data, we found that since the additional cost involved in adopting the simpler rule is relatively small, it is plausible that one-sided rules are used in practice. We also found that suboptimal one-sided rules and optimal two-sided rules are in practice similar, since one of the bounds is not reached very often. We concluded that the macroeconomic effects when one-sided rules are suboptimal are similar to the results obtained under two-sided optimal rules, when they are close to each other. However, this is true only when one-sided rules are used in the context where they are not optimal.
Mostrar mais

36 Ler mais

Web Page Recommendation Using Web Mining

Web Page Recommendation Using Web Mining

The aim of a recommender system is to determine which Web pages are more likely to be accessed by the user in the future. In this phase active user‟s navigation history is compared with the discovered Sequential Association rules in order to recommend a new page or pages to the user in real time. Generally not all the items in the active session path are taken into account while making a recommendation. A very earlier page that the user visited is less likely to affect the next page since users generally make the decision about what to click by the most recent pages. Therefore the concept of window count is introduced. Window count parameter „n’ defines the maximum number of previous page visits to be used while recommending a new page. Since the association rules are in the form of ontology individuals, the user‟s navigational history is converted into the sequence of ontology instances. Then the semantic rich association rules and user navigation history are joined in order to produce recommendations.
Mostrar mais

6 Ler mais

Dimensionality Reduction for Optimal Clustering In Data Mining

Dimensionality Reduction for Optimal Clustering In Data Mining

PCA is the simplest of the true eigenvectors-based multivariate analysis. Often, its operation can be tough as well as revealing the internal structure of the data in a way which best explains the variance in the data. If a multivariate dataset is visualized as a set of coordinates in a higher-dimensional data space (1 axis per variable) PCA supplies the user with a lower dimensional picture, a “shadow” of this object when viewed from its (in some sense) most of informative viewpoint. PCA is closely related to factor analysis; indeed, some statistical packages deliberately combine the two techniques. True factor analysis makes different assumptions about the underlying structure and solves eigenvectors of a slightly different matrix.
Mostrar mais

4 Ler mais

The time-saving bias: Judgements, cognition and perception

The time-saving bias: Judgements, cognition and perception

Svenson (2008) and Peer (2010a,b) studied the time- saving bias in driving or production: the time saved by speed increases from a relatively high speed is overesti- mated relative to the time saved by speed increases from low original speeds. Most studies on driving speed have been paper and pen questionnaires and judgments made when the respondent was not driving, with the excep- tion of a study by Peer and Solomon (2012). Peer and Solomon (2012) investigated professional taxi drivers and non-professional drivers about a journey they were cur- rently making in a slow but not congested city traffic envi- ronment. Both groups gave biased judgments of journey time, average speeds and biased time-savings, consistent with previous questionnaire studies (Peer, 2010a,b; Sven- son, 1970; Svenson & Salo, 2010; Svenson et al. 2011). However, overestimations of time savings following in- creased driving speed among the taxi drivers were smaller than those made by the non-professionals.
Mostrar mais

6 Ler mais

A Heuristic Approach to the Disease Diagnose System Using Machine Learning Algorithms (An Expert Advisory System for EMU Bird’s Diseases)

A Heuristic Approach to the Disease Diagnose System Using Machine Learning Algorithms (An Expert Advisory System for EMU Bird’s Diseases)

Abstract--The paper deals with the concepts of expert system and data mining belongs to the Artificial Intelligence fields. The main task of expert system is to ratiocination, while the machine learning algorithm is to find the better optimal solution. This paper mainly focuses on diagnoses of the disease which is effected to the Emu bird by mechanism of Particle Swarm Optimization (PSO) algorithm and Artificial Bee Colony(ABC) algorithm. The decisive rules of database is mined and that could be applied in expert system. Thus, by applying optimization techniques resulting to best global optimized solution.
Mostrar mais

3 Ler mais

A Recent Review on XML data mining and FFP

A Recent Review on XML data mining and FFP

Examples are FP-tree and FP-growth, CAN-tree etc. The eXtensible Markup Language (XML) has become a standard language for data representation and exchange. XML is a Standard, flexible syntax for data exchanging Regular, structured data. Database content of all kinds: Inventory, billing, orders etc. It has small typed values and irregular, unstructured text. It can consist of documents of all kinds: Transcripts, books, legal briefs etc. With the continuous growth in XML data sources, the ability to manage collections of XML documents and discover knowledge from them for decision support becomes increasingly important. Mining of XML documents significantly differs from structured data mining and text mining. XML allows the representation of semi-structured and hierarchal data containing not only the values of individual items but also the relationships between data items. Element tags and their nesting therein dictate the structure of an XML document.
Mostrar mais

6 Ler mais

Optimal-state-dependent rules, credibility, and inflationary inertia

Optimal-state-dependent rules, credibility, and inflationary inertia

21The distribution of price deviations will converge in the long run to an ergodic distribution which is different from the one associated with a no inflation steady state[r]

62 Ler mais

Show all 10000 documents...