Proceedings of the 2nd International Workshop on Multi-Relational

In Section 3, we give a detailed summary of the work done on unit cost active feature selection originally reported in [9] and [8]. At each iterative call t of the base learner, the base hypothesis h t is learned about E based on the current distribution D t .

The learning progress is monitored in terms of the development of the average margins of the training examples. Boosting is known to be particularly effective in increasing the margins of training examples [20, 7].

Table 2. Strategies to feature ordering and selection based on the distributional in- in-formation present in boosting

A Simple Relational Classifier

1 Motivation

While the reported results on the relational classifiers (PRMs, RPTs, and RBCs) have been compared with basic non-relational learners (e.g., the naive Bayes classifier or C4.5 [26]), a simple relational classifier is an equally important, and perhaps a more appropriate point of comparison. Here we analyze the Relational Neighbor (RN) [25] classifier as such a simple classifier that only uses class labels from known related instances and does not learn.

2 A Relational Neighbor Classifier

We then propose a probabilistic version of RN and show that it unexpectedly does not add value in the cases described in this paper, although it may do so in other domains. The iterative relational-neighbor (RN∗) classifier iteratively classifies entities using the RN classifier in its inner loop.

3 Case Studies

CoRA

Using the same methodology reported in the PRM study, we ranged the proportion of papers whose class is initially known from 10% to 60%. We will only report on the RN ∗ classifier in the following two studies because of its clear superiority.

IMDb

Second, in all cases, although less so in the Cornell data set, RN* was competitive with RPT, even seeing only 5% of the data. In this case, looking at only 5% of the data gave very close performance in 3 out of 4 data sets.

RBC RPT

4 The Probabilistic Relational Neighbor Classifier

Due to the fuzzy nature of the propagation, there is no guarantee of convergence, although in all our test cases, the probabilities appear to be converging. In virtually all tests, when only a small fraction (≤ 30%) of the data was initially labeled, RN ∗ performed better than pRN ∗ , although they often performed similarly when we labeled > 75% of the data.

5 pRN on Synthetic Data

From these results, it is clear that pRN* is able to use very little information (eg, one negative and only one good guy - 0.1% good) to virtually completely characterize the remaining negatives. Even when the bad guys are unknown, pRN ∗ is still able to perform very well.

6 Final Remarks

Acknowledgments

Furthermore, 5% of the good guys need to be labeled before it can perform comparably to pRN. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, whether expressed or implied, of the Defense Advanced Research Projects Agency (DARPA), the Air Force Research Laboratory, or the U.S. .

This work is sponsored by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory, Air Force Command, USAF, under agreement number F. Technical Report 02-55, Department of Computer Science, University of Massachusetts Amherst, nil Report 02-55, Dept. in Computer Science, University of Massachusetts Amherst, 2002.

Collective Classification with

Relational Dependency Networks

1 Introduction

We show preliminary results showing that collective inference with RDN provides improved performance compared to non-collective inference which we call “individual inference”. We also show that collectively applied RDNs can perform close to the theoretical ceiling reached if all neighbor labels are known with perfect accuracy. These results are very promising, indicating the potential utility of further exploration of collective inference with RDN.

2 Classification Models of Relational Data

In this article, we present Relational Dependency Networks (RDNs), an undirected graph model for relational data. As a result, learned RPTs are a relatively compact and parsimonious representation of conditional probability distributions in relational data.

3 Relational Dependency Networks

Dependency Networks

The undirected edges of the graph connect each node x i to each of its parents (the nodes in pa i. Each node is conditionally independent of the other nodes in the graph given its parents.

Learning. As with any graphical model, there are two components to learning a DN

Relational Dependency Networks

Given a set of objects and the relationships between them, an RDN defines a complete joint probability distribution over the values of the objects' attributes. For each objective variable, the RPT model is used to return a probability distribution given the actual attribute values in the rest of the graph.

4 Experiments

Tasks

Nine attributes were supplied to the models, including country of study, actor's year of birth, and the class label of related movies two links away. Twelve attributes were available for the models, including journal affiliation, paper location, and the subject of articles one link away (references) and two links away (via authors).

Results and Discussion

In addition, the performance of the RDN models is superior to both the RPT-indiv models (RPT learned without labels) and the RPT standard models (RPT learned with labels and tested with a standard labeling for the class values of related instances). If we can conclude from the learning curves that the Gibbs chain had converged, why did the RDN model not perform as well as the RPT ceiling model on Cora.

Fig. 5. Example RPT learned for the Cora dataset to predict paper topic given the topics of related papers.

5 Conclusions and Future Work

The Gibbs chain can mix slowly, making it unlikely that the process will jump to a distant part of the label space. The views and conclusions contained herein are those of the authors and should not be construed to necessarily represent the official policies or endorsements, expressed or implied, of DARPA, AFRL, or the US.

Structural Logistic Regression for Link Analysis

1 Introduction

This is both impractical and incorrect - the size of the resulting table is prohibitive, and the notion of an object corresponding to an observation is lost, being represented by multiple rows. We propose the use of Structural Logistic Regression to link prediction and argue that the properties of the method and the task are a good match.

2 Methodology

We use data from CiteSeer (a.k.a. ResearchIndex), an online digital library of computer science papers [22] ( http://citeseer.org/.

Search

Relational Feature GenerationStatistical Model Selection

Control Module Learning Process

Relational Database Engine

Feature Generation

Refinement graphs are directed acyclic graphs that specify the search space of the first-order logic queries. Using aggregate operators in feature generation makes trimming the search space more involved.

3 Tasks and Data

The latter is easier and should be the subject of future improvements. .. search down refinement graphs allows for a number of optimizations, e.g. i) the results of queries (before applying the aggregations) at a parent node can be reused at the child nodes; this should certainly be weighed against the space required to store the views, and ii) a node whose query results are empty for each observation should not be further refined, as its refinements will also be empty. Density is the percentage of existing citations out of the total number of possibilities, (# Docs)_ Dataset # Docs # Links Density (`a\bc_.

4 Results

We change the ratio of the number of negative examples to the number of positive ones used in testing. Precision recall curves for the “artificial intelligence” dataset with different class priors. is the ratio of the number of negatives to the number of positive examples used in testing.

5 Related Work and Discussion

While in the first case "upgrading" can also be seen as a form of "propositionalization", the second emphasizes that the model selection criteria of the original propositional algorithm do not participate in the construction of properties. Being a domain-wide joint probabilistic model, PRMs can provide answers to a large number of questions, including class labels, latent groups, changing beliefs based on new observations.

6 Conclusions and Future Work

Ðñ0ðGPAïeïePT_4ø(ikK.\ñedlmSQø3i\aedñpd(jNdfe]$MOlkdø(dPAòð\KTør÷ ZTfePBOS\a$÷RSOP*ikdaejïlkSïd.\NdT9lmSïd. d(frKzïødñ_ÕKNSR]6_`PTfød.imlmïgd(frKAiÓñ<ïøî\KNS#SOdø3dñøñgKAfg]pñéPN_`d'iklmïødfgKNiÓñ`ZNdSOd(frKîïïñ`ZNdSOd(frK

BettyJoyce

YpîOd2MOfgdalÓø(KAïdñÇMGdø(lACQø(KAïlkPNS ©

ALEPH BOUNDED mFOIL

S8\_5ðGd(f5PAò*ø3iÓK.\ñÇdñlmï5ø3PTS\ñÇï჈f;\øზïgda pKNñÕô8AüÚiÓKAfgZNd(f[LKAimïgîOP.OZTîùK2ï*ზ჈ñø](Çð჈ñø](Çð჈ø; ø3ï Eda2ò¯PNfÿ *U*e ÐñÍM\f EPETZNfrKA_ PTS .ûEj lÓaEdS\ø3d5ûüRïEfrKNø3ïElkPNS2KASQaÛJlkSO÷.

Efficient Multi-relational Classification by Tuple ID Propagation

1 Introduction

In a database for multi-relational classification, there is one target relation R t , whose tuples are called target tuples. Until now, there is no accurate, efficient and scalable approach for multi-relational classification.

2 Related Works

It uses a sequential covering algorithm that is similar to FOIL, which iteratively builds rules and removes the positive sets of targets covered by each rule. It then constructs a new dataset which contains all positive and negative target tuples satisfying r', along with the corresponding non-target tuples.

Card card-id

At each step, every possible predicate is evaluated and the best one is added to the current rule. After calculating the foil gain of each of these predicates, the best predicate is added to the current rule.

Client client-id

FOIL is inefficient because it has to evaluate too many predicates in the whole procedure, and evaluating each predicate is time-consuming due to the new data set. Therefore, FOIL is not scalable according to the number of relations and the number of attributes in the databases.

3 Tuple ID Propagation

Account

Loan

Basic Definitions

In the example above, a tuple in Loan can only be associated with one tuple in Account. In fact, it can be associated with more than one tuple in other relationships such as Order and Customer (see figure 1).

Search for Predicates by Joins

For a predicate in account relation, such as "Account(A, ?, monthly, ?)", we need to define what is meant by "a target tuple satisfies a rule containing this predicate". We say that a tuple t in the Loan relation satisfies r if and only if a tuple in the Account relation that can be connected to t has value "monthly" on the frequency attribute.

Loan ∞ Account

Tuple ID Propagation

One way to solve this problem is to run the join once and calculate the foil gain of all predicates.

Loan loan-id

Please note that we cannot only calculate the foil gain from the class labels of each tuple in account relations (see Figure 3). For each tuple t in R 2 , there is a set of IDs representing the tuples in the target relation that can be joined to t (using the join path specified in the current rule).

4 Implementation

Data Representation

For example, "Loan(L, A) specifies that we should merge the Account relation with the Loan relation (in our case, propagating the Tulp IDs of the Loan relation into the Account relation). For example, "Account(A, ?, monthly, ?)" specifies that tuples must have the value "monthly" for the frequency attribute in the account relation.

Learning Algorithm

In a predicate pair, the first predicate specifies how we can pass IDs to the relation, and the second predicate specifies the constraint of that relation. Loan(L, A Account(A, ?, monthly, ?))" specifies that we propagate tuple IDs from the Loan relation to the Account relation, and in the Account relation the frequency attribute must have the value "monthly".

We define a predicate pair as the combination of a predicate of the first type p 1 and a predicate of the second type p 2.

5 Experimental Results

We modified the data set a bit by shrinking the Trans relation, which is extremely large, and removing some positive target tuples in the Loan relation to make the number of positive and negative target tuples more balanced.

6 Conclusions

Â_`k i sqfhე~egik¢oq}gkh_kh_ji oqeYfheg|ზმfhpzsvuJჃgfhaji khkhu}Ypr_jჃ!fhe!prვk n¢fheg|ma`k iprprftÃNaji¯Ä.

ÛT59@D=