ONTOLOGY BASED WEB PAGE ANNOTATION FOR EFFECTIVE INFORMATION RETRIEVAL

(1)

ONTOLOGY BASED WEB PAGE

ANNOTATION FOR EFFECTIVE

INFORMATION RETRIEVAL

S.Kalarani M.E.,(Ph.D), Asst Prof/Dept of IT St Joseph’s college of Engg

Chennai 119. India.

Dr.G.V.Uma M.E., Ph.D, Prof/Head Of IST

Anna University Chennai -20

India. Abstract

Today’s World Wide Web has large volume of data – billions of documents. So it is a time consuming process to discover effective knowledge from the input data. With today's keyword approach the amount of time and effort required to find the right information is directly proportional to the amount of information on the web. The web has grown exponentially and people are forced to spend more and more time in search for the information they are looking for. Lack of personalization as well as inability to easily separate commercial from non-commercial searches is among other limitations of today's web search technologies. This paper proposes a prototype relation-based search engine. “OntoLook” which has been designed in a virtual semantic web environment. The architecture has been proposed. The Semantic Web is well recognized as an effective infrastructure to enhance visibility of knowledge on the Web. The core of the Semantic Web is “ontology”, which is used to explicitly represent our conceptualizations. Ontology engineering in the Semantic Web is primarily supported by languages such as RDF, RDFS and OWL. This paper discusses the requirements of ontology in the context of the Web, compares the above three languages with existing knowledge representation formalisms, and surveys tools for managing and applying ontology. Advantages of using ontology in both knowledge-base-style and database-style applications are demonstrated using one real world applications.

Keywords: Semantic web, onto look; OWL,RDF,RDFS; Knowledge base. 1. Introduction

Problems with Today's Web Search

No ability to manage the results by defining context or meaning. It is not easy to build an advanced search query. Lack of hints for search.

1. No user-friendly visual management with mouse click.

2. Documents are ranked by a search engine according to an algorithm. specific to a search engine, and not specific to your interests.

3. Over-ranked commercial and under-ranked non-commercial search results. 2. Semantic web

(2)

2.1 PROBLEMS OF THE SEMANTIC WEB 1.Reduced anonymity on the Web

Unless we are already taking active measures to keep yourself non-indexed you may find that in the Semantic Web information about your identity, interests, and habits are trivial to discover. . One day we may see a shift in the importance of anonymity. Openness and transparency may become the "in thing."

2. Increased invasion of privacy

This problem stems from the issue of reduced anonymity on the Semantic Web. A Web that exposes vast amounts of information about everyone has its drawbacks.

3. Intelligent content scraping

The content scrapers of today are really quite simple compared to what we will have to deal with in the Semantic Web. Essentially the scraper will access a Website or feed and extract and store the desired content. The technology does not yet fully exist that would give us the ability to do what we described above, however the bottom-up approach to semantic content scraping would be to scrape the content of metadata written in RDF / OWL. The "bottom-up" scraper would not have the ability to extract information from the content in the way that a top-down content scraper (using an NLP agent) could but we expect to begin seeing this soon, if it hasn't already started.

4.Value paradigm shifts

When the user wants to search some information in the web , the search engine abstracts the information to the keyword combination and then submits it. The relationship is obvious to users, while it is not for search engines. If web page only includes the keywords and there is no relationship between keywords in the context of the web page, the web page does not provide what the user wants. In this case, we say the web page is a keyword isolated page. However , there are many keyword isolated pages in the result set returned by traditional search engines. In fact , because of the constraints of the current web architecture , search engines cannot exclude these keywords-isolated pages from the result set. So semantic web need a paradigm shift to represent information in the form of knowledge base, as ontologies.

5. Vocabulary incompatibilities

The vocabularies we use to classify information are the backbone of the new information frontier. The problem with multiple vocabularies that contain the same terms but apply different meaning to them is that we destroy the author-intended meaning of the information if we attempt to merge the information. There will be a great need for an open, unified vocabulary in the Semantic Web.

3.PROPOSED SYSTEM -RELATION BASED SEARCH ENGINE

While search engines which index HTML pages find many answers to searches and cover a huge part of the Web, then return many inappropriate answers. There is no notion of "correctness" to such searches. By contrast, logical engines have typically been able to restrict their output to that which is probably correct answer, but have suffered from the inability to rummage through the mass of intertwined data to construct valid answers. The combinatorial explosion of possibilities to be traced has been quite intractable. These drawbacks are recovered by the relation based search engine.

“OntoLook” constructed in semantic web, can exclude the keywords-isolated pages from the result set. Different from the traditional keyword-based search engines. “ontoLook” is a relation-based search engine. When “ontoLook” processes the keywords , not only are the keywords processed, but so is the relationship between the entities offered by the architecture of semantic web. A page will be returned to users only when it includes the relationship between keywords , and those pages with the keywords only and without the relationship are discarded.

(3)

3. Ontology learner

Automatic ontology building is a vital issue in many fields where they are currently built manually. This paper presents a user-centred methodology for ontology construction based on the use of Machine Learning and Natural Language Processing. In our approach, the user selects a corpus of texts and sketches a preliminary ontology (or selects an existing one) for a domain with a preliminary vocabulary associated to the elements in the ontology (lexicalisations). Examples of sentences involving such lexicalisation (e.g. ISA relation) in the corpus are automatically retrieved by the system. Retrieved examples are validated by the user and used by an adaptive Information Extraction system to generate patterns that discover other lexicalisations of the same objects in the ontology, possibly identifying new concepts or relations. New instances are added to the existing ontology or used to tune it. This process is repeated until a satisfactory ontology is obtained. The methodology largely automates the ontology construction process and the output is an ontology with an associated trained leaner to be used for further ontology modifications.

The ontology learner contains the modules

Internet

Crawl

program

The user

interface

OWL

parser

Relation‐based

search engine

OntoLook

Ontology database

The keyword input by the

(4)

1. Web site annotator

All related web sites are annotated through their possible relationship using RDF and OWL. For example in case of Hotel reservation system the details about destination site, urban area, purpose like accommodation, accommodation rating like 5 star/3 star rating, and any activity provided by that hotel are stored as ontology.

2. Domain concept annotator

In this annotation activity domain ID, domain name , description, domain concepts are created and submitted to the intelligent search engine. Each ontology has its associated domain and range values. This will hold the property values , and its constraints.

3. Concept dictionary annotator

In concept annotator the domain specific values are given like web site ID, web site URL, description, and ontology domain like –travel are given as input and the details are inserted successfully.

After creating ontology and successful invocation of ontology learner we can view the created ontology and we can view RDF, and view ontology graph.

View ontology

In hotel reservation system , travel ontology will be created automatically as owl file. This owl file contains RDF tags , and necessary properties , and property values.

View RDF

In RDF view we can see the RDF triple <subject , property, object> for all the web sites which are annotated previously.

View ontology graph

This one will give a final view of all the related web urls in the form of tables. Modules

1. Site Registration 2. RDF generation 3. Ontology generation

a. Property keyword set. 4. USER search

a. Concept relational graph b. Cut arc graph

c. Property keyword set d. Modules description: Site Registration

In this phase, admin collect all the detail about the site. Site id , site size and possible relational and their properties values. Create or update ontology of the particular domain.

RDF generation

Admin collect all the detail about the relational, concept corresponding values, it set into under RDF API , rdf file was created.

Ontology generation

From the created ontology and its corresponding RDF file , we retrieve concepts and its values. Then , map these input into OWL API , for creating ontology file. Finally store the web site information into database. Now the database is a ontology database, which is called knowledge base.

Property keyword set

A keyword candidate set is the smallest unit being sent to database to retrieve. The property-keyword candidate consists of four triple. The first system will send the four triples into the database to retrieve , respectively , and the system will obtain four result sets, each of which includes web pages containing the keywords and the corresponding property. Then , the four result sets will be combined , and the intersection of the four result sets is the last result set.

(5)

USER search

In this module, user search into relational search engine system ONTO LOOK . user enters their keyword and it’s corresponding relational.

Concept relational graph

Our ONTOLOOK system , annotating words with ontology file and url info into database. If match the relational words means, it pick up the url info from the database.

Cut arc graph:

Alone the concept relational graph , our system pick up the interest url info. Here we minimize relations which is closed to the keyword search. So some of the relational words are cut from the set. Display only required url info only.

Relation based search engine – final result

Using relation based search engine select the domain for information retrieval. As per our example travel domain. After selecting domain specify some of the related information which is already used during web site annotation. Now our relation based search engine will list out only the relevant web sites along with the page rating (its frequency of usage). The user click on the web site url and access the opt information.

(6)

Workflow inside the relation-based search engine “ONTOLOOK”

CONCLUSION

One of the most exciting uses of an ontology , in the context of the semantic web, is to support the development of agent based systems for web searching. The idea of combining ontologies and knowledge base has motivated our work. We develop “ontolook” a prototype for relation based search engine. This paper has presented different experience in the field of ontology visualization, and effective information retrieval. Final result is in the form of user friendly direct urls. The workflow diagram describes the complete work done by the ontolook search engine.

References

[1] Yimin Shi , Guanyu Li, “Frame work of semantic web service discovery based on ontology mapping” , 978-0-7695-3927@

2009 IEEE International conference on research challenges in computer science. Albert, R.; Jeong, H.; Barab´asi, A.-L. (1999): Diameter of the world-wide Web. Nature, 401, pp. 130–131.

[2] Naida catenazzi , Lorenzo ,” user friendly ontology editing and visualization tools: the OWL Laasy Viz approach “ @ 2009 13th

International conference Information Visualization.

[3] S. Schaffert, F. Bry, J. Baumeister, and M. Kiesel, "Semantic Wikis," software, IEEE, vol. 25, pp. 8- 11, 2008.

[4] "Resource Description Framework (RDF) / W3C Semantic Web Activity.

[5] Zaihisma Che Cob , Rusli Abdullah “Ontology-based Semantic Web Services Framework for Knowledge Management System”

[6] D. Xua, Y. Qi, D. Hou, Y. Chen, and L. Liu. A Formal Model for Security-Aware Dynamic Web Services Composition. In Proc.

of the International Conference on Computational Science and its Applications, pages 139-143. IEEE, 2007.

[7] Bingxian Ma, Nengfu Xie “From OWL-S to PNML+OWL for Semantic Web Services” , 2010 Second International Conference

on Computer Modeling and Simulation.

[8] OWL-S.http://www.w3.org/Submission/2004/SUBM-OWL-S-20041122/(2009.10.

[9] 9. Juan C. Vidal, Manuel Lama, Alberto Bugarín , Petri net semantics for OWL-S service choreography, ATPN-Workshop on

Formal Approaches to Business Processes and Web Services, Siedlce (Polonia), 26/06/2007, pp. 7-20.

[10] 10. JRG Pulido, SBF Flores , RCM Ram´ırez” Eliciting ontology components from semantic specific-domain maps: towards the

next generation web “,2009 Latin American Web Congress.

[11] 11. A. Zouaq and R. Nkambou, “Enhancing learning objects with an ontology-based memory,” IEEE Trans.on Knowledge And

Data Engineering, vol. 21, no. 6, pp. 881–893, 2009.

User inputs thekeyword User receive return result set

User interface

Analyze user input keyword retrieve the relations between their corresponding

Submit URL set to web page data base , retrieve web page information

Construct the concept relation graph based on retrieved relations

Submit URL set to web page database , retrieve web page information

Cut the arc from graph in turn

form concept relation subgraph

Select appropriate relation

from arc ofthe subgraph

form property keyword candidate set