Metadata Control Agent approach for Replication in Grid Environments

(1)

International Journal of Electronics and Computer Science Engineering 1156

Available Online at www.ijecse.org ISSN- 2277-1956

Metadata Control Agent approach for Replication in

Grid Environments

P. SunilGavaskar

1

, Dr.Ch D V Subba Rao

2

1 2 _{Department of Computer Science and Engineering}

1_{Sri Venkateswara University, Tirupathi, Andhra Pradesh, India}

2_{Sri Venkateswara University, Tirupathi, Andhra Pradesh, India}

1_{Email- Sunil079@Gmail.com}

Abstract- since grid environment is dynamic, network latency and user requests may change. In order to provide better communication, access time and fault tolerant in decentralized systems, the replication is a technique to reduce access time, storage space. The objective of the work is to propose an agent control approach for Heterogeneous environments using the Agents for storing objects as replicas in decentralized environments. Our idea minimizes the more replicas (i.e. causes overhead on response time and update cost), therefore maintaining suitable number of replicas is important. Fixed replicas provides file access structure to identify the esteem files and gives optimal replication location, which minimize replication issues like access time and update cost by assuming a given traffic pattern. In this context we present the Agents as replicas to maintain a suitable scalable architecture. The solution uses fewer replicas, which lead to fewer agents as a result of that frequent updating is possible. Our tests show that the proposed strategy outperforms previous solutions in terms of replication issues.

Keywords – Agents, Metadata, Lookup Service, Replicas, Data Grid

I.INTRODUCTION

Grid [1] is a basic infrastructure of national high performance computing and Information services, it achieves the integration and interconnection of many types of high performance Computers, Data servers, large-scale storage systems which are distributed heterogeneous in nature, and the important application research queries which are lack of effective research approach. Data grid [2] provides services for supporting the discovery of resources and enables computing in heterogeneous storage resource by storage of resource agent. Data grid [3] allows sharing data among different environments with appropriate permission from storage space of data.

Replica utilization in fault tolerance strategy is challenging research in data grid. By placing multiple replicas at different locations, replica management [4] can reduces network delay and bandwidth consuming of remote data access. Check pointing frequencies are increased as the fault frequencies increases [5], thereby diskless check pointing has been introduced as solution to avoid I/O bottleneck of disk- based check pointing. According to replica creation method, fewer replicas [6] would increase the response time of data access, while more replica would increase data update cost and waste storage space. Therefore, maintaining adequate number of replicas is important. In addition, due to the dynamic change of network bandwidth and storage resource, a fixed number of replica is neither feasible.

The rest of the paper is organized as follows. Related works are presented in section II. Proposed System model and strategies are explained in section III. Simulation and Analysis are presented in section IV. Concluding and future works are given in section V.

II. RELATED WORK

(2)

1157

Metadata Control Agent approach for Replication in Grid Environments

Strategy, Plain Caching Strategy, Caching and Cascading Strategy and Fast Spread Strategy. The Economic model based replica management strategy uses evaluation function to evaluate replica value, then decides whether to create local replica or not. This strategy uses antivickrey auction model to choose best replica. [2][3].The simple replica management strategy would make data frequently replicated, thus is not suitable to the dynamic grid environment. Second strategy hierarchical model is designed for Europe data grid and could not accommodate P2P and hybrid network structure. By considering third strategy an economic model with two main objectives: to maximize profits and minimize management costs [9]. Computation and Storage units purchase files from other storage units following auction protocol. The replicas deletion is also based on an economic model; a replica is deleted only if the deletion gives gain. In [9], the authors used economic model for database replica placement adapted to support database applications.

Issues with Replication are widely studied in Distributed systems [7]. Our Proposed models mainly aim at minimizing search time of replica with the help of agents as result of that reduces access time, communication cost and increase availability of data. In order to implement that the number of replicas become larger as the load of accessed node increases, to overcome this issue multiple object states are set. Each time creating replica for heavy load nodes and puts these replicas in the relative light load nodes, thus achieves load-balance. Root nodes always maintains metadata catalogue to perform mapping of objects and agent replicas.

III.SYSTEMMODEL

The proposed architecture is demonstrated in Figure.2: which contains the following components called child node or S Agents and superiors are called Magents. 1. Sagents: The roles of agents are storing local replicas of the currently registered distributed objects. 2. Magents: They also act as replicas and contain local copies and Meta-data catalogues. There by subset of Agents becomes under the control of Magents, the Agents are connected to form a complete graph. All the messages are routed between Magents there by Magents connect together several connected clusters of Sagents. Here clusters are formed on the basis of a Calendar Queue [10], which divides the queue into class types. In The proposed model we consider that we have numbers of Magents are equivalent of classes. Sagents are having length of class i.e. Number of objects supporting in it. Here we have several lookup discovery services, which stores information regarding the currently existing Magents.There by a new cluster is formed, its Magents dynamically registers to closest lookup service and then Sagents uses one lookup service to obtain information. The forming of agents into cluster is based on their network bandwidth and also on the basis of set of condition metrics.

A. Meta-Data Catalogue—

(3)

IJECSE,Volume2,Number 4

P Sunil Gavaskar et al.

1158

M Agents

SAgents

Figure 1. Integration of Agents with Monitoring Architecture.

B. Replicas Management. –

Here all the agents maintain copy of several replicated objects, these are java serializable objects. Once Clusters are grouped then it contains Master Privileges and information regarding the current objects are maintained with in the metadata catalogue, there by the metadata catalogue is updatable when Sagents are converted into Magents. The Replica mapping techniques is on the basis of hashing, which uses two values (Hash key, HashfieldValue).Hash Key of replica represents a Unique identification of object identifier with the particular mapping pattern Key (id0, id1, id2, id3…idn-1). HashfieldValue (id_agent1,id_agent2,..id_agentsN) identifies an Agent’s identifier, where N depends on class inheritance. In order to access the particular agent’s replica the request passes from Magents to other agents.

Previously replication is combined to job scheduling or at runtime [11].In our model the replicas are created and placed before job execution / job starts, therefore agents contain replicas which are independent of the files requested by job (i.e. Copies of agents replication is happens at once after the original copies are created and before any file request, but modification replica allows on basis of mapping objects and agentsID present in metadata catalogue).This model performs to add,remove,rejoin of agents so that replica is dynamic in nature based on its size. From figure 1.the following steps will demonstrates our strategy.

Step1: Sagents joins in the network by Identifying list of Magents based on the lookup services and then estimate best Magent.

Step2: Magent is chosen based on size of its class id (i.e. load on it).and then Number of Sagents connected to Magents.

Step3: Sagents finds Magent which having fewer Sagents and then Computes metric (Distance between Magents is evaluated by means of shortest path connecting two Magents using shortest path routing algorithm.).

Thereby we reduced total Sagents connected to Magents by converting Sagents into Magents.In order to avoid 1. Monitoring &discovery.

(4)

Metadata Control Agent approach

Step4: If the Number of Sagents is g

Step5: Among Sagents promote one metadata also changes.

Step6: Sagents connected are lesser Catalogue destroys and that catalog connecting it.

In the above said architecture such replication in critical data–intensive

OptorSim is a salable, configurable Java implementation of our strategy.

Table 0.00 1000.00 2000.00 3000.00 4000.00 5000.00 6000.00 Eco. mod 4800.4 3110.3 400 A v e ra g e jo b t im e (m s ) NR strategy Number of f Size of files Number of j Job Executi Job delay Background Number of s Number of S Capacity of Optimizer

Number of per site.

ach for Replication in Grid Environments

s greater when compare to minimum length of Magent c ne Sagent into Magent based on the basis of election pr ser than when Compare to Minimum Length of Magen ogue connected Sagents joins with the one from adjace

ch kind of hierarchical interconnection topology ensur ve applications.

IV.SIMULATIONS

le and programmable simulation tool for grid. We exten gy.

ble.1.Simulation Configuration

Figure 2. Average job time

odel LFU LRU

5500.7 5475.1 .3 3510.2 3460.1 4000.5 4995.1 4740.2

Initial replica proposed strategy Initial replicas at random

of files 97

es 1GB

of jobs 100

ution time 5

2500

nd bandwidth NO

of sites 20

of SE per site 1

of SE 50-100GB

Lru Optimizer Lfu Optimizer EcoModelOptimizer

of worker node 100

1159

t catalogue size.

procedure, as a result of that ent catalogue size .Then the cent Magent or shortest path sures scalability and optimal

tend OptorSim by adding the

(5)

In figure 2 .The simulation results s job time by 25%,here the replicas updating of replicas are faster.

From the figure3.It shows that nu maintaining unwanted replicas (agen

By considering figure.4. The percen strategy uses replicas (Sagnets) base at runtime by means of connection replica, Thereby it depends on the ob Through the analysis of simulation and improvement over average job fully, but we believe that our agents

T o ta l n u m b e r o f re p li c a ti o n s

IJECSE

P

s shows that gain on the average job time ,our proposed as are placed according to pattern mapping using met

Figure 3. Total Number of replications

numbers of replicas are generated, in our proposed s ents) and the accesses of all the replicas as local access.

Figure 4. SE Utilization

centage of Storage space changes slightly when compa ased on the number of object patterns present in it. Stor on and disconnection of agents (Number of objects ca objects states).This strategy performs little worse over p n results, it can reduce number of replicas as a result b time. Due to limitations in simulation tool kit the pe ts strategy achieve good performance.

0 50 100 150 200 250 300 350 400 450 500 550 600

Eco. Model LFU LRU

462 544 536 390 545 540 430 542 ₅₃₇

NR strategy Initial replica proposed strategy initial replicas at random

0.00 0.10 0.20 0.30 0.40 0.50

Eco.Model LFU LRU

S E U s a g e

NRstrategy Initial replica proposed strategy Initial replica at random

CSE,Volume2,Number 4

P Sunil Gavaskar et al.

1160

sed strategy reduces average etadata catalogue, therefore

d strategy model, it reduces ss.

pare to other strategies .Our torage space is compensating can accept request on same

(6)

1161

Metadata Control Agent approach for Replication in Grid Environments

In this paper, we proposed the concept of object oriented Replicas; we have introduced the strategy called agents activation of replica creation, selection, replacement and maintenance. The operations are optimized in terms of number of replicas used and messages exchanged between agents. With objects as replicas two processes can demand for the same resource at same time, due to objects state maintenance. Our work is useful to users where the decentralized environments facing challenges like load balance, storage space, access rate, Simulation results shows that, our concept of replicas optimization can significantly reduces total number of replicas. Thereby gain in the execution time is mainly due to enhancement in availability of replicas (objects) and its transfer time. In future we plan to further extend ideas with service level fault tolerance apart from resource level fault tolerance. The service level fault tolerance should focus on QoS issues and global behavior. It can benefit for Representation of grid global state in service oriented format.

REFERENCE

[1] Foster I, Kesselman C.The grid: blue print for a new computing infrastructure [M]. San Francisco, USA: Morgan Kaufman Publishers,

1999.

[2] Y. Wang, N. Xiao, R. Hao, et al. Research on key technology in data grid [J].Journal of Computer Research and Development, 2002,

39(8):943-947.

[3] Chervenak A, Foster I, Kesselman C. The data grid: towards an architecture for the distributed management and analysis of large scientific

datasets [J] journal of Network and Computer Applications, 2000, 23(3):187-200.

[4] H. Lamehamedi , B. Szymanski, Z. shentu, and E. Deelman , ”Data replication strategies in grid environments, ” in Proceedings of the

Fifth International Conference on Algorithms and Architectures for Parallel Processing , 2002, pp.378-383.

[5] J. S. Plank and L. Xu, Optimizing Cauchy Reed-Solomon codes for Fault-Tolerant network Storage Applications, NCA-06: 5th _IEEE

International Symposium on Network Computing Applications, Cambridge, MA, July, 2006.

[6] M. Shorfuzzaman, P. Graham and R. Eskicioglu, “Popularity-Driven Dynamic Replica Placement in Hierarchical Data Grids” in Proc. 9th

International Conference on Parallel and Distributed Computing, Applications and Technologies, 2008, pp. 524-531.

[7] S.Goel and R.Buyya, Data Replication Strategies in wide area Distributed Systems, Enterprise Service Computing: From Concept to

Deployment. Robin G.Qiu, 2006 pp.211-241.

[8] W. H. Bell, D. G. Cameron, R. Carvajal-Schiaffino, A.P.Millar,K. Stockinger and F. Zini,” Evaluation of an economy-based file

replication strategy for a data grid” in Proceedings of the 3rd_{IEEE/ACM International Symposium on Cluster Computing and the}

Grid(CCGrid2003),2003.

[9] C.Haddad and Y.Slimani,”Economic model for replicated database placement in grid” in 7th IEEE International Symposium on Cluster

Computing and the Grid (CCGrid’07), 2007, pp.283-292.

[10] R.Brown, “Calendar Queues: A fast O(1) Priority Queue implementation for the simulation event set problem”

.Comm.ACM,31(10),pp.1220-1227,1988.

[11] A.Chakrabarti, R.A.Dheepak, and S.Sengupta”Integration of scheduling and replication in data grids” in Proceedings of 11th_{International}