Top PDF REVIEW OF CHECKPOINTING ALGORITHMS IN DISTRIBUTED SYSTEMS

REVIEW OF CHECKPOINTING ALGORITHMS IN DISTRIBUTED SYSTEMS

REVIEW OF CHECKPOINTING ALGORITHMS IN DISTRIBUTED SYSTEMS

time that elapses between two events of receiving ‘prepare checkpoint and ‘take checkpoint’ messages by any process is referred to as an active interval of that process .The maximum transmission delay incurred by any message to reach the destination is assumed to be t . It is also assumed that T>3t,Since checkpoint interval is obviously greater than active interval and length of active interval is bound to be at least ‘3t’ to survive the transmission delay of control messages and to enable logging of computational messages. If any process wants to send a message in side the active interval, initially it has to be logged and the process execution is continued. It enables the proposed protocol to handle the lost messages. Every process maintains to counters namely message received count (MRC) and message send count (MSC). These counters are initialized to zero at the start of active interval .The counts of MRC and MSC are incremented only within the active interval .Outside the active interval there will not be any change in their values. At time K*T + 3*t ,the initiator sends ‘take’ checkpoint’ signal to other processes .Afterwards it takes the checkpoint and exits from active interval. In response to take checkpoint, the rest of process will take checkpoint and exits from the respective active intervals. These checkpoints forms consistent global state .After exiting from the active interval, all the processes follow their normal operation .It implies that there is no checkpointing and logging of messages outside the active interval .In case of any failure ,every process rolls back to its latest checkpoint and necessary messages will be replayed from stable storage to reconstruct the previous state of the whole system .If failure occurs after all processes exited from their respective active intervals, then the application rolls back to the latest consistent global state namely ‘g’; else if failure occurs before one of the processes exits from their respective active intervals ,then the application rolls back to previous global state namely ‘g-1’.
Mostrar mais

9 Ler mais

Statics and Dynamics of Selfish Interactions in Distributed Service Systems.

Statics and Dynamics of Selfish Interactions in Distributed Service Systems.

It was recently proved possible to investigate the whole landscape of Nash equilibria in multi-agent games, such as public goods games on networks [7–9], by mapping the equilibrium condition on a constraint-satisfaction problem and then analyzing it using efficient message- passing algorithms based on the cavity method from statistical physics [10–12]. Here we adopt this approach to study the equilibrium properties in a simple model of distributed service pro- vision. We use such information to understand how efficient (i.e. with large aggregate utility) Nash equilibria obtained from best-response are compared to those obtained using different dynamical rules and how much the answer depends on the initial conditions. Moreover, the real self-organization processes of the agents are not exactly best-response processes and their details are normally not known, therefore a complete analysis cannot be restricted to a single dynamics. We bring evidence of the richness of the equilibrium landscape by describing the full set of Nash equilibria using a statistical mechanics analysis and comparing it with the typi- cal fixed-points of different dynamics. We also study the effects of correlating the users’ utilities with the loads they bring to the system. Finally we generalize the analysis to a stochastic case, in which the agents are present with a given probability. To do this, we introduce a new algo- rithmic approximation technique to perform the required average over the realizations of the stochastic parameters on single instances.
Mostrar mais

29 Ler mais

A Review on Distributed Control of Cooperating Mini UAVS

A Review on Distributed Control of Cooperating Mini UAVS

Inertial Measurement Unit (IMU) is a traditional sensor to measure the attitude and heading, that is the orientation. However alternate methods also exist exploiting the devices which are not meant for measuring the attitude like converting motor power consumption to Euler angles [23]. Infra-Red (IR) sensors are investigated to be used not only for height above the ground [24] but also for absolute attitude determination [25]. One idea of infra-red attitude sensor is to measure the heat difference between two sensors on one axis to determine the angles of the UAV because the earth emits more IR than the sky. However accuracy directly depends on the baseline length. Attitude can also be determined using multiple GPS antennas [26] or Signal-to-Noise Ratio (SNR) measurements [27]. Though accuracy is dependent on the distance between the sensors. Computer Vision based approaches are also possible. However IMUs are still the most relevant orientation sensor for systems which do not rely on external reference systems like optical tracking. A common approach is to use multiple sensors and then apply data fusion and some filtering technique like Kalman filter or Particle filter to find true value. Although knowledge of correct attitude and heading is the basis for accurate navigation, but this information is even more critical for cooperating units operating in close locality.
Mostrar mais

13 Ler mais

Performance Improvement and Deadlock Prevention for a Distributed Fault Diagnosis Algorithm

Performance Improvement and Deadlock Prevention for a Distributed Fault Diagnosis Algorithm

Fault diagnosis forms an important tool in the maintenance strategy of distributed computer systems. The theory of fault diagnosis in distributed systems has received a considerable attention over the years and numbers of diagnosis algorithms were proposed in literature. Modified SELF3 algorithm is among these algorithms and it has been considered as a starting point in this study. Using a simulated distributed system, this algorithm is implemented, where all actions that were specified in the algorithm has been introduced to the simulator in a unified message format. A time stamp is appended to each message, which represents the local clock of the node from which the message is issued.
Mostrar mais

6 Ler mais

Integration of JAM and JADE Architecture in Distributed Data  Mining System

Integration of JAM and JADE Architecture in Distributed Data Mining System

Data mining systems aim to discover patterns and extract useful information from facts recorded in databases[2]. Applying various machine learning algorithms which compute descriptive representations as well as patterns from which various knowledge can be acquired. However, are computationally complex and require all data to be resident in main memory which is clearly untenable for many realistic problems and databases[2]. Traditional data analysis methods that require humans to process large data sets are completely inadequate. Applying the traditional data mining tools to discover knowledge from the distributed data sources might not be possible [4]. Therefore knowledge discovery from multi-databases has became an important research field and is considered to be a more complex and difficult task than knowledge discovery from mono-databases [8]. The relatively new field of Knowledge Discovery and Data Mining (KDD) has emerged to compensate for these deficiencies. Knowledge discovery in databases denotes the complex process of identifying valid, novel, potentially useful and ultimately understandable patterns in data [1]. Data mining refers to a particular step in the KDD process. According to the most recent and broad definition
Mostrar mais

4 Ler mais

Building Adaptive Services for Distributed Systems

Building Adaptive Services for Distributed Systems

Separation of computation from adaptation concerns is common on component-based ar- chitectures [12, 13] as well as in service composition frameworks [8, 7]. This approach has proved to improve flexibility and maintainability and, hence, it has been applied in several areas [14, 15]. In component-based architectures, adaptation is typically achieved through the addition, removal, and exchange of system components or the interactions between those components, while in composition frameworks, mainly dedicated to the composition of net- work level protocols, adaptation is achieved typically through the exchange of algorithms and the fine-tuning of protocol parameters. The work described in this paper is a tenta- tive of combining the two types of approaches. The proposed approach targets services in general (i.e., network services but also application-specific services), offering not only fine- tuning of services, but also the support for the addition, removal and exchange of services at runtime. Support for adaptation has been addressed in the context of several frame- works, namely Ensemble [16] and Cactus [17]. Each framework offers a different approach. Ensemble is a protocol composition framework, that relies in vertical protocol stacks to offer a service. Runtime reconfiguration is achieved by switching algorithms. The switch relies in a coordinator-based orchestration, and a stop-and-go local reconfiguration mode, thus using a single strategy for switching protocols. Cactus is a service composition framework, whose dynamic reconfiguration relies in switching micro-protocols (whose composition re- sults in a service). Moreover, reconfiguration can also be achieved by parameters tuning. The framework offers monitoring, and agreement features to support automatic dynamic reconfiguration. Since the system’s global reconfiguration is expected, Cactus offers a single reconfiguration strategy based on inter-host global orchestration, and non-stop local recon
Mostrar mais

20 Ler mais

Hybrid Genetic Algorithms: A Review

Hybrid Genetic Algorithms: A Review

The selection of an appropriate approximation model to replace the real function is an important step in ensuring that the optimization problem is solved efficiently. Neural network [21 ch. 8] models have widely been used for function approximation [60]. Willmes et al. [61] compared neural networks and the Kriging method for constructing fitness approximation models in evolutionary algorithms. Jin and Sendhoff [62] combined the k-nearest-neighbor clustering method and a neural network ensemble to estimate a solutions’ fitness. Burdsall and Giraud-Carrier [53] used an approximation of the network’s execution to evaluate solutions fitness instead of constructing a radial basis function network (RBF) to optimize the topology of a neural network. The approximation is based on an extension of the nearest- neighbor classification algorithm to fuzzy prototypes. Ankenbrandt et al. [63] implemented a system of fuzzy fitness functions, to grade the quality of chromosomes, representing a semantic net. The system is used to assist in recognizing oceanic features from partially processed satellite images. Pearce and Cowley [64] presented a study of the use of fuzzy systems to characterize engineering judgment and its use with genetic algorithms. They demonstrated an industrial design application where a system of problem-specific engineering heuristics and hard requirements are combined to form a fitness function.
Mostrar mais

14 Ler mais

Performance Enhancement of Scheduling Algorithm in Heterogeneous Distributed Computing Systems

Performance Enhancement of Scheduling Algorithm in Heterogeneous Distributed Computing Systems

Schedule length is the maximum finish time of the exit task in the scheduled DAG [26]. The main function of task scheduling is minimizing an application time, so schedule length is the important metric to measure performance of task scheduling algorithm. The NDCP algorithm used critical path to detect task priority, because the critical path contains a very important tasks. The NDCP algorithm computes the first critical path to get rid the critical tasks then it computes the next critical path (after updating DAG) to get rid the next critical tasks and so on. It deals with the DAG, after computing a critical path, as a new DAG with new critical path. The NDCP algorithm uses also task duplication to reduce DRT of the successors, and it could reduce the overall time of application. The algorithm duplicates MP of VIT only. Therefore, the NDCP algorithm is more efficient than other algorithms. This appeared from Fig. 6 to Fig. 10. Figures show scheduling length versus number of tasks with varying number of processors 8, 16, 32, 64 and 80. Performance ratio in schedule length is 11%.
Mostrar mais

9 Ler mais

A Distributed approach for antenna subset selection in MIMO systems

A Distributed approach for antenna subset selection in MIMO systems

schemes and interference mitigation has been studied in [9], where a game-theoretic framework is used for a 2-cell scenario in downlink communication and each base-station (BS) aims at the maximization of its error probability making use of partial channel state information (CSI). In addition to this, [10] has generalized antenna selection algorithms proposed in [11], [12] for interference limited MIMO wireless environments. In that work, the antenna selection criterion is the maximization of the post-processing signal-to-interference-plus-noise ratio (SINR) at each BS through a non-iterative algorithm. However, [9] does not consider MIMO configurations and [10] does not use an iterative algorithm in order to mitigate the inherent interference. The works [13], [14], [6] have applied game theory in multi-user MIMO systems. Those works contribute with a general game-theoretic framework using the iterative waterfilling (IWF) algorithm [15] to find out optimal precoding matrices for SM systems. Also, they derive sufficient condition (e.g., convex precoder set) ensuring existence and uniqueness of the Nash equilibrium.
Mostrar mais

5 Ler mais

Fault-tolerant Stochastic Distributed Systems

Fault-tolerant Stochastic Distributed Systems

1.2 Previous Work and Brief Literature Review power grids towards creating smart grids (i.e. energy networks that can automatically monitor energy flows and adjust to changes in energy supply and demand accordingly), an import aspect in ensuring its continuous operation is the detection of malfunctioning components, outages in power sources, load buses that fail, communications errors between appliances, etc. that can perturb the overall power grid performance. According to the GE company website, “Power Interruptions cost European Union businesses e 150 billion each year. Outages cost the U.S. economy an average of $1.5 billion each week - $80 billion, with a ‘B’ each year.”. One important problem in designing observers and decision making mechanisms for these kind of networks is related to the fact that the observability of the whole system can be compromised by the presence of similar components, i.e., components with the same dynamics. In the case that only relative measurements are available (i.e., the difference between each pair of states), observability is lost. It motivates to consider how to design distributed tools for fault detection and isolation that can deal with the above problem without compromising the required accuracy. The solution should be distributed for fault detection and isolation with multiple detectors, thus potentially reducing the time to detect faults and the rate of missed detection.
Mostrar mais

238 Ler mais

A Review Of Fault Tolerant Scheduling In Multicore Systems

A Review Of Fault Tolerant Scheduling In Multicore Systems

Abstract: In this paper we have discussed about various fault tolerant task scheduling algorithm for multi core system based on hardware and software. Hardware based algorithm which is blend of Triple Modulo Redundancy and Double Modulo Redundancy, in which Agricultural Vulnerability Factor is considered while deciding the scheduling other than EDF and LLF scheduling algorithms. In most of the real time system the dominant part is shared memory.Low overhead software based fault tolerance approach can be implemented at user-space level so that it does not require any changes at application level. Here redundant multi-threaded processes are used. Using those processes we can detect soft errors and recover from them. This method gives low overhead, fast error detection and recovery mechanism. The overhead incurred by this method ranges from 0% to 18% for selected benchmarks. Hybrid Scheduling Method is another scheduling approach for real time systems. Dynamic fault tolerant scheduling gives high feasibility rate whereas task criticality is used to select the type of fault recovery method in order to tolerate the maximum number of faults.
Mostrar mais

5 Ler mais

A Tunable Checkpointing Algorithm for Distributed Mobile Applications

A Tunable Checkpointing Algorithm for Distributed Mobile Applications

additional data used for tracking execution states of ongoing application. The blocking time in our algorithm is very short on average, compared with the traditional algorithms. To see that, recall the steps of lines 5-7 in Fig . 3. In those steps, blocking time arises only when a globally consistent local checkpoint is not found within the R- distance. In many cases, such a situation is not the case. Even though there is a need for creating a new global checkpoint, our protocol will choose a local checkpoint whose checkpointing overhead is most cheap. That is done by the logging agent by using the routine CreateGCS() of Fig. 5. Using this routine, the logging agent can choose among previous local checkpoints any one that demands a least number of local checkpoints.
Mostrar mais

9 Ler mais

Soft-Checkpointing Based Coordinated Checkpointing Protocol for Mobile Distributed Systems

Soft-Checkpointing Based Coordinated Checkpointing Protocol for Mobile Distributed Systems

Abstract: Minimum-process coordinated checkpointing is a suitable approach to introduce fault tolerance in mobile distributed systems transparently. It may require blocking of processes, extra synchronization messages or taking some useless checkpoints. All- process checkpointing may lead to exceedingly high checkpointing overhead. To optimize both matrices, the checkpointing overhead and the loss of computation on recovery, we propose a hybrid checkpointing algorithm, wherein an all-process coordinated checkpoint is taken after the execution of minimum-process coordinated checkpointing algorithm for a fixed number of times. In the minimum-process coordinated checkpointing algorithm; an effort has been made to optimize the number of useless checkpoints and blocking of processes using probabilistic approach and by computing an interacting set of processes at beginning. We try to reduce the loss of checkpointing effort when any process fails to take its checkpoint in coordination with others. We reduce the size of checkpoint sequence number piggybacked on each computation message.
Mostrar mais

7 Ler mais

Agent-based distributed manufacturing control: A state-of-the-art survey

Agent-based distributed manufacturing control: A state-of-the-art survey

Current approaches design simple reconfiguration mechan- isms, normally focusing on the design phase, that are not flexible enough to replace manually reconfigurable systems. More com- plex and powerful reconfiguration methods are required, em- bodying learning and self-organization capabilities in distributed entities and also designing distributed control based on swarm intelligence theories. A step ahead is the application of evolution mechanisms that allows a system to evolve into new control structures adapting its behavior to the environmental conditions. Interoperability is a crucial factor in the development of distributed and heterogeneous production control applications. The solution to those problems requires the use of standard platforms that support transparent communication between distributed control components or applications. Ontologies play a decisive role to support interoperability problems. However, the development of an ontology may take from a few hours up to months or even years depending on the choice of the language, the covered topics, and the level of formality and precision (Borgo and Leita˜o, 2006). The ontologies used in industrial applications are usually proprietary, very simple and just hierarchical struc- tures of concepts. Even the FIPA specifications do not support the complete interoperability. Additionally, the definition of common ontologies for a specific domain is not an easy job: people with different background have different points of view over the same domain concepts.
Mostrar mais

13 Ler mais

OPTIMAL CONTROL ALGORITHMS FOR SECOND ORDER SYSTEMS

OPTIMAL CONTROL ALGORITHMS FOR SECOND ORDER SYSTEMS

The quality of control in a system depends on settling time, rise time and overshoot values. The main problem is to optimally reduce such timing parameters, avoiding undesirable overshoot, longer settling times and vibrations. To solve this problem, many authors have proposed different approaches. A first approach is the Proportional Integral Derivative (PID) controllers application. They are extensively used in industrial process control application. Vaishnav and Khan (2007) designed a Ziegler-Nichols PID controller higher order systems. A tuning method which uses PID controller has been developed (Shamusuzzoha and Skogestad, 2010). Such method requires one closed-loop step setpoint response experiment similar to the classical Ziegler-Nichols experiment. However, in complex systems characterized by nonlinearity, large delay and time- variance, the PID’s are of no effect (Cao et al., 2008). The design of a PID controller is generally based on the assumption of exact knowledge about the system. Because the knowledge is not available for the majority of systems, many advanced control methods have been introduced.
Mostrar mais

15 Ler mais

Distributed Algorithms for Target Localization in Wireless Sensor Networks Using Hybrid Measurements

Distributed Algorithms for Target Localization in Wireless Sensor Networks Using Hybrid Measurements

Recursive methods, such as Newton’s method combined with gradient descent method, are often used to obtain the ML solution [33]. However, since the objective function may have numerous local optima, it is possible that local search methods may get trapped in them. This issue can be overcome by using approaches such as grid search methods and linear or convex relaxation techniques, which can also be used to provide good initial points for more accurate iterative algorithms [35, 36, 48]. Grid search methods solve the ML problem by forming a grid and passing each point of the grid through the ML objective function until the optimal point is found. This approach is suboptimal since it doesn’t search for the solution in a efficient way; the result is a time-consuming method with computational complexity and memory requirements proportional to the grid size and number of unknown parameters. Less complex are the linear estimators such as the linear least squares. Methods of this type are very efficient regarding the processing time and computational complexity. Nonetheless they are based on heavy approximations so low accuracy is to be expected, especially in the presence of high noise levels [69]. An- other way of tackling the issue is to employ convex relaxation techniques. The original non-linear and non-convex ML problem is transformed into a convex one. The advantage is that convergence to the globally optimal solution is guaranteed. Still, the obtained convex problem is a relaxed version of the original problem, therefore, its solution may not correspond to the original ML problem’s solution [11].
Mostrar mais

85 Ler mais

Securing networked embedded systems through distributed systems analysis

Securing networked embedded systems through distributed systems analysis

} Local variables: buffer Frame pointer Return address Function Parameters str.. Filling garbage Evil ar gs.[r]

107 Ler mais

Distributed Algorithms for Target Localization in Wireless Sensor Networks Using Hybrid Measurements

Distributed Algorithms for Target Localization in Wireless Sensor Networks Using Hybrid Measurements

Abstract—This paper presents a performance analysis of two recently proposed distributed localization algorithms for cooperative 3-D wireless sensor networks (WSNs) in a more realistic scenario. The tested algorithms rely on distance and angle measurements obtained from received signal strength (RSS) and angle-of arrival (AoA) information, respectively. The mea- surements are then used to derive a convex estimator, based on second order cone programming (SOCP) relaxation techniques, and a non-convex one that can be formulated as a generalized trust region sub-problem (GTRS). Both estimators have shown excellent performance assuming a static network scenario, giving accurate location estimates in addition to converging in few iterations. Here, we test their performance considering differ- ent probabilities of communication failure between neighbour nodes at the broadcast phase. Our simulations show that their performance holds for high probability of communication failure and that convergence is still achieved in a reasonable number of iterations.
Mostrar mais

85 Ler mais

Designing Expert System for Detecting Faults in Cloud Environment

Designing Expert System for Detecting Faults in Cloud Environment

from one vertex /proposition (e. g. A) We can reach to two exclusive vertices (e. g., C and ¬C). To check for this kind of fault, we first determine the set of exclusive vertices, and then we only need to check whether the exclusive vertices are in the same Complementary set and none of them is the root of the set. If they are in the same set and none of them is a root, then there is a contradiction anomaly, otherwise there is no contradiction anomaly. Unreachability faults occur if there is no path between any two given vertices. To check for that, we first specify whether the two vertices are in the same Complementary set or not. If true, we determine whether there is a path between them, and in this case there is no unreachability anomaly. The benefit of our approach is its ability to detect faults as the dynamic rule base is being updated. If a rule r is added to the dynamic rule base, then the new dynamic rule base can be verified against various faults without rebuilding any structures.
Mostrar mais

8 Ler mais

Reduction algorithms for solving large systems of logical equations

Reduction algorithms for solving large systems of logical equations

Large systems of logical equations are considered in this pa- per, each depending on a restricted number of variables. A method of reduction is suggested that reduces the number of roots in separate equations, which in its turn saves time spent for find- ing roots of the whole system. Three mechanisms of reduction are proposed, each looking for some prohibited combinations of variables in separate equations (combinations that do not satisfy the equations). The first procedure looks for constants (prohib- ited values of some variables, or 1-bans). The second one looks in a similar way for prohibited combinations of values on pairs of variables (2-bans) and finds all their logical consequences closing the set of discovered 2-bans. The third analyses the equations by pairs, finds r common variables for them, and checks one by one all different combinations of their values looking for prohibited ones (r-bans). The found bans are used for deleting some roots in other equations. After this new bans could be found, so the procedure of reduction has the chain nature. It greatly facilitates solving large systems of logical equations. Sometimes it is enough to find the only root of a system or prove its inconsistency.
Mostrar mais

13 Ler mais

Show all 10000 documents...