Although BCI applications utilize low cost commercial devices for local implementation, to our knowledge, there are not many implementation of local BCI signal interpretation processing units in an embedded form; most developments rely on the use of high processing computers as end device. Albeit some devices provide portability to the brain wave reading headset in forms presented as in , the use of custom devices is still required. Moreover, most of the hardware attends the brain wave signal processing part without presenting a way to display different forms of neurofeedback (e.g. spellers or games. One of the end goals of this research is to provide a set of classifiers that can be implemented on very low time computational performance in order to be implemented on high end commercial embedded hardware with portability capabilities (e.g. processors that are used in tablets, with high definition video processing qualifications) that could be useful for different available low cost EEG-based commercial based BCI tools. Therefore, a way to process and determine a set of post-processed synchronous signals consisting of attention and meditation data from a single channel electrode tool like the well known Neurosky Mindwave (http://neurosky.com) is presented as well as clas- sification and analysis through unsupervised and supervised methodologies. In order to have almost real time response after trial, optimization with parallelprocessing software tools was implemented and compared with a high processing embedded processing device with parallelprocessing capabilities.
A Network is an interconnected collection of autonomous computers. Two computers are said to be interconnected if they are able to exchange information. The connection can make up through links  . A typical IN consists of a number of switching elements (SE’s) [1,9] and interconnection links that enables the processor to communicate themselves or with memory units. As a compromise between two extremes (time- shared and crossbar networks) and various operation characteristics of an IN, multistage interconnecting network  was introduced. A multistage Interconnection network is capable of connecting an arbitrary input terminal to an arbitrary output terminal  . Generally a MIN consists of more than one stage of small interconnection networks called Switching Elements (SEs). An irregular class of multistage interconnection network for parallelprocessing named Improved Four Tree (IFT) has been proposed and analyzed in this study.
Typical real time computer vision tasks require huge amount of processing power and time for handling real time computer vision applications. The nature of processing in a typical computer vision algorithm usually ranges from many small arithmetic operations (Fine Grain Parallelism) to symbolic operations (Coarse grain parallelism). The task become more complicate while considering image processing application due to large data sets and there processing. The existing processing system responds efficiently under sequential working and result in efficient output, but results in a slow operating system which results in a inefficient processing system under high speed image processing systems. Parallelprocessing founds to be the only solution to obtain the require processing speed for handling high-speed image processing applications. The existing image processing systems support usually only one suit of operations at once and fail to respond under multiple tasks. System taking single instruction or multiple instruction process operates using low level and high-level operations. Generally SIMD architecture is suitable under low level processing while MIMD architecture is suitable for high-level processing. This paper explores on modeling and simulation of parallel Image Processing architecture for Image Processing applications using Parallel Virtual Machine(PVM) , MATLAB external interface API and C language on the Linux operating system platform.
To accomplish these requisites a solution based in parallelprocessing was developed. The architecture proposed by the ECU2010 to accomplish these requirements is based on a parallelprocessing system composed of one or more equal modules connected by a serial link, in a ring topology. Also each one of the module has a peripheral bus, where sensors and actuators can be connected. The modules are independent of each other, each one having its processor, storage memory and log capability. The serial link between the modules is used to transfer variables needed across multiple modules. With this architecture if more processing power, storage space or peripherals are needed in the system, we would only have to insert one or more modules in the serial ring, Figure 4. It was also part of the ECU2010 project the development of an Integrated Development and Management System (IDMS), which is a single application to allow the development, deployment and debug of the functions in the hardware platform. Throughout the complete cycle of the development of an ECU software, there would only be the need for this tool, from the laboratory to the racing track.
Quantum and Evolutionary computation are new forms of computing by their unique paradigm for designing algorithms.The Shor’s algorithm is based on quantum concepts such as Qubits, superposition and interference which is used to solve factoring problem that has a great impact on cryptography once the quantum computers becomes a reality. The Genetic algorithm is a computational paradigm based on natural evolution including survival of the fittest, reproduction, and mutation is used to solve NP_hard knapsack problem. These two algorithms are unique in achieving speedup in computation by their adaptation of parallelism in processing.
The number of astronomical images produced grows daily, in addition to the amount already stored. Great sources of data are solar images, whose study can detect events which have the capacity to affect the telecommunications, electricity transmission and other systems on Earth. For such events being detected, it becomes necessary to treat these images in a cohe- rent way,considering aspects of storage, processing and image visualization. Combining image processing algorithms and high performance computing techniques facilitates the handling of information accurately and in a reduced time. The techniques for high performance computing used in this work were developed for hybrid systems, which employ a combination of shared and distributed memory systems. Parallel version of some established techniques were pro- duced for hybrid systems. Moreover, new techniques have been proposed and tested for this system. To evaluate the improvement in performance, comparisons were made between serial and parallel versions. In addition to the analysis, this text also presents a system with capacity to store, process and visualize solar images. In one of the techniques for detecting filaments, the process was accelerated 120 times. Also an auxiliary process for the detection of brighter areas was 155 times faster than the serial version.
On the other hand, in multiple-tree implementation, the Divide-and-Conquer technique is applied first to the sequences to make subsequences. Then one of these subsequences are kept by the main processor and the rest of the sub sequences are sent to n-1 processors in parallel. Message passing technique is used for sending these n-1 subsequences to n-1 processors. Then each of those processors (including main processor) will build guide tree from their own subsequences and execute the MSA module of the program according to their own guide tree in parallel. After doing the subsequence alignment, all of these n-1 processors will send the alignment results to the main processors. The main processor will merge the sub sequence alignment, sent by n-1 processors, to complete the final alignment.
As laser scanning technology improves and costs are coming down, the amount of point cloud data being generated can be prohibitively difficult and expensive to process on a single machine. This data explosion is not only limited to point cloud data. Voluminous amounts of high-dimensionality and quickly accumulating data, collectively known as Big Data, such as those generated by social media, Internet of Things devices and commercial transactions, are becoming more prevalent as well. New computing paradigms and frameworks are being developed to efficiently handle the processing of Big Data, many of which utilize a compute cluster composed of several commodity grade machines to process chunks of data in parallel.
The amount of data collected and stored in databases is growing considerably for almost all areas of human activ- ity. Processing this amount of data is very expensive, both humanly and computationally. This justifies the increased interest both on the automatic discovery of useful knowl- edge from databases, and on using parallelprocessing for this task. Multi Relational Data Mining (MRDM) tech- niques, such as Inductive Logic Programming (ILP), can learn rules from relational databases consisting of multiple tables. However, current ILP systems are designed to run in main memory and can have long running times. We propose a pipelined data-parallel algorithm for ILP. The algorithm was implemented and evaluated on a commodity PC clus- ter with 8 processors. The results show that our algorithm yields excellent speedups, while preserving the quality of learning.
A distributed database (DDB) is a collection of multiple, logically interconnected data- bases distributed over a computer network. A distributed database management system, distributed DBMS, is the software system that permits the management of the distribut- ed database and makes the distribution trans- parent to the users. A parallel DBMS is a DBMS implemented on a multiprocessor computer. . The parallel DBMS imple- ments the concept of horizontal partitioning  by distributing parts of a large relational table across multiple nodes to be processed in parallel. This requires a partitioned execution of the SQL operators. Some basic operations, like a simple SELECT, can be executed in- dependently on all the nodes. More complex operations are executed through a multiple- operator pipeline. Different multiprocessor parallel system architectures , like share- memory, share-disks or share nothing, define possible strategies to implement a parallel DBMS, each with its own advantages and drawbacks. The share-nothing approach dis- tributes data across independent nodes and has been implemented by many commercial systems as it provides extensibility and avail- ability.
In the context of a software platform that performs complex workflows to analyze SAF-T files (Standard Audit File for Tax Purposes) in batch mode, the need to impose complex restrictions to the sequencing and concurrency of each task arises. The purpose of this work is to identify relevant restrictions that may need to be imposed on workflows, as well as distributing and monitoring their execution among any number of “slave” machines, that perform the actual computational work of each task of the workflow. The final solution should improve both flexibility in workflow orchestration as well as performance improvements when running multiple workflows in parallel. Besides analyzing the existing system and eliciting its requirements, a survey of existing so- lutions and technologies is made in order to architect the final solution. Although this work aims to improve the existing system from which it arose, it should be developed in an agnostic man- ner, so as to be integrated with any system that requires the handling of complex computational workflows.
Above we defined the concept of a parallel kernel. By composing multiple kernels we arrive an ‘abstract algo- rithm’: a description of data dependencies between parallel processes, but without regard for architectural details. This corresponds to the much-studied dataflow model where a task can start (‘fire’) if all of its inputs are available, which is when the earlier tasks have finished; see for instance .
We used the same linked list implementation, conﬁgurations and the mixed workload of the previous experiments to show how reconﬁguration can boost performance. Figure 4 presents the throughput for sequential (0P3S), parallel (3P0S) and re- conﬁgurable (REC) executions. The hybrid systems (1P2S and 2P1S) presented a behavior similar to the parallel execution since the performance was determinated by parallel replicas (see previous discussion about Figure 3). In the reconﬁgurable execution, the system started with only 1 active thread and used the policy of Algorithm 7 to activate/deactivate threads, ranging from 1 to 10 active threads. Figure 2 (mixed columns) shows the average throughput and latency perceived by clients. Reconﬁguration improves system performance and, at the same time, saves resources since threads are deactivated when they are unnecessary and their presence may negatively impact performance. For example, at time 60 seconds when clients start to invoke only dependent requests, performance drops to approximately 0.8 Kops/sec and the system starts to deactivate threads. After approximately 10 seconds, it remains with only one active thread and the throughput becomes similar to a sequential execution (approximately 1.8 Kops/sec). Notice that the policy will deﬁne the time to react after a workload change.
(1000 items), however the efficiency of our algorithm respectively to Quick sort is about 64.56% for a large input size and does not change if the input size increased from n to 10 n (10000-100000 items). And that is accorded with our goal is to propose an algorithm simple that works in parallel and has a good efficiency for all input size.
The decision by the SCA is a timely one. It has brought certainty to a situation where there were widely diverging views on the parallel application of the DFA and the Ordinances. It somehow seems inconceivable that a purpose so clearly spelt out in the DFA itself and commented upon by the courts and academic writers could be overlooked so that non-land reform "land development areas" could be established. In this context development tribunals could be seen to be exercising their powers for an improper purpose. 75
Complex networks are used in a wide range of artificial and natural systems. The detection of small patterns in these networks lead to a better understanding of their structure and functionality. This operation is called subgraph search and has been applied to networks in many fields. However, it is a computationally hard problem and because of that its application is limited by the size of the pattern being searched and the size of the network. For the purpose of decreasing those limitations, this work develops a parallel MapReduce strategy that speeds up subgraph census in complex networks. Moreover, a plugin to do subgraph search in a friendly way was built inside Cytoscape software. This chapter summarizes the main contributions done and concludes with a directions for future work.
Indeed, in the traditional function, because the test results are dependent on the human user, the possibility of an invalid conclusion is very high. In other words, the accuracy of conclusion from obtained data is dependent on the skill and experience of the examiner. Thus, by the automated processing of NDT signals and using artificial intelligence techniques, it is possible to step up the optimization of nondestructive inspection methods, namely improving overall system performance, in terms of reliability and system implementation costs. In this regard, due to the random, non-linear and non-stationary properties of the NDT signals, AI methods and statistical signal processing techniques have been able to play an effective role in solving various NDT problems. However, in the NDE domain, less attention has been paid to the statistical processing of NDT signals than other engineering areas. Therefore, according to the properties of these signals, it is desirable to focus more on this branch in the future.
We argue that one critical aspect is the neighbourhood of a configuration – the set of possible moves – which define transitions between configurations. In other words, the neighbourhood graph of the problem. If a problem has a dense neighbourhood, each of these moves can be explored in parallel. Thus, when a promising configuration (with lower cost) is propagated and several moves are possible, they can be explored in parallel and the probability that one of these moves will lead to a faster path towards an optimal solution increases. On the contrary, if only one move is possible, there is little benefit from using a configuration which was propagated by another search thread and here the algorithm has to rely more on the stochastic behaviour of Adaptive Search to achieve diversification. Another important aspect is the number of local minima and resets and how they both relate. A problem that finds a large number of local minima (skewed landscape) before encountering an optimal solution benefits less from continuing with a configuration which seems promising. This configuration is heuristically promising but in reality this informa- tion is less meaningful than it should. Similarly, a problem with a high number of partial resets suffers from the same issue. To improve on such problems, the configuration should be used differently, as a guide from which other kinds of information can be computed. There are some aspects that we have not analysed but which are also important to benefit from parallelism. The solutions of a problem (if known and possible) ought to be analysed: their number, density and distribution are important characteristics, with a significant impact on the success of a walk. A problem with solutions uniformly spread over the search space may only require a independent multiple-walk (as we have seen in the case of the CAP) whereas a problem in which solutions are clustered in some part may benefit from more complex mechanisms of cooperation which tend to intensify the search.
constraint-based problems such as (large instances of) Magic Square or All- Interval, independent multiple-walk parallelization does not yield linear speedups, reaching for instance a speedup factor of “only” 50-70 for 256 cores. However on the Costas Array Problem, the speedup can be linear, even up to 8000 cores . On a more theoretical level, it can be shown that the parallel behavior depends on the sequential runtime distribution of the problem: for problems admitting an exponential distribution, the speedup can be linear, while if the runtime dis- tribution is shifted-exponential or (shifted) lognormal, then there is a bound on the speedup (which will be the asymptotic limit when the number of cores goes to infinity), see  for a detailed analysis of these phenomena.