Information and communication technology has the capability to improve the process by which governments involve citizens in formulating public policy and public projects. Even though much of government regulations may now be in digital form (and often available online), due to their complexity and diversity, identifying the ones relevant to a particular context is a non-trivial task. Similarly, with the advent of a number of electronic online forums, social networking sites and blogs, the opportunity of gathering citizens’ petitions and stakeholders’ views on government policy and proposals has increased greatly, but the volume and the complexity of analyzing unstructured data makes this difficult. On the other hand, textmining has come a long way from simple keyword search, and matured into a discipline capable of dealing with much more complex tasks. In this paper we discuss how text-mining techniques can help in retrieval of information and relationships from textual data sources, thereby assisting policy makers in discovering associations between policies and citizens’ opinions expressed in electronic public forums and blogs etc. We also present here, an integrated textmining based architecture for e-governancedecisionsupport along with a discussion on the Indian scenario.
I would also like to thank the enormous support I have received from my colleagues at the company I work for: Nicolau Romão and Joaquim Calhau for making my studies financially possible and for giving me the time and corporate knowledge needed to better align research with corporate needs; Rui Costa for supporting the Case Study and for believing that doing thinks right is researching, studying and implementing a real aligned solution with best practises, frameworks and body of knowledge; Telmo Henriques and Nuno Perry for his wisdom on research and for is enormous support when time, patience and other priorities almost made this research a second priority; Tito Torres for his knowledge on IT-Governance best practises and for believing this is a true important topic that deserved more research; Carlos Gouveia for his belief in the project and for is support on making meetings happen with all parties involved in the project. I would also like to thank a big friend of mine that shares the same passion for Data Driven DecisionSupport Systems for is contribution in validating the model even when is time balance was already negative.
Problem is that classic data table from textual documents contains real-valued attributes. Then we need some fuzzification of classic crisp method. One approach to one-sided fuzzification was presented in . Concept lattice created from real- valued (fuzzy) attributes is called one-sided fuzzy concept lattice. The proposed algorithm for FCA discovery is computationally very expensive and provides a huge amount of concepts (if we use definition). One approach to solve the issue is based on the problem decomposition method (as was described in ). This paper describes one simple approach to creation of simple hierarchy of concept lattices. Starting set of documents is decomposed to smaller sets of similar documents with the use of clustering algorithm. Then particular concept lattices are built upon every cluster using FCA method and these FCA-based models are combined to simple hierarchy of concept lattices using agglomerative clustering algorithm. For our experiments we used GHSOM algorithm for finding of appropriate clusters, then ´Upper Neighbors´ FCA algorithm (as defined in ) was used for building of particular concept lattices. Finally, particular FCA models were labelled by some characteristic terms and simple agglomerative algorithm was used for clustering of local models, with the metric based on these characteristic lattices terms. This approach is easy to implement in distributed manner, where computing of local models can be distributed between nodes and then combined together. This will lead to reduction of time needed for building the concept model on one computer (sequential run). Next, we will shortly describe the idea of one-sided fuzzy concept lattice, process of data pre-clustering, concept lattices creation and algorithm for combination of concept lattices (introduced in ).
In Figure 10 it is important to highlight the points 1 and 2 in red. The first point illustrates the software process model file name setting, defined when process modeler saves the model. Then in the second section, an interaction between configUpdateComponentAction method (implemented in action controller) and the integration façade of AnnotationBasedIntegration is highlighted. In particular, the second section in Figure 10 illustrates basic integration parameters value setting. It is worth noting, in this context, the last line where an array of String is passed as parameter to GenericIntegrationFacade containing #ENTITY_NAME# and #FOCUS_ENTITY_TYPE# parameters. The first is one refers to the name of the entity (a phase in the example), while the second is the type of entity that is in focus. It is noteworthy that, at last, there is a reference to a concept of the domain ontology that was produced at the activity named as evolution of ontology base (not detailed in this paper). In the example, as envisioned that integration could be done with any entity modeled on Spider PM, it was considered valid to define a parameter that represents the type of entity modeled (thus allowing a single parameterized assistance to be defined). However, this decision creates a stronger coupling between Spider PM Integration implementation and the assistance engine, because the tool developer team needs to understand some specific ontologies details used by assistance engine. Another way to implement the same behavior could have been the setting of many assists (one for each entity type).
Classification is an important problem for machine learning and data mining research communities. The basic idea of a classification algorithm is to construct a classifier according to a given training set. Once the classifier is constructed, it can predict the class value(s) of unknown test data sample(s).Suppose we work for a web site that maintains a public listing of job openings from many different companies. A user of the web site might find new career opportunities by browsing all openings in a specific job category. However, these job postings are speared from the Web, and do not come with any category label. Instead of reading each job post to manually determine the label, it would be helpful to have a system that automatically examines the text and makes the decision itself. This automatic process is called text classification. Text classification systems categorize documents into one (or several) of a set of pre-defined topics of interest.
These programs are very different in terms of their runtime environment, file format acceptance, thresholding procedures, pixel/voxel/object recognition, subject of analysis, volumetric quantification, co-localization analysis, determination of structural parameters, and automation. ImageJ (http://rsb.info.nih.gov/ij/) is a free, open source, and extensive scientific image processing software, developed by Wayne Rasban in 1987, and designed to handle various types of imaging data. Over the years it underwent several changes, keeping however the main ideas: (i) a biological image software that runs on any operative system, with the help of Java Runtime Environment; (ii) has a simple, user-friendly interface, with a single toolbar (see Figure 6); and (iii) extensibility via user-designed macros and plugins. Rasban chose a flexible approach to his software that allows the user to add functionality on their own, but in a manner that would allow sharing with others through macros and plug-ins. Macros are simple custom programming scripts that automate a task inside a large piece of software. The user does not need to have any programming skills to create a macro: in fact, with the help of a “macro record”, it is possible to record any action manually, and thereby create a work flow that one can use repeatedly and share with others. Over 325 macros are currently available at http://rsbweb.nih.gov/ij/macros/. There are also over 500 plug-ins developed by users and available at http://rsbweb.nih.gov/ij/plugins/index.html. Since these plug-ins were developed to solve specific problems one can expect the continuous increase of this database. The plug-ins are designed to, for instance, count particles or enable the input of more specific instrument file formats — Bio- formats plug-in .
The decision-tree algorithm is one of the most effective classification methods. The data will judge the efficiency and correction rate of the algorithm. We used 10-fold cross validation to compute confusion matrix of each model and then evaluate the performance by using precision, recall, F measure and ROC space. As expected, bagging algorithms, especially CART, showed the best performance among the tested methods. The results showed here make clinical application more accessible, which will provide great advance in healing CAD, hepatitis and diabetes. The survey is made on the decision tree algorithms ID3, C4.5 and CART towards their steps of processing data and Complexity of running data. Finally it can be concluded that between the three algorithms, the CART algorithm performs better in performance of rules generated and accuracy. This showed that the CART algorithm is better in induction and rules generalization compared to ID3 algorithm and C4.5 algorithm. Finally, the results are stored in the decisionsupport repository. Since, the knowledge base is currently focused on a narrow set of diseases. The approach has been validated through the case study, it is possible to expand the scope of modeled medical knowledge. Furthermore, in order to improve decisionsupport, interactions should be considered between the different medications that the patient is on.
The nature of SVP meant that numerous purchases could be made from one supplier, either within a single year or across different years. The next step is to investigate suppliers from whom purchases were made from them every year. These suppliers, the frequency of the transactions with them, amount spent and the types of goods and services are summarized in Table 5. It is observed that most of the goods and services identified in Approach 2 were all present in Approach 1 except for two main ones, “Standby Technician Support for Video Conferencing System” and “Rental of Cherry Picker”. This is probably because each appeared as one single transaction only in most of the years; hence the term occurrence in the TextMining analysis was low i.e. it did not feature significantly in the generated Word List or Centroid Cluster Model. However, these observations were noteworthy as they were repeated purchases of relatively significant values in the context of SVP (averaging between S$2,500 and S$2,800) in most years. B. OQ
The company should in first place monitor customer reviews particularly on independent servers (review sites, ranking sites), such as the TripAdvisor, Yelp! or Cnet. These servers are primarily intended for sharing customer experiences and are increasingly powerful source of information for future purchases (or for maintaining customer loyalty with existing customers). Taking into consideration the classic 5-step marketing-oriented decision- making process describing the buying behavior – (a) problem recognition, (b) information search, (c) evaluation of alternatives, (d) purchase decision & (e) post purchase behavior – the huge role of ranking sites can be seen with all points except the first one. If the new customer encounters in phase (b) or (c) the negative rating, the company loses them; if the customer shares a negative feeling about the product/service along after the purchase –
Explanatory mining in medical data has been extensively studied in the past decade employing various learning techniques. Bojarczuk et al  applied genetic programming method with constrained syntax to discover classification rules from medical data sets. Thongkam et al  studied breast cancer survivability using AdaBoost algorithm. Ghazavi and Liao  proposed the idea of fuzzy modeling on selected features of medical data. Huang et al  introduced a system to apply mining techniques to discover rules from health examination data. Then they employed a case-based reasoning to support the chronic disease diagnosis and treatments. The recent work of Zhuang et al  also combined mining with case- based reasoning, but applied a different mining method. They performed data clustering based on self-organizing maps in order to facilitate decisionsupport on solving new cases of pathology test ordering problem. Biomedical discovery support systems are recently proposed by a number of researchers [5, 6, 10, 29, 30]. Some work [20, 25] extended medical databases to the level of data warehouses.
Big Data volume (data magnitude) definitions are relative, depending on factors such as time and data type. What can be considered Big Data today may not be in the future, as data storage capacities will increase, which will allow the collection of larger datasets. Variety, another Big Data property, refers to the heterogeneous structure of a dataset. Technological advances allow companies to use various types of data, whether structured (e.g., relational tables), semi-structured (e.g., XML) or unstructured (e.g., text, image, audio, and video). Velocity refers to the rate that the data is generated and the speed that it should be analyzed and put into practice. Recently, the number of V’s increased with the need to better frame Big Data. Accordingly, Seddon and Currie (2017) have developed a model with four additional V’s: variability, veracity, visualization, and value. The abovementioned characteristics leveraged existing approaches (e.g., predictive analytics) or gave rise to novel ones (NoSQL databases) to handling data. Thus, Big Data is an umbrella term covering all of them under a new philosophy devoted to dealing with this phenomenon (Chen et al., 2012).
Assim, a AS ou Opinion Mining é o estudo computacional de opiniões, sentimentos e emoções expressadas em texto. A informação textual pode ser classificada em dois tipos principais: factos e opiniões. Os factos são expressões objectivas sobre entidades, eventos e as suas propriedades. As opiniões são geralmente expressões que descrevem os sentimentos e avaliações das pessoas em relação a determinadas entidades, eventos e suas respectivas propriedades (Liu, 2010). Apesar de grande parte da literatura apresentar a AS como o estudo computacional de sentimentos, esta pode ser utilizada para muitos outros casos, tal como mostra este projecto. É importante referir que a AS se trata de um problema de classificação e que, como tal, pode ser utilizada para classificar informação textual de acordo com a sua polaridade, independentemente da frase denotar algum sentimento ou não. A frase "Taxa Euribor mantém forte queda" apenas descreve um facto, no entanto poderá ser classificada como positiva ou negativa para a economia.
In spite of the fact that important insights have been gained, there are questions that are still left unsolved. Knowing whether found results will apply in different decision settings still requires further investigation. Being a more or less structured decision task, having a larger number of decision reviewers, baring different convergence processes, having decisions with intertwined discussions in distinct moments in time, different backgrounds in using GSS. What would happen if some information elements were deleted (for instance, due to a court order)? What is the threshold for deleting information, but still preserving the ability to reconstruct a decision? Which type of information is more likely to deteriorate the ability for decision reconstruction if deleted? Is there any threshold for determining the overall accuracy of the decision reconstruction? These are just a few questions and decision scenarios that need to be dealt with.
The utility of an airport benchmarking process is widely recognised in a world where competition between airports is becoming a reality. Therefore, there is a need for a wide consensus to establish and construct reliable databases for measuring airport performance and consequently the development and the implementation of even more accurate performance management systems. A wide number of studies that focus on airport benchmarking - but mainly based on economic and productivity performance indicators, are done and can be found in the literature. However, there is a lack of studies that focus on the airport performance in a holistic form, set in different areas for a truly global analysis. A Multi-Criteria Decision Analysis (MCDA) approach applied to Safety key performance area from PESA–AGB (Performance Efficiency Support Analysis – Airport Global Benchmarking) model, based on MACBETH (Measuring Attractiveness by a Categorical Based Evaluation Technique) methodology, is used to evaluate its impact on the overall performance of three airports and under two distinct processes, peer and self-benchmarking - along eleven years. The Safety area performance analysis is done comparing scores among different airports (peer benchmarking) and assessing scores of each airport along several years (self-benchmarking). This proves to be a useful and flexible tool for stakeholders. The results evidence the importance of this type of evaluation to understand how airports deal with Safety issues and how this key performance area may impact in any benchmarking process, and on the overall evaluation of such complex transport infrastructure too.
A search engine is an information retrieval system to help find out the information contained in documents stored on a computer system. The results provided by this kind of a system are usually in form of a list. Search engines basically work on the concept called ‘Text-Mining’. Textmining is a variation on a field called data mining and refers to the process of deriving high-quality information from unstructured text. In this paper we are going to depict an intelligent agent based search engine tool which takes the input from user in form of keyword and based on the keyword, find out the matching documents and show it to user (in the form of links). This tool uses a new ‘Ranking Algorithm’ to rank the documents.
The first official initiative of the European Commission for promoting public participation was the 5 th Framework Programme – Raising Public Awareness of Science and Technology, in the late 1990s. In 2001, there was also the White Paper on European Governance with a large section on science and citizenship, highlighting the importance of participation in governance. The same year, following the Lisbon Agenda Declaration, the European Commission officially started the Science and Society Action Plan “[…] aimed at developing stronger relations between science and the broader society by supporting measures such as public consultation, public debates, and public involvement concerning the development and assessment of science and specific technologies” (Felt and Wynne, 2007, p. 56).
For instance, Koh and Low  constructed a decision tree by using a data sample of 165 organizations. In order to detect fraud, following six financial variables were examined: quick assets to current liabilities, market value of equity to total assets, total liabilities to total assets, interest payments to earnings before interest and tax, net income to total assets, and retained earnings to total assets. Cecchini M.  in 2005 examined quantitative variables along with text information for detection of fraud. The qualitative variables were mapped to a higher dimension which takes in to account ratios and year over year changes.