• Nenhum resultado encontrado

CSTM: a conceptual spatiotemporal model for data warehouses

N/A
N/A
Protected

Academic year: 2021

Share "CSTM: a conceptual spatiotemporal model for data warehouses"

Copied!
130
0
0

Texto

(1)Pós-Graduação em Ciência da Computação. “CSTM: A CONCEPTUAL SPATIOTEMPORAL MODEL FOR DATA WAREHOUSES” Por. CARLA VERÓNICA RUIZ CASTRO Dissertação de Mestrado. Universidade Federal de Pernambuco [email protected] www.cin.ufpe.br/~posgraduacao. RECIFE, MARÇO/2010.

(2) Universidade Federal de Pernambuco CENTRO DE INFORMÁTICA PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO. Carla Verónica Ruiz Castro. “CSTM: A Conceptual Spatiotemporal Model for Data Warehouses". Este trabalho foi apresentado à Pós-Graduação em Ciência da Computação do Centro de Informática da Universidade Federal de Pernambuco como requisito parcial para obtenção do grau de Doutor em Ciência da Computação.. ORIENTADOR(A): Profa. Valéria Cesário Times. RECIFE, MARÇO/2010.

(3) Castro, Carla Verónica Ruiz CSTM: a conceptual spatiotemporal model for data warehouses / Carla Verónica Ruiz Castro. - Recife: O Autor, 2010. vii, 117 folhas : il., fig., tab. Dissertação (mestrado) – Universidade Pernambuco. CIn. Ciência da computação, 2010.. Federal. de. Inclui bibliografia e apêndice. 1. Banco de dados. 2. Data Warehouse espaço temporal. 3. Modelagem conceitual. I. Título. 005.74. CDD (22. ed.). MEI2010 – 038.

(4)

(5) Abstract Extensive research has been conducted regarding temporal and spatial data warehouse technologies. The former allows handling time-varying data in dimensions while the latter permits storing and querying spatial data in both fact and dimension tables. A wide variety of applications need to capture not only spatial but also time-varying aspects of the entities they model. However, enough attention has not been given to studies that bring these two research areas together. It is in this context that the present master’s dissertation is defined. It proposes a conceptual multidimensional model for spatiotemporal data warehouses. This model allows users to define levels, hierarchies and dimensions with both spatial and temporal characteristics. As a consequence of this, it is possible to represent time-varying spatial attributes. Moreover, this work defines a set of spatiotemporal operators that could be useful to query spatiotemporal data warehouses. Differently from existing proposals, our operators integrate not only multidimensional and spatial operators but also spatial and temporal ones (i.e., spatiotemporal) in a unique syntax. A taxonomic scheme, which classifies the proposed operators, is also defined. The importance of the proposed taxonomy is that it helps to guide the development of spatiotemporal OLAP technology. In order to automate the modeling of spatiotemporal schemas, a CASE tool was developed. Besides allowing the definition of schemas in conformity with the proposed conceptual model, this tool also permits the automatic generation of the corresponding logical schema using an object relational approach. The proposed ideas are validated with a case study on the meteorological field. It presents a practical application of the spatiotemporal conceptual multidimensional model and spatiotemporal operators introduced in the present work.. Key words: Spatiotemporal Data Warehouse, Conceptual Modeling, Data Warehouse Design, Spatiotemporal Operators, Taxonomy..

(6) Resumo Estudos abrangentes relacionados a data warehouse temporais e espaciais tˆem sido conduzidos. Data warehouse temporais permitem lidar com dados vari´aveis no tempo tanto em tabelas de fatos quanto em tabelas de dimens˜oes. Uma ampla variedade de aplica¸co˜es precisa capturar n˜ao s´o caracter´ısticas espaciais, mas tamb´em temporais das entidades modeladas. Entretanto, estudos que unam essas duas ´areas de pesquisa n˜ao ´ neste contexto que o presente trabalho de tˆem sido suficientemente considerados. E disserta¸ca˜o est´a definido. Ele prop˜oe um modelo conceitual para data warehouses espa¸co temporais. Este modelo permite aos usu´arios definir n´ıveis, hierarquias e dimens˜oes tanto com caracter´ısticas espaciais como temporais. Como consequˆencia disso, ´e poss´ıvel representar atributos espaciais vari´aveis no tempo. Al´em disso, este trabalho define um conjunto de operadores espa¸co temporais que poderia ser u ´til na consulta de data warehouses espa¸co temporais. Diferentemente de propostas existentes, nossos operadores integram n˜ao s´o operadores multidimensionais e espaciais, mas tamb´em espaciais e temporais (i.e., espa¸co temporais) em uma u ´nica sintaxe. Um esquema taxonˆomico, o qual classifica os operadores propostos, tamb´em ´e definido. A importˆancia da taxonomia proposta ´e que ajuda no desenvolvimento de tecnologia OLAP espa¸co temporal. Com o objetivo de automatizar a modelagem de esquemas espa¸co temporais, uma ferramenta CASE foi desenvolvida. Al´em de permitir a defini¸ca˜o de esquemas conformes com o modelo conceitual proposto, esta ferramenta tamb´em permite a gera¸ca˜o autom´atica do esquema l´ogico correspondente usando uma abordagem objeto relacional. As id´eias propostas s˜ao validadas com um estudo de caso na ´area meteorol´ogica.. O estudo. apresenta uma aplica¸ca˜o pr´atica do modelo conceitual espa¸co temporal e dos operadores espa¸co temporais apresentados neste trabalho.. Palavras chave: Data Warehouse Espa¸co Temporal, Modelagem Conceitual, Data Warehouse Design, Operadores Espa¸co Temporais, Taxonomia..

(7) CONTENTS. Contents Chapter 1 - Introduction 1.1. 1. Contextualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1. 1.1.1. Data Warehouse. 1. 1.1.2. Temporal Data Warehouse. . . . . . . . . . . . . . . . . . . . .. 2. 1.1.3. Spatial Data Warehouse . . . . . . . . . . . . . . . . . . . . . .. 2. 1.2. Motivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3. 1.3. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4. 1.4. Dissertation Organization. 5. . . . . . . . . . . . . . . . . . . . . . . . . .. Chapter 2 - Related Work 2.1. Introduction. 2.2. Temporal Data Warehouses. 2.3. 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7. . . . . . . . . . . . . . . . . . . . . . . . .. 7. 2.2.1. Temporal Concepts . . . . . . . . . . . . . . . . . . . . . . . . .. 8. 2.2.2. Slowly Changing Dimensions. . . . . . . . . . . . . . . . . . . .. 9. 2.2.3. Schema Evolution. . . . . . . . . . . . . . . . . . . . . . . . . .. 11. 2.2.4. Schema Versioning . . . . . . . . . . . . . . . . . . . . . . . . .. 12. 2.2.5. MultiDimER Temporal Extension. . . . . . . . . . . . . . . . .. 15. . . . . . . . . . . . . . . . . . . . . . . . . .. 18. Spatial Data Warehouses 2.3.1. Spatial DW and SOLAP Concepts. . . . . . . . . . . . . . . . .. 18. 2.3.2. Geographic Data Warehouse Metamodel . . . . . . . . . . . . .. 19. 2.3.3. MultiDimER Spatial Extension . . . . . . . . . . . . . . . . . .. 20. 2.3.4. Existing SOLAP Operations . . . . . . . . . . . . . . . . . . . .. 24. 2.4. A Comparative Analysis of TDW and SDW Proposals. . . . . . . . . .. 26. 2.5. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 29. i.

(8) CONTENTS. Chapter 3 - The CSTM Conceptual Model and CASE Tool. 30. 3.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 30. 3.2. The Proposed Conceptual Data Model . . . . . . . . . . . . . . . . . .. 30. 3.3. 3.4. 3.2.1. Spatiotemporal Concepts. . . . . . . . . . . . . . . . . . . . . .. 31. 3.2.2. The CSTM Metamodel. . . . . . . . . . . . . . . . . . . . . . .. 35. 3.2.3. A Comparative Analysis between Related Work and Our Proposal 36. Case Tool Implementation . . . . . . . . . . . . . . . . . . . . . . . . .. 38. 3.3.1. Graphical Editor Module. . . . . . . . . . . . . . . . . . . . . .. 39. 3.3.2. Schema Storage Module . . . . . . . . . . . . . . . . . . . . . .. 48. 3.3.3. Automatic Script Generator Module. . . . . . . . . . . . . . . .. 49. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 57. Conclusion. Chapter 4 - Taxonomy and Set of ST OLAP Operators. 58. 4.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 58. 4.2. A Taxonomy of Spatiotemporal OLAP Operators . . . . . . . . . . . .. 58. 4.3. 4.2.1. Aggregate Functions on Facts . . . . . . . . . . . . . . . . . . .. 61. 4.2.2. Operations on Facts and Members. . . . . . . . . . . . . . . . .. 62. 4.2.3. Navigation Operations on Members . . . . . . . . . . . . . . . .. 64. 4.2.4. Operations on Levels . . . . . . . . . . . . . . . . . . . . . . . .. 65. 4.2.5. Operations on Hierarchies and Dimensions . . . . . . . . . . . .. 65. A Set of Spatiotemporal OLAP Operators . . . . . . . . . . . . . . . .. 67. 4.3.1. Multidimensional Domain . . . . . . . . . . . . . . . . . . . . .. 68. 4.3.2. Spatial Domain . . . . . . . . . . . . . . . . . . . . . . . . . . .. 70. 4.3.3. Spatial-Multidimensional Domain . . . . . . . . . . . . . . . . .. 73. 4.3.4. Spatiotemporal Domain. . . . . . . . . . . . . . . . . . . . . . .. 82. 4.4. Classification of the Operators in the Taxonomy . . . . . . . . . . . . .. 86. 4.5. Conclusion. 89. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ii.

(9) CONTENTS. Chapter 5 - Case Study. 91. 5.1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5.2. Meteorological STDW. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 91. 5.3. Examples of Queries Using Spatiotemporal Operators . . . . . . . . . .. 94. 5.4. Conclusion. 99. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Chapter 6 - Conclusions and Future Work. 91. 100. 6.1. Final Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100. 6.2. Main Contributions. 6.3. Future Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102. References. 104. Appendix A - DDL Oracle Scripts. 113. A.1. DDL Oracle Scripts for Schema in Figure 5.1. . . . . . . . . . . . . . . 113. iii.

(10) LIST OF FIGURES. List of Figures 1.1. Dissertation organization. . . . . . . . . . . . . . . . . . . . . . . . . . .. 6. 2.1. Hybrid slowly changing dimensions.. . . . . . . . . . . . . . . . . . . . .. 11. 2.2. The COMET metamodel . . . . . . . . . . . . . . . . . . . . . . . . . . .. 14. 2.3. MultiDimER notations. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 15. 2.4. MultiDimER time varying levels. . . . . . . . . . . . . . . . . . . . . . .. 16. 2.5. MultiDimER time varying hierarchies. . . . . . . . . . . . . . . . . . . .. 17. 2.6. MultiDimER lifespan ans snapshot cardinalities. . . . . . . . . . . . . . .. 17. 2.7. Types of spatial dimensions. . . . . . . . . . . . . . . . . . . . . . . . . .. 19. 2.8. The GeoDWM metamodel. . . . . . . . . . . . . . . . . . . . . . . . . . .. 21. 2.9. MultiDimER spatial data types. . . . . . . . . . . . . . . . . . . . . . . .. 22. 2.10 MultiDimER spatial dimensions, hierarchies and levels. . . . . . . . . . .. 23. 2.11 Aggregate functions for GeoMDQL.. . . . . . . . . . . . . . . . . . . . .. 27. 3.1. The CSTM Notations. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 32. 3.2. Example of a spatiotemporal hierarchy. . . . . . . . . . . . . . . . . . . .. 33. 3.3. Example of levels with temporal and spatial properties. . . . . . . . . . .. 34. 3.4. The CSTM metamodel. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 37. 3.5. CSTMTool’s interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 40. 3.6. File menu options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 41. 3.7. Icon toolbar options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 41. 3.8. Palette elements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 42. 3.9. Keeping more than one schema open. . . . . . . . . . . . . . . . . . . . .. 43. 3.10 Adding measures to a fact relationship. . . . . . . . . . . . . . . . . . . .. 44. 3.11 Adding non-temporal attributes to a level. . . . . . . . . . . . . . . . . .. 45. 3.12 Adding temporal attributes to a level. . . . . . . . . . . . . . . . . . . . .. 46 iv.

(11) LIST OF FIGURES. 3.13 Non-spatial temporal attribute properties. . . . . . . . . . . . . . . . . .. 47. 3.14 XML structure of a saved schema. . . . . . . . . . . . . . . . . . . . . . .. 50. 3.15 Option for the generation of an Oracle 10g logical schema. . . . . . . . .. 50. 3.16 An example of a spatiotemporal one-to-many hierarchy. . . . . . . . . . .. 51. 3.17 Country table’s structure. . . . . . . . . . . . . . . . . . . . . . . . . . .. 54. 3.18 CountryProvince’s table structure (one-to-many relationship). . . . . . .. 55. 3.19 An example of a spatiotemporal many-to-many hierarchy. . . . . . . . . .. 56. 3.20 CountryProvince’s table structure (many-to-many relationship). . . . . .. 57. 4.1. Resulting operation categories. . . . . . . . . . . . . . . . . . . . . . . . .. 59. 4.2. Levels of the taxonomy of spatiotemporal OLAP operators.. . . . . . . .. 60. 4.3. DistributiveTopological and DistributiveMetric examples. . . . . . . . . .. 76. 4.4. AlgebraicMetric examples. . . . . . . . . . . . . . . . . . . . . . . . . . .. 78. 4.5. HolisticMetric example. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 80. 4.6. Spatial-multidimensional navigation operators examples. . . . . . . . . .. 82. 4.7. STMetric example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 85. 4.8. STTopological example. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 85. 4.9. STIntersection example. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 86. 5.1. A meteorological spatiotemporal schema. . . . . . . . . . . . . . . . . . .. 93. v.

(12) LIST OF TABLES. List of Tables 2.1. Aggregate functions for the GIS domain.. . . . . . . . . . . . . . . . . .. 26. 2.2. Comparative analysis of existing TDW and SDW proposals. . . . . . . .. 27. 3.1. Comparative analysis of existing TDW proposals and CSTM. . . . . . . .. 38. 3.2. CSTM elements’ properties. . . . . . . . . . . . . . . . . . . . . . . . . .. 48. 3.3. Temporal support for the modeled elements. . . . . . . . . . . . . . . . .. 51. 4.1. The proposed taxonomy of spatiotemporal OLAP operators. . . . . . . .. 66. 4.2. Distributive, algebraic and holistic OLAP functions. . . . . . . . . . . . .. 68. 4.3. Possible topological operations. . . . . . . . . . . . . . . . . . . . . . . .. 69. 4.4. Possible metric properties. . . . . . . . . . . . . . . . . . . . . . . . . . .. 69. 4.5. MDX aggregate functions. . . . . . . . . . . . . . . . . . . . . . . . . . .. 69. 4.6. Topological operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 70. 4.7. Metric operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 71. 4.8. Operators with spatial return value. . . . . . . . . . . . . . . . . . . . . .. 72. 4.9. DistributiveTopological operator. . . . . . . . . . . . . . . . . . . . . . .. 74. 4.10 DistributiveMetric operator. . . . . . . . . . . . . . . . . . . . . . . . . .. 74. 4.11 Overloaded DistributiveMetric operator. . . . . . . . . . . . . . . . . . .. 74. 4.12 AlgebraicMetric operator. . . . . . . . . . . . . . . . . . . . . . . . . . .. 75. 4.13 Overloaded AlgebraicMetric operator (Average and standard deviation distance). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 77. 4.14 Overloaded AlgebraicMetric operator (MaxN and MinN area, length and perimeter). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 77. 4.15 Overloaded AlgebraicMetric operator (MaxN and MinN distance). . . . .. 77. 4.16 HolisticMetric operator. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 78. 4.17 Overloaded HolisticMetric operator (Median and mode distance). . . . .. 79. 4.18 Overloaded HolisticMetric operator (Rank area, length and perimeter). .. 79 vi.

(13) LIST OF TABLES. 4.19 Overloaded HolisticMetric operator (Rank distance). . . . . . . . . . . .. 79. 4.20 Spatial-multidimensional navigation operators. . . . . . . . . . . . . . . .. 81. 4.21 STTopological operator. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 83. 4.22 STMetric operator.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 84. 4.23 Overloaded STMetric operator. . . . . . . . . . . . . . . . . . . . . . . .. 84. 4.24 Spatiotemporal operators with spatial return value. . . . . . . . . . . . .. 84. 4.25 Aggregate functions on facts.. 86. . . . . . . . . . . . . . . . . . . . . . . . .. 4.26 Operations on facts and members.. . . . . . . . . . . . . . . . . . . . . .. 87. . . . . . . . . . . . . . . . . . . . .. 88. 4.28 Comparative analysis between GeoMDQL and our operators. . . . . . . .. 90. 5.1. 94. 4.27 Navigation operations on members.. Taxonomy categories of the operators used in the queries. . . . . . . . . .. vii.

(14) CHAPTER 1. INTRODUCTION. Chapter. 1 Introduction 1.1. Contextualization. This section discusses topics that represent the background knowledge of the present work. The topics that will be discussed are: i) Data Warehouse; ii) Temporal Data Warehouse; and iii) Spatial Data Warehouse.. 1.1.1. Data Warehouse. One of the technologies being widely used in the decision support environments is the Data Warehouse (DW) [Chaudhuri and Dayal, 1997, Kimball and Ross, 2002, Theodoratos and Sellis, 1999].. A DW has been defined by Inmon [1996] as a. subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process. Its main characteristics are detailed as follows: i) it is subject oriented because data gives information about a particular subject instead of about a company’s ongoing operations; ii) it is integrated because data is gathered from a variety of sources and merged into a coherent whole in the data warehouse; iii) it is time-variant because all data in the data warehouse is identified with a particular time period; and finally, iv) it is non-volatile because data is added but it is rarely removed. One concept that is strongly related to DW is Online Analytical Processing (OLAP) [Thomsen, 2002, Chaudhuri and Dayal, 1997, Fidalgo, 2005, Pedersen and Jensen, 2001]. OLAP is a software category specifically oriented to answer multidimensional analytical queries about data stored in a DW. This query processing must have a high performance, consistency and interactivity and it must improve the decision making 1.

(15) CHAPTER 1. INTRODUCTION. process by interpreting data using a variety of multidimensional views. These views can be processed because of the DW’s multidimensional structure and can be seen as a set of axes and points. Each axis can be interpreted as a dimension or perspective, while the points can be seen as the measured values and they correspond to the intersection of the axes.. 1.1.2. Temporal Data Warehouse. Temporal Data Warehouse (TDW) [Eder and Koncilia, 2001, Eder et al., 2002, 2001, Blaschka et al., 1998, Vaisman, 2001, Body et al., 2003, Malinowski and Zim´anyi, 2006a, Golfarelli and Rizzi, 2009] originated from the fusion of Temporal Databases (TDB) and DW. TDBs have been successfully used for modeling time-varying information [Malinowski and Zim´anyi, 2006a]. The research area of TDB characterizes the semantics of temporal data and provides expressive and efficient ways to model, store, and query temporal data [Jensen, 2000]. In the case of a DW, a time dimension is normally included and it allows comparisons of different periods. However, the instances of dimensions could also change over time and these changes cannot be adequately represented using conventional data warehouse technology [Eder et al., 2001]. Temporal Data Warehouses provide the necessary solutions for managing time-varying data in dimensions.. 1.1.3. Spatial Data Warehouse. Spatial Data Warehouses (SDWs) and Spatial On-Line Analytical Processing (SOLAP) [Rivest et al., 2001, 2003, B´edard et al., 2001, Bimonte et al., 2005, Malinowski and Zim´anyi, 2004a] were originated from the combination of spatial data management and multidimensional data analysis techniques [Damiani and Spaccapietra, 2006]. Two kinds of systems deal with data management: Spatial Database Management Systems (DBMS) and Geographic Information Systems (GIS). A spatial DBMS extends the functionalities of conventional data management systems in order to support the storage, efficient retrieval and manipulation of spatial data [Rigaux et al., 2002]. A GIS [Chrisman, 1997, Cˆamara et al., 1996, DeMers, 1997] in turn, is a composite computer based information system that consists of an integrated set of programs and that may include. 2.

(16) CHAPTER 1. INTRODUCTION. or interact with a spatial DBMS. This system enables the capturing, modeling, analysis and visualization of spatial data [Longley et al., 2001]. Differently from a spatial DBMS, end users directly interact with a GIS. A Spatial Data Warehouse (SDW) is defined as a multidimensional database of spatial data. SOLAP tools form a new family of user interfaces and are meant to be client applications sitting on top of a multi-scale SDW [Rivest et al., 2003, 2005]. A SOLAP tool can be defined as a visual platform which is built to support rapid and easy spatiotemporal analysis and exploration of multidimensional data. This data exploration is performed following a multidimensional approach comprised of aggregation levels. These levels are available in cartographic, tabular and diagram displays. Spatial data warehousing has been recognized as a key technology in enabling the interactive analysis of spatial data sets for decision-making support [Rivest et al., 2001, Han et al., 2002].. 1.2. Motivation. After a literature review it could be observed that there are numerous proposals in the literature that focus on the modeling of temporal [Blaschka et al., 1998, Vaisman, 2001, Eder et al., 2002, Body et al., 2003, Malinowski and Zim´anyi, 2006a, Eder et al., 2001] as well as spatial data warehouses [Stefanovic et al., 2000, Rivest et al., 2001, Stefanovic et al., 2000, Marchand et al., 2004, Jensen et al., 2004, Malinowski and Zim´anyi, 2004a, Bimonte et al., 2005]. Nevertheless, enough attention has not been given to studies that bring these two research areas together. It has been estimated that about eighty percent of all data stored in corporate databases has a spatial component [Rivest et al., 2001]. However, spatial data could be subject of changes through time. In an SDW, the spatial data are either stored in dimensions or in fact tables (as spatial measures). When spatial data are stored as measures, it can be analyzed along the time axis and the reason for this is that data warehouses almost always consider a time dimension. On the other hand, when spatial data are stored in the dimensions of an SDW, their changes cannot be properly managed. This is because multidimensional models usually consider facts as the dynamic part of data warehouses and dimensions as static entities [Body et al., 2003].. 3.

(17) CHAPTER 1. INTRODUCTION. The present work considers that the merging of these two research areas originates the concept of Spatiotemporal Data Warehouses (STDWs). It is in this context that the present work is developed. A wide variety of applications need to capture spatial and time-varying characteristics of the entities they model. Let’s take as an example a meteorological application that collects data about precipitation, relative humidity, atmospheric pressure among other phenomena. A Data Collection Platform (DCP) network collects the meteorological data. However, the location of these DCPs could change over time. It would not be possible to conceptually model this scenario using existing temporal or spatial data warehouse models. An STDW model allows users to define and analyze data in a spatiotemporal context, improving the decision making process. In the context of STDW, another important issue is the definition of spatiotemporal OLAP operators. These operators should be able to manipulate multidimensional, spatial and spatiotemporal data. Moreover, they should be applied to an OLAP cube built using a spatiotemporal data warehouse. In order to use these operators, a spatiotemporal multidimensional query language should be defined. However, this definition is out of the scope of the present work. In order to guide the development of spatiotemporal OLAP technology, a taxonomy of spatiotemporal OLAP operators should also be proposed.. 1.3. Objectives. The objectives that were defined to guide the development of the present work are given as follows: 1. Define a conceptual model for spatiotemporal data warehouses. 2. Specify and develop a CASE tool that automates the generation of conceptual spatiotemporal models. 3. Define a taxonomy of spatiotemporal OLAP operators. 4. Define a set of spatiotemporal OLAP operators. 5. Classify the operators from the set using the proposed taxonomy.. 4.

(18) CHAPTER 1. INTRODUCTION. Concerning objectives number 3, 4 and 5 the following paper has been published: C. Ruiz and V. Times. A Taxonomy of Solap Operators. In Brazilian Symposium on Databases (SBBD), 2009.. 1.4. Dissertation Organization. This master’s dissertation is organized according to the schema shown in Figure 1.1. Chapter 2 - Related Work: this chapter presents a literature review concerning three topics relevant to the present work: temporal data warehouses, spatial data warehouses and SOLAP operations. Chapter 3 - The CSTM Conceptual Data Model and CASE Tool: this chapter presents our proposed conceptual model, which permits the definition of spatiotemporal data warehouses. Moreover, the implementation details of our CASE modeling tool are introduced. This tool permits the definition of conceptual schemas for spatiotemporal data warehouses. Chapter 4 - Taxonomy and Set of ST OLAP Operators: this chapter presents our taxonomy of spatiotemporal operators. A set of spatiotemporal operators and its classification using the proposed taxonomy are also presented. Chapter 5 - Case Study: this chapter presents a case study that shows a practical application of our conceptual model and a subset of our proposed spatiotemporal operators. Chapter 6 - Conclusions and Future Work: this chapter concludes the present work by presenting the final considerations and main contributions.. Some. suggestions of future work are also discussed.. 5.

(19) CHAPTER 1. INTRODUCTION. Figure 1.1: Dissertation organization.. 6.

(20) CHAPTER 2. RELATED WORK. Chapter. 2 Related Work 2.1. Introduction. This chapter discusses two topics relevant to this work: Temporal Data Warehouses (TDWs) and Spatial Data Warehouses (SDWs). The present work proposes a conceptual spatiotemporal data model. However, after a literature review, it was observed that there are not many works in the data warehouse field that deal with temporal and spatial aspects simultaneously. Therefore, it is necessary to first review the works that propose temporal data warehouses and then, the ones that discuss spatial data warehouses. Besides discussing SDW, this chapter also presents the SOLAP operations offered by existing spatial-multidimensional technologies. This chapter is organized as follows.. Section 2.2 discusses Temporal Data. Warehouses introducing various approaches for dealing with changes in data warehouse dimensions. Section 2.3 presents Spatial Data Warehouses discussing how they store spatial elements in dimensions and fact tables.. It also introduces existing SOLAP. operators and spatial aggregate functions. Section 2.4 presents a comparative analysis concerning the introduced spatial and temporal data warehouse proposals. Section 2.5 concludes the chapter.. 2.2. Temporal Data Warehouses. Data warehouses allow analyzing enterprise data and give the possibility to aggregate it and compare it along dimensions relevant to the application domain. Time is one of the dimensions found in data warehouses and it allows comparisons of different periods. However, multidimensional data warehouses and OLAP systems do not provide adequate. 7.

(21) CHAPTER 2. RELATED WORK. means for dealing with changes in dimension data and structure [Eder and Koncilia, 2002]. This is due to the fact that dimensions are orthogonal for data warehouses. Orthogonality with respect to the time dimension means that the other dimensions must be time-invariant [Eder et al., 2001]. In this section we first introduce the main concepts regarding time in databases. Then, we present a literature review of different approaches that deal with changes of the structure and instance data of dimensions.. 2.2.1. Temporal Concepts. The temporal concepts presented next will be used throughout this section. These definitions were taken from [Dyreson et al., 1994]. - Time-varying Attribute: it is an attribute whose value is not constrained to be constant over time. In other words, it may or may not change over time. - Valid Time: by definition, the valid time of a fact is the time when the fact is true in the modeled reality. Valid times are usually supplied by users. - Transaction Time: the transaction time of a database fact is the time when the fact is current in the database and it may be retrieved. Transaction times cannot be changed. Additionally, they cannot be after the current time. - Chronon: it is the shortest duration of time supported by a temporal Data Base Management System (DBMS). It can also be defined as a non decomposable unit of time. In the case of data warehouses, usually one dimension represents the valid time in the system. The chronon is determined by the finest dimension member within this time dimension. The time axis defined through the dimension time is a series of chronons [Eder and Koncilia, 2001]. - Time Interval: it is the time between two instants. In a system that supports a time line composed of chronons, an interval may be represented by a set of contiguous chronons. Both transaction and valid times are represented by intervals [Ts, Te] where Ts is the beginning of the valid/transaction time, Te is the end of the valid/transaction time and Te ≥ Ts. 8.

(22) CHAPTER 2. RELATED WORK. - Span: it is a directed duration of time. A duration is an amount of time with known length, but no specific starting or ending chronons. A span is either positive, denoting forward motion of time, or negative, denoting backward motion in time. - Timestamp: a timestamp is a time value associated with some object, like an attribute value or a tuple. - Lifespan: the lifespan of a database object is the time over which it is defined. The valid-time lifespan of a database object refers to the time when the corresponding object exists in the modeled reality. The transaction-time lifespan, on the other hand, refers to the time when the database object is current in the database. If the object has an associated timestamp, then that object’s lifespan is the value of the timestamp. In many cases, lifespan is used as alternative name for timestamp. - Schema Evolution: it permits the modification of the database schema without the loss of extant (i.e., currently existing) data in a scenario where no support for previous schemas is required. The same is true for a DW schema. - Schema Versioning: a database (or data warehouse) system supports schema versioning if it allows the querying of all data, both retrospectively and prospectively, through user-definable version interfaces. The difference between evolution and versioning is that the former does not require the maintenance of a schema history, while in the latter all past schema versions are retained [Golfarelli and Rizzi, 2009].. 2.2.2. Slowly Changing Dimensions. One of the earliest proposals concerning changes in dimension data was given by Kimball and Ross. They proposed three types of Slowly Changing Dimensions (SCD) which are ways of handling changes in multidimensional data structures [Kimball and Ross, 2002]. Type 1, which is the simplest approach, consists in overwriting the old attribute value in the dimension row, replacing it with the current value. This type is appropriate when the attribute change is a correction. We will not go into much detail about type 1 because it simply updates the data structure without keeping track of history. Type 2 allows representing prior history and it consists in inserting a new row, which has the 9.

(23) CHAPTER 2. RELATED WORK. new attribute value, into the dimension table. This row has a new surrogate primary key. Each row contains as attributes the natural key (or durable identifier) and the row’s effective and expiration dates (i.e., a timestamp). Alternatively, a most recent row indicator might be useful to allow users to quickly constrain their query to only the current profiles. A disadvantage of this type is that it does not permit associating the new attribute value with the old fact history or vice versa. Type 3 supports this kind of requirement. It consists in adding another attribute to the existing dimension row in order to support analysis based on either the new or prior attribute value. Type 3 is appropriate when it is necessary to support two views of the world simultaneously (what is sometimes called alternate reality). However, it is the least commonly needed technique. The difference between the type 2 and type 3 responses is that in type 3 both the current and prior descriptions can be regarded as true at the same time. Besides the three basic types, hybrid techniques are also presented by Kimball and Ross [2005]. Those techniques are: Mini Dimension with “Current” Overwrite, Type 2 with “Current” Overwrite, Type 2 with Natural Keys in the Fact Table, and Series of Type 3 Attributes. The first technique consists in the creation of a mini dimension which isolates volatile dimension attributes in a separate table. When an event spawns a fact row, the row has one foreign key for the primary dimension and another for the mini-dimension profile in effect at the time of the event (Figure 2.1a). The second technique is a hybrid of type 1 and type 2. It consists in capturing a type 2 attribute change by adding a row to the primary dimension table. In addition, there is a “current” attribute on each row that is overwritten (type 1) for the current and all previous type 2 rows (Figure 2.1b). This way, when a change occurs, the most current dimension row has the same value in the uniquely labeled current and historical columns. The third technique is proposed for a scenario of a million-row dimension table with many attributes requiring historical and current tracking. It consists in including the dimension natural key as a fact table foreign key, in addition to the surrogate key for type 2 tracking (Figure 2.1c). Finally, the last technique is useful when there is a dimension attribute that changes with a predictable rhythm and it is necessary to summarize facts based on any historical value of the attribute. In this case, type 3 is generalized by having the required number of versions of the changing attribute, one column for each version (Figure 2.1d).. 10.

(24) CHAPTER 2. RELATED WORK. Figure 2.1: Hybrid slowly changing dimensions. Adapted from [Kimball and Ross, 2005]. 2.2.3. Schema Evolution. This section presents two proposals that follow the schema evolution approach. As mentioned in the previous section, schema evolution permits the modification of the data warehouse schema without the loss of existing data but without maintaining the schema history. Blaschka et al. [1998] proposed a formal framework to describe evolutions on multidimensional schemas and their effect on the schema and on the instances. This framework is based on a formal conceptual description of a multidimensional schema and corresponding schema evolution algebra. A minimal set of operations is proposed including insert level, insert attribute, connect attribute to fact and delete dimension, among others. One of the characteristics of this framework is that existing instances are adapted to the new schema automatically. The authors suggest that this framework should be used as a basis for tool-supported warehouse schema changes. This work, however, does not discuss instance modifications which, in our case, are important because we are interested in modeling spatial members with changing geometries. Vaisman’s proposal [Vaisman, 2001] deals with evolution at both the schema and instance levels. The author presented a complete characterization of the possible dimension updates in a multidimensional model and a collection of operators which 11.

(25) CHAPTER 2. RELATED WORK. perform them. These operators are classified into two groups: structural update operators and instance update operators. The former group includes the Generalize (creates a new level to which a pre-existent one rolls up), Relate/Unrelate Levels and Delete Level operators. Instance update operators include Add Instance and Delete Instance operators and also a set of complex update operators like Reclassify, Split, Merge and Update. The author also presented a set of algorithms that maintain materialized views after executing the updates. A temporal multidimensional model that aims to keep track of the different states of a data warehouse throughout its lifespan is introduced. In the temporal model presented, dimension elements at the schema and instance levels are time stamped in order to keep track of the updates that occur during the dimension’s lifespan. The temporal version of the dimension (structural and instance) operators is also presented. Finally, the author introduced a temporal query language to support the data model. It lets users choose in their request the way they want data to be aggregated. They can therefore choose between a temporally consistent representation and the last version.. 2.2.4. Schema Versioning. This section presents two proposals that follow the schema versioning approach. Differently from schema evolution, schema versioning maintains the history of previous data warehouse schemas. The COMET Metamodel, proposed by Eder et al. [2002], allows representing not only changes of transaction data (called cells, fact data or measures) but also of schema and structure data. This model permits obtaining consistent query results spanning multiple periods (i.e., when having different versions of dimension data). The authors propose a TDW architecture that extends multidimensional data warehouses with: i) temporal extensions, which consist in time stamping data in order to represent their valid time; ii) structure versions, whose need arose due to the fact that data are time stamped; and iii) transformation functions, which allow transforming data from one structure or version into another. Figure 2.2 shows the COMET Metamodel in Unified Modeling Language (UML) [OMG, 2009] notation. All dimension members, hierarchical links between members, dimensions, levels and their hierarchical relations are time stamped with a time interval 12.

(26) CHAPTER 2. RELATED WORK. representing the valid time. A structure version is a view on a temporal data warehouse valid for a given time period. Within a structure version, dimension data on both the schema level and instance level areS stable. The authors take into consideration a large number of integrity constraints that have to be fulfilled due to the temporal extensions of the model. To validate the ideas, a prototype was built. It consists of three parts: the TDW, the Transformer (maps data from a structure version into another by using transformation functions) and the Administration Tool (allows defining the schema of the TDW, modifying it and importing cell data). Another interesting work that deals with schema versioning is presented by Body et al. [2003]. This approach not only permits tracking history of dimension changes but also comparing data, mapped into static structures. The dimension evolutions are divided into schema evolutions (e.g. creation and deletion of a dimension) and evolutions on instances of dimensions (e.g. transformation of a member). The proposed temporal and multidimensional data model redefines the well known concepts of facts and dimensions with valid times. The data model introduces the notion of temporal mode of presentation to let users choose what interpretation they want to give to their request. It also provides confidence factors that allow users to detect values that are mapped, and then altered somehow, from those that come from source data. Four basic structural evolutions operators are proposed: Insert (a new member version), Exclude (a specified member version), Associate (two versions of a member) and Reclassify (redefine the set of parents of a member version). The authors provide options to adapt the conceptual approach to a logical level with the objective of implementing the model on current commercial tools. All the proposals presented so far deal with changes in the schema of a data warehouse, either using evolution or versioning approaches, which is not the objective of the present work.. Our approach seeks to represent changes in dimension data,. specifically in member’s geometries. Furthermore, none of the introduced models offers a graphical representation based on a multidimensional view of temporal data. As shown in the following section, these issues are treated by Malinowski and Zim´anyi [2006a] who proposed a temporal extension for a conceptual multidimensional model called MultiDimER.. 13.

(27) CHAPTER 2. RELATED WORK. Figure 2.2: The COMET metamodel [Eder et al., 2002].. 14.

(28) CHAPTER 2. RELATED WORK. 2.2.5. MultiDimER Temporal Extension. MultiDimER is a conceptual model based on Entity Relationship (ER) constructs. The model’s graphical notations are displayed in Figure 2.3. Levels are represented as entity types, as can be seen in Figure 2.3a. An instance of a level is called a member. A hierarchy contains several related levels (Figure 2.3c). Given two consecutive levels of a hierarchy, the higher level is called parent and the lower level is called child. A level that does not have a child level is called leaf. A relationship joining child and parent levels is characterized by a criterion name (Figure 2.3b) which can express different structures used for analysis. Cardinalities (Figure 2.3d) indicate the minimum and the maximum number of members in one level that can be related to a member in another level. Levels contain key attributes and descriptive attributes. A key attribute of a parent level shows how child members are grouped for applying aggregate functions to measures. A key attribute in a leaf level, or in a level forming a dimension without hierarchy, indicates the aggregation level of a measure in the associated fact relationship. A descriptive attribute contains additional features of a level (e.g. store number or manager’s name in a Store level). A fact relationship (Figure 2.3e) represents an n-ary relationship between leaf levels from all dimensions participating in the relationship. It may contain attributes commonly called measures.. Figure 2.3: MultiDimER notations. [Malinowski and Zim´ anyi, 2006a]. 15.

(29) CHAPTER 2. RELATED WORK. The model’s temporal extension considers three kinds of temporal types: Valid Time (VT), Transaction Time (TT) and Data Warehouse Loading Time (DWLT). The latter represents the time when data was loaded into a TDW. The temporal types may represent events (instants) or states (intervals). Sets of instants or intervals may also be used. The model permits the conceptual representation of time-varying levels, hierarchies and dimensions. In the case of time-varying levels, changes are supported for attributes (Figure 2.4a) and for members as a whole (Figure 2.4b). For the latter, the member’s lifespan (LS) represents its existence in the modeled reality.. Figure 2.4: MultiDimER time varying levels. [Malinowski and Zim´ anyi, 2006a] Relationships between levels could also be temporal (Figure 2.5). The authors identify three types of situations: i) temporal levels and non-temporal relationships between them, ii) temporal levels and temporal relationships between them, and iii) non temporal levels and temporal relationships between them. Integrity constraints concerning temporal support in hierarchies are also presented. One of the constrains, for example, states that no modifications to level members are allowed in the case of non-temporal levels and temporal relationships between them. Another important constraint is that temporal levels that are connected with a non-temporal relationship must not keep their LS and must not include VT for their key attributes. Two types of cardinalities can be represented: the snapshot cardinality and the lifespan cardinality. The former is considered for every time instant while the latter, over the relationship’s lifespan. The snapshot cardinality is by default equal to the lifespan 16.

(30) CHAPTER 2. RELATED WORK. Figure 2.5: MultiDimER time varying hierarchies. [Malinowski and Zim´ anyi, 2006a]. cardinality. However, if these cardinalities are different, a dotted line with the LC symbol is included and it indicates the lifespan cardinality. As Figure 2.6 shows, an employee can work in more than one section at the same time instant and over his lifespan. On the other hand, in every time instant an employee can be affiliated only to one section, but over his lifespan he can be affiliated to many sections.. Figure 2.6: MultiDimER lifespan ans snapshot cardinalities. [Malinowski and Zim´ anyi, 2006a] The authors also propose an Object Relational (OR) representation of the conceptual model for TDW design [Malinowski and Zim´anyi, 2006, Malinowski and 17.

(31) CHAPTER 2. RELATED WORK. Zim´anyi, 2008]. They present a mapping which transforms the conceptual model into an object-relational model using the SQL:2003 [Eisenberg et al., 2004] standard. As an example of a commercial object-relational DBMS, they use Oracle 10g [Oracle, 2009a]. An OR representation is chosen over a purely relational one due to the former’s ability to model complex objects. OR models permit a better representation of the real world because they group related facts into a single row. The authors also define the logical representation of different kinds of hierarchies in a multidimensional model in an additional work [Malinowski and Zim´anyi, 2006b].. 2.3. Spatial Data Warehouses. First, this section presents concepts related to SDW and SOLAP, such us spatial dimensions and measures. It then describes two conceptual spatial data models. There are many data models for SDW in the literature. However, the choice of what models to discuss here was made based on their ability to graphically represent multidimensional data. This section also discusses some of the existing SOLAP operations with the objective of incorporating them into our proposed set of operators.. 2.3.1. Spatial DW and SOLAP Concepts. Spatial data warehouses consist of both spatial and non-spatial dimensions and measures. Dimensions on a spatial data cube are classified into three types: non-spatial, spatial to non-spatial, and fully spatial. A non-spatial dimension is a dimension containing only non-spatial data whose generalizations are non-spatial as well. A spatial to non-spatial dimension in turn, is a dimension whose primitive level data is spatial but whose generalization, starting at a certain high level, becomes non-spatial. Finally, a spatial to spatial (i.e., fully spatial) dimension is a dimension whose primitive level and all of its high-level generalized data are spatial [Stefanovic et al., 2000]. Other authors, like Rivest et al. [2003], B´edard et al. [2001] and Marchand et al. [2004], consider an additional type of spatial dimension: the non-geometric spatial dimension. In the non-geometric spatial dimensions, the spatial reference uses only names (e.g. names of places) due to the fact that no geometric representation is associated with the dimension members. The other type of spatial dimensions (geometric and 18.

(32) CHAPTER 2. RELATED WORK. mixed) include geometric shapes which may be spatially referenced on a map to allow the graphical visualization and querying of its dimension members [Rivest et al., 2001]. Examples of these dimensions can be seen in Figure 2.7.. Figure 2.7: Types of spatial dimensions. [Rivest et al., 2001] Measures in a spatial multidimensional context could be spatial or non-spatial (i.e., measures containing only numerical data). The former are specialized in three types. The first type groups measures represented by a geometric shape or set of shapes obtained by the combination of geometric spatial dimensions. They require a geometric operation (e.g. union or intersection) which generates a new shape (or set of shapes). The second type of spatial measures results from the computation of spatial metric or topological operators (e.g. surface or distance). Finally, the last kind is the set of pointers (stored within the data cube cells) to the geometric shapes stored in another software [Rivest et al., 2003]. Another spatial measure classification is proposed by Marchand et al. [2004]. The authors define spatial measures as measures which are mapped to at least one spatial dimension whose members hold a spatial representation. Two types of spatial measures are distinguished: geometric to non-geometric spatial measure (which is mapped by at least one geometric to non-geometric spatial dimension) and fully geometric spatial measure (which is only mapped by fully geometric spatial dimensions).. 2.3.2. Geographic Data Warehouse Metamodel. The first spatial data model to be discussed is the Geographic Data Warehouse Metamodel (GeoDWM) [Fonseca, 2007, Fonseca et al., 2007]. It is a SDW metamodel that is based on the star schema and considers geographic dimensions and measures. It is specified 19.

(33) CHAPTER 2. RELATED WORK. using Object Constraint Language (OCL) restrictions [Object Management Group, 2006] and a Unified Modeling Language (UML) class diagram [Object Management Group, 2009] which makes this model non ambiguous and easy to understand. The metamodel can be seen in Figure 2.8. To facilitate its use and extension by other proposals, it is based on the Relational package of the Common Warehouse Metamodel (CWM) specification [Object Management Group, 2003] and the Simple Features Specification for SQL (SFS-SQL) [Open GIS Consortium, 1999]. This metamodel defines how concepts (e.g. conventional or geographic measures or dimensions) of a geographic dimensional model can be organized in order to describe a GDW and provides a set of stereotypes with pictograms that facilitate the GDW modeling. GeoDWM serves as a base metamodel for CASE tools which deal with conceptual modeling and automatic generation of GDW logical schemas. Furthermore, it allows consistency verification of the schemas developed through its OCL restrictions. GeoDWM normalizes geometric and descriptive data of spatial objects in order to minimize geometric data redundancy [Siqueira et al., 2008, 2009]. However, it is unclear how dimension hierarchies and levels are modeled. Geographic DW CASE (GeoDWCASE) modeling tool was built to allow the definition of SDW schemas. Its characteristics are: i) it offers a friendly interface; ii) it is implemented using Eclipse platform; iii) it is based on the XML Metadata Interchange (XMI) standard [Object Management Group, 2005] for storage, manipulation, recovering and interchange of metadata; iv) it gives support to UML class modeling using GeoDWM stereotypes and pictograms; v) it allows the validation of the designed GDW schema using GeoDWM’s OCL restrictions; and vi) it permits an automatic transformation between a conceptual schema and a logical schema compatible with the selected DBMS.. 2.3.3. MultiDimER Spatial Extension. Malinowski and Zim´anyi [2004a] proposed an extension of MultiDimER, a conceptual multidimensional model, with spatial dimensions, spatial hierarchies, spatial fact relationships and spatial measures. In this model, the spatiality of a level depends on whether it has at least one spatial attribute (i.e., an attribute represented by a geometry). A spatial attribute’s geometry is represented using spatial data types such as point, line, 20.

(34) CHAPTER 2. RELATED WORK. Figure 2.8: The GeoDWM metamodel. [Fonseca, 2007]. 21.

(35) CHAPTER 2. RELATED WORK. area or collections of these types (Figure 2.9). A geometry is also expressed in a Spatial Reference System (SRS).. Figure 2.9: MultiDimER spatial data types. [Malinowski and Zim´ anyi, 2004a] A spatial hierarchy is defined as a hierarchy that includes at least one spatial level. Likewise, a spatial dimension is a dimension including at least one spatial hierarchy. Furthermore, two consecutive spatial levels that form a hierarchy indicate the topological relationship that exists between their spatial components. Examples of these topological relationships are: contains, equals, intersects and overlaps. The authors assume that the default relationship is the contains topological relationship. On the other hand, the disjoint topological relationship is not allowed. Another characteristic of this conceptual model is the possibility of sharing spatial dimensions (Figure 2.10). A spatial fact relationship is a fact relationship that requires a spatial join between two or more spatial dimensions. In this case, it is necessary to indicate a topological relationship that links the different spatial levels. Spatial measures can be associated to a fact relationship, independently of whether this relationship is spatial or not. Measures can be non-spatial or spatial. The former are numeric and express analysis needs in a quantified form whereas the latter are classified into two groups: i) measures represented by a geometry and ii) measures represented by a numerical value that is calculated using spatial or topological operators. Aggregate functions for spatial geometric measures must be specified in a similar manner as aggregate functions are specified for numeric measures. The transformation of a conceptual schema based on MultiDimER’s spatial extension to an object-relational schema is also proposed [Malinowski and Zim´anyi, 2007]. The authors base their mapping on the SQL:2003 [Eisenberg et al., 2004] and SQL/MM standards [Stolze, 2003, Melton and Eisenberg, 2001]. SQL:2003 specifies features of an object-relational representation that allow managing complex data. SQL/MM in turn, includes the specification for spatial data and functions. Examples of a commercial 22.

(36) CHAPTER 2. RELATED WORK. Figure 2.10: MultiDimER spatial dimensions, hierarchies and levels. [Malinowski and Zim´ anyi, 2004a]. 23.

(37) CHAPTER 2. RELATED WORK. implementation using Oracle 10g with its spatial extension [Oracle, 2009b] are given. The use of spatial extensions allows having an integrated architecture which permits representing spatial and non-spatial data in the same system. Spatial integrity constraints are used to ensure the semantic equivalence of the conceptual and logical schemas, since in the translation process some semantics may be lost.. 2.3.4. Existing SOLAP Operations. This section presents some proposals that integrate analytic and geographic environments and focuses on the SOLAP operations that each one offers. Furthermore, it presents aggregate functions for SOLAP. One of the proposals that integrates analytic and geographic environments is JMap Spatial OLAP [Kheops Technologies, 2003-2009]. It is a web-enabled technology that integrates geospatial dimensions within a Business Intelligence (BI) decision support environment. In what respects to the SOLAP operations that JMap offers, the focus is mainly on navigation operations like spatial drill-down, spatial rollup and spatial drill-across. Two other operations are defined: open and close. They are a variation of the drill-down and roll-up operations respectively. The difference is that they keep the context of the other dimension members [Rivest et al., 2005]. Consequently, they are equivalent to the Multidimensional Expressions (MDX) [Whitehorn et al., 2004, Microsoft Corporation, 2009] functions DrilldownMember and DrillupMember. Another proposal that integrates multidimensional and geographic environments is MapWarehouse [Sampaio et al., 2006]. Operations like drill-down, rollup, slicing, dicing and pivoting are implemented using Oracle Spatial aggregate functions and spatial operators [Oracle, 2009b]. The Spatial OLAP Visualization and Analysis Tool (SOVAT) [Scotch and Parmanto, 2005, Scotch, 2006] is a research application that combines On-Line Analytical Processing (OLAP) with Geographic Information Systems (GIS) capabilities to enhance the data retrieval and knowledge discovery process for the user. The SOVAT interface allows the user to perform spatial drill-down and drill-up operations. In addition, SOVAT contains other unique operations not included in standard OLAP. One of them is the drill-out operation [Scotch et al., 2008], which allows users to click on a map object such 24.

(38) CHAPTER 2. RELATED WORK. as a county and submit queries that contain both numerical and spatial aggregation. To perform drill out on, for example, a region A, SOVAT would first identify the regions that border the region A and then would aggregate the numerical measures for each bordering region. The drawback of these three proposals is that they do not allow performing queries that integrate spatial operators (e.g. topological or metric operators) and analytic ones (e.g. average or rank). The next proposal offers this integration. The Geographic Multidimensional Query Language (GeoMDQL) [Da Silva, 2008] was defined in the context of the GOLAPA architecture [Da Silva et al., 2004, 2005, 2006, Fidalgo, 2005, Fidalgo et al., 2004a,b] and it is based on well-established standards: MDX [Whitehorn et al., 2004, Microsoft Corporation, 2009] and OGC’s Simple Feature Specifications for SQL [Open GIS Consortium, 1999]. This query language has the capability to integrate both multidimensional and spatial operators in a unique syntax. This integrated syntax provides a means of defining queries that are shorter, neater and easier to understand and write. GeoMDQL defines a set of operators which includes:. the multidimensional. operators from MDX; the MDX operators that needed to be re-implemented to allow the manipulation of geographic data; geographic operators; and new operators defined specifically for GeoMDQL. The last group of operators was specified using two criteria: i) combining operators from the multidimensional and geographic areas; and ii) applying a multidimensional operator to spatial data and vice-versa [Da Silva et al., 2009]. GeoMDQL’s operators can be divided into three groups:. i) geographic operators. (i.e., topological, positional and metric operators along with operators that generate new geometries); ii) multidimensional operators; and iii) geographic multidimensional operators.. SOLAP Aggregate Functions A standard set of operations for spatial aggregation has not been defined yet. However, some proposals were found when reviewing the literature. Authors like Chen et al. [2002] and Pestana and da Silva [2005] classify the spatial aggregate functions into three categories: distributive, algebraic and holistic.. Examples of distributive operations 25.

(39) CHAPTER 2. RELATED WORK. include convex hull, geometric union, and geometric intersection.. Some algebraic. operations are center of n geometric points and center of gravity, while examples of spatial holistic operations include equi-partition and nearest-neighbor index. Shekhar et al. [2002] propose the same classification and present similar operations for each category. Figure 2.1 shows some examples of each category for the GIS domain. Table 2.1: Aggregate functions for the GIS domain. Adapted from [Shekhar et al., 2002]. Da Silva et al. [2008] introduce a new proposal for the definition of spatial aggregate functions, as it was mentioned earlier. The idea behind this approach is to apply the Cartesian product of two groups of functions. The first group is composed of well known OLAP operations (distributive, algebraic and holistic) whereas the second group comprises spatial operations classified according to their number of parameters and their return type based on the work of Rigaux et al. [2002]. This combination results in 21 categories of spatial aggregate functions as can be seen in Figure 2.11. After regrouping the operations considering only the return type, the number of categories is reduced to six. Besides the operations that result from this combination, other operations are also considered, they are: NNearestNeighbor, NDistantNeighbor, NGeoGrouping and Voronoi. The drawback of this approach is that each category has a great number of possible operations resulting in a very large group of spatial aggregate functions. According to Clementini et al. [1993], it is better to have an overloaded set of just a few basic relationships which the user understands well. When a language has too many operators, it is hard for users to remember (and consequently use) them all.. 2.4. A Comparative Analysis of TDW and SDW Proposals. After having reviewed different proposals of temporal and spatial data warehouses, we will now present a comparative analysis considering them. This comparative analysis 26.

(40) CHAPTER 2. RELATED WORK. Figure 2.11: Aggregate functions for GeoMDQL. Adapted from [Da Silva, 2008]. takes into consideration the following criteria: i) whether or not the data model supports schema evolution/versioning (e.g. adding levels or deleting dimensions); ii) whether or not it supports instance evolution (e.g. changing the value of the product’s name in a Product level); iii) if it considers spatial aspects (i.e., spatial measures and/or spatial dimensions); iv) if it offers a graphical representation based on a multidimensional view of data; and finally, v) if a CASE modeling tool is available for the model. The comparative analysis is shown in Table 2.2.. Table 2.2: Comparative analysis of existing TDW and SDW proposals.. 27.

(41) CHAPTER 2. RELATED WORK. As can be seen, the proposals from Vaisman [2001], Body et al. [2003] and Eder et al. [2002] support both schema and instance evolution. As a matter of fact, the last two not only support evolution but also versioning, which means that they are able to track history and also compare data along different periods. Schema modifications are out of the scope of the present work and can be seen as a suggestion for future work. The proposal from Blaschka et al. [1998] focuses on schema changes but no reference to instance modifications is mentioned in their work. As stated before, one of the objectives of this work is to propose a model that is able to represent changes in dimension’s geometric attributes (i.e., dimension instances). Among the proposals that deal with instance evolution, only Malinowski and Zim´anyi [2006a]’s temporal extension of the MultiDimER model constitutes a conceptual model that permits a graphical representation of temporal multidimensional data. This conceptual data model allows users to represent fact tables, levels, dimensions and different kinds of hierarchies. The drawback of this temporal extension is that it does not consider spatial elements. Enhancing this proposal with spatial aspects would provide a spatiotemporal conceptual data model able to represent a wider range of scenarios, enriching the decision making process. Between the two proposals that deal with spatial aspects, only GeoDWM offers a CASE modeling tool that automates the definition of SDW schemas. However, in GeoDWM it is unclear how dimension hierarchies and levels are modeled.. On the. other hand, MultiDimER’s spatial extension presents similar advantages to its temporal extension: it permits the representation, in a graphical way, of fact tables, levels, dimensions and hierarchies that have spatial characteristics. However, temporal support is not considered in this spatial extension. As a consequence of this comparative analysis it was concluded that in order to define a spatiotemporal conceptual model that permits a graphical representation of multidimensional data, a good approach is to base it on MultiDimER’s temporal and spatial extensions. Besides, this proposal could be enhanced with the implementation of a CASE modeling tool that automates the schema definition.. 28.

(42) CHAPTER 2. RELATED WORK. 2.5. Conclusion. This chapter presented a detailed literature review concerning temporal and spatial data warehouses. As could be seen, the existing proposals do not allow the definition and graphical representation of conceptual models that have temporal as well as spatial characteristics simultaneously. This chapter also presented a review of existing SOLAP operations and SOLAP aggregate functions. A comparative analysis of the most relevant temporal and spatial data warehouse proposals was also presented. After performing this analysis it was concluded that a conceptual spatiotemporal model could be proposed based on the temporal and spatial extensions of the MultiDimER model. A CASE modeling tool could also be developed for the model. Chapter 3 will introduce our proposed conceptual spatiotemporal data model. A model of this type permits representing evolutions over time of spatial attributes in dimensions. Moreover, the implementation details of our CASE modeling tool will be presented. This tools automates the definition of conceptual schemas in conformity with CSTM.. 29.

(43) CHAPTER 3. THE CSTM CONCEPTUAL MODEL AND CASE TOOL. Chapter. 3 The CSTM Conceptual Model and CASE Tool 3.1. Introduction. This section presents our proposed conceptual data model for spatiotemporal data warehouses. As seen in the related work section, there is a need to propose a conceptual model that allows modeling scenarios with both temporal and spatial characteristics. Our model handles this problem by introducing the concepts of spatiotemporal levels, hierarchies, dimensions and relationships between levels. A metamodel, which defines the construction rules of the conceptual spatiotemporal schemas, is introduced. Moreover, a comparative analysis considering our proposed model and the proposals reviewed in the previous chapter is performed. This chapter also presents the characteristics of the Conceptual SpatioTemporal Modeling Tool (CSTMTool). This tool allows the definition of conceptual schemas that are in conformity with CSTM. Another characteristic of this tool is that it permits the automatic generation of the conceptual schema’s corresponding logical schema using an object relational approach. This chapter is organized as follows. Section 3.2 presents the proposed conceptual data model. A metamodel and a comparative analysis are also discussed. Section 3.3 presents our CASE modeling tool. Finally, section 3.4 concludes the chapter.. 3.2. The Proposed Conceptual Data Model. The proposed conceptual data model is called Conceptual SpatioTemporal Model (CSTM) and it is based on the temporal [Malinowski and Zim´anyi, 2006a] and spatial [Malinowski 30.

(44) CHAPTER 3. THE CSTM CONCEPTUAL MODEL AND CASE TOOL. and Zim´anyi, 2004a] extensions of the MultiDimER model [Malinowski and Zim´anyi, 2004b]. The innovation of this proposal consists in the ability to model scenarios that have spatial and temporal characteristics at the same time. To accomplish this simultaneous representation of time and space, levels that have spatial, temporal and spatiotemporal attributes are considered. We are particularly interested in changing geometries stored in dimensions which leads to the definition of spatiotemporal levels, hierarchies, dimensions and relationships between levels.. 3.2.1. Spatiotemporal Concepts. This section will introduce the concepts of spatiotemporal attributes, levels, hierarchies, dimensions and relationships between levels. In CSTM, levels contain attributes, which could be Key or Descriptive. As seen in the related work chapter while describing the MultiDimER model, a key attribute of a parent level shows how child members are grouped. A key attribute in a leaf level indicates the aggregation level of a measure in the associated fact relationship. A descriptive attribute in turn, contains additional features of a level (e.g. store number in a Store level or the geometry of a city in a City level). Both of these types (key and descriptive) are classified into Temporal and Non-Temporal attributes. In our model, besides being temporal or non-temporal, a descriptive attribute could also be Spatial (represented by a geometry) or Non-Spatial (alphanumeric data types). As a consequence, the proposed classification allows the representation of spatiotemporal attributes by having Descriptive-Temporal -Spatial attributes (i.e., spatial attributes that change along time). A level is spatiotemporal if it has a spatial attribute and, if and only if, this attribute is time-varying.. A hierarchy is spatiotemporal if it has at least one. spatiotemporal level. Similarly, a dimension is spatiotemporal if it has at least one spatiotemporal hierarchy or if the level that is comprised of is spatiotemporal, in the case of dimensions without hierarchies. A relationship between levels could consider temporality and, if the levels are spatial, a topological relationship (which exists between the level’s spatial components). Having this in mind, a relationship between levels is spatiotemporal if it is a temporal relationship which connects two spatial levels (i.e., the relationship has both 31.

(45) CHAPTER 3. THE CSTM CONCEPTUAL MODEL AND CASE TOOL. Figure 3.1: The CSTM Notations. the temporal and spatial components). In our model, the relationship between two levels is also characterized by cardinalities. As in the original model, two types of cardinalities between levels are considered: the Snapshot cardinality and the Lifespan cardinality. The former is considered for every time instant while the latter, over the relationship’s lifespan. Figure 3.1 shows the CSTM model’s notations, which are base on the ER model. They are divided into two groups: i) the notations that were inherited from the MultiDimER model, and ii) the new spatiotemporal notations proposed.. Besides. representing levels (Figure 3.1a), temporal levels (Figure 3.1b), spatial levels (Figure 3.1c), fact relationships (Figure 3.1d) and temporal and non temporal cardinalities (Figure 3.1e), CSTM is able to represent levels with temporal and spatial attributes (Figure 3.1f), spatiotemporal levels (Figure 3.1g) and spatiotemporal relationships (Figure 3.1h). In the case of spatiotemporal attributes, a temporal icon (symbolizing a clock) is placed next to the spatial icon taking into consideration the MADS proposal [Parent et al., 1999] Figure 3.2 shows an example of a spatiotemporal hierarchy. The spatial data types associated to the levels are the following: point for Client, simple area for City and area set for State. The spatial attribute Point, which represents the client’s location, happens to also be a time-varying attribute (i.e., it is a spatiotemporal attribute). This 32.

(46) CHAPTER 3. THE CSTM CONCEPTUAL MODEL AND CASE TOOL. means that clients could change their location. The relationship between City and Client is spatiotemporal because it is a temporal relationship (represented by the VT symbol) which connects two spatial levels. This relationship has a many-to-one snapshot cardinality and a many-to-many lifespan cardinality (i.e., at each instant a client lives in exactly one city, but they may live in several cities over their entire lifespan). There is a containment topological relationship between these two levels (i.e., a city contains many clients). As in the original model, the Contains relationships is the default relationship.. Figure 3.2: Example of a spatiotemporal hierarchy. In the same dimension, spatial and temporal levels can be combined. This means that the same dimension could have a temporal level (i.e., a level that considers lifespan or has a timestamped attribute) and also a spatial level (i.e., a level that has an attribute of geometric type). Furthermore, in the same level, spatial and temporal attributes could coexist. In this case, the spatial attribute is not necessarily a temporal attribute (like in the case of spatiotemporal levels). An example of this is seen in Figure 3.3. Country and Province are both spatial levels. The former has an area set associated to it and the latter, a simple area. Countries and provinces could cease to exist, merge or split 33.

(47) CHAPTER 3. THE CSTM CONCEPTUAL MODEL AND CASE TOOL. (e.g., the Federal Republic of Germany and the German Democratic Republic) which is represented by the lifespan (LS) symbol. The relationship between a country and a province is temporal and it has a many-to-many lifespan cardinality, which permits dealing with the reassignment of provinces to new countries. Furthermore, the attribute Country name in Country level (as well as Province name in Province level) could also change (e.g., Republic of Zaire, former name of the Democratic Republic of the Congo) which is represented by the VT symbol.. Figure 3.3: Example of levels with temporal and spatial properties. In the case of measures, their valid time is given by the time dimension. Even though some authors consider that spatial measures may be represented by a numerical value that is calculated using spatial or topological operators [Rivest et al., 2001, Malinowski and Zim´anyi, 2004a], our proposal considers that spatial measures are only those measures represented by a geometry.. Spatial measures that result when applying a spatial. computation are treated like conventional numeric measures. Our proposed taxonomy of spatiotemporal OLAP operators, which will be presented in Section 3.3, considers that spatial and spatial-multidimensional operations are applied to spatial measures. As a consequence, spatial measures must necessarily be represented by a geometry. It is important to mention that, differently from the original model, the present works considers valid time exclusively. As a following section will show, the spatiotemporal operators proposed are independent from the type of temporal support considered.. For example, the operator STMetric (spatiotemporal metric property) 34.

Referências

Documentos relacionados

The probability of attending school four our group of interest in this region increased by 6.5 percentage points after the expansion of the Bolsa Família program in 2007 and

Na hepatite B, as enzimas hepáticas têm valores menores tanto para quem toma quanto para os que não tomam café comparados ao vírus C, porém os dados foram estatisticamente

Cette liste des formes les moins employées des chansons (par rapport aux autres émetteurs, bien entendu) suffit à faire apparaitre l’énorme décalage qui les sépare des autres

Extinction with social support is blocked by the protein synthesis inhibitors anisomycin and rapamycin and by the inhibitor of gene expression 5,6-dichloro-1- β-

Como existe uma grande diferença no comportamento da gasolina com etanol e da gasolina pura quando esta entra em contato com a água, sugere-se que, nos trabalhos futuros sejam

Tidal analysis of 29-days time series of elevations and currents for each grid point generated corange and cophase lines as well as the correspondent axes of the current ellipses

Quer isto dizer que também recentemente, apesar da relação terapêutica (terapeuta-paciente) ser considerada um bom preditor de sucesso e eficácia para o resultado das

5.13 A vague region stored as a multiple geometry and an array of membership values.183 5.14 The logical design for the vague spatial attribute with monovalued certitude and