• Nenhum resultado encontrado

Migration of relational databases to NoSQL - methods of analysis

N/A
N/A
Protected

Academic year: 2021

Share "Migration of relational databases to NoSQL - methods of analysis"

Copied!
9
0
0

Texto

(1)

Deposited in Repositório ISCTE-IUL:

2018-07-16

Deposited version:

Post-print

Peer-review status of attached file:

Peer-reviewed

Citation for published item:

Oliveira, F., Oliveira, A. & Alturas, B. (2017). Migration of relational databases to NoSQL - methods of

analysis. In G. Senatore, M. Cilento & M. Cara (Ed.), 7th International Conference on Human

and Social Sciences. (pp. 121-128). Barcelona: Richtmann Publishing.

Further information on publisher's website:

--Publisher's copyright statement:

This is the peer reviewed version of the following article: Oliveira, F., Oliveira, A. & Alturas, B.

(2017). Migration of relational databases to NoSQL - methods of analysis. In G. Senatore, M. Cilento

& M. Cara (Ed.), 7th International Conference on Human and Social Sciences. (pp. 121-128).

Barcelona: Richtmann Publishing.. This article may be used for non-commercial purposes in

accordance with the Publisher's Terms and Conditions for self-archiving.

Use policy

Creative Commons CC BY 4.0

The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for personal research or study, educational, or not-for-profit purposes provided that:

• a full bibliographic reference is made to the original source • a link is made to the metadata record in the Repository • the full-text is not changed in any way

The full-text must not be sold in any format or medium without the formal permission of the copyright holders. Serviços de Informação e Documentação, Instituto Universitário de Lisboa (ISCTE-IUL)

(2)

1

Migration of Relational Databases to NoSQL - Methods of Analysis

2

3

Fábio Oliveira

4

5

Instituto Universitário de Lisboa (ISCTE-IUL),

6

Lisboa, Portugal

7

8

Abílio Oliveira

9

10

Instituto Universitário de Lisboa (ISCTE-IUL), and

11

Information Sciences,Technologies and Archictecture

12

Research Centre (ISTAR-IUL), Lisboa, Portugal

13

14

Bráulio Alturas

15

16

Instituto Universitário de Lisboa (ISCTE-IUL), and

17

Information Sciences,Technologies and Archictecture

18

Research Centre (ISTAR-IUL), Lisboa, Portugal

19

20

21

Abstract

22

23

The amount of data to store, organize and manage in any organization, is very high and increases every

24

day, fact well-known by companies as Facebook, Google or SAS. With this current growth rate,

25

technologies must adapt to the amount of disposable data, and a new approach to information

26

processing is required. Big Data technologies are more focused, and this is a reason for a greater

27

spread of NoSQL database models. The purpose of this article is to validate the existing (and already

28

used) migration methods and to adapt them, to understand the most efficient method to migrate a

29

relational database to a NoSQL database. We will show the methodology used and what were the steps

30

followed for the implementation, as well as the configuration of the environment used during the tests.

31

Results show that in this migration process, the most efficient method is what is referred to as automatic

32

offline migration. However, it requires a window of unavailability greater than the method of online

33

migration, which in turn requires more resources from the operating system to migrate. Therefore, the

34

most efficient method to migrate a database will depend on the application availability, and the

35

computational resources available for it. We hope to make an important contribution in helping to choose

36

a migration method to use, and the metrics that can be collected to better evaluate the performance of a

37

migration.

38

39

Keywords: Migration, Methods, Metrics, Relational databases, NoSQL.

40

41

42

1. Introduction

43

44

Nowadays there seems to be data everywhere. Data that is organized and structured in

45

information. How do I save, manage, transfer, and use them in a timely manner in a variety of

46

contexts, particularly in large organizations or enterprises? It is estimated that the volume of data is

47

growing 40% per year, and is expected to grow 44 times between 2009 and 2020 (Manyika et al.,

48

2011).

49

Given the number of databases built based on the relational model, still predominant, it

50

exemplifies and contextualizes the possibility of migration to the NoSQL (Not Only Structured Query

51

Language) model, which are gradually adopted in medium and large companies.

52

(3)

1.1 Relational database:

58

59

In 1985, Edgar Frank Codd published an article where he defined thirteen rules for a DBMS to be

60

considered relational: (Codd, 1982, 1990)

61

1. Fundamental Rule

62

2. Rule of information

63

3. Access guarantee rule

64

4. Systematic treatment of null values

65

5. Online dynamic catalogue based on relational model

66

6. Comprehensive sub-language rule

67

7. View Update Rule

68

8. Inserting, updating, and deleting high level

69

9. Independence of physical data

70

10. Logical independence of data

71

11. Independence of integrity

72

12. Independence of distribution

73

13. Rule of non-subversion.

74

Also, every relational database must guarantee four characteristics in a transactional, that is

75

known as ACID:

76

Atomicity: All the tasks of a transaction are executed or none of them are executed.

77

Consistency: The operation takes the database from one consistent state to another equally

78

consistent.

79

Insulation: The effect of a transaction is not visible to other transactions until it is committed.

80

Durability: Changes made by committed transactions are permanent.

81

All the above features had a cost, this cost ends up generating a cost so that they are

82

guaranteed, and this is necessary for the correct maintenance of the above materials.

83

84

1.2 NoSQL

85

86

The NoSQL are no relational databases and appeared as a solution of scalability to the relational

87

model, because these do not follow the consistency ACID, but the CAP Theorem (Moniruzzaman &

88

Hossain, 2013).

89

The CAP Theorem:

90

C – Consistency

91

A – Availability

92

P – Network Partition

93

BASE Property:

94

BA – Basically Available

95

S – Soft-State

96

E – Eventually Consistent

97

98

1.3 Migration of relational databases to NoSQL

99

100

Figure 1 lists some of the projects that have undergone migration from the relational model to the

101

NoSQL model.

102

103

104

(4)

105

106

Figure 1: Migrated databases from relational to NoSQL.

107

Source: (MongoDB, 2015)

108

109

In that article, we can see that there is no information about the migration methods, about the

110

scenario configuration, the machine configuration or same the metrics collected during the

111

execution, so we cannot have an investigation under this situation. We propose here a valid method

112

that we can measure the migration a have same conclusion as performance, between other.

113

114

1.4 Migrated scenarios

115

116

Figure 2 lists the scenarios that we have used during our labs tests.

117

118

119

120

Figure 2: Migrated base Infrastructure:

121

122

123

 Scenario OLTP - On-line Transaction Processing - TPC-C.

124

This scenario is compose basically by sort transaction and a complete reference can be

125

obtain in the official documentation in tpc.org (TPC, 1992a).

126

 Scenario OLAP - On-line Analytical Processing - TPC-H.

127

Here we have a data warehouse scenario, where there is aggregation and load process

128

we also can check this one in the tpc.org (TPC, 1992b).

129

 Scenario HTAP - Hybrid Transactional Analytical Processing - TPC-C + TPC-H.

130

This method is the previous two scenarios working together, a complete description is

131

(5)

138

139

Figure 3: Migrated method by mongoDB:

140

Source: (MongoDB, 2015)

141

142

Our migration methods are based on the original documentation, we also include a new method,

143

this one using online our skill to create a single migration code.

144

 Online migration with continuous synchronization.

145

In this method, the application is always working and connected to the database, using the

146

database resource in parallel with the migration.

147

 Offline migration through tool.

148

Here we have the application completely down e all the computation power is used by the

149

migration database and software.

150

 Manual offline migration through scripting.

151

Also, we have the application completely down e all the computation power is used by the

152

migration database and script executed by us.

153

154

1.6 Migration metrics

155

156

Table 1 lists the usage metrics in this work. We have collected a series of metrics that can provide

157

us to answers a set of questions about the performance and the migration times.

158

159

Table 1: Migration metrics

160

161

Metric Main References

Baseline (Yaqub, 2012)

CPU (Rodrigues, 2009; Yaqub, 2012)

Disk(I/O) (Rodrigues, 2009; Yaqub, 2012) Network (Rodrigues, 2009; Yaqub, 2012)

Memory added by the researcher

Latency added by the researcher

(6)

2. Main Objective

162

163

Emulate and analyze a database migration from relational database to NoSQL database, and with

164

that result, we can answer what is the most efficient method for migrating a relational database to a

165

NoSQL database?

166

167

3. Method

168

169

We have a case study of a qualitative nature, in which researchers define the starting point

170

according to their own experience, or situations related to their practical life. Although in a case

171

study different data collection techniques can be used, in this research we privilege the technique of

172

observation, since we submit the data to tests and observe the results from it.

173

We carried out an extensive qualitative study with a sample of 9 cases of migration, where we

174

started with three application scenarios to migrate using three different migration methods. In all 9

175

cases metrics were collected and the windows and migration were defined and which processes

176

would be running in each window, so that with this division we could measure and analyze the

177

proposed metrics.

178

179

4. Results

180

181

In summary, we have verified the performance of three possible migration methods between

182

relational and NoSQL databases, using a set of metrics that provide us with a detailed view of

183

resource consumption as well as details about the migration process.

184

From the theoretical point of view, the main contribution of this study is to show how methods

185

of migrating relational databases to NoSQL and their associated metrics can be used, since these

186

methods can be applied in several DBMS systems, not being linked only to this work, or to the

187

software used here.

188

189

4.1 Identify the requirements and the various phases of a migration:

190

191

After the state of the art survey, several points were collected during a migration from relational

192

database to NoSQL, based on other studies, as well as on processes that suppliers or software

193

maintainers indicate as the way forward - being able to consult the reference in the listing below.

194

The following points were gathered to be validated as requirements for a migration, coming from

195

some research done, and others added by the author:

196

 Planning (Antaño, Castro, & Valencia, 2014; C. S. de Oliveira & Marcelino, 2012)

197

 Number of records / Initial situation (Antaño et al., 2014)

198

 Mapping the data types (Davenport & Dyche, 2013; Gomes, 2011; Pereira, 2014)

199

Restrictions and triggering (Antaño et al., 2014)

200

 Character encoding (Antaño et al., 2014; Neto, Neto, Junior, & Oliveira, 2013)

201

 Tests (added by the researcher)

202

 Implementation (added by the researcher)

203

 Partial and Final Monitoring / Validation (added by the researcher)

204

 Staging area / No staging area (added by the researcher)

205

 Failures during migration (added by the researcher)

206

 Data modelling (Gomes, 2011)

207

We can divide the migration method in phases, as there is four phases for the online method

208

and two for the offline methods.

209

(7)

212

213

Figure 4: Phases of the online migration with continuous synchronization method:

214

215

216

217

Figure 5: Phases of the offline migration through tool method:

218

219

220

221

Figure 6: Phases of the manual offline migration through scripting method:

222

223

4.2 Check the possible methods of migrating from a relational database to a NoSQL.

224

225

To carry out a literature review, to research in theses of the area, to validate the documentation of

226

the main software developers of the thematic area and to filter those that best assist in the framing

227

of this work. With that done, we worked with three migration methods as describe in the chapter

228

“1.5 Migration methods”.

229

(8)

4.3 Evaluate each of these methods according to the steps followed in each phase of the

231

migration - using specific and appropriate metrics in each case.

232

233

In order to respond to this objective, we analyzed the metrics collected in each of the phases, a

234

small sample of these metrics will be visualized here, and however, all the metrics can be analyzed

235

in the work done in (F. V. de Oliveira, 2017).

236

In figure 7 we can see the CPU consumption during the first phase of the OLTP scenario

237

migration through the online migration method.

238

239

240

241

In this metric, we can clearly see the amount of CPU consumed by the application, which is

242

identified by the "CPU% user" event, and we can also see the amount of CPU consumed by the

243

operating system, the amount of CPU responsible for the I / O used.

244

In figure 8 we have the disk activity, that is, how much writing activity and how much reading

245

activity we have in this phase of the migration, and here we can notice the reading behavior.

246

247

248

249

4.4 Compare the various methods according to the established metrics.

250

251

We can compare each phase against any other method in the same phase, with that, we can

252

evaluate which phase consume more resource in any phase, of course, comparing the same phase

253

(9)

method. However, this requires a large window of unavailability, and if it is not possible to stop the

260

application or even the database during the migration process, the online method will be the one

261

indicated. To answer this question, we must first ask the following question:

262

263

"What window of unavailability is available for the migration?"

264

265

5. Conclusions

266

267

We have verified the performance of three possible migration methods between relational and

268

NoSQL databases, using a set of metrics that provide us with a detailed view of resource

269

consumption as well as details about the migration process.

270

From the theoretical point of view, the major contribution of this study is to show how methods

271

of migrating relational databases to NoSQL, and their associated metrics can be used, since these

272

methods can be applied in several systems.

273

We can thus respond, after applying and analysing metrics in migration cases and migration

274

methods proposed here, that the most efficient migration method is the automatic offline migration

275

method. However, this requires a large window of unavailability, and if it is not possible to stop the

276

application or even the database during the migration process, the online method will be the one

277

indicated. With this conclusion, we understand that it is always necessary to validate resources and

278

unavailability as requirements of the project and not the migration itself, because it can happen

279

without or with unavailability, however, with direct reflection in migration times.

280

281

References

282

283

Antaño, A. C. M., Castro, J. M. M., & Valencia, R. E. C. (2014). Migracion de Bases de Datos SQL a NoSQL.

284

Revista Tlamati, Especial 3, 144–148.

285

Codd, E. F. (1982). Relational database: a practical foundation for productivity. Communications of the ACM, 25(2),

286

109–117.

287

Codd, E. F. (1990). The relational model for database management: version 2. United States of America:

Addison-288

Wesley.

289

Davenport, T. H., & Dyche, J. (2013). Big Data in Big Companies. International Institute for Analytics.

290

Gomes, P. F. L. (2011). Migração de aplicações legadas para bases de dados NoSQL. Universidade do Minho.

291

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big data: The next

292

frontier for innovation, competition, and productivity. Retrieved January 9, 2016, from

293

https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-294

innovation

295

MongoDB. (2015). RDBMS to MongoDB Migration Guide. A MongoDB White Paper, 16.

296

Moniruzzaman, A. B. M., & Hossain, S. A. (2013). NoSQL Database : New Era of Databases for Big data Analytics-

297

Classification , Characteristics and Comparison. International Journal of Database Theory and Application,

298

6(4), 14.

299

Neto, P. de A. dos santos, Neto, J. R., Junior, F. das C. R., & Oliveira, P. A. (2013). Requisitos para ferramentas de

300

Migração de Dados. In IX Simpósio Brasileiro de Sistemas de Informação (pp. 887–898). Teresina.

301

Oliveira, C. S. de, & Marcelino, M. A. (2012). Metodologias e Extratégias de Migração de Dados. Sinergia

302

(CEFETSP), 13(3), 183–191. Retrieved from www2.ifsp.edu.br/edu/prp/sinergia

303

Oliveira, F. V. de. (2017). Migração de bases de dados relacionais para NoSQL - Métodos de Análise. ISCTE-IUL

304

Instituto Universitário de Lisboa.

305

Pereira, D. J. P. (2014). Armazens de dados em bases de dados NoSQL. Instituto Superior de Engenharia do

306

Porto.

307

Rodrigues, R. A. B. (2009). Métricas e Ferramentas Livres para Análise de Capacidade em Servidores Linux.

308

Universidade Federal de Lavras.

309

TPC. (1992a). TPC-C is an On-Line Transaction Processing Benchmark. Retrieved May 29, 2017, from

310

http://www.tpc.org/tpcc/

311

TPC. (1992b). TPC-H is a Decision Support Benchmark. Retrieved January 11, 2016, from http://www.tpc.org/tpch/

312

Wikipedia. (2017). Hybrid Transactional/Analytical Processing (HTAP). Retrieved June 22, 2017, from

313

https://en.wikipedia.org/wiki/Hybrid_Transactional/Analytical_Processing_(HTAP)

314

Yaqub, N. (2012). Comparison of Virtualization Performance: VMWare and KVM. Signal Processing. Univesity of

315

Oslo.

316

Imagem

Figure 2: Migrated base Infrastructure:
Table 1 lists the usage metrics in this work. We have collected a series of metrics that can provide
Figure 4: Phases of the online migration with continuous synchronization method:

Referências

Documentos relacionados

Repare o que sucede na maioria das vezes e em Portugal cada vez mais é que os edifícios são postos á praça ou são entregues para ser construídos sem se ter a ideia de quanto é

Environmental factors recorded by RHS at reach scale such as the presence and complexity of riparian galleries, habitat and flow heterogeneity, are particularly important drivers

Segundo a Assembléia Geral da Organização das Nações Unidas - ONU, no documento “Normas sobre a Equiparação de Oportunidades para Pessoas com Deficiência”

Neste artigo, sem nenhuma pretensão biográfica, faço uma análise de alguns elementos que influenciaram o pensamento e as ações de Omar Catunda para desenvolver um sistema edu-

Uma maneira de evidenciar sua responsabilidade e preocupação relacionada à questão social e ambiental está no Relatório de Administração (RA), instrumento de

(2017) examined the associations of smoking-related DNA methylation biomarkers and frailty in a population of older adults classified according to FI, and observed that

Dois dos três ovínos aparentemente sadios, da mesma região, têm valores subdeficientes de cobre (n.° 594 e 603). Os valores de cobalto obtidos em tôdas as amostras de

39 ) Liderança e Chefia destacou-se sobremaneira como tema preferido, demonstrando mais uma vez a busca de auto-íirna- ção do homem através de suas realizações,