• Nenhum resultado encontrado

Performing a Space-Time Analysis to understand drug consumption trough hospital releases and national budget expenses at municipalities level in Mexico

N/A
N/A
Protected

Academic year: 2023

Share "Performing a Space-Time Analysis to understand drug consumption trough hospital releases and national budget expenses at municipalities level in Mexico"

Copied!
75
0
0

Texto

(1)

Master Degree Program in Data Science and Advanced Analytics

Performing a Space-Time Analysis to understand drug

consumption trough hospital releases and national budget expenses at municipalities level in Mexico

Edgardo Arellano Juárez

Project Work

presented as partial requirement for obtaining the Master Degree Program in Data Science and Advanced Analytics

NOVA Information Management School

Instituto Superior de Estatística e Gestão de Informação

Universidade Nova de Lisboa

MDSAA

(2)

i NOVA Information Management School

Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa

PERFORMING A SPACE-TIME ANALYSIS TO UNDERSTAND DRUG CONSUMPTION TROUGH HOSPITAL RELEASES AND NATIONAL

BUDGET EXPENSES AT MUNICIPALITIES LEVEL IN MEXICO

by

Edgardo Arellano Juárez (M20200749)

Project Work as partial requirement for obtaining the Master’s degree in Advanced Analytics, with a Specialization in Data Science

Supervisor: Pedro Cabral

(3)

ii

November 2022

STATEMENT OF INTEGRITY

I hereby declare having conducted this academic work with integrity. I confirm that I have not used plagiarism or any form of undue use of information or falsification of results along the process leading to its elaboration. I further declare that I have fully acknowledge the Rules of Conduct and Code of Honor from the NOVA Information Management School.

Edgardo Arellano Juárez Saltillo, Coahuila, México

November 29, 2022

(4)

iii

DEDICATION

I dedicate this work to my parents Martin Edgardo Arellano Borrego and Esperanza Juarez cruz, who gave me their support to fullfill my studies abroad and make all this possible. Also to my brother and sister Eduardo and Claudia Arellano Juarez that supported me to focus on my studies. As to all the friends I made trough this process in Portugal, Mexico and all around the world. Finally a last dedication to all the victims of drug trafficking and military force that never had the opportunities that I enjoy now. For all of them, this work and the future ones.

(5)

iv

ACKNOWLEDGEMENTS

First of all I would like to express my grattituted with my supervisor Pedro Cabral, for his support and patience during this last couple months. Also to the my core Universities NOVA IMS and ITESM for the facilitation of access to diferent trusth worthy sources of academic information.

(6)

v

ABSTRACT

Mexico is internationally recognized for its delicious food, vibrant culture, and its historical problem with drug trafficking, something well known also by Mexican politicians and other decision makers.

That's why understanding how the drugs market distributes over the country is key to define how the intelligence, military, and public resources will be rearranged, when necessary, therefore this project proposes an approach that will help develop said understanding trough a Spatial analysis on Public Health and the relation to its economy. The objective of this paper is to confirm a significant Spatial Autocorrelation of the Releases from Hospitals and Deaths related to drug consumption and also for the Federal Budget Expenses in the whole Mexican territory at a municipality scale from 2010 to 2020, and to validate if these variables are linearly correlated. Based on the previous statement Geographic Information System tools are used to confirm Global and Local Spatial Autocorrelation, using Local Moran's Index to identify relevant clusters in Space and Time and employing Spearman’s rank to measure the correlation between variables. The research resulted in the detection and classification of significant clusters and priority states in Space and Time that works as a guide for further targeted actions according to cluster’s characteristics regarding the variables of interest. Also, the Spearman's rank showed no lineal correlation between the studied variables at different levels, what calls out on the actual dismiss of the Drug Abuse records when using and distributing the Federal Budget, an issue that, if not addressed, can perpetuate the drug presence in their localities.

KEYWORDS

Drugs; Federal Budget; Spatial Autocorrelation; Mexico; Spearman Correlation; Space-Time Cube

(7)

vi

INDEX

1. Introduction ... 1

2. Methodology ... 5

2.1 Data ...5

2.2 Methods ...7

2.2.1 Space Analysis ...7

2.2.1.1 Spatial Autocorrelation ...7

2.2.1.2 Spatial Clustering Algorithms ...9

2.2.1.3 Space-time Cube ...11

2.2.2 Time Analysis ...11

2.2.2.1 Time Global and Local Entities Trend ...12

2.2.2.2 Time Series Clustering ...12

2.2.3 Space-Time Analysis ...15

2.2.4 Correlation ...16

3. Results and discussion ... 17

3.1 Space Analysis ...17

3.2 Time Analysis ...24

3.3 Space-Time Analysis ...34

3.4 Spatial and Time Analysis Results ...41

3.5 Correlation... 43

4. Conclusions ... 44

5. Limitations and recommendations for future works ... 45

References ... 47

Appendix (optional) ... 52

Annexes (optional) ... 63

(8)

vii

LIST OF FIGURES

Figure 1 Percentage of the polled urban students that consumed a specific drug at least once..

... 2

Figure 2: Percentage of the polled rural students that consumed a specific drug at least once..

... 2

Figure 3: Spatial Relation Conceptualiation Comparison. ... 8

Figure 4: Space-Time Cube example, signaling Latitude, Longitude and Time dimensions. ... 11

Figure 5: Spatial Autocorrelation Report for IDW for Drug Abuse variables. ... 17

Figure 6: Clusters and Outliers of the Drug Abuse cases per million people in Mexico from 2010

to 2020. ... 19

Figure 7: Moran Scatterplot for Drug Abuse cases per million people in Mexico from 2010 to

2020 Cluster and Outlier Analysis. ... 20

Figure 8: Clusters and Outliers of the Overall Budget Expenses per capita in Mexico from 2010

to 2020. ... 21

Figure 9: Moran Scatterplot for Overall Budget Expenses per capita in Mexico from 2010 to

2020 Cluster and Outlier Analysis. ... 22

Figure 10:

Clusters and Outliers of the Security Budget Expenses per capita in Mexico from 2010 to 2020.. ... 23

Figure 11: Moran Scatterplot for Security Budget Expenses per capita in Mexico from 2010 to

2020 Cluster and Outlier Analysis. ... 24

Figure 12: Global Trend for the Drug Abuse data aggregated by year ... 25

Figure 13: Global Trend for the Overall Budget Expense data aggregated by year ... 26

Figure 14: Global Trend for the Security Budget Expenses data aggregated by year ... 26

Figure 15: Time-Series Pseudo-F statitic by Number of Clusters based on Correlation as TS

distance for Drug Abuse data. ... 27

Figure 16: Average Time Series representing each cluster. ... 27

Figure 17: Clusters of the Drug Abuse per million persons in Mexico from 2010 to 2020.. ... 28

Figure 18 : Time-Series Pseudo-F statitic by Number of Clusters based on Correlation as TS

distance for Overall Budget Expenses data... 28

Figure 19: Average Time Series representing each cluster.. ... 29

Figure 20: Clusters of the Overall Budget Expenses per capita in Mexico from 2010 to 2020.

... 30

Figure 21: Time-Series Pseudo-F statitic by Number of Clusters based on Correlation as TS

distance for Security Budget Expenses data. ... 30

Figure 22: Average Time Series representing each cluster. ... 31

(9)

viii Figure 23: Clusters of the Security Budget Expenses per capita in Mexico from 2010 to 2020.

... 31

Figure 24: Up/Down Trend of each municipality by Drug Abuse cases in Mexico from 2010 to

2020.. ... 32

Figure 25: Up/Down Trend of each municipality by Overall Budget Expenses cases in Mexico

from 2010 to 2020. ... 33

Figure 26: Up/Down Trend of each municipality by Overall Budget Expenses cases in Mexico

from 2010 to 2020. ... 33

Figure 27: Space-Time Clusters and Outliers of the Drug Abuse per million people in Mexico

clustered yearly from 2010 to 2020. ... 34

Figure 28: Drug Abuse Multiple Types summarize by Ellipses. ... 35 Figure 29: Space-Time Clusters and Outliers of the Overall Budget Expenses per person in

Mexico clustered yearly from 2010 to 2020. Generated by ArcGIS Pro. ... 37

Figure 30: Overall Budget Expenses Multiple Types summarize by Ellipses. ... 37 Figure 31: Space-Time Clusters and Outliers of the Security Budget Expenses per person in

Mexico clustered yearly from 2010 to 2020. ... 39

Figure 32: Security Budget Expenses Multiple Types summarize by Ellipses. ... 39

(10)

ix

LIST OF TABLES

Table 1: Cluster and Outlier Analysis parameters comparisson. ... 17

Table 2: Drug Abuse Cases Information Table ... 19

Table 3: Overall Budget Expenses Information Table ... 21

Table 4: Security Budget Expenses Information Table ... 23

Table 5: Stationarity and White Noise Information Table ... 26

Table 6: States with a prescence of a single type of DA time cluster greater than 33%. ... 27

Table 7: States with a prescence of a single type of cluster, other than 3, greater than 33%.

... 29

Table 8: States with a prescence of a single type of SBE time cluster greater than 33%. ... 31

Table 9: Spearman Correlation Results Table, for Drug Abuse against Overall Budget Expenses

and Security Budget Expenses. ... 43

(11)

x

LIST OF ABBREVIATIONS AND ACRONYMS

AGCT The Mexican state of Aguascalientes.

BC The Mexican state of Baja California.

BCS The Mexican state of Baja California Sur.

C- If followed by a number means Cluster.

CDMX The Mexican state of Mexico City.

CHHA The Mexican state of Chihuahua.

CHPS The Mexican state of Chiapas.

CLMA The Mexican state of Colima.

COAH The Mexican state of Coahuila de Zaragoza.

DA Drugs Abuse.

DNGO The Mexican state of Durango.

GDL The Mexica state of Guadalajara.

GRRO The Mexican state of Guerrero.

GTO The Mexican state of Guanajuato.

-H- Makes reference to a High value or a High neighborhood.

HLGO The Mexican state of Hidalgo.

ISAC Incremental Spatial Autocorrelation.

JAL The Mexican state of Jalisco.

-L- Refers to a Low value or a Low neighborhood.

MEX The Mexican state of Mexico State.

MICH The Mexican state of Michoacan de Ocampo.

MRLS The Mexican state of Morelos.

NL The Mexican state of Nuevo Leon.

NYRT The Mexican state of Nayarit.

OABE Overall Federal Budget Expenses.

OAX The Mexican state of Oaxaca.

PBLA The Mexican state of Puebla.

SBE Public Security Federal Budget Expenses.

SL The Mexican state of San Luis Potosi.

SNLO The Mexican state of Sinaloa.

SNRA The Mexican state of Sonora.

TBSC The Mexican state of Tabasco.

TLAX The Mexican state of Tlaxcala.

TMPS The Mexican state of Tamaulipas.

VERA The Mexican state of Veracruz.

ZCS The Mexican state of Zacatecas.

(12)

1

1. INTRODUCTION

Mexico is internationally recognized for its delicious food, vibrant culture, and its historical problem with drug trafficking. As the closest neighbor in Latin America to the first economic power, United States of America, it represents a great opportunity for organized crime to transport their illegal merchandise, not only from South America, nor only drugs, but from various countries and products all around the world, to the USA as they manage to take advantage of the lack of control that persists in both countries (Voz de America - Editorial Staff, 2022). Even thou most of the drugs that passes through Mexico end up in the USA’s citizens pockets, Mexico has become a great market for illegal drugs consumption by its own means with a market whose most conservative estimations of income are around $10,000 and $15,000 Million of dollars per year (El País, 2010). According to the Mexican Health Department almost 10% of all Mexicans had consumed illegal drugs at least once in their lives by 2017, this are around 12.5 Million people (Encuesta Nacional de Consumo de Drogas, Alcohol y Tabaco 2016-2017. Consumo de Drogas: Prevalencias Globales, Tendencias, y Variaciones Estatales, 2017), and, told by the spokesman of the current Mexican government, Jesus Ramirez Cuevas, 2.2 Million are regular consumers, of which 230,000 are minors which not only increased since 2002 a 300% but also decreased the average initiation age to 10 years (Forbes, 2020), while the average of at-least-once in the previous year users around the world is 5.5% and an estimated 13% of those suffer from drug disorders, according to United Nations Office of Drugs and Crime, UNODC (United Nations, 2021).

The drug market in Mexico is one of the main ones around the world, is the third country with more hectares dedicated to illegal opium poppy cultivation, with an approximate on of 21,500 hectares by 2019, only under Afghanistan and Myanmar, and if that wasn’t enough, Mexico is also the third producer of cannabis herb in the globe, with more than 5,000 tons in 2017 (United Nations publication, 2021), being cannabis and opioids the most popular drugs, and the second one also considered the most harmful, this both facts presented by the UNODC in their “Global overview about Drug Demand and Supply” report.

To better understand the kind of drugs that are used in Mexico, in Figure 1 and Figure 2 it is exposed two charts about a survey performed in 2016 about drug consumption of students (Villatoro

Velazquez, 2016)

Drug consumption has become a more tolerated activity over the years, and even thou its regulated use may bring benefits to its consumer drug abuse brings with it several health problems which includes lungs, heart, kidneys, and mental diseases, usually with long term consequences. Also is common for addicts to have unsafe social contact, which includes unprotected sex and sharing syringes, activities that increase the chances of acquire HIV, Hepatitis C or other blood/sexual infections, this according to the USA National Institute of Drug Abuse (NIDA, 2022), that also mentions the risk of developing or worsening mental disorders such as anxiety, depression or schizophrenia. Besides causing self-damage to their users the rivalry of the market space and routes among drug mafias causes the violence in Mexico to rise in amount and severity, what calls out 2021, a year that registered 28 homicides for every 100,000 citizens in the country (in comparison the global rate is only 6). “Semaforo Delictivo” a Non-profit organization declares than 80% of the homicides are linked to drug trafficking (DW, 2021). It’s important to remark the snowball effect of an unregulated drug market as this violence have negative effects in the local communities that live

(13)

2 with it, transforming a developed market economy into an informal one, reducing the household’s income and the collective capacity of the nation, at the time that it increases the unemployment and inequality (Alberto Javier Iñiguez Montiel, 2019).

Figure 1: Percentage of the polled urban students that consumed a specific drug at least once. Data obtained from the Velazquez Vilatoria, Ameth Jorge, et al. El consumo de drogas en estudiantes de México: tendencias y magnitud del problema, Salud Mental 193-203. 2019 http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S0185- 33252016000400193&lng=es&tlng=es.

Figure 2: Percentage of the polled rural students that consumed a specific drug at least once. Data obtained from the Velazquez Vilatoria, Ameth Jorge, et al. El consumo de drogas en estudiantes de México: tendencias y magnitud del problema, Salud Mental 193-203. 2019 http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S0185- 33252016000400193&lng=es&tlng=es.

The few opportunities of economic development, the high levels of drug supply for internal and external markets and the complicity between cartels, dealers, and end consumers (brought by politics uncapable of differentiating between them) (Angles, 2013) and also the complicity with some parts of the government have let the market consolidate and expand rapidly, this is why nowadays drug trafficking have become a top priority in the security public and national policies of Mexico.

Before moving on, is required to understand the difference between Public Security and National Security and why drug trafficking topic is a matter of both. According to the Mexico´s 2020 National Public Security Strategy and Plan for National Development, national security is defined as the

(14)

3 immediate actions focused directly on keeping the integrity and stability of the Mexican state, being under control and supervision of the President, while the public security, in charge of Mexican three government powers, has the goal of preserving people’s rights, peace and order by researching, punishing, reintegrating and preventing crimes (Centro Nacional de Inteligencia, 2020)

The first definition of the current concept of National Security came after the failed drug war that begun in 2007 and the void of power caused by it. The previous government defined National security with a plan centered in developing the research, data management and intelligence of the security authorities to keep a stable state, but after years passed it became obvious that the main priority of the resources became financing the Army to focus on inside security missions, leaving behind the police departments what made difficult any efficient collaboration between both groups.

Nowadays the plan of the new government started by slowly removing the increasing amount of military bodies around the country back to the barracks and build a civilian force, the “National Guard”, to take care of the inside security, adapt from past intelligence organizations and create the National Intelligence Central, both institutions are supposed to become the new way to fight a solidly stablished drug market. (Manaut, 2018), (ESPINOZA & Claudia JUAREZ JAIMES, 2019)

Nowadays is not a secret the important advance of the political and economic empowering of the Army, which for almost two decades, due active combat missions in the streets and intervening with the public security, not only haven’t reduced the violence of the criminal groups and cartels (Zavala, 2021) , but have committed numerous crimes against the civilian population, from sexual abuse to execution, passing through hundreds of forced disappearances (Human Rights Watch, 2022). With the recent initiative of the president Andres Manuel Lopez Obrador to give control of the National Guard to the Army’s responsible institution and the efforts on using 80 times the resources on campaigns for drugs and criminals tracking and forbidding , (México Unido contra la Deluncuencia, 2019) than the ones invested on addiction prevention or other approaches to reduce internal demand (CIEP, ONC, 2019)a question comes up: Is the investment on Social development and public security been taken into consideration for the combat against drugs? More than what is sent to USA, does it have an impact on the actual consumption of toxic elements of the Mexican citizens?

Even thou the Army is controlled by SEDENA (National Defense Department) and their presence persist all over the country, the public security and social programs are heavily influenced by the decision makers of every state, city and municipality (of course in base of the resources they may have) therefore, it can be assumed that the drug consumption phenomena won’t be the same for every place. Performing an analysis needs to consider the space feature to differentiate a

municipality from another and consider that the expansion of a local drug market is still done mainly around physical locations/persons that try to get new customers by amplifying the dealer’s relations with other substance-users and turning the most loyal of this users or family members into new dealers that will grow the business (Angles, 2013) a mouth-to-mouth expansion within close neighbors. Under this logic Geographic Information Systems (GIS) are significant for the

comprehension of this issue and its spatial behavior, moreover in addition with the time patters, it may guide an interested part to data driven decision making.

GIS tools have been used in many cases to solve social problems, and fighting crime is one of the most popular ways, from tracking graffiti patterns in a neighborhood to discover delinquency

(15)

4 patterns all around a city. A great example would be the one done in Tempe, Arizona where they mapped the amount of opioid medication prescribed by county to discover the hotspots of the city and choose where to monitor their supply, as they analyzed the cause of deaths related to opioids and discovered most of them were caused by prescribed ones (Tempe Government, s.f.). There even exist cases where the GIS have helped to explore illegal drug consumption and related crimes, like in Baltimore, USA where they used a Space-time clustering to describe the drug activity in the city (Linton, 2014) or in China where the drug related crimes were spatially analyzed to identify clusters in the city and built a geographically weighted regression model to measure the relation between drug crimes and several urban variables (Jianguo, 2020).

Just as in the rest of the world, in Mexico the GIS tools have helped researchers to answer questions regard the drug spatial presence and its relation to a certain variable. In 2009 Carlos Vilalta began the spatial analysis of the drug crimes and consumption with his paper “Local Geography of Drug

Dealing: Patterns, Processes & Urban policies recommendation” where he detected a group of hotspots within the Cuahutemoc municipality in Mexico City using Anselin Moran’s Index (Vilalta Perdomo, 2009), after him Raul Iglesias in his research “Further than Drug Dealing: Violence, Forced Displacement and Capitalist dispossession in the Mexican northeast” explained the relation within the drug crimes in the northeast zone of Mexico, forced displacement and hydrocarbon extraction businesses also supported by Moran’s I and a Linear Regression model (Iglesias Nieto & Gaussens, 2022). Even before them, a simpler work based on Time Series trends and Central Tendency measures, of the Public Security and its relations with drug consumption was made in 2007 by Mexico’s Health Department (Secretaría de Salud, 2014).

Although all of this works have returned valuable information, and useful proposals for the

government to take in action, none of this have the analysis of the variables in Space and Time terms jointly and they measure any kind of drug crime (whether it is possession, trafficking, violent crime, etc.) to analyze the supply in specific regions and its effects in the society, while for this project it’ll be tried to achieve an understanding for not only the impact of the public resources in the fight against drugs but the independent abusive consumption/investment patterns trough space and time being the objective of this project to illustrate the usefulness of GIS analysis for decision making using it to prove and interpret the existence of spatial correlation within the variables of interest and the Mexican territory for a time period and evaluate the investment effectiveness trough measuring the correlation between distribution of drugs and budget variables, and help propose spatial targeted actions to fight drug trafficking in the region during further investigations.

This objective will be accomplished by using GIS tools to compare grouping results of Anselin Moran’s Local Index and Getis-Ord Statistics. Also, the sole time patterns will be analyzed globally and locally guided by Mann-Kendall statistics and clustered by their time series patterns and trends. After this ArcGis Local Outlier be employed to understand the space-time patterns of every cluster and entity generated, a 3D Space-Time cube will also be created and explored to enhance the comprehension of this patterns. Finally, Pearson correlation will provide information about the relation of the principal study variables: People with drug caused diseases & Federal budget expenses. The whole project will be done using municipality data of Mexico, from 2011 to 2020, to evaluate if the drug consumption and budget expenses have relevant spatial patterns in the defined space, whether clustered or hot spotted for the last 10 years, and if they are somehow correlated.

(16)

5

2. METHODOLOGY

2.1. D

ATA

The United States of Mexico has a surface of 1,972,550 𝑘𝑚2, surrounded by mountains, volcanoes, desserts, lakes, rivers and 9,000 𝑘𝑚2 of coast is populated by approximately 125,200,000 individuals, who live in one of the 32 Federal Entities (31 states and Mexico city) each one under a Governor administration, these 34 entities are divided into a total of 2,471 municipalities and more than 180,000 localities (Centro de Estudios Internacionales Gilberto Bosques, 2018), each one of them with different features.

Willing to understand drug abuse, the resources investment on fighting it and the relation between these variables, two main features were selected to represent these concepts: Patients released from Hospitals and the Budget of Expenses of the Federation (PEF) for which data was obtained, analyzed and transformed. The Hospital Releases data was generated as a free use dataset for Mexicans created by the SALUD the Mexican Health Department, a part of the executive power in charge of illness prevention and healthy habits promotion. The data was available in the official government webpage for data access datos.gob.mx (Secretaría de Salud, 2010-2020). On the other side the PEF data was obtained from the National Institution of Statistics and Geography (INEGI) which is an autonomous organism responsible of collecting and spreading information about Mexico, through their official website inegi.org.mx (INEGI, 2010-2020). All the data was obtained in csv format excel files grouped by year. The data was extracted from 2010 to 2020, as even thou the PEF information had more than the selected period, the Hospital Releases didn’t count with that many records.

The data contained several features, of the ones some weren’t useful for the project’s purpose, so to keep the data as organized and clean as possible and reduce any processing time, these features had to be excluded from the analysis. The relevant features from Hospital releases dataset, were:

- EGRESO: The date of release from the hospital, in a day/month/year format.

- DIAG_INI: This is a code for the kind of diagnosis the patient received after entering the hospital. Only drug consumption related diagnosis where selected for this dataset.

- CAUSAEXT: This is a code for the kind of cause that generated the condition of the patient.

Only drug consumption related causes where selected for this dataset.

- AFECPRIN: This is a code for the kind of symptoms that were detected in the patient during his stay in the hospital. Only drug consumption related symptoms where selected for this dataset.

- MOTEGRE: This is a number assigned based in the class of motive for the patient to be released. This could be: “Healing”, “Improvement”, “Voluntary”, “Translation”, “Death”,

“Runaway”, “Other”. “Death” category was excluded from the analysis.

- ENTIDAD: Unique id number for the federal entity. Integer formatted.

- MUNIC: Unique id number for the municipality. Integer formatted.

Each of the rows in this dataset represents a patient that accessed a hospital for drug related causes, therefore there isn’t a value column to analyze but the count aggregation of cases for each

municipality and year. This aggregation value was grouped by each municipality population and

(17)

6 multiplied by 1,000,000 to represent the number of cases per capita per million. This would be the value required for the later analytics and was named “VXP_M”.

To enrich the ailment dataset and represent overall abusive consumption, another one was obtained from the official government webpage for data access datos.gob.mx/ (SALUD, 2010-2020),

information about all the deaths in Mexico for the previously stablished period. Also filtered by diagnosis, symptoms and causes related to illegal drug consumption, the new values were added to the main Hospital releases dataset, matching the release date and municipality Id with the death reported date and municipality of death.

While for the PEF dataset, the relevant features were:

- DESCRIPCION_CATEGORIA: Cash flow category that references the pretended usage of the money.

- ANIO: Year when the money was assigned, saved in integer format.

- ID_ENTIDAD: Unique id number for the federal entity. Integer formatted.

- ID_MUNICIPIO: Unique id number for the municipality. Integer formatted.

- VALOR: Amount of budget spent for a certain municipality for a specific year, in Mexican pesos.

For this case there’s in fact a value assigned for each municipality for each year, so it was just missing to calculate a new field that considered the population of each municipality in the value, this new value was named “invxper”. For this project two dataset have been created, one that includes all the money invested by municipality (identified by the feature “DESCRIPCION_CATEGORIA” having the value “Total de ingresos”) and another one only considering the expenses on Public Security (identified by the feature “DESCRIPCION_CATEGORIA” having values related to Public Security like Equipment, Maintenance, Programs, Institutions, Materials or simply classified as for “Seguridad publica”).

In order to create this rate values the amount of population for each municipality was needed, luckily this was found as part of the geographic coordinates dataset used to match each municipality to a geographic space in ArcGIS, this data was obtained also in an excel csv format from developarts.com, an external source that assigned its correspondent latitude and longitude values for one of each of the localities declared by INEGI (DevelopArts, 2020), which supported the dataset with official information about each locality. The relevant features of this dataset are:

- Mapa: Unique id number for the federal entity, municipality, and locality. String formatted.

- Lat: Latitude value in geographic coordinates.

- Lng: Longitude value in geographic coordinates.

- Poblacion: Municipality total population during 2020. Integer format.

As a summary, “VXP_M” is formed of 27,159 values corresponding to one of each municipality for every time period year, the values go from 0 to 2,754. Regarding the budget values “invxper” also having 27,159 values, goes from $0 to $143,238 per person. From the meaning of the dataset’s values any empty or non-existing value will be treated as 0 patients for drug abuse, or 0 Mexican pesos assigned for an entity during a particular year.

(18)

7 Mexico is mainly divided in 32 federal entities, but according to INEGI all have dissimilar and varied population and surface, the biggest one covering 12.6% of the territory while the smallest one 0.09%

of it (INEGI). This doesn’t reflect the influence that the neighbors of any given entity have on them, therefore, even thou 32 states make a good enough sample size, it would be better to identify any distinctive area by performing the analysis at a municipality scale. Municipal geographic coordinates weren’t included in the Developarts dataset directly, so the belonging localities coordinates are grouped by its municipality aggregated by mean. A yearly approach was taken due to the

impossibility of getting trustful data for the budget of each municipality and due the impracticality of the hospital’s releases data, after all, even thou the exact date is present for all the cases, there would be a lot of zeros from day to day, and even still from month to month. With the scale unit defined is important to consider how spatial analysis would be optimally done, so each point is represented by its actual municipality surface, thanks to a Shapefile of Mexico’s municipalities that was introduced in the ArcGIS software and matched with their correspondent ID. This map was extracted from National Commission for Knowledge and Biodiversity Usage, CONABIO’s web page conabio.gob.mx (Comisión Nacional para el Conocimiento y Uso de la Biodiversidad, 2012).

2.2. M

ETHODS

Cluster analysis is defined by Leonard Kaufman as “the art of finding groups in data”, and this groups are formed in a way that closer objects are like their neighbors, and vice versa. (Kaufman, 1990) This is the main idea of the project, to be capable of differentiate the objects similar to a group and dissimilar to another in matters of drug abuse and budget spent by space and time features. To understand each of the variable’s patterns different kind of clustering and relational tools had been used in ArcGIS Pro 2.9 and several libraries available for Python 3.9.12. for the sake of organization, the methodology and tools employed are classified into 4 categories: Spatial Analysis, Time Analysis, Space-Time Analysis and Features Correlation.

2.2.1. Space Analysis

2.2.1.1. Spatial Autocorrelation

Before moving on with any grouping technique it is mandatory to evaluate how the values are arranged in the space, to quickly check out if clusters can be formed or data is randomly distributed.

With this on mind Global Moran’s Index for Spatial Autocorrelation (SAC) is calculated.

“Spatial autocorrelation statistics are the basic statistics for all data capable of being mapped”

(Odland, 2020) and a great algorithm for this, also one of the most popular to calculate SAC, is the Global Moran’s I which can be easily calculated using the tool Spatial Autocorrelation in ArcGIS Pro 2.9. Developed by Patrick Alfred Moran, the Global Moran’s Index examines how similar all spatial entities, and their values, are to the others trough a cross product to evaluate how dispersed or clustered the set is (Moran, 1950), evaluation that is returned by the ArcGIS tool as a Report with important information like:

• Moran Index

• Expected Index

• Z score

• P value

(19)

8

• Variance

The expected index value is compared against the Moran’s observed index (which oscillates from -1 to 1) to evaluate the null hypothesis “The attribute is randomly dispersed among the space”

generating a Z score and P-value for the whole map. The p-value is then evaluated, if its lower than .05 the null hypothesis is rejected (as the probability for it to be true is below the confidence value) and given the case it depends on a high Z score to classify the spatial distribution as clustered, if its negative it would be classified as dispersed (Environmental Systems Research Institute, 2018).

Equation 1 is the formula used to calculate de Global Moran’s I:

Equation 1: Global Moran's Index Equation

Where n is the number of regions, w is the spatial weight for two points, 𝑥𝑖 and 𝑥𝑗 are their data values for the point i and j while 𝑥̅ represents the mean for the whole sequence. In few words this formula can be vaguely read as the summation over all pair of regions divided by the variance of the data series (Moran, 1950)

The spatial autocorrelation can be calculated in many ways depending on the spatial relation of the entities, which for this project purpose, considering the extension of the country’s surface, are not considered of equal weight for every municipality because, when talking about drug market, closer entities have more influence than the further ones. Following this idea two spatial relation

conceptualization where chosen, Inverse distance and Indifference Zone. The first one is an intuitive behavior for the drug market influence as not only distance, but a partially independent type of government and social characteristics diminish further influence, while the second one includes a fixed band zone where influence is equal for all the points in it, after this it models the out-of-range entities as an inverse distance relation (ESRI, 2022-a). The band for the Indifference Zone

conceptualization will be the average distance in meters of the closest neighbor of all the entities.

The Figure 3 explains how Inverse Distance and Indifference Zone spatial relations behave.

Figure 3: Spatial Relation Conceptualiation Comparison. On the left the Inverse Distance and on the right the Zone of Indifference types of relations.

(20)

9 2.2.1.2. Spatial Clustering Algorithms

After a global cluster distribution is confirmed, the next step of the project will be to identify where the local clusters at municipality level, two main techniques will be explored: Cluster and Outlier Analysis trough Anselin Moran Local index and Hotspot Analysis trough Getis Ord Statistics, both space analytics tools in ArcGIS Pro.

Cluster and Outlier Analysis tool identifies Clusters and Outliers (Hot and Cold) using Anselin Moran Local Index, brought as a decomposed case for Global Moran’s Index by Luc Anselin. The

classification is guided by the P-value (95% Confidence Level) resultant from Local Moran calculation:

A low enough P-value is considered significantly different from the other entities, but in sum with the Z-score we can confirm if it’s a Hot/Cold spot (Positive High or Low value respectively) or an Outlier (Negative), this process is done through several permutations in which the original values are randomly rearranged to compare them against each other, ensuring the original data is clustered.

The more permutations, the more accurate (ESRI, 2022-b). The Local Anselin Moran’s I formula is the one referenced as Equation 2:

Equation 2: Local Anselin Moran's Index

Having the same variables as the Global Moran’s I Formula, this one can be read as a straight

comparison between an evaluated entity and its neighbors where a high value (whether it is positive or negative) represents a significant location. Moreover when 𝐼𝑖 has a 𝑥𝑖 over the average and a 𝑥𝑗 under it or vice versa (meaning that the neighbor’s values are opposed) 𝐼𝑖 will be negative, otherwise will be positive (cause values are supportive) (Anselin, 1995).

As in Global Moran, the conceptualization of the spatial relations must be defined, so looking for the best distribution 3 spatial relations will be considered and compared: Inverse Distance, Indifference Zone and Fixed Distance (This last one construe the entity’s influence only within a specified distance). To determine the Distance Band parameter for the presented spatial relations several spatial autocorrelation indexes were calculated for different distances using ArcGIS tool “Incremental Spatial Autocorrelation”, were a start and step of increment are defined in order to search for the optimal neighborhood size. The tool returns a line chart with z-scores that reveal those peaks where the clustering behavior is more pronounced (ESRI, 2022-c). To reduce the number of False Positives the correction method False Discovery Rate is used, it adjusts the p-value to a determined confidence level, excluding those p-values that doesn’t seem as significative (ESRI, 2022-d)

The other grouping tool used was the Optimized Hotspot Analysis that identifies any possible hot or cold spot depending on the probability of it being one whether the software is 90%, 95% or 99%

confident about its classification, based on the outcome p-value of the null hypothesis that declares spatial independence (ESRI, 2022-e). This tool is based on the Getis-Ord Gi* statistic which,

developed by Arthur Getis and J.K. Ord, measures the association of an entity with all other entities within a specific radius. The next one is the formula to calculate the statistic:

(21)

10

Equation 3: Getis-Ord Gi* Equation

Where, like Moran’s I calculation, 𝑤𝑖,𝑗 represents the spatial weight between i and j, n is the number of entities considered, 𝑥𝑗 is the value of the j in turn and S and 𝑋̅ are the standard deviation and mean of all measurements (Getis & Ord, 1992), their formulas are also presented in Equation 4 and Equation 5:

Equation 4: Formula used to calclate the mean of the entities

Equation 5: Formula used to calclate the standard deviation of the entities

Getis-Ord G* can be negative or positive, which translates into cold or hot spots. This statistic will be defined by the value of its surrounding neighbors, even more if this are closer in space. Of course the index can be diminished after a big deviation within the neighborhood values.

ArcGIS spatial analytics tool Hotspot Analysis is more than only Getis-Ord, it has a set of components capable of identifying the configuration that will result in the optimal grouping. Firstly, the amount (>30 unique entities) and variation of data is evaluated as a required characteristic for the analysis, excluding any null values in the analyzed field. Once the data is clean an Incremental Spatial Autocorrelation process begin (just as performed for the Moran’s I case), for this case the distance band, which works as a threshold for the limited space specified to revise the resultant Z-score, in the optimal case the fixed distance would be the one in which every entity have at least 1 neighbor. With these parameters given the Hotspot Analysis begins, also corrected by FDR (ESRI, 2022-e).

Both of the approaches may present discrepancies among them, to evaluate if these differences are relevant to our analysis the Pearson’s correlation is being measured among their respective Z-score absolute values, if the result is lower than 0.8 two additional measure will be taken to evaluate the best option for the project’s data spatial grouping. The first measure is to assess the amount of Z- scores greater than 2.58, which is the score for a two-tailed 99% of confidence, as the confidence level used is 95% it will be contrasted the certainty of each grouping method to classify the given entities (Sánchez-Martín, Rengifo, & Blas-Morato, 2019). In second place, a new field will be created

(22)

11 from a t-statistic test for every entity’s value (whether drug consumption or budget investment) of belonging to the main dataset, so outliers will be given a lower p-value, this filed will be then

compared via Pearson’s correlation with the resultant p-values of both grouping techniques, showing the influence of the real values on the clustering allocation. These statistics and charts will be done using statsmodel library in Python 3.9.12.

2.2.1.3. Space-time Cube

For further analysis it was mandatory to first create a Space-Time Cube by defined locations (the municipalities) in ArcGIS Pro, this is simply a 3D netCDF structure where the values of each entity on a specific year are represented in a 2D map, these maps are aggregated one over another forming a third dimension which represents time, outputting a structure like Figure 4 (with Mexico territory´s shape in this case).

Figure 4: Space-Time Cube example, signaling Latitude, Longitude and Time dimensions.

Parameters like a feature of interest, time step, location ID and aggregation method have to be specified, luckily for this case it is pretty straightforward, being this respective fields the interest value, a yearly timestep, the unique identifier for each municipality and a sum aggregation method (Which doesn’t matter as the data already has only one value for each municipality and year

combination). Some limitations considered when dealing with the dataset were a maximum allowed number of bins generated by the tool of 2,000,000 bins and a minimum of 10 time-steps, for both cases these limitations didn’t affect the project’s advance. Finally, regarding the Space-Time Cube, after its creation a block of information regarding the characteristics of the cube and overall behavior about the data forming it is displayed, which lets us validate its usefulness (ESRI, 2022-f).

2.2.2. Time Analysis

The time analysis is done in three main approaches, each one more specific than the previous, Global Trend Analysis, Entities Trend Analysis and Time Series Clustering.

(23)

12 2.2.2.1. Time Global and Local Entities Trend

In order to have a general overview of the behavior of the data of interest trough time a line chart is created for the sum of all municipality values by year. From here information about the history of the studied issue trough time can be obtained, like peaks, troughs, patterns, and trends. This general perspective will help during the breakdown of the data at a municipality level. The previously discussed Space-Time cube can be studied solely by its temporal characteristics with the Visualize Space-Time Cube in 2D tool, the Trend Visualization option makes it possible to appreciate if a given value has increased or decreased trough time thanks to the Mann-Kendall statistic, which compares the sign between every data point and its earlier step looking to deny the null hypothesis, that there doesn’t exist a monotonic trend within a given time series (ESRI, 2022-g). The Mann-Kendall formula for the sample Z-score is showed in Equation 6:

Equation 6: Man-Kendall’s score and Variance formulas

Where sgn refers to a discrete value assignment according to the sign of the 𝑥𝑗− 𝑥𝑘 calculation that can take 1, 0 or -1, as normally 𝑥𝑗 equals the entity value on j, n is the number of entities considered and 𝑡𝑝 represents the number of data values in the p group (Pacific Northwest National Laboratory, s.f.).

2.2.2.2. Time Series Clustering

Time Series aren’t just capable of being grouped by its increasing or decreasing monotonic trends, they can also be clustered by other type of patterns, but before that the time series have to be examined to understand its chances of being properly clustered and to know what kind of behavior to expect when doing it, so the time series will be tested to revise how many of the 2,471 have an stationary or “white noise” conduct.

Python has a library focused on ARIMA forecasting named pmdarima and one of its functions includes an Augmented Dickey-Fuller Test whose null hypothesis sustains that an evaluated time series has a unit root, being the alternative hypothesis that the time series is stationary, this means it has a constant mean and standard deviation in time and it is not seasonal (Prabhakaran, 2019). ADF is an augmented version of Dickey-Fuller Test which has the purpose of identifying how much of a times series value is explained by a previous value, call out to Equation 7:

(24)

13

Equation 7: Augmented Dickey-Fuller Test Formula

It can be noticed that 𝑐, 𝛽𝑡,, 𝑒𝑡 are variables that alters 𝑦𝑡, so it is for the test interest to evaluate if 𝛼=1, which nourish the null hypothesis, otherwise the ADF function will return a Boolean value of

“True” indicating the time series has a unit root. Some of the entity’s time series will be stationary and some others won´t within the same dataset, if a significant value of them are another

verification will be done, to check how much of the time series are “white noise”.

White noise is a special type of time series, formed by random numbers, what makes it unpredictable and makes it hard to even find any type of pattern. There exist some conditions that must be

satisfied in order to call a series “White noise”, these are having a mean value equal to zero, having a standard deviation that doesn’t change over time and that there is no autocorrelation with itself and its lagged version (Fuller, 1976), this is why we will take advantage of the statsmodels library in Python. A Ljung-Box test, developed by Greta M. Ljung and George Pelham Box, is a statistic that expresses the autocorrelation in a time series as Equation 8:

Equation 8: Ljung-Box Autocorrelation Test Equation

Where n is the sample size, 𝑟𝑘2 is k lagged sample autocorrelation and p is the number of analyzed lags, Q on its part is compared with a chi-distribution for h degrees of freedom to evaluate the null hypothesis that stands up for the assumption that residuals of the time series are not autocorrelated, therefore are independently distributed. A resultant p-value higher than .05 and a mean equal to 0 among the values of the series will reveal the evaluated time series as White Noise (C., 2008). ADF and LB tests will be done for every single time series of the municipalities.

If the dataset contains enough non-white-noise time series it’ll be likely to try to cluster the time series by its correlation, this is its trend to increase or decrease value in similar moments, not necessarily of the same amount, being useful to detect a known waveform in a random noise (Ali Alqahtani, 2021) , for this it’s used the Time Clustering space-time analysis tool in ArcGIS, which from the previously created Space-Time Cube, groups the time series by similar characteristics in time, and returns a 2D map for visual sighting (ESRI, 2022-h) .

Time Clustering with correlation as feature of interest calculates the difference between each pair of spatial locations by calculating their statistical correlation and subtracting it from 1 then summarizes it in a dissimilarity matrix which will guide the K-medoids algorithm or PAM (Partition Around medoids). The next is the step-by-step description of how the K-medoids algorithm works (Mirkes, 2021):

1) Initialize by randomly selecting one of the localizations as the medoid k.

(25)

14 2) Associate each time-series with the closest medoid (using the diss-matrix)

3) For each medoid m select one associated non-medoid o and compute the cost of this being a new medoid (by adding the dissimilarities of each other o point to be associated with the medoid). If the cost is lower substitute o as the new m.

4) Repeat steps 2 and 3 until convergence.

Another option for clustering that is included in ArcGIS is the grouping by Fourier recurring patterns and it will be used if most of the time series of the dataset aren’t stationary. Fourier detects the predominant signals in a time series which can also be described as a variety of sin and cosine functions and each one of these has a weight over the others, making them the characteristic of interest for the clustering as time series formed by similar dominant functions will be considered similar to each other (ESRI, 2022-i).

This method uses as clustering algorithm K-means instead of K-medoids, which are similar but with important variations (ESRI, 2022-i):

1) Initialize by randomly selecting one of the localizations as the centroid k.

2) Associate each time-series with the closest k. The difference among time series is calculated by summing up the squared differences of their weighted functions.

3) For each centroid the mean of all-time series associated with it is calculated, this being the average of the weights of the basic function that forms it.

4) Repeat steps 2 and 3 until convergence.

For these both processes to work an optimal number of clusters must be selected, as part of ArcGIS Time Clustering tools, the optimal number of clusters will be evaluated using the Variance Ratio Criterion which, brought by T. Calinski & J. Harabasz, describes the ratio of between cluster variance and within cluster variance (T. Caliński, 2011). Equation 9 is the formula for the variation calculation.

Equation 9: Variance Ratio Criterion

Knowing that the cluster of interest is i, n the number of observations, m the centroid, x the average of the weighted base functions and k the number of clusters for the current trial is easy to

understand that the VRC indicates how similar time series are to the ones inside its cluster and different from the ones outside it, therefore the optimal number of clusters are the ones that maximize the VRC. N represents the number of all observations (T. Caliński, 2011). The output of the VCR is not only the optimal number of clusters for the dataset, but it too provides the F-statistic result for all the other options from 0 to 10 clusters, running for each one of them 10 times to reduce the effects of random initialization.

(26)

15 The outputs of this Tool are very useful, in a 2D map the clustering is easily appreciated and thanks to the time series chart pop up after selecting any entity it can also be visually noticed the similarities between the clusters and its average through time, therewith charts to look at the average time series per cluster helps to identify the different characteristics of the clusters (ESRI, 2022-i).

2.2.3. Space-Time Analysis

When looking at the data with the time or space perspective it permits noticing individual patterns within the data but is impossible to accept that these two variables aren’t related to each other, hence with the purpose of having a full-analysis of the features of interest data must be seen also in the context of space and time. ArcGIS does count with tools specially designed for space-time analysis, for the matter of this project the employed tool is named Local Outlier Analysis.

Local Outlier Analysis identifies local significant Clusters and Outliers considering the values of the bins in a given Space-Time Cube using Anselin Moran Local Index for a 95% confidence with FDR correction, similar to the one explained before for each time-step but now the neighborhood includes also the time dimension, therefore new bins from the 3D Neighborhood. The neighborhood is defined by two parameters required by the tool, Distance neighborhood and Neighborhood Time Step (ESRI, 2022-j), alike the past shown Figure 4:

To define the project’s neighborhood time-step it is used only 1 year before and after the evaluated year, but for the spatial distance it’ll be used and compared the neighborhood distance obtained from the Incremental Spatial Autocorrelation tool to the distance provided by an adaptation of Silverman’s Kernel Density (Equation 10):

Equation 10: SearchRadius equation (an adaptation of Silverman's Kernel Density Formula)

This altered form of the formula compares the results of the Standard Distance and median Distance 𝐷𝑚 calculation and choose the lower one. For the Standard Distance calculation 𝑥𝑖 and 𝑦𝑖are the coordinates of I entity while 𝑋̅ and 𝑌̅ represents the center of the entities. n is the total number of entities (ESRI, 2021). The tool returns a 2D map marking the type of cluster each entity is identified as, these can be one among 6 different classifications: H-H (High bin surrounded by High neighbors), L-L (Low bin surrounded by low neighbors), H-L (High bin surrounded by low neighbors) and L-H (Low bin surrounded by high neighbors), Multiple types (Has more than one of the previous 4 types in the time period) and Never Significant, which are explained in the Appendix A (ESRI, 2022-j). Local Moran's Index is not calculated for the bins in the first time slice. These results are added as a new variable to the Space-Time Cube and can be analyzed one by one when visualizing the 3D

representation of the cube, what lets the user, among many other things, identify how the Multiple Types clusters truly behave.

It was already exposed how every entity for every time step is classified as a specific type of cluster, and with the 3D map it can be noticed patterns/conducts of the bins of each municipality, these patterns will alternatively be analyzed and classified following the descritpions on Appendix B.

𝑆𝑒𝑎𝑟𝑐ℎ𝑅𝑎𝑑𝑖𝑢𝑠 ∗ = 0.9∗min (𝑥𝑖− 𝑋)̅̅̅2

𝑛 + (𝑦𝑖− 𝑌)̅̅̅2

𝑛 , 1

ln⁡(2)∗ 𝐷𝑚 ∗ 𝑛−0.2

(27)

16 2.2.4. Correlation

After all the tools used, a lead about the relation between the variables may or not have appeared, but to truly conclude if this association exists or not within the two variables, hospital releases and budget expenses (the entire budget and the labeled for security), correlation must be calculated.

Pearson r and Spearman p are the most used methods to calculate bivariate correlation among quantitative variables, but to be able to use it several assumptions should be made/proven. For the Pearson’s case the data should have a normal-like distribution, a sample size greater than 30 and assume linearity (Zaid, 2015). The formula for Pearson correlation is Equation 11:

Equation 11: Pearson r correlation index Formula

For x̅ and y̅ being the mean for each variable analyzed and x𝑖 and y𝑖 the values of the variables in i repetition. The resultant correlation of Pearson is evaluated from -1 to 1, being -1 an inverse perfect association, 1 a complete direct association and 0 a null association. To evaluate normality a

Kolmogorov-Smirnov test will be used in Python with scipy library, which compares two distribution and evaluate their similarities.

In the other hand Spearman’s rank correlation p doesn’t need to confirm any assumptions about the distribution of the data nor the linearity of it, so it’s a good alternative when Pearson’s requirements cannot be fulfilled or as a validation score (Zaid, 2015). The formula is exposed as Equation 12:

Equation 12: Spearman's r correlation index Formula

Also evaluated from -1 to 1, d𝑖 is the difference of the ordinal ranks of the i repetition, and n is the size of the sample.

The patterns found within and among the variables will be exposed in the next sections to gather conclusions about the behavior of the studied variables in Mexico.

𝑟= (x𝑖 − x̅)(y𝑖 − y̅) (x𝑖 − x̅)2 (y𝑖 − y̅ )2

𝜌= 1− 6 𝑑𝑖2 𝑛(𝑛2−1)

(28)

17

3. RESULTS AND DISCUSSION

3.1. S

PACE

A

NALYSIS

The Results of the Global Spatial Autocorrelation report detected a global Cluster behavior among the entities, for both inverse Distance and Zone of Indifference types of Spatial Relations, as the Figure 5 shows. This test returns similar results for both the Hospital releases and deaths and the Federal budget expenses variables.

As a not random spatial distribution was proven the Clustering tools are analyzed and compared, in order to begin the spatial analysis. For this both grouping algorithms, different parameters were also compared as in Table 1.

Table 1: Cluster and Outlier Analysis parameters comparisson.

Spatial Relation Conceptualization Parameters

Consumption Variables

OA Designated Federal Budget

Security class DFB

IDW (Spatial Lag corr.) 0.419 0.532 0.637

ZOI (Spatial Lag corr.) 0.378 0.414 0.512

Fixed Dist. (Spatial Lag corr.) 0.378 0.414 0.512 Optimal Neigh. Distance 230,000 195,000 215,000

Tools Correlation 81.80% 84.70% 84.5%

Used tool C&O with LM C&O with LM C&O with LM

Spatial Relation Conceptualizations were compared not only visually but by evaluating its Correlation with the Spatial Lag of each entity with its neighbors, with the purpose of revealing the influence of these to an evaluated entity against the influence of the entity value itself. For the Local Moran´s

Right hand Z-score Low P-value

Figure 5: Spatial Autocorrelation Report for IDW for Drug Abuse variables. The Report indicates significantly clustered neighbors.

(29)

18 Index algorithm the results of each run where similar, being the Inverse Distance conceptualization the one that stood out over the Fixed distance and Zone of Indifference ones by few. Now for the G*

Index it wasn’t possible to compare the Spatial Lag, but the results of IDW conceptualization didn’t return a useful clustering, therefore it was selected the Zone of Indifference for all cases.

As the influence of each variable may differ from one to another, for all the cases the Neighborhood Distance parameter was selected using an ISAC evaluation, determining the optimal distance between 275 and 300 km for Drug Abuse variables case, 195 and 230 km for the Overall Budget Expenses (OABE) and 215 to 275 for the adjusted Security Budget Expenses (SBE), according to Z- score of the test. It was selected the lower limit for further calculations. Even though higher Z-scores were returned for higher distances, these weren’t as optimal regarding distance increase or covered an unrealistic surface of the Mexican territory, so it was chosen the most conservative approach.

When comparing LM and G* tool´s results it was found a Pearson´s correlation over 0.81 among these, a high number considering the Spatial Conceptualizations used are different (IDW for LM index and Fixed distance for G*, when used the same the correlation is substantially increased), so it won’t be significantly different to select one over another. Whatever, the influence of the variables of interest of each municipality were measured by comparing the P-values of each one against the chances of it to not belong to that variable sample (a t-test was used for this purpose), and the results enforced the characteristic of G* of not being influenced by the value of interest of the evaluated entity itself, as expected from the Index formula calculation. So, for this project´s grouping purposes it will be used the ArcGIS tool that relies on LM Index: Cluster and Outlier Analysis. In addition, this tool classifies the clusters/outliers in 4 different significant categories, which provides a better understanding of each one of them.

For the resultant spatial clusters, the variables of interest by municipality were aggregated for the 11 years evaluated. Is important to remark that there were identified different number of clusters for each analyzed variable. For the hospital releases and deaths, 11 relevant clusters were found, this can be observed in the Figure 6 and its information explored in Table 2. To support the interpretation of the cluster’s spatial localization a map of the later mentioned Mexican states by name is included as the Annex 1.

The clusters are labeled by its significance based on the average P-value obtained for every municipality forming them, being the most significant the one in Baja California Sur – South, specifically for the city “Los Cabos” with the lowest P-value, 0.003. The second cluster is formed by some municipalities among Baja California, Sonora and Chihuahua, which has most of its High-value entities in the center and west of the region, some others exist in the East side, but is particularly denoted by Low-High entities, representing an important drop of the hospital releases and deaths considering the values of their neighbors. The cluster number 3 is the biggest cluster, formed by 1389 Low and High-Low entities, these High-Low entities aren’t clustered in a relevant manner, but are worth the attention.

(30)

19

Table 2: Drug Abuse Cases Information Table

The number 4 is located in the North of Baja California Sur, formed only by 3 High clusters, and with a P-value of 0.0056. The fifth cluster is in Mexico´s center, formed by Aguascalientes in the center and its surroundings, like the South of Zacatecas, Northwest of Jalisco, Northeast of Guadalajara, etc. It is formed by 96 High and Low-High entities, most of the High ones are concentrated in the middle of

Label Location Description

Clusters value

Full value index avg

P- value

N

Neighbors 1 Los Cabos

Single HH 1184.21 1184.21 0.003 1

2 BC, Sonora y Chihuahua-UP

Big HH with LH and limited by Chihuahua-

NW 1746.11 1246.67 0.004 91

3 Center and

Tehuantepec Isthmus Biggest LL with HL 79.02 162.93 0.005 1389 4 BCS-North

Small HH 720.67 720.67 0.006 3

5

Aguascalientes, GDL- N, ZCS-S, JAL-W &

SL-UP

Big HH with LH Limited by Surround

areas 872.08 546.86 0.007 96

6 CDMX + Neza

Small HH in the

middle of C9 1967.98 1967.98 0.011 10

7 Los Herrera &

Melchor Ocampo Small HH 826.35 826.35 0.018 2

8 Yucatan-N

HH with LH 861.47 561.86 0.02 36

9 Coahuila-NW LL 157.538 172.05 0.021 10

10

Colima, Villa de Alvarez &

Coahuayana Small HH 681.38 681.38 0.023 4

11 Oaxaca

Single HH 719.94 719.94 0.033 1

4

2

9

7

5

10 3 1

8 6

11

Figure 6: Clusters and Outliers of the Drug Abuse cases per million people in Mexico from 2010 to 2020. Generated by ArcGIS Pro, IDW as spatial relations conceptualization. Colors symbology according Appendix A.

(31)

20 the cluster, which is surrounded from the west and partially from the east by Low-High class

municipalities. Cluster 8 can be localized in the north of Yucatan, at the way East of the country, it is formed by a dispersed group of High and Low-High entities and has a P-value of 0.0204. The nineth cluster is formed in the Northwest of Coahuila, by 9 only Low entities. For the case of clusters 6, 7, 10 and 11 these are real small isolated High clusters, with no more than 4 entities each.

All clusters are worth to study, but the top five, that have an average P-value smaller than 0.01, are certainly more significant. It would be correct to point out the cluster values for C-1, C-2, C-6 and C-3 because of their extreme Hospital releases per million people indexes, but also C-2, C-3 and C-5 because of the importance of the Outliers that form the cluster, what enhance the idea that those clusters are somehow contained.

It can be observed in the Moran´s Scatterplot (Figure 7) the distribution and correlation of Z-scores and Spatial Lag, which is positive. The Z-scores go from -0.5 to 18.8, having a lot of variety in the High-High quadrant. Moran´s I is represented by the slope, which is 0.18.

For the Overall Budget Expenses data, the Cluster and Outlier Analysis tool also divided the map in 11 clusters exposed in the Figure 8 and detailed in the Table 3.

The clusters are labeled by its significance based on the average P-value obtained for every

municipality forming them. The first and more significant with a P-value of 0.0024 is also the largest Low cluster, it is formed by 971 municipalities from 11 states (Guerrero, Mexico, Hidalgo, Veracruz, Tabasco, Chiapas, Puebla, Morelos, Tlaxcala, Mexico City, and the North of Oaxaca) by both Low and some High-Low entities. Even though not as big, this low values of OABE partially coincide with those in cluster 3 of the Drug Consumption variables map. The second cluster, located in the Northeast of the country is composed by Nuevo Leon, Tamaulipas and Coahuila, and it is a high value cluster surrounded by Low-High entities with its center in Nuevo Leon, it has an average P-value of 0.0026. It may be important to notice that the biggest L-H outliers, located in the Northwest of Coahuila match with the C-9 of the Drug abuse variables map. The cluster number 3 is located in the Northwest of the country, and it extends over 3 states. Is a high cluster with a clear division between its High and Low-High entities.

Low- Low Low- High

High- High- High

Figure 7: Moran Scatterplot for Drug Abuse cases per million people in Mexico from 2010 to 2020 Cluster and Outlier Analysis. Shows a comparisson of positive and negative Z-scores and Spatial lags.

Referências

Documentos relacionados

Por outro lado, um dado curioso, foi possível também identificar um equipamento de videovigilância com o porto 21/tcp (tcp) aberto. Uma vez que o 3PNIF pode ser reproduzido em

At the firm level, previous research suggests that the magnitude of the effect of the implementation of capital controls outweighs the impact of unforeseen

Este artigo discute o filme Voar é com os pássaros (1971) do diretor norte-americano Robert Altman fazendo uma reflexão sobre as confluências entre as inovações da geração de

This log must identify the roles of any sub-investigator and the person(s) who will be delegated other study- related tasks; such as CRF/EDC entry. Any changes to

The probability of attending school four our group of interest in this region increased by 6.5 percentage points after the expansion of the Bolsa Família program in 2007 and

Para tanto, tem-se por objetivo analisar as ações empreendidas pelo poder público municipal no âmbito das inundações e alagamentos na área de abrangência do Programa

Ao longo do estágio foram aplicados vários conhecimentos adquiridos na Licenciatura de Tecnologias de Informação e Multimédia, como usabilidade, interface pessoa-máquina, análise

As atividades relacionadas à integração da ergonomia na fase de projeto infor- macional são provenientes das etapas de compreender o contexto de uso e espe- cificar requisitos