• Nenhum resultado encontrado

Origin, Diversity and Selection Patterns of South American Chickens

N/A
N/A
Protected

Academic year: 2021

Share "Origin, Diversity and Selection Patterns of South American Chickens"

Copied!
185
0
0

Texto

(1)

D

Origin, Diversity and

Selection Patterns of

South American

Chickens

Agusto R. Luzuriaga Neira

Programa Doutoral em Biodiversidade, Genética e Evolução

Departamento de Biologia

2018

Orientador

Albano Beja-Pereira Investigador Principal

(2)
(3)

Nota prévia

Na elaboração desta tese, e nos termos do número 2 do Artigo 4º do Regulamento Geral dos Terceiros Ciclos de Estudos da Universidade do Porto e do Artigo 31º do D.L. 74/2006, de 24 de Março, com a nova redação introduzida pelo D.L. 230/2009, de 14 de Setembro, foi efetuado o aproveitamento total de um conjunto coerente de trabalhos de investigação já publicados ou submetidos para publicação em revistas internacionais indexadas e com arbitragem científica, os quais integram alguns dos capítulos da presente tese. Tendo em conta que os referidos trabalhos foram realizados com a colaboração de outros autores, o candidato esclarece que, em todos eles, participou ativamente na sua conceção, na obtenção e análise de dados, e discussão de resultados, bem como na elaboração da sua forma publicada.

(4)
(5)

Acknowledgements

The completion of this research could not be possible without the expertise of the following researchers:

Albano Beja-Pereira and Lucía Pérez-Pardal from CIBO-InBIO, Universidade do Porto. Michael Miller, Sean O'Rourke and İsmail Sağlam from Department of Animal of the University of California Davis.

Alejandro Sánchez-Gracia from the Department of Genetics of the Universitat de Barcelona.

I would like to express my special acknowledgment to the academic staff of the Population Genetics School at the University Veterinary Medicine Vienna for share with me their significant knowledge in the Population Genetics field.

To my former co-workers Gustavo Villacís Rivas, Galo Escudero Sánchez and Freddy Cueva Castillo from the Universidad Nacional de Loja.

To my former lab colleagues Rita, Vânia, Nasser and Ricardo. In addition, a thank you to all the farmers that facilitated the sampling of their animals, especially to the AMIBA Portugal and ASOGAL-Loja which provided the samples from local breed Portuguese breeds and gamefowls.

(6)
(7)

A galinha doméstica é provavelmente a fonte de proteína mais importante de todas as espécies de pecuária. Após ter sido domesticada no Sul e Sudoeste Asiático, esta espécie espalhou-se por todo o mundo e está hoje presente em todo o planeta. Apesar da sua importância na economia doméstica das famílias Sul Americanas das zonas rurais, pouco é conhecido acerca das populações locais de galinha. Num mundo em que a conservação da agro-biodiversidade é considerada um dos principais desafios, ainda nada se sabe sobre a diversidade genética e as origens das galinhas da América do Sul.

A origem das primeiras galinhas domesticadas na América do Sul (SA) é considerada por muitos como um enigma difícil de resolver e tem sido tema de intenso debate. Duas hipóteses foram propostas relativamente à origem da galinha doméstica na América do Sul. A primeira, baseada em descobertas de arqueologia e genética humana, sugere que as galinhas foram trazidas para a América do Sul por povos Polinésios, ainda antes da chegada de Colombo; a segunda, sugere que as galinhas foram primeiro introduzidas na região por colonos originários da Península Ibérica, que entraram na região após a chegada de Colombo.

Neste trabalho, pretendeu-se contribuir o conhecimento das origens das galinhas da AS, através da reconstrução da sua recente história evolutiva. Para tal, foram produzidos e analisados dados de sequenciação dos genomas mitocondrial e nuclear de seis populações representativas das galinhas locais da América do Sul. Para fins comparativos foram ainda utilizados dados genómicos de outras populações, tais como populações representantes dos centros de domesticação da galinha (Sul e Sudeste Asiático), bem como das populações a partir das quais se pensa descenderem as primeiras galinhas da AS (Península Ibérica, Ilhas do Pacifico Sul).

Um vasto número de estudos sobre a diversidade do genoma mitocondrial da galinha foram feitos na última década, veio demonstrar que alguns haplótipos mitocondriais estão associados com determinadas regiões geográficas. Por esta razão, o presente estudo iniciou-se com a sequenciação de um fragmento de 837-bp da região de controle do genoma mitocondrial, das populações da AS e das duas populações de origem (Península Ibérica e Ilha da Páscoa). A analise das sequências demonstrou a presença dos haplogrupos mais cosmopolitas – A, B, E – na América do Sul e Península Ibérica. Tendo o haplogrupo D, característico da região do Sul do Pacífico, sido apenas observado na população originária da Ilha da Páscoa. Uma análise mais profunda da diversidade interna dos haplogrupos, revelou um total de nove sub-haplogrupos nas

(8)

populações analisadas. Destes, a presença do sub-haplogrupo Ea1(b) nas populações América do Sul, que anteriormente apenas tinha sido observado em populações do Este Asiático, sugere a possibilidade as atuais populações de galinhas do AS, terem origem noutras populações alem da Península Ibérica.

Posteriormente, procedeu-se a analise de dados do genoma nuclear para a caracterização genética das galinhas locais das populações da AS. Através da técnica de sequenciação de ADN das zonas de restrição (RADseq), foi obtido uma base de dados com 122801 bases polimórficas (SNPs) distribuídas ao longo do genoma. Também neste estudo, foram adicionados dados de outras populações raças comerciais de galinhas poedeiras e de carne, assim como raças de combate. A análise destes SNPs demonstrou níveis elevados de diversidade genética nas populações da AS e indicaram a origem desta diversidade como das consequências dos múltiplos eventos de miscigenação entre populações locais e comerciais que ocorreram ao longo do tempo, mais especificamente nos últimos 80 anos. No entanto, foi bastante fraco o sinal de miscigenação detetado entre os indivíduos de populações locais da AS e os da população de combate. Isto, e a grande diferenciação genética entre ambas as populações indica que os casos de fluxo genético têm sido esporádicos e quase sempre unidirecionais – de animais de combate para animais das populações locais. As análises de miscigenação também demonstraram a existência de casos de miscigenação com uma população desconhecida (não incluída nos nossos dados) que contribuiu para a atual composição genética das populações de galinhas da AS. A contribuição desta população desconhecida foi também encontrada na população analisada da Ilha da Páscoa.

As relações genéticas entre as populações analisadas foram exploradas através da incorporação de indivíduos representantes dos centros de origem (China e Índia), com a inclusão de dados genómicos do ancestral selvagem da galinha (junglefowl), e ainda pela inclusão do mesmo tipo de dados de populações da região do Pacífico Sul e do Índico. Entre os principais resultados destaca-se a divisão de todas estas populações em seis grupos. O grupo mais basal da arvore de relações genéticas é constituído pelo ancestral selvagem provenientes das populações chinesa e Indonésia, assim como de galos de combate da china. A inclusão de galinhas domésticas oriundas de regiões aonde a espécie selvagem ainda se encontra presente (Indonésia, Bangladesh e China) neste grupo, sugere que estes sejam potenciais híbridos e demonstra a ocorrência de trocas genéticas entre indivíduos selvagens e domésticos nestas regiões. Outro grupo

(9)

bastante próximo do anterior, é construído por indivíduos de várias regiões da Papua Nova Guiné, Madagáscar, Bolívia, e Brasil. A distribuição cosmopolita dos indivíduos que constituem este grupo, juntamente com a tradição de lutas de galos nestes locais, pode ser indicativa que este grupo representa uma linhagem de galos de combate com origem num ancestral selvagem da China. Um terceiro grupo é composto maioritariamente por galinhas da China e está na base de outro grupo composto por alguns indivíduos da China, Península Ibérica, e de raças comerciais de carne. Isto, demonstra a existência de uma clara associação entre raças comerciais de carne e as galinhas de raças locais da China, suportando que estas raças de carne têm origem na China. Este grupo é seguido de outro que se compõe por galinhas de raças comerciais poedeiras e de vários indivíduos da América do Sul. Estes dois grupos que ocupam uma posição intermédia entre os grupos onde se inclui a maioria das aves da China e da India (sexto grupo), e que alojam os indivíduos dos dois tipos de raças comerciais – carne e ovos – representam a influência que as galinhas comerciais têm tido nas populações locais de diferentes regiões. Finalmente, o sexto grupo é o mais distanciado geneticamente do grupo do ancestral selvagem e contem indivíduos da Índia, Pacífico Sul e AS (incluindo os galos de combate). A elevada diferenciação face as galinhas com origem na china e a inclusão das galinhas Indianas, indicam este grupo como o representante da domesticação que se pensa ter ocorrido no vale do Indo. Mais, a inclusão dos galos de combate da AS neste grupo sugere que apesar de terem a mesma finalidade, i.e., luta, esta e a população de combate da china não partilham a mesma origem. Por fim, a posição da população do Sul do Pacífico neste clade pode estar associada à religião, nomeadamente à expansão do Budismo e do Hinduísmo pelas ilhas do Pacífico, que teve lugar na primeira metade do primeiro milénio da nossa era. Uma vez que as populações analisadas são representativas não só de uma grande diversidade ambiental mas também dos diferentes tipos objetivos de seleção (comportamento, ovos, carne), os dados genómicos foram também utilizados para a deteção das marcas genéticas de seleção. Utilizando um método de analise baseado na inferência bayesiana, foram identificados um total de 892 loci que desviavam significativamente da neutralidade (i.e., outliers). Tendo em conta em geral todo o tipo de galinhas destinada à alimentação (ovos e carne) têm sido selecionadas contra comportamento agressivo ao contrário do que acontece para as destinadas ao combate, focamos a nossa atenção nos loci previamente identificados que estivessem localizados em regiões do genoma previamente associadas a este tipo de características. Foram então escolhidos dois SNPs localizados na região intergénica do Dopamine 170

(10)

Receptor 2 (DRD2) e o Ankyrin Repeat and Kinase Domain Containing 1 (ANKK1), para

análise mais aprofundada. Utilizando o método population branch statistics (PBS), foi confirmado que estes dois SNPs estiveram sujeitos a forte pressão seletiva (positiva) nos galos de combate. Assim sendo, estudos mais profundos sobre esta região podem revelar informação relevante para a compreensão da arquitetura genómica dos fenótipos relacionados com o comportamento agressivo na galinha. Estes resultados também demonstram a utilidade RADseq para detetar loci sob seleção.

De forma resumida, este estudo permitiu demonstrar que (i) as populações atuais de galinhas da América do Sul são o resultado de diferentes eventos de miscigenação entre indivíduos de diferentes origens, sobretudo de raças comerciais, (ii) a origem Indo-Europeia das populações da América do Sul foi confirmada pela proximidade genética com a galinha da Índia, (iii) as raças comercias de carne tiveram a sua origem na China, (iv) que devido à sua elevada miscigenação com as atuais galinhas da Península Ibérica, impedem a classificação destas ultimas como representantes das galinhas introduzidas pelos colonos Ibéricos na AS e (v) a contribuição do Sul do Pacífico para a composição genética das galinhas da América do Sul não pode ser descartada mas requer um estudo mais aprofundado.

Palavras-chave: Gallus gallus, ave de caça, galinha selvagem, RADseq, diversidade genética, galinhas locais, assinaturas de seleção, miscigenação, modelação demográfica

(11)

Summary

Chicken is probably the most important source of protein from all the livestock species. After being domesticated in South and Southeast Asia, this species spread across the world and is now present in a wide range of different environments. Despite its importance for the household economy of South American families from rural areas, very little is known about the local chicken populations. In a world where the conservation of the agrobiodiversity is considered one of the main challenges, nothing is known about the genetic diversity and origins of local chicken populations from South America. The origin of the first domestic chicken in South America (SA) is considered by many as a difficult riddle to solve and the subject of hot debates. Two hypotheses have been proposed about the origin of domestic chicken in South America: One, based on archaeological and human genetic findings, suggests that chickens were brought to South America by Polynesians before Colombo’s arrival; and the other, suggests that chickens were firstly introduced in this region by the Iberian colonizers that entered the American continent after Columbus.

In this work, mitochondrial and genome-wide sequencing data from local SA chickens were used to characterize six local chicken populations representing South America and reconstruct their evolutionary history to infer their origin. Besides these local South American populations, also other populations representing its potential source, and the centres of origin of the chicken were assessed in this work.

A plethora of studies on the diversity of mitochondrial genome diversity made in the last decade has demonstrated that some haplotypes are associated with certain geographic regions. For this reason, we have initiated this work by sequencing an 837-bp fragment from the control region of the mitochondrial chicken genome from five local SA chicken populations and two potential source populations (the Iberian Peninsula and Easter Island). The sequencing analysis revealed the presence of the most cosmopolitan haplogroups A, B, E in SA and Iberian Peninsula populations. The haplogroup D, characteristic of the South Pacific region, was only observed in the Easter Island population. When analyzing the internal diversity of the haplogroups, nine sub-haplogroups were identified in all the analyzed populations. From these, the presence of the sub-haplogroup Ea1(b) in South American populations, which was only previously observed in East Asian populations opens the possibility of other sources other than the Iberian Peninsula at the origin of the extant SA populations.

(12)

The genetic characterization of local South American populations was further analyzed using genome-wide sequencing information. The restriction site associated DNA sequencing (RADseq) was used to genotype a set 122,801 SNPs distributed across the genome. In this work, other populations were added for comparison proposes – egg layer and broiler commercial chicken populations and gamefowl. The analysis of this set of SNPs revealed high levels of genetic diversity and pointed the origin of this diversity as the consequence of multiple admixture events between those local populations and the commercial egg layer population. Although, it was detected the relatively weak sign of admixture between local chicken and gamefowl, the large genetic differentiation of the latter population from all the others, indicate that gene-flow events were sporadically and mostly unidirectional – from gamefowl to local breeds. Furthermore, the admixture analysis revealed the existence of an unknown admixture source contributing to the current SA chicken gene pool that is also associated with the Easter Island population. Genetic relationships between the analyzed populations were further extended by the inclusion of data from individuals representing the domestication centers (China and India), the domestic chicken wild relative (junglefowl), as well as data from populations located across the Indian and South Pacific regions. Among the main finding, was the division of all these populations in six major clades. A basal clade was constituted by the Chinese junglefowl as well as by individuals from places where the junglefowl still exists Indonesia, Bangladesh, and Chinese population of gamefowl. Although the latter individuals were identified as domesticated chicken, its presence in the junglefowl clade points for hybrid origins and subsequently demonstrates the occurrence of ongoing gene flow between wild and domestic chicken in those regions. The very first clade after the previous is constituted by individuals from several regions of Papua New Guinea, Madagascar, Bolivia, and Brazil. The cosmopolitan distribution of the individuals that constitute this clade might and the tradition of cockfighting in this places might indicate that these individuals may represent a gamefowl clade with its origins associated to the Chinese junglefowl. A third clade is mostly composed of Chinese chicken and is at the base of another clade composed by Chinese, broilers, and Iberian Peninsula chicken. Therefore, the clear association of the broilers in the same clade of the Chinese local provides support for a Chinese origin of the commercial broiler breeds. The remaining two clades in which one harbors the commercial egg-layers and several SA individuals and the other, the more distanced from the basal clade of the Chinese junglefowl, is constituted, Indian, South Pacific island and, SA chicken, notably the SA gamefowl. While the first of these last two clades represent the influence of the modern commercial

(13)

egg layers in the local populations from different regions, the second clade, indicates that the Indian chicken represent an independent origin of this clade that might be associated to the Indus valley as one of the potential centers of domestication of the chicken. The relatively close positioning of the SA gamefowl to the Indian chicken and very distanced from the Chinese gamefowl, clearly indicates this type of chicken do not share the same origins. Finally, the positioning of South Pacific chicken in this clade may be associated to the Buddhism and Hinduism expansion through the Pacific islands that took place in the first half of the first millennium C.E.

Our genome-wide dataset was also used for detecting selection signatures as our samples represent a wide-range of environments and breeding objectives (behavior, eggs, meat). A total of 892 loci outlier (SNP putatively in regions under selection) were detected using a Bayesian approach. As we have in our dataset information from a population that is well known for being selected for aggressive behavior (gamefowl), a trait that is negatively selected in the other breeding objectives (eggs and meat) we focused our attention on the outliers that were located in regions of the genome that were previously associated with this type of traits. Two SNPs located in the intergenic region between the Dopamine 170 Receptor 2 (DRD2) and the Ankyrin Repeat and Kinase Domain Containing 1 (ANKK1) genes were chosen for further testing. Using the population branch statistic method (PBS), we have confirmed that these two SNPs have been under strong positive selection in the gamefowl and further studies involving this region may be of great interest to understand the genetic architecture of the aggressive behavior in chicken. More, this finding also provides a good example of the usefulness of RADseq data to detect loci under selection.

In resume, our genome-wide dataset permitted to verify that (i) the extant SA chicken populations resulted from different admixture events with individuals from different origins, (ii) the Indo-European origin of the SA chicken is confirmed by the genetically close proximity of the Indian chicken, (iii) the modern commercial broiler breeds have originated in china and its large admixture with the extant Iberian Peninsula population no longer qualify the modern Iberian chickens as the source for the SA chicken; and finally, (iv) the contribution of the South Pacific population for the SA chicken genepool cannot be discarded and requires further information.

Keywords: Gallus gallus, gamefowl, junglefowl, RADseq, genetic diversity, local chicken, selection signatures, admixture, demographic modelling

(14)
(15)

Contents

Acknowledgements ... iii

Resumo ... i

Summary ... v

Chapter 1 : General Introduction ... 15

1.1. Origin, Domestication and, Dispersal of the Domestic Chicken ... 16

1.2. Importance of Local Chicken Breeds ... 19

1.3. Diversity and phylogeographic studies ... 20

1.3.1. Mitochondrial DNA ... 21

1.3.2. The nuclear genome ... 23

1.3.3. Restriction Site-associated DNA Sequencing (RADseq) ... 25

1.4. Population Genetics inference using NGS data ... 27

1.4.1. Population structure ... 28

1.4.2. Demography ... 29

1.4.3. Selection ... 31

1.5. References ... 33

Chapter 2 : Objectives and Thesis Structure ... 47

2.1. Rationale ... 48

2.2. Hypothesis Testing ... 48

2.3. Detailed objectives and thesis structure ... 49

Chapter 3 : Genetic Diversity ... 51

3.1. Article I. On the origins and genetic diversity of South American chickens: One step closer ... 52

Summary ... 52

3.2. Article II. South American chicken populations are a melting pot of genomic diversity ... 65

Abstract ... 65

Introduction ... 65

Materials and Methods ... 67

Results... 71

Discussion ... 76

References ... 80

Chapter 4 : Origin and Demographic History ... 87

4.1. Article III. The many origins of South American chickens ... 88

Abstract ... 88

Introduction ... 89

Materials and Methods ... 90

(16)

References ... 100

Chapter 5 : Detection of Selection Signatures ... 104

5.1. Article IV. Genome scan for selection in South America chicken reveals region under selection associated with aggressiveness ... 105

Abstract ... 105

Introduction ... 105

Materials and Methods ... 107

Results and Discussion ... 108

References ... 113

Chapter 6 : General Discussion ... 117

6.1. Overview ... 118

6.2. Genetic Diversity of South American chickens ... 118

6.3. Demographic history of the South American chicken population ... 125

6.4. Selection Signatures ... 127

6.5. References ... 130

Chapter 7 : Appendices ... 137

Appendix A – Supplementary material of chapter 3.1 ... 138

Appendix B – Supplementary material of chapter 3.2 ... 143

Appendix C – Supplementary material of chapter 4 ... 148

Appendix D – Supplementary material of chapter 5 ... 152

Appendix E – Other Publications ... 154

E1. Genetic origin of goat populations in Oman revealed by mitochondrial DNA analysis ... 154

(17)

List of figures

Figure 1.1.1. Most update range occurrence points for all four junglefowl species

acording UICN range ... 16

Figure 1.2.1. Poultry meat consumption kilograms per capita ... 19

Figure 1.3.1. Hierarchical phylogenetic relationships among worldwide distributed haplotypes ... 22

Figure 1.3.2. Distribution of ancient mtDNA haplogroups ... 23

Figure 1.3.3. Standard protocol used to generate and sequence RADseq markers .... 27

Figure 1.4.1. Effect of the demographic perturbations on the SFS ... 30

Figure 3.1.1. Haplotype sequence variants from South American and Iberian chickens ... 57

Figure 3.1.2. Median-joining network of the d-loop chicken haplotypes found in South American and Iberian chickens ... 58

Figure 3.2.1. Nucleotide diversity of South American Chicken populations ... 72

Figure 3.2.2. Principal component analysis of the local South American populations and putative genetic material sources ... 73

Figure 3.2.3. Admixture patterns among South American chicken populations ... 75

Figure 4.1.1. Principal components analysis from all the studied populations ... 94

Figure 4.1.2. Unrooted NJ tree based on corrected p-distances estimated from 87,452 neutral and unlinked SNPs... 95

Figure 4.1.3. Co-ancestry matrix among worldwide chicken populations... 96

Figure 4.1.4. Out-group f3-statistics analyses (TURKEY; JUNGLEFOWL, NLOCAL population) ... 97

Figure 4.1.5. The two best model fit to the extant SA chicken populations ... 98

Figure 4.1.6. Scheme of the demographic models ... 99

Figure 5.1.1. Population structure analysis based on outliers loci ... 109

Figure 5.1.2. BayeScan analysis of 122,801 SNP loci from RADseq data obtained for the 10 studied chicken populations. ... 110

Figure 5.1.3. Population branch statistics analysis(PBS) ... 111

Figure 6.2. Overlapping between principal component analysis dispersion and D mtDNA haplogroup………...122

Figure 6.4. Single nucleotide polymorphism distribution across the chicken genome……….128

(18)

List of tables

Table 3.1.1. Genetic variability found at the studied chicken populations. ... 55 Table 3.1.2. Measures of genetic differentiation for the studied chicken populations ... 56 Table 5.1.1. Candidate genes under selection associated with aggressiveness in gamefowls ... 110

(19)

List of Abbreviations

aDNA ancient DNA

A.D. After Anno Domini

B.C. Before Christ or Before Common Era B.P. Before Present (A.D. 1950)

Ca Circa, approximately.

Gb Giga base, one billion bases (nucleotides) Mb Mega base, one million bases (nucleotides) mtDNA Mitochondrial DNA

NGS Next Generation Sequencing PCR Polymerase Chain Reaction

RADseq Restriction Site-Associated DNA Sequencing SA South America

SFS Site Frequency Spectrum

SNP Single Nucleotide Polymorphism

IUCN International Union for Conservation of Nature RADSeq Restriction Site Associated DNA Markers GBIF Global Biodiversity Information Facility

OECD Organization for Economic Co-operation and Development PCR Polymerase Chain Reaction

FAO Food and Agricultural Organization of United Nations GWAS Genome Wide Association Studies

RRLs Reduced Representation Libraries DNA Deoxyribonucleic Acid

PCA Principal Component Analysis IBP Iberian Peninsula

(20)
(21)
(22)

1.1. Origin, Domestication and, Dispersal of the Domestic Chicken

The domestic chicken (Gallus gallus) is considered the most numerous domestic species distributed worldwide. ⁠This livestock species is considered to have originated from the wild or red junglefowl (G. gallus; Linnaeus 1758), which inhabits the South and Southeast forests of Asia (Figure 1.1.1). Nonetheless, it is accepted that in areas where there is a co-occurrence of other species of junglefowl, such as the grey junglefowl (G. sonneratii; Temminck 1813), the hybridization between both species can occur and thus it is expected that this Gallus species also have contributed to the genetic pool of the extant domestic chicken (Eriksson et al. 2008; Miao et al. 2013). Regarding the other two Asiatic junglefowl relative species, the green junglefowl (G. varius; Shaw 1789) and the Ceylon junglefowl (G. lafayettii; Lesson 1831), which are found in India, Sri Lanka, Southeast Asia, there is not enough evidence to support the contribution of these other junglefowl species to the extant domestic chicken (Ulfah et al. 2016).

Figure 1.1.1. Most update range occurrence points for all four junglefowl species according UICN range. Occurrence points from GBIF.org for all four junglefowl species according IUCN Red list (from Pitt et al. 2016).

Charles Darwin was one of the first suggesting a time and origin for the domestication of chicken. According to Darwin, the domestication of chicken happened ca. 2,500 years B.C. in the Indus Valley(Darwin 1868). Presently, and mostly based archaeological data,

(23)

it is widely consensual that chicken was domesticated in Indus Valley (Harappan culture), with a large body of archaeological evidences – bones, seals depicting cocks, clay statues of chickens – supporting it and dating it from around 2,500 B.C. (Zeuner 1963; Fuller 2006). Archaeological findings also suggested the Northern China as another center of origin, proposing a much earlier domestication (ca. 8000 B.C.) of chicken in this region (West & Zhou 1988; Xiang et al. 2014). But the reappraisal of the archaeological material and the site context, from which the chicken bones were taken, defends that there is not sufficient morphological evidence to support that the bones were from domestic chicken or their wild ancestors (Deng et al., 2014), neither the number of archaeological sites (2) with “candidate” chicken bones was sufficient to support that chickens were widely kept and distributed in central and northern China during the early and middle Holocene period (Eda et al. 2016).

The motives that compelled the Neolithic civilizations to domesticate the jungle fowl, is yet another important information about the domestication of chicken that remains unclear. Some scholars, mostly based on the abundance of seals depicting cocks in a fighting position found at the center of the Harappan culture (Mohenjo-Daro, Pakistan), defend that game was probably the first motivation for chicken domestication (Tixier-Boichard et al. 2011).

The dispersal of chicken from its centers of origin throughout the old world is supposed to have followed the main human migration and trade routes. The introduction of chickens in Europe took two main routes: The northern route, that hypothesizes that chicken spread westward (into Europe) starting from China and via Russia until reaching Europe (West & Zhou 1988), and the southwest route in which chicken was introduced in Europe by the Phoenician maritime trade routes (Perry-Gal et al. 2015) arriving at the Iberian Peninsula, during the first millennia B.C. (Hernandez-Carrasquilla 1992; Garcia Petit 2005; Davis 2006). Therefore, it is consensual to accept that chickens started to be frequently found in Europe from the Iron Age, ca. 500 B.C., onwards (Strid 2015). Regarding the introduction of chicken in Africa, the first evidence of chicken in the African continent was found in Egypt as early as the second millennia B.C (Redding 2015). However, information regarding sub-Saharan Africa is scarce and the few archaeological data indicates the presence of chicken remains ca. 500–800 AD in Mali (Williamson 2000).

The presence of chicken in Oceania is another interesting issue that has been the subject of intensive debate (Storey et al. 2008; Fitzpatrick & Callaghan 2009) as this species is

(24)

thought to have been transported by the Polynesians during their eastward dispersal through the South pacific archipelagos from Ca 1100-1290 A.D. (Storey et al. 2008; Wilmshurst et al. 2011). Although, Southeast Asia, due its geographical proximity has been considered the most probably source of the chickens from the South pacific islands, the distribution of archaeological remains of chicken is very scattered and open to multiple interpretations (Tixier-Boichard et al. 2011; Wood et al. 2016).

The arrival of chickens to South America is yet another unsolved riddle. The most logical explanation about the introduction of the domestic chicken in South America, is the one which considered the Iberian Peninsula settlers has the ones who have firstly introduced chicken in America, after the first contact made by Columbo (1492). In fact, the first written record of introduction of chicken America dates from 1495 A.D. and reports the introduction of chicken taken from the Canary Islands to the Española Island (Dominican Republic), during the second Columbo’s voyage⁠ (Columbus 2010). Another, written source on the entry of chicken in South America was the report of chickens carried by Portuguese explorers entering Brazil around 1500 A.D. (Teixeira & Papavero 2006). Although a European origin of the extant South America local chicken populations is the most well documented hypothesis, there is an alternative hypothesis that proposed that chicken could have arrived South America before Columbo. This hypothesis proposes that Polynesians have arrived to South America around the fourteen century A. D. and have carried along domestic chickens (Langdon 1989; Storey et al. 2011). The carbon-dating of bones from El-Arenal archaeological site of Chile dated from before Columbian arrival into South America (Storey et al. 2007) and the fact that a famous South American chicken breed, the Araucana, carries the blue-egg shell mutation, which is also found in Chinese chicken but not in European chicken (Langdon 1989), are the two most strong evidences supporting this theory.

Finally, it is important to mention that chicken production and chicken populations have been extensively and intensively changed by its industrialization that started 60 years ago. Modern poultry production relies on just two highly selected and specialized chicken lines based (eggs and meat). The meat producing line, broilers, are thought to descend all from the Plymouth Rock breed, and the egg-layers lines, are supposed to be descendants from the Rhode Island Red breeds. The growing facility in transporting eggs and the globalization of intensive farming based highly productive lines have been contributing for the fast admixture of these lines into local chicken populations (Muir et

(25)

1.2. Importance of Local Chicken Breeds

The domestic chicken is considered the most widespread livestock species, and it would not be wrong to think that chicken is the most world-wide abundant bird. Every year, billions of chickens are raised to feed the relentless demand for animal protein (see Fig. 1.2.1; OECD/FAO 2017) which is the only species for which meat consumption is expected to increase 0.2% annually in European Union, in which is expected that by 2030 the consumption per capita will be around 25 kg, and will represent 50% of the increase in global meat consumption (European Commission 2018).

Figure 1.2.1. Poultry meat consumption kilograms per capita. Source: OECD (2018)

Despite the importance of commercial poultry, backyard poultry plays a very important role in the household economy and nutrition of the rural areas, particularly, in the developing and underdeveloped countries (Wong et al. 2017). The chicken raised in backyard poultry production are well adapted and feed mostly on scavenging and foraging insects and seeds, with no cost for the owners other than a small coop for shelter and nesting. The rich number of phenotypes displayed by the local chicken as well as their adaptation to the environment suggests that they harbor a large genetic diversity. Most of this populations were never genetically assessed and/or subjected to any selection program other than the natural selection imposed by the surround environment

(26)

and thus they might represent a treasure trove of useful information on important traits that can be promising for future production in climate changing (Leroy et al. 2016). Presently, a large part of these local populations are threatened by the uncontrolled crosses with commercial breeds (Davila et al. 2009) that lack adaptability to survive in extreme environments. This is of concern because local breeds, under a sustainable environmentally friend agricultural system, provide economic value as well as social and environmental value, especially for people in low-income countries (Leroy et al. 2018).

Local chicken populations from South American are, like most of the livestock species, the result of successive waves of colonizers that have entered this subcontinent in the last five hundred years (Gautier & Naves 2011). Unlike to the modern commercial breeds, the local populations have been subjected to low levels of artificial selection but very exposed to the natural selection imposed by environmental conditions, which coupled with different crosses have resulted in the large phenotypic variability displayed by chickens from this region. However, South America has also a long tradition on cockfighting (Finsterbusch 1990) that can be found from the Caribbean Islands all the way down to Argentina. Cockpits are often a place of socialization and the gamefowl breeders and owners often detain respected social status in their communities. For this reason, there are many populations of gamefowl which have been selected based on their behavior rather than by meat or egg production traits. Therefore, this populations have been kept apart of all the successive introductions of chickens in to South America and represent a very interesting genepool to study.

1.3.

Diversity and phylogeographic studies

The Polymerase Chain Reaction (PCR) together with the automated DNA sequencing technologies, have made possible the massive use of molecular genetics for population genetic studies of the livestock species in the 1990’s, and opened new avenues on evolutionary history of chicken. The very first genetic studies used the sequencing analysis of the control region of the mitochondrial DNA to ascertain the origins of the domestic chicken (Fumihito et al. 1994; Fumihito et al. 1996).

In parallel, the poultry breeding research field and the industry saw the then recent developments on molecular genetics, particularly on DNA sequencing, and lunched the gene mapping of the chicken (Bitgood & Somes Jr 1993; Burt et al. 1995). Consequently, the sequencing of extensive regions of the genome in search for candidate genes underlying major phenotypic traits started to unveil regions with large variation. Among

(27)

the highly variable regions of the genome, the abundance of short but highly-repetitive DNA loci (i.e., microsatellite) spread throughout the genomes, attracted the attention of the genetics. Soon, these loci become known for its genotyping easiness and its nearly-neutral evolutionary behavior (Goldstein & Schlotterer 1999), and started to be used as genetic markers in populations genetics studies of many species, including the livestock ones (Cheng & Crittenden 1994). The popularity of these markers continued to grow in the 1990’s and the scientific community together with the food and agriculture organization from the United Nations started to encourage the use of these markers to characterize local livestock populations (FAO 1997). Interestingly, the genetic characterization of local chicken populations using microsatellites did not rich the extension as for the other major livestock species (e.g., sheep, cattle, goat, pig).

Across the last 20 years, most of the studies using microsatellite to characterize local chicken populations were on a relative small and very regional number of chickens. Most of these works were focused on population diversity and structure, with little contribution to the understanding of the phylogeography and recent evolutionary history of the domestic chicken at the world-wide level (Hillel et al. 2003), but the difficulty to merge independently created datasets has been preventing to assess and interpret worldwide geographic scale diversity and genetic relationships patterns.

1.3.1. Mitochondrial DNA

As the mitochondrial genome is highly abundant in biological samples, relatively straight forward to sequence, and very polymorphic, particularly its control region, it became the most widely used genome to assess the evolutionary history of the domestic chicken. In particular, the control region of the mtDNA genome has been often used to (i) assign the ancestral of the domestic chicken (Fumihito et al. 1996), (ii) to identify its centers of origins and, (iii) to reveal the world-wide landscape distribution of the main mitochondrial lineages (Liu et al. 2006). On this respect, Liu and colleagues (2006) proposed the first classification of all the different mtDNA sequences (haplotypes) found at the control region. By then, all the published chicken sequences (mtDNA) were distilled in eight haplogroups. A haplogroup is a group of similar haplotypes that share a common ancestor - which were named from A to I⁠. With exception of the group C all haplogroups were present in wild and domestic populations. More recently, the use of the next generation sequencing (NGS) permitted to extend the sequence analysis of the mtDNA control region to the whole mitochondrial genome (Miao et al. 2013) and revealed 14

(28)

mitochondrial haplogroups or lineages dispersed across the world (Figure 1.3.1). There, the authors observed that haplotype E1 is the most worldwide spread, whereas its sister haplotype E2 and E3 are restricted to South Asia. The haplogroups A and B also have a wide distribution and the haplogroup D is present in East Africa, South, Southeast and, East Asia as well as is the dominant haplogroup in the South Pacific archipelagos (Gongora et al. 2008; Thomson et al. 2014). The haplogroups F and G are restricted to Southwest China and, the H haplogroup have been recovered only in Southwest China and Japan. Finally, the haplogroups A, B, E are the most frequent in the broiler and egg layer commercial lines

Figure 1.3.1. Hierarchical phylogenetic relationships among worldwide distributed haplotypes. Values between parenthesis correspond to the sample sizes (from Miao et al. 2013).

mtDNA sequencing information was also used to study archaeological samples in an attempt to understand the origins and spreading of the domestic chicken in ancient times. The vast majority of those “ancient DNA” works were focused on the couple millennia and have a narrow geographic representation (Storey et al. 2008; Storey et al. 2011; Storey et al. 2012; Girdland Flink et al. 2014; Thomson et al. 2014), and have shown that

(29)

the most frequent and cosmopolitan haplogroup (E) was already present in Europe one thousand years ago (Storey et al. 2012). Yet, more recently and controversial the analysis (see above sub-section 1.1.) of archaeological bones radiocarbon dated from around 10,000 years B.P. (Xiang et al. 2014) suggested northern China as one of several regions where chicken was domestication.

Ancient DNA was also used to assess the origin of the South American chicken. Indeed, the region on which more aDNA studies were published was the Southern pacific region and all have South American chicken data (Storey et al. 2007; Storey et al. 2008; Storey

et al. 2011; Storey et al. 2012; Thomson et al. 2014). However, the controversy around

the validity of the radiocarbon dating precludes any strong interpretation on the presence of haplotype D, currently dominant in the South Pacific Islands, in the coast of Peru (Figure 1.3.2). Regarding this, studies on contemporary samples representing South American chicken have identified the haplogroup E as the most frequent in this region (Gongora et al. 2008; Miao et al. 2013; Thomson et al. 2014; Luzuriaga-Neira et al. 2017) and that haplogroup D is indeed the most frequent in the archipelagoes of the Pacific South (Gongora et al. 2008; Thomson et al. 2014; Luzuriaga-Neira et al. 2017), and it seems to have been present there since historical times (Thomson et al. 2014).

Figure 1.3.2. Distribution of ancient mtDNA haplogroups. Each pie represents the proportion of each haplogroup and the colors represent the haplotype (Storey et al. 2012).⁠⁠

1.3.2. The nuclear genome

The first sequence of the whole genome of chicken was published fourteen years ago (International Chicken Genome Sequencing 2004), and put the chicken studies in the

(30)

“omics” era. The chicken genome is organized in 40 chromosomes, from which 38 are autosomes and two are sexual chromosomes (Z and W), and contrary to the mammals the avian females are the heterogametic sex (ZW) and males the homogametic (ZZ). A third version of the chicken genome (Gallus_gallus-4.0; GCA_000002315.2) was released in 2010 and gathered sequencing data generated with Sanger and NGS sequencing data. The first genome study on chicken reported more than 7 million SNPs, found remarkable evidences that the locus for thyroid stimulating hormone receptor (TSHR) might have been one of the major selection targets during domestication, as well as identified other regions under selection associated with growth, appetite and metabolic regulation (Rubin et al. 2010)⁠. Currently, the most updated build of the chicken genome (Gallus_gallus-5.0; GCA_000002315.3) dates from last year and results from deep coverage sequence and has a total length of 1.2 Gb (Warren et al. 2017)

The increase of genomic resources and the relative short time of data production convert the chicken among other domestic species in an interesting model to study. In addition, their natural history and adaptation to several environments make the chicken a research target. Finally, its economic importance has been orienting most of the genome analyses towards detection of selection signatures associated with production phenotypes, in highly productive chicken breeds (egg-layers and broiler). The initial effort on the whole genome sequencing of the chicken resulted in commercial SNP arrays, containing hundreds of thousands of SNPs that become the major method to gather genome data for fine-scale genome wide association studies (Elferink et al. 2010; Gholami et al. 2015) which mostly will help guiding chicken breeding programs.

Although the majority of the genome-wide studies on chicken were GWAS oriented, some few studies have been published on detecting the molecular bases of environment adaptation, the candidate genes enrolled in the domestication process and the consequences of the high intensive selection on the loss of diversity by the commercial chicken breeds. From these, a study using whole genome sequencing information from Tibetan chicken populations, detected several candidate genes involved in adaptation to high altitude (Wang et al. 2015), and that some regions showing sharp changes in allele frequencies throughout the time, might have resulted from the globalization of poultry industry based on few (Eriksson et al. 2008), which have also resulted in a substantial decrease of the genetic diversity of these commercial breeds (Muir et al. 2008) on which industry fully relies, are some of most representative studies of the potentialities of genomic information to understand the recent evolutionary history of the domestic chicken.

(31)

1.3.3. Restriction Site-associated DNA Sequencing (RADseq)

The advances in high-throughput sequencing methods allowed the genomic characterization of any specie and subsequently detect their genetic variation patterns permitting to explore their evolutionary history. Although the whole genome sequencing and re-sequencing is presently extensively used to discover and genotype DNA variants across of genomes, its use is relatively costly and remains prohibitively expensive for many studies, especially if we consider sets with an appreciable number of samples (Van Tassell et al. 2008; Sboner et al. 2011; Shendure & Lieberman Aiden 2012).

With the aim of reducing the costs of sequencing and increasing the number of samples several, particularly on non-model species, several methods and protocols based on the use of reduced-representation libraries (RRLs), have been developed and used (Davey

et al. 2011; Andrews et al. 2016). Although, studies on population demographic

inferences and phylogenetics require information from a relatively small number of loci (from tens to hundreds), others such as those using association mapping to identify loci that influence phenotypic variation or genome scans to identify candidate regions under selection associated to population differential adaptation, typically require information from thousands to millions of loci (Allendorf et al. 2010; Davey et al. 2011; Narum 2013). Thus, the sequencing of RRLs represents the optimal genetic analysis strategy as it facilitates flexibility in the number of loci and individuals studied.

The initial restriction site-associated DNA sequencing (RADseq) method, which consists on the digestion of genomic DNA using one restriction enzyme and a mechanical shearing to reduce fragments to a desired range of lengths (Miller et al. 2007; Baird et

al. 2008). Since this initial protocol publication, several variants to this method have been

developed. Among these, the most popular ones (Andrews et al. 2016) are the genotyping by sequencing (GBS) which combines enzyme digestion and PCR amplification of the short fragments resulted from the enzyme cutting (Elshire et al. 2011), and the double-digest RAD (ddRAD) which uses two restriction enzymes each its specific adaptor and fragment size selection step (Peterson et al. 2012). More recently, several studies have started to develop new variants of the RADseq method to be used with low copy number DNA samples (e.g., noninvasive samples, museum collection samples, degraded samples), among which the introduction of a new step consisting on the in-solution capture of chosen RAD tags to target sequencing reads to desired loci proved to be very efficient (Ali et al. 2016).

(32)

Despite all the differences between the RADseq methods, there are three steps that are common to all of them (Fig. 1.3.3), such as the use of specific sequencing adaptors required by the next-generation sequencing platforms. These adapters contain specific strings of nucleotides that can be used as individual sample identifiers (i.e., barcodes) that permit the pooling of many individuals in a single library (multiplexing). Another common step, is the use of restriction enzymes to fragment the genomic DNA. Depending on the whether it is used a common-cutter (6-cutter) or a rare-cutter (8-cutter) enzyme, the number of loci produced may vary. As the enzyme digestion reduces the genomic DNA to variable number of fragment lengths, it requires a size selection step to select the fragments with the ideal length for sequencing. The consistence of the size selection permits the production of data on the same set of loci across samples reducing avoiding high rates of missing genotypes.

Another aspect of the RADseq is the relatively easiness of the post-sequencing analysis. As most of the NGS produced data, it starts by de-multiplexing and trimming of the barcodes, followed by the application of several filters for read selection and sequence quality. Then, if there is a reference genome available, the loci can then be identified by assembling the obtained reads this reference genome, otherwise the loci can be de novo assembled de novo. Finally, variants and genotypes can be called using Bayesian approaches (Nielsen et al. 2012) and, then can be annotated to explore their biological relevance and can include a final step of validation (Pabinger et al. 2014).

One last note to comment on the fact that vast majority of the population genomic studies on the agricultural species use commercial SNP arrays (a.k.a. SNP chips) as it is a relatively cheap technique and the processing of the data is relatively straightforward when compared with the NGS data. Nonetheless, in the last years some studies started using RRLs methods, from which the GBS and the ddRAD are the two most used methods for population genetics studies (Andrews et al. 2016; Suravajhala et al. 2016). Concerning the use of the classical RADseq method, so far, there is only one paper reporting the use of classical RADseq protocol to study the genetic diversity and relationships between local Chinese chicken breeds (Zhai et al. 2015).

(33)

Figure 1.3.3. Standard protocol used to generate and sequence RADseq markers. (A) Genomic DNA is digested with a restriction enzyme of choice. (B) P1 adapter is ligated to the fragments. The P1 adapter is adapted from the sequencing adapter with a molecular identifier (MID) and a cut site overhang at the end. (C) Samples from multiple individuals are pooled together and all fragments are randomly sheared. Only a subset of the resulting fragments contains restriction sites and P1 adapters. (D) P2 adapter is ligated to all fragments. The P2 adapter has a divergent end. (E) PCR amplification with P1 and P2 primers. The P2 adapter will be completed only in the fragments ligated with P1 adapter, and so only these fragments will be fully amplified. (F) Pooled samples with different MIDs are bioinformatically separated and SNPs called (C/G SNP underlined). (G) As fragments are sheared randomly, paired end sequences from each sequenced fragment will cover a 300–400 bp region downstream of the restriction site (adapted Davey and Blaxter 2010).

1.4.

Population Genetics inference using NGS data

The evolutionary history of population is embedded in the DNA molecules, and population genetics become a paramount tool to describe and delimit the history of the populations and species. However, learning the histories of populations registered in the DNA is an interesting challenge that requires both comprehensive and accurate data. The relentless increase in data production have been propelling the development of more efficient screening tools able to deal with big datasets. The contribution of demography and selection besides mutation and recombination as forces that lead the short-term evolution of the species is one of the main axis of research that have greatly benefited from the development of more powerful statistical methods for population genetics data analysis (Schraiber & Akey 2015).

(34)

1.4.1. Population structure

One of the key elements in population genetics studies is the nature of the genetic data. Currently, SNPs are the most widely used molecular markers to assess the evolutionary history of populations, but these markers have some particularities that can bias the estimation of many population genetics parameters. Demographic parameters estimation can be highly influenced by SNPs that have been under selection and non-random association between SNPs are two of the largest constraints of the large genome wide data sets, that have to be filtered before conducting some analysis (Luikart et al. 2003). Another factor that can be of paramount interest in the study of the population is their structure. Patterns of genetic structure can be extremely useful by providing insights on the dating of population split, migration rates and patterns of mating among individuals, but population structure may also negatively influence some statistical methods for inferring demographic history, as for example most of this methods assume that data comes from randomly mating population (Schraiber & Akey 2015).

Principal component analyses (PCA) is one of the most used methods to analyze large SNP datasets to visualize the population structure and the geographic distribution of genetic variation and ultimately, permits to evaluate the genealogical relationships between the studied individuals (McVean 2009). As PCA does not involve a priori knowledge of the population structure because it projects an individual multilocus genotype onto a small number dimensions (usually two) that maximally separate the data. Other methods to detect population structure are based on individual clustering (Alexander et al. 2009), which individuals are hypothesized as the result of the contribution of two or more theoretical ancestral populations (the ADMIXTURE method). The goal of the method consists in determine the amount of contribution of those populations per each studied individual based on allele frequencies. However, besides the PCA and the ADMIXTURE methods being very sensitive to the amount and quality of the data, their analyses do not elucidate the history of the population being the observed genetic patterns. Yet, another widely used method (fineSTRUCTURE) combines coalescence and recombination, considering that each individual in a population is the result of a mixture of two or more genomes. Using ancestral chromosome segments the method allow to estimate the population structure at fine scale (Lawson et al. 2012).

(35)

of population structure, new test based on these parameters (Reich et al. 2009; Patterson et al. 2012) have been developed to identify admixture in populations. The assumption behind this test is that when the product of the differences in the allele frequency between population (X, A) and (X, B) is negative, X is the result of admixture of A and B, and for this reason was named as outgroup f3 statistics. If the product is positive it not only suggests no admixture between A and B, but also suggests a strong post-admixture drift in the allele frequencies masking the results of the f-statistics. Thus, the f3-population test can be used to calculate the shared genetic drift between A, B and their ancestor, if the X population was replaced by and outgroup. Generally, this test is used to test and quantify the relationships between groups observed in PCA analysis (Haak et al. 2015)

The same research group that developed f3-population test, developed another method to model the phylogenetic relationships between populations. This model permits to propose and test the fit model of a tree model of the population admixture history (Haak

et al. 2015). The method has been particularly used in paleogenomics studies (e.g., Reich et al. 2009; Haak et al. 2015) and, its major challenge seems to be choice of the outgroup population. Thus, there are three important requirements for this analysis: (1) The reference populations should not be the same as the outgroups; (2) references must represent derived clades from the outgroup; and (3) the outgroup must be related to the reference but not no directly. The estimated mixing proportions following the f-statistic workflow can be used to build and fit admixture graph models.

Another group of methods to test admixture is based on trees, and relies on the assumption that the relationships between two populations are reflected in the branch length of a tree (Cavalli-Sforza & Edwards 1967; Nielsen 1998)⁠. However, the original tree-based method could not accommodate the admixture factors (migration and gene-flow). To solve this, Pickrell and Pritchard (2012) proposed a new approach, TreeMix, which test the fit of maximum likelihood tree of population without admixture. The fit of the estimated tree is tested by using the residuals of the covariances of allele frequencies in the empirical data and the expected under the estimated tree. Then, the migration between populations is added to get the best model fit. The results of this approach shall be carefully interpreted the best model fit might change according to the chromosome and the linkage disequilibrium between markers.

1.4.2. Demography

(36)

evolutionary forces demography and selection. The representation of the allele frequency distribution in a sample of individual from a population is usually designated as site or allele frequency spectrum (SFS). As the shape of the allele frequency distribution and spectrum alters with demographic dynamics (Nielsen 2000), the comparison between observed and expected frequency spectra under a given demographic model, can provide a goodness of fit of that the model to the data. Although the SFS can also be used to estimate selection parameters, it has been widely used to estimate demographic parameters (Nielsen 2000; Gravel et al. 2011; Excoffier et al. 2013). In brief, mathematically the SFS is a numeration of the derived alleles in a sample

n show up in 1/n, 2/n,…(n-1)/n individuals. When it is not possible to determine the

derived allele usually the folded SFS can be used (Korneliussen et al. 2014; Schraiber & Akey 2015).

Figure 1.4.1. Effect of the demographic perturbations on the SFS. Four simple population demographic models are described: constant, bottleneck, expanding population and structured population models with their respective genealogy and site frequency spectrum. Green lines in the SFS plot represent the trace expected under the neutral model. The arrows in the upper side of the figure represent migration (Schraiber & Akey 2015)⁠.

Population dynamics (Fig.1.4.1) such as contraction leads to the loss of rare variants while population expansion can lead to a dearth of rare variants, whereas a rapid population expansion can lead to an excess of rare or low frequency alleles (e.g., see Beichman et al. 2017), which results from the accumulation of de novo mutations in new born individuals. Although, the SFS requires data from a significant number of individuals

(37)

(Excoffier et al. 2013), it requires less sequence data per individual than other methods. For this reason, SFS-based demographic inference has been used in population genomic studies based on loci distributed throughout the genome, such as the data produced by RADseq (Excoffier et al. 2013).

1.4.3. Selection

One of the main quests of the modern geneticists is to be able to identify genome regions that depart from the expected evolutionary neutrality due to natural selection (Nielsen 2005). The recently develop of the next-generation sequencing methods the ability of detect selection signatures genome level has made a major breakthrough, subsequently several statistical test have been refined to detect selection signatures depend on different demographic or selection models (Vitti et al. 2013). Theoretically, a beneficial variant under selection will generate distinct DNA signatures in its corresponding genome region that can be translated in a shift in the allele frequency spectrum towards high or low frequencies, an excess of homozygous genotypes, the presence of long haplotypes with high frequencies, or an extreme differentiation of a local population in reference to its neighbors. In practice, most of the described selection tests pick one or a combination of this peculiar characteristics to detect where the selection is acting (Voight et al. 2006; Vitti et al. 2013; Booker et al. 2017).

Based on the nature of the data, the methods used for detecting selection can be divided in three groups. The methods based on the allele frequency spectrum which use Tajima’s D coefficient obtained by comparing the number of pairwise differences and the number of segregating sites (Tajima 1989)⁠. Additionally, some of these allele frequency spectrum-based methods incorporate recombination rates to distinguish between the changes in the frequency spectrum caused by demography and the true effects of selection (Nielsen 2005)⁠. Other methods are based on the haplotype reconstruction, which basically compare the length of homozygosity in a haplotype block displayed by a population under possible selection, with the length of the homozygosity displayed by a reference population (Sabeti et al. 2007; Pickrell et al. 2009; Wang et al. 2016).

At last but not the least, there are the methods based on population differentiation, which are based on wright’s fixation index (FST) (Wright 1965). This method was initially

proposed by Lewontin and Krakauer (1973) that the several FST-based methods have

been proposed (e.g., see reviews by Nielsen 2005; Vitti et al. 2013). The biggest challenge of this method is the multiple testing of many genomic locations, and the many effects of population structure and other demographic factors on FST (Beaumont &

(38)

Balding 2004). For this reason, the most recent tests compare the null distribution obtained from simulation of populations under migration-mutation equilibrium with the empirical FST estimates (Akey et al. 2002; Beaumont & Balding 2004; Foll & Gaggiotti

2008). The underlying idea behind this is that loci under selection display significantly higher genetic differentiation (FST-outliers) between populations. Yet, another FST-based

method for detecting loci under selection which is based on the comparison of the FST

values between three populations, population branch statistics (PBS), was proposed (Yi

et al. 2010). Basically, the PBS value represents the amount of allele frequency change

at a given locus in the history of the candidate population to under selection, since it diverges from the other two populations.

(39)

1.5. References

Akey J.M., Zhang G., Zhang K., Jin L. & Shriver M.D. (2002) Interrogating a high‐density SNP map for signatures of natural selection. Genome Research 12, 1805–14.

Alexander D.H., Novembre J. & Lange K. (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Research 19, 1655-64.

Ali O.A., O'Rourke S.M., Amish S.J., Meek M.H., Luikart G., Jeffres C. & Miller M.R. (2016) RAD Capture (Rapture): Flexible and Efficient Sequence-Based Genotyping. Genetics 202, 389-400.

Allendorf F.W., Hohenlohe P.A. & Luikart G. (2010) Genomics and the future of conservation genetics. Nature Reviews Genetics 11, 697-709.

Andrews K.R., Good J.M., Miller M.R., Luikart G. & Hohenlohe P.A. (2016) Harnessing the power of RADseq for ecological and evolutionary genomics. Nature Reviews

Genetics 17, 81-92.

Baird N.A., Etter P.D., Atwood T.S., Currey M.C., Shiver A.L., Lewis Z.A., Selker E.U., Cresko W.A. & Johnson E.A. (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLOS ONE 3, e3376.

Beaumont M.A. & Balding D.J. (2004) Identifying adaptive genetic divergence among populations from genome scans. Molecular Ecology 13, 969-80.

Beichman A.C., Phung T.N. & Lohmueller K.E. (2017) Comparison of single genome and allele frequency data reveals discordant demographic histories. G3 (Bethesda) 7, 3605-20.

Bitgood J.J. & Somes Jr R.G. (1993) Gene map of the chicken (Gallus gallus or G.

domesticus). In: Genetic Maps (ed. by O'Brien SJ), pp. 4332 - 42. Cold Spring

Harbor Laboratory Press, Cold Spring Harbor.

Booker T.R., Jackson B.C. & Keightley P.D. (2017) Detecting positive selection in the genome. BMC Biology 15, 98.

Burt D.W., Bumstead N., Bitgood J.J., Ponce de Leon F.A. & Crittenden L.B. (1995) Chicken genome mapping: a new era in avian genetics. Trends in Genetics 11, 190-4.

(40)

Cavalli-Sforza L.L. & Edwards A.W. (1967) Phylogenetic analysis. Models and estimation procedures. American Journal of Human Genetics 19, 233-57. Cheng H.H. & Crittenden L.B. (1994) Microsatellite markers for genetic mapping in the

chicken. Poultry Science 73, 539-46.

Columbus C. (2010) Journal of the first voyage of Columbus. In: Journal of Christopher

Columbus (During his First Voyage, 1492–93): And Documents Relating the Voyages of John Cabot and Gaspar Corte Real (ed. by Markham C), pp. 13-194.

Cambridge University Press, Cambridge.

Darwin C. (1868) The Variation of Animals and Plants under Domestication. John Murray, London.

Davey J.W., Hohenlohe P.A., Etter P.D., Boone, J. Q., Catchen, J. M. & Blaxter M.L. (2011) Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics, 499–510.

Davila S.G., Gil M.G., Resino-Talavan P. & Campo J.L. (2009) Evaluation of diversity between different Spanish chicken breeds, a tester line, and a White Leghorn population based on microsatellite markers. Poultry Science 88, 2518-25. Davis S.J.M. (2006) Faunal Remains from Alcáçova de Santarém, Portugal (Trabalhos

de Arqueologia 43). Instituto Português de Arqueologia, Lisbon.

Eda M., Lu P., Kikuchi H., Li Z.P., Li F. & Yuan J. (2016) Reevaluation of early Holocene chicken domestication in northern China. Journal of Archaeological Science 67, 25-31.

Elferink M.G., van As P., Veenendaal T., Crooijmans R.P. & Groenen M.A. (2010) Regional differences in recombination hotspots between two chicken populations. BMC genetics 11, 11.

Elshire R.J., Glaubitz J.C., Sun Q., Poland J.A., Kawamoto K., Buckler E.S. & Mitchell S.E. (2011) A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLOS ONE 6, e19379.

Eriksson J., Larson G., Gunnarsson U., Bed'hom B., Tixier-Boichard M., Stromstedt L., Wright D., Jungerius A., Vereijken A., Randi E., Jensen P. & Andersson L. (2008)

Referências

Documentos relacionados

Even though the MXL may have ancestors in different parts of Mexico, their Native genetic origins likely reflect the demographic history of the areas in Mexico with the highest

Genome sequencing and global exploration of transcriptome are effective methods to obtain abundant functional sequences involved in various biological processes. Compared to the

Thus, prior to a routine application on 16S for species-level assessment we recommend reference sequencing of whole mitochondrial genomes using high throughput sequencing ( Tang et

Using the same threshold as in our previous work [ 20 ] (upper and lower 1%) resulted in detection of a lower number of selection signatures with FLK (73.2%) and much lower with

However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned

To track changes in viral intra- population genetic diversity during human-to-mosquito transmission, we performed whole- genome Illumina sequencing of DENV2 populations from 12

With the aim of understanding the diversity of resistance genes in Arachis , we scanned the genome of several wild and cultivated species in search for RGAs (Bertioli et al.

A low-coverage whole-genome sequencing of samples obtained from peripheral blood, H209 cell line and its derived primary xenograft was performed through shotgun whole genome ultra-