The genome-widesearch was performed with amplifi- cation by polymerase chain reaction (PCR) of 400 markers in microsatellite regions with∼10 cM resolution. Each marker site was amplified by AmpliTaq Gold polymerase (PE Applied Biosystems, F oster City, CA, USA) from 60ng of genomic DNA with fluorescent dye-labeled primers (ABI PRISM L inkage Mapping Set V ersion 2, PE A pplied Biosystems) at the manufacturer s recommended conditions for GeneAmp PCR System 2400. DNA fragments were then mixed with size standards (GenScan 400HD［ROX ］) in for- mamide, applied to ABI PRISM 310 Genetic Analyzer, and analyzed by GeneScan Analysis Software. F or each locus, non-parametric affected sib-pair analysis and non- parametric linkage analysis for multiple pedigrees in Genehunter software version 2.0 beta (http://linkage. rockefeller.edu/soft/) were used to calculate non- parametric multipoint lod scores and non-parametric lin- kage (NPL ) scores, respectively.
The feasibility of carrying out genome-wide association studies (GWAS) has led to the rapid progression of the field of complex- disease genetics over the past few years. Although the GWAS approach has been successful in identifying novel candidate genes leading to new discovery of pathways that are involved in the pathophysiology of diseases, the genetic variants identified so far only explain a small proportion of the heritability for complex traits . Due to the modest genetic effect size and inadequate power to overcome the heterogeneity of genetic effects in meta- analysis, true association signals may not be revealed based on a stringent genome-wide significance threshold alone . In addition, the majority of the GWAS have not provided much information beyond statistical signals to understand the genetic architecture for those usually novel genes that have not been studied for a particular trait/disease before. Thus, the necessity of incorporating additional information when studying the GWAS has become apparent. Expression profiling with gene signatures of cellular models have been used to characterize gene’s involvement in bone metabolism and disease processes. One such approach is parathyroid hormone (PTH) stimulated osteoclastogenesis and osteoblast maturation for osteoblastogenesis . PTH indirectly stimulated osteoclastogenesis via its receptors on osteoblasts, which then signal to osteoclast precursors to stimulate osteoclastogenesis. Impaired osteoblastic differentiation reduces bone formation and causes severe osteoporosis in animals . The TNFRSF11B/OPG gene, a well-known candidate gene for osteoporosis, is involved in osteoclastogenesis through the regulation of PTH . Compared to GWAS-identified candidate genes that do not show differential expression in these cellular models, genes like TNFRSF11B/OPG with differential expression are more likely to be involved in skeletal metabolism and thus more likely to be truly associated with osteoporosis. Given that the majority of the reported genome- wide significant SNPs are in the intergenic or noncoding regions , it is not clear which SNP/gene might be implicated as a causal SNP/gene. Since intergenic or noncoding SNPs do not appear to affect protein sequence, it is likely that these SNPs either are in linkage disequilibrium with the causal variants or located within the transcription regulation elements of nearby genes. The relative quantification of gene transcripts may act as intermediate phenotypes between genetic loci and the clinical phenotypes. Expression quantitative trait loci (eQTL) analysis in specific tissues is a valuable tool to identify potentially causal SNPs [7–10]. By integration of genetic variants, transcriptome, and phenotypic data, investigators have the potential to provide much-needed
Quanto version 1.2.4 was employed for sample size calculation using minor allele frequency data from dbSNP database (http://www.ncbi.nlm.nih.gov/projects/SNP/). The categorical data of SNPs for association analysis with T2D was performed by using Pearson’s chi-squared test to detect differences in allele frequencies between cases and controls. Hardy–Weinberg equilib- rium test was performed using a χ2 goodness-of-fit test to assess genotype frequencies. Associ- ation analysis was further confirmed by logistic regression after adjusting for age, sex and BMI as covariates and association results of SNPs with T2D was assessed by using odds ratio (OR) and corresponding 95% confidence interval (CI). In our analysis the most frequent homozy- gous genotype in the control population was considered as the reference category. Further we evaluated the differences in continuous variables (clinical variables between cases and controls) using Students t-test and data are represented as mean ± SD. Bonferroni correction has been used to reduce the chances of obtaining false-positive results (type I errors). A P value of less than 0.005 has been kept as a threshold of significance (after Bonferroni correction). All statis- tical analysis was performed using SPSS version 16 for windows (SPSS, Chicago, Illinois, USA) and SNPstat Software. Population attributable risk (PAR%) which identify what percentage of total risk for T2D is due to genetic effect of the variant was estimated for those SNPs which showed positive association with T2D susceptibility. For allele dosage analysis to acquire the combined information from multiple SNPs, we used allele count model where we summed the number of exact risk alleles carried by each individual in cases and control group.
meta-analyses in European-derived populations, power calculations (Tables S8 and S9) show that this study has greater than 80% statistical power to detect effects for common variants (MAF = 0.20) consistent with published effect sizes (OR = 1.28) for T2DM (e.g. transcription factor 7-like 2 (TCF7L2) and potassium voltage-gated channel, KQT-like subfamily, member 1 (KCNQ1) with ORs 1.3– 1.4; reviewed by ) and more modest power (,70%) to detect effects for less common variants (MAF = 0.10). The power to detect and replicate moderate level contributions to T2DM susceptibility should increase with meta-analysis of this GWAS data and other GWAS currently being conducted in African-American popula- tions. In addition this study reports results from only directly genotyped SNPs. Effective imputation of additional SNPs would undoubtedly improve coverage of the African-American genome. While recent imputation methods development  show encour- aging progress, rigorous empirical testing continues. A potential bias of the current study design may be that the GWAS was conducted in an African-American population of individuals with type 2 diabetes with nephropathy however; there is no specific reason why this African-American population should differ substantially from African Americans with T2DM without ESRD. For example, TCF7L2 is strongly associated in our studies of African-American T2DM-ESRD subjects [28,40]. In addition it should be noted that although every precaution was taken to account for population structure, as with any GWAS or candidate gene study, there may be residual population substructure. The major strength of this study is the genotyping and replication in four additional populations, thus providing support for the evidence of association observed. In addition, the study design which includes individuals with T2DM and ESRD allows for the identification of ESRD loci which are distinct from those presented herein (Table S10; ).
In summary, we have identified the locus including the bromodomain-containing gene , BAZ2B, as a new SCD susceptibility locus. The bromodomain is an exclusive protein domain known to recognize acetyl-lysine residues on proteins and might play an important role in chromatin remodeling and gene transcription regulation . The risk allele, while low frequency in Caucasian populations (MAF = 0.014) has a relatively large effect, increasing risk for SCD by .1.9-fold per allele (95% CI 1.57 to 2.34). While rs4665058 may not be clinically relevant in the general population in whom the increment in absolute risk attributable to the variant is modest, exploring its role in high-risk populations (e.g. heart failure, SQTS/LQTS) may help to identify those who could benefit from intervention. Beyond the BAZ2B locus, our study also highlights the role of QRS/QT interval associated variants in the risk of SCD, and suggests that larger GWAS of these and other intermediate risk factors may yield additional SCD loci.
HCC cases can be attributable to chronic infection with HBV in hyper-endemic regions, suggesting CHB was a major risk factor for development of HCC . The enormous variation in clinical outcome of HBV infection highlights the importance of identification of mechanism underlying the progression of HBV exposure to CHB for prevention against HBV-induced fatal liver disease. Although the environmental factors such as alcohol abuse, infection age, and co-infection with other hepatitis virus unveiled as risk factors of HBV-induced liver disease, genetic factors may also influence clinical progression after HBV exposure, which is indicated by familial studies . In fact, multiple candidate genes, such as IFNG, TNF, VDR, and HLA loci, have been extensively investigated in the progression to CHB, but results were inclusive [10–13]. A recent genome- wide association study (GWAS) by Kamatani et al. in Japanese population has suggested two SNPs of rs3077 and rs9277535 in
(rs8034191 and rs1051730, p = 0.029 and 0.023, respectively (http://www.b58cgene.sgul.ac.uk/, accessed [3/7/2008]). Histori- cally, nicotinic receptors are classified as neuronal or muscle-type, based on their initial site of identification and composite subunits . Cholinergic activity in the airways primarily induces tracheo- bronchial smooth muscle contraction and mucous secretion. However, there is an increasing body of literature showing the importance of extra-neuronal cholinergic signaling  in the lung. The association of the SNPs at the chromosome 4 HHIP (Hedgehog-Interacting Protein) locus is also interesting, though it did not reach the stringent genome-wide significance levels in the populations studied in this manuscript. These SNPs were also associated with FEV 1 in the BEOCOPD study (rs1828591 and
Type 2 diabetes (T2D)-associated end-stage kidney disease (ESKD) is a complex disorder resulting from the combined influence of genetic and environmental factors. This study contains a comprehensive genetic analysis of putative nephropathy loci in 965 African American (AA) cases with T2D-ESKD and 1029 AA population-based controls extending prior findings. Analysis was based on 4,341 directly genotyped and imputed single nucleotide polymorphisms (SNPs) in 22 nephropathy candidate genes. After admixture adjustment and correction for multiple comparisons, 37 SNPs across eight loci were significantly associated (1.6E-05,P emp ,0.049). Among these, variants in MYH9 were the most significant (1.6E-
GWAS in which several hundred thousands or even a millions of SNPs are typed in thousands of individuals provide unprece- dented opportunities for systematic exploration of the universe of variants and interactions in the entire genome and also raise several serious challenges for genome-wide interaction analysis. The first challenge comes from the problems imposed by multiple testing. Even for investigating pair-wise interaction, the total number of tests for interaction between all possible SNPs across the genome will be extremely large. Bonferroni-corrected P-values for ensuring genome-wide significance level of 0.05 will be too small to reach. The second challenge is the need for computa- tionally simple statistics for testing interactions. The simplest way to search for interactions between two loci is to test all possible two-locus interactions. This exhaustive search demands large computations. Therefore, the computational time of each two- locus interaction test should be short. The third challenge is the power of the statistics for testing interaction. To ensure the genome-wide significance, the statistics should have high power to detect interaction. Developing simple and efficient analytic methods for evaluation of the gene-gene interactions is critical to the success of genome-wide gene-gene interaction analysis. Finally, the fourth challenge is replication of the finding of such interactions in independent studies.
The ‘‘thrifty genotype’’ hypothesis proposes that the high prevalence of type 2 diabetes (T2D) in Native Americans and admixed Latin Americans has a genetic basis and reflects an evolutionary adaptation to a past low calorie/high exercise lifestyle. However, identification of the gene variants underpinning this hypothesis remains elusive. Here we assessed the role of Native American ancestry, socioeconomic status (SES) and 21 candidate gene loci in susceptibility to T2D in a sample of 876 T2D cases and 399 controls from Antioquia (Colombia). Although mean Native American ancestry is significantly higher in T2D cases than in controls (32% v 29%), this difference is confounded by the correlation of ancestry with SES, which is a stronger predictor of disease status. Nominally significant association (P,0.05) was observed for markers in: TCF7L2, RBMS1, CDKAL1, ZNF239, KCNQ1 and TCF1 and a significant bias (P,0.05) towards OR.1 was observed for markers selected from previous T2D genome-wide association studies, consistent with a role for Old World variants in susceptibility to T2D in Latin Americans. No association was found to the only known Native American-specific gene variant previously associated with T2D in a Mexican sample (rs9282541 in ABCA1). An admixture mapping scan with 1,536 ancestry informative markers (AIMs) did not identify genome regions with significant deviation of ancestry in Antioquia. Exclusion analysis indicates that this scan rules out ,95% of the genome as harboring loci with ancestry risk ratios .1.22 (at P , 0.05).
According to publicly available expression data (http://genome. ucsc.edu), in humans, BC034767 is expressed in the testes only, while TOX3 expression has been shown in the salivary glands, the trachea, and in the CNS. Detailed in-depth real time PCR profiling of TOX3 showed high expression levels in the frontal and occipital cortex, the cerebellum, and the retina . To assess a putative eQTL function of rs6747972 or rs3104767, we studied the SNP-genotype-dependent expression of TOX3 and BC034767 as well as of genes known to directly interact with TOX3 (CREB-1/ CREBBP/CITED1) and potential target genes of long-range regulatory elements at the locus on chromosome 2 (MEIS1/ ETAA1) in RNA expression microarray data from peripheral blood in 323 general population controls . No differential genotype-dependent expression variation was found.
In summary, we have for the first time used a rapid, web-based enrollment method to assemble a large population for a genome- wide association study of PD. We have replicated results from numerous previous studies, providing support for the utility of our study design. We have also identified two new associations, both in genes related to pathways that have been previously implicated in the pathogenesis of PD. Using cross-validation, we have provided evidence that many suggestive associations in our data may also play an important role. Using recently developed analytic approaches developed for GWAS that take into account the ascertainment bias inherent in a case-control population, we have estimated the genetic contribution to PD in this sample. These findings confirm the hypothesis that PD is a complex disorder, with both genetic and environmental determinants. Future in- vestigations, expanded to include environmental as well as genetic factors, will likely further refine our understanding of the patho- gensis of PD, and, ultimately, lead to new approaches to treatment.
In general, when performing GWAS with quantitative mea- surements, the potential for type I errors can be high. However, Bonferroni is a very conservative correction and the association of a given genetic variant on cytokine level might be too small to reach genomewide significance when comparing 500,000 tests. As for any empirical research, the statistical power for GWAS is of critical importance. As an example, we had 80% power to detect an effect size of 0.876SD between groups with a minor allele frequency of 20%. According to our sample size, minor or modest effects of genetic variants on the produced cytokine levels might be missed in the current study.In the discovery stage we found significant SNPs of whom one SNP also passed the Bonferroni correction. This SNP did not reach significant level after permutation-based correction for multiple testing; this might be because the phenotype is not completely normally distributed and the statistics might be inflated giving artificially low P-values. This distorts the type I error for the asymptotic test and makes the permutation procedure less powerful. However, none of these SNPs could be replicated in the further stages. From the meta- analysis illustrated in the forest plot we see some wide confidence intervals for all of the studies. This indicates that we have a low power in the three stages. As the mean values are logarithmic,
In ABCA minigene, the presence of cryptic acceptor splice site upstream to L1 59 UTR determined the efficiency of L1-induced TI, because its deletion increased intron retention about 5-fold. Therefore, the occurrence of cryptic acceptor splice sites is somehow beneficial for SV40 Pol II to guarantee processive transcription coupled with splicing across exons 23, ExSP and 3 (Figure 4A), whereas the absence of it may force Pol II to slow down or even dissociate from the template, giving rise to premature intron-containing transcripts (Figure 8A). Alternative- ly, elongating Pol II could search for additional, less favourable splice sites, causing exonization in upstream region. All these data are consistent with the kinetic model of transcription originally proposed by Eperon et al.  and modified by Kornblihtt . According to this model, the use of alternative splice sites depends on the elongation rate of Pol II. In our minigenes, L1 SP Pol II complex could act as a roadblock (or sitting duck) by forcing elongating SV40 Pol II to pause or dissociate from the template. Since L1 ASP Pol II complex is moving in opposite direction, it could collide with SV40 Pol II complex and decrease its efficiency of transcription. The fact that L1 SP and ASP activities are significantly lower than SV40 promoter activity does not necessarily mean that their contribution to TI is minimal. It is possible that the binding affinities of L1-specific TFs and cooperativity between them will determine the net effect on TI. The TI effect may also depend on the promoter strength, which, according to a recent study  can be determined by the balance between productive and abortive initiation of transcription. It is reasonable to assume that both, productive as well as abortive transcription (resulting ,15 nt transcripts), could contribute to TI. Moreover, binding of TFs alone without initiation of transcription could affect the outcome of TI. It may be argued that an interplay between L1 SP and ASP could influence the SV40 promoter- driven transcription and splicing in some complicated or combinatorial way, which is not so easy to trace from simple deletion analysis. However, it is clear that deletion of the L1 59 UTR or part of it in ABCA minigene affects SV40 transcription strongly. Nevertheless, we cannot completely rule out the possibility that other factors (sequence composition or nucleotide bias) could also contribute to the net TI effect, similar to that described earlier [59,60].
After this manuscript was submitted, two other studies characterizing the population history of Roma were published. First, a study based on Y-chromosome haplogroups showed that on the paternal lineage, Roma haplotypes cluster predominantly with the Northwestern Indian haplotypes , consistent with our findings based on autosomal IBD sharing. The second study was based on whole genome SNP genotype data like ours . Our findings are broadly consistent with the results from that paper, although with some notable differences. For inferring the date of the founder event, the other study uses a two-pulse model (an out- of-India founder event, followed by a second founder event that affects only the western Roma groups). We instead estimate the date of a single shared founder event; with our limited sample size (we have only 2 samples from western Roma groups), we cannot recover the entire distribution of founder events and so the date of the founder event in our study should be interpreted as an average date of multiple founder events. Similarly, the other study, using a continuous admixture model, estimates that the admixture in Roma occurred over a period of 38 generations . Assuming a single admixture model, we estimate that the average date of admixture is 2962 generations. However, when we consider a two-pulse model of admixture, we infer the dates of 37 and 4 generations, consistent with the results of the other study.
Here we used the Atherosclerosis Risk in Communities study cohort (ARIC) and the Northern Finland Birth Cohort (NFBC1966) to explore the potential values of high throughput analyses of epistasis in the eight metabolic traits above. ARIC is one of the largest GWAS populations available and both its sample size and density of SNPs genotyped nearly double the counterparts in NFBC1966. After data scrutiny and quality control checks (Table S1), we performed full pair-wise genome scans using BiForce and conventional GWAS in all eight metabolic traits in both cohorts, identified and tested replication of genome-wide significant epistatic signals. It has been shown that a combined search algorithm implemented in BiForce can increase the power of detection of epistasis by applying appropriate thresholds to test interactions involving SNPs with genome-wide significant mar- ginal effects (marginal SNPs) while keeping false-positive rates under control . We then assessed the impact of sample size and SNP density on power of detection by comparing the computed interaction profiles in each trait between the two cohorts. Further we characterised local interactions between SNPs located within 1 Mb and with an interaction P value (P int ) less than a threshold of
each family member using Research Diagnostic Criteria (RDC)  and DSM-III/IV criteria . We obtained: a) genotypes from a panel of 1991 multi-allelic microsatellite markers (deCODE panel) for the entire pedigree (497 individuals) and b) high-density SNP genotypes (Illumina Omni 2.5 M SNP arrays) for a subset of 388 individuals. Copy number variants (CNVs) were called using the PennCNV software . To establish an initial catalogue of potential sequence variants underlying bipolar disorder, we also obtained whole genome sequences (WGS) for 50 individuals comprising 18 parent-child trios and 8 additional parent-child pairs from the extended pedigree (Figure S1). We selected individuals for whole genome sequencing that included a) the most distantly related subfamilies; b) affected individuals (n = 23) diagnosed with BPI or BPII and c) a small subset of healthy siblings (ages between 48 and 78, well past the age of disease onset, which is usually prior to age 35). Whole genome sequencing was performed by Complete Genomics Inc. (CGI; Mountain View, CA) using a sequence-by-ligation method . Median sequenc- ing depth was ,506 across all 50 samples (Figure S2). By combining genotype data for the entire extended pedigree with whole genome sequences for a selected set of parent-child trios across different sub-pedigrees, we were able to infer the genetic background for the entire pedigree and address the impact of both common and rare variants within this large multigenerational family.
The growth period traits are important traits that affect soybean yield. The insights into the genetic basis of growth period traits can provide theoretical basis for cultivated area division, rational distribution, and molecular breeding for soybean varieties. In this study, genome-wide association analysis (GWAS) was exploited to detect the quantitative trait loci (QTL) for num- ber of days to flowering (ETF), number of days from flowering to maturity (FTM), and number of days to maturity (ETM) using 4032 single nucleotide polymorphism (SNP) markers with 146 cultivars mainly from Northeast China. Results showed that abundant phenotypic varia- tion was presented in the population, and variation explained by genotype, environment, and genotype by environment interaction were all significant for each trait. The whole accessions could be clearly clustered into two subpopulations based on their genetic relatedness, and accessions in the same group were almost from the same province. GWAS based on the uni- fied mixed model identified 19 significant SNPs distributed on 11 soybean chromosomes, 12 of which can be consistently detected in both planting densities, and 5 of which were pleotro- pic QTL. Of 19 SNPs, 7 SNPs located in or close to the previously reported QTL or genes controlling growth period traits. The QTL identified with high resolution in this study will enrich our genomic understanding of growth period traits and could then be explored as genetic markers to be used in genomic applications in soybean breeding.
Amyotrophic lateral sclerosis (ALS) is a terminal disease involving the progressive degeneration of motor neurons within the motor cortex, brainstem and spinal cord. Most cases are sporadic (sALS) with unknown causes suggesting that the etiology of sALS may not be limited to the genotype of patients, but may be influenced by exposure to environmental factors. Alterations in epigenetic modifications are likely to play a role in disease onset and progression in ALS, as aberrant epigenetic patterns may be acquired throughout life. The aim of this study was to identify epigenetic marks associated with sALS. We hypothesize that epigenetic modifications may alter the expression of pathogenesis-related genes leading to the onset and progression of sALS. Using ELISA assays, we observed alterations in global methylation (5 mC) and hydroxymethylation (5 HmC) in postmortem sALS spinal cord but not in whole blood. Loci-specific differentially methylated and expressed genes in sALS spinal cord were identified by genome-wide 5mC and expression profiling using high- throughput microarrays. Concordant direction, hyper- or hypo-5mC with parallel changes in gene expression (under- or over-expression), was observed in 112 genes highly associated with biological functions related to immune and inflammation response. Furthermore, literature-based analysis identified potential associations among the epigenes. Integration of methylomics and transcriptomics data successfully revealed methylation changes in sALS spinal cord. This study represents an initial identification of epigenetic regulatory mechanisms in sALS which may improve our understanding of sALS pathogenesis for the identification of biomarkers and new therapeutic targets.
Based on the findings presented, the combination of multiple methods and the functional annotation strategies adopted seemed to be highly informative. Notwithstanding, some challenges still need to be overcome when considering scanning genome-wide data for selection sweeps. First, as similar genomic patterns can be produced by other phenomena, such as genetic drift, separating false positives from real selection signals may not be trivial. Second, identified candidate regions often lacked spatial resolu- tion, spanning from hundreds of kilobases to few megabases and comprising many genes. Third, distinguishing causal variants from nearby neutral loci may be the most difficult issue, as those variants were probably seldom typed in SNP arrays, and even with whole genome sequence data, variants in LD with the actual selected locus could have produced similar signals due to genetic hitch-hiking. Integrating different methodologies may help miti- gating these problems, and should provide a valuable tool for seeking loci that are likely to have undergone recent artificial selection.