1 Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America, 2 Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America, 3 Program in Genomics and Divisions of Genetics and Endocrinology, Children's Hospital, Boston, Massachusetts, United States of America, 4 Department of Zoology, University of Oxford, Oxford, United Kingdom, 5 Forensic Genetics Laboratory, Istituto di Medicina Legale, Universita Cattolica del Sacro Cuore, Rome, Italy, 6 Department of Biology, Galton Laboratory, University College London, United Kingdom, 7 Department of Clinical Sciences, Diabetes and Endocrinology, Lund University, University Hospital Malmo, Malmo, Sweden, 8 Department of Pathology, Medical School, National and Kapodistrian University of Athens, Athens, Greece, 9 Amalia Biron Research Institute of Thrombosis and Hemostasis, Sheba Medical Center, Tel Hashomer, Israel, 10 Institute for Genome Sciences and Policy, Center for Population Genomics and Pharmacogenetics, Duke University, Durham, North Carolina, United States of America
Abstract
Although the literature concerning statistical testing for genotype-phenotype association in family-based and population-based studies is very extensive,
until recently the sex chromosomes have received little attention. Here it is shown that the X chromosome in particular presents special problems with respect to efficient analysis of mixed-sex population studies, and as a result of X inactivation. This paper reviews recent developments in approaching these problems. Introduction The statistical problem of testing for association between phenotype and genetic markers on the sex chromosomes has received less attention than tests for autosomal markers. The advent of genome-wide association studies has hugely increased the number of studies of associations with the sex chromosomes and, in this context, it has recently been recognized that the X chromosome, in particular, poses special problems [1]. Firstly, in population-based case-control studies involving both male and female subjects, associations can be confounded by differences in sex ratio between cases and controls even when, as is usually the case, allele frequencies do not differ between the sexes. Conventional epidemiological approaches to deal with this confounding can be very inefficient. Secondly, the phenomenon of X inactivation, which affects most loci on the X chromosome in females, means that the risk attributable to a single allele would generally be expected to be less in females than in males. An efficient statistical test would allow for this. This review describes approaches to statistical testing for association with loci on the sex chromosomes, largely in the context of case-control studies of binary phenotypes. The X chromosome will be the focus of most of the review. Later sections will briefly discuss family-based association studies, quantitative phenotypes and methods for the Y chromosome. Case-control studies Before turning to the special problems presented by the X chromosome, we shall review simple methods of analysis for autosomal loci in case-control studies. Autosomal loci Counting chromosomes Many early analyses of association between a binary phenotype and a genetic marker used simple tests for association in contingency tables in which cell entries were counts of chromosomes rather than people. Thus, for an autosomal locus, the total cell count is twice the number of subjects studied, and associations were tested simply by comparing allele frequencies between cases and controls. In the diallelic case, this reduces to the analysis of a 2 × 2 table (Table 1). The most commonly used test was the familiar chi-squared test for association which, here, has one degree of freedom (df). The calculations of the chi-squared test statistic, T say, can be broken down in a manner which aids later discussion as follows, where N is the total sample size and A and a the two alleles at the locus:
until recently the sex chromosomes have received little attention. Here it is shown that the X chromosome in particular presents special problems with respect to efficient analysis of mixed-sex population studies, and as a result of X inactivation. This paper reviews recent developments in approaching these problems. Introduction The statistical problem of testing for association between phenotype and genetic markers on the sex chromosomes has received less attention than tests for autosomal markers. The advent of genome-wide association studies has hugely increased the number of studies of associations with the sex chromosomes and, in this context, it has recently been recognized that the X chromosome, in particular, poses special problems [1]. Firstly, in population-based case-control studies involving both male and female subjects, associations can be confounded by differences in sex ratio between cases and controls even when, as is usually the case, allele frequencies do not differ between the sexes. Conventional epidemiological approaches to deal with this confounding can be very inefficient. Secondly, the phenomenon of X inactivation, which affects most loci on the X chromosome in females, means that the risk attributable to a single allele would generally be expected to be less in females than in males. An efficient statistical test would allow for this. This review describes approaches to statistical testing for association with loci on the sex chromosomes, largely in the context of case-control studies of binary phenotypes. The X chromosome will be the focus of most of the review. Later sections will briefly discuss family-based association studies, quantitative phenotypes and methods for the Y chromosome. Case-control studies Before turning to the special problems presented by the X chromosome, we shall review simple methods of analysis for autosomal loci in case-control studies. Autosomal loci Counting chromosomes Many early analyses of association between a binary phenotype and a genetic marker used simple tests for association in contingency tables in which cell entries were counts of chromosomes rather than people. Thus, for an autosomal locus, the total cell count is twice the number of subjects studied, and associations were tested simply by comparing allele frequencies between cases and controls. In the diallelic case, this reduces to the analysis of a 2 × 2 table (Table 1). The most commonly used test was the familiar chi-squared test for association which, here, has one degree of freedom (df). The calculations of the chi-squared test statistic, T say, can be broken down in a manner which aids later discussion as follows, where N is the total sample size and A and a the two alleles at the locus: