Methods in Psychiatric Genetics
A scientific revolution has occurred in the field of genetics, with the advent of molecular biological techniques. Using these techniques, researchers have located genes, in specific regions of chromosomes, for many neuropsychiatric diseases: Huntington’s disease (chromosome 4), Friedreich’s ataxia (chromosome 9), neurofibromatosis (chromosome 17), and familial Alzheimer’s disease (chromosomes 1, 14, 19, and 21). After strong evidence for inheritance of a disorder has been found through family, twin, and adoption studies, the tools of molecular biology can be used to locate the relevant gene(s) and designate the precise abnormality.
*Some of this material also appears in Nurnberger JI Jr., Berrettini WH: Psychiatric Genetics. Chapman & Hall, 1998. Reprinted, with permission.
Population Genetics of Psychiatric Disorders
Three types of population genetic studies--family, twin, and adoption studies-- are conducted to ascertain whether a particular human phenomenon is genetically influenced.
Family Studies
Family studies can answer three critical questions concerning the inheritance of a human phenomenon: First, is the phenomenon found more frequently among the blood relatives of an affected individual compared to relatives of control subjects? That is, are relatives of an affected subject at increased risk for the disorder compared to relatives of control subjects? Second, what other phenomena (possibly genetically related) are also found more frequently among relatives of an affected individual? That is, what other disorders may share a common genetic vulnerability with the phenomenon in question? Third, can a specific mode of inheritance be discerned?
A family study should be differentiated from a family history study in which the relatives are not examined directly but information from the proband (ie, the initially ascertained patient) or other persons is used to establish the presence or absence of the illness. The reliability of the family history method is obviously not as high as the family study method. The discrepancy in reliability is a function of the phenomenon under study, but for psychiatric diseases, the discrepancy is usually great enough to render the family history method undesirable. A family study typically begins with a proband whose relatives are then studied. Prevalence rates in relatives are generally corrected for age (the resulting rate is referred to as the morbid risk or lifetime risk ).
Twin Studies
Twin studies are based on the fact that monozygotic (MZ), or identical, twins represent a natural experiment in which two individuals have the exact same genes. This is in contrast to dizygotic (DZ), or fraternal, twins, who share 50% of their genes and are no more genetically similar than any pair of siblings. A phenomenon that is under genetic control should be more concordant (ie, similar) in MZ twins than in DZ twins. By comparing the concordance rate (how often the second member of a twin pair demonstrates the phenomenon in question when the first member has it) for MZ and DZ twin pairs, investigators can obtain evidence for the genetic determination of a phenomenon. Concordance may be reported as pairwise (each pair of twins is counted once) or probandwise (each affected subject is considered together with his or her co-twin). If twin pairs are identified through affected subjects, the probandwise method may be more correct.
Adoption Studies
Adoption studies represent the strongest test population genetics. In the most straightforward type of adoption study, a group of affected subjects who have been adopted is identified. Similarly, a control group of unaffected, adopted subjects is identified. The risk for the disorder is then evaluated in four groups of relatives: the adoptive and biological relatives of affected adoptees and the adoptive and biological relatives of control adoptees. If the disorder is heritable, one should find an increased risk among the biological relatives of affected subjects, compared to the other three groups of relatives. One can also compare risk for illness in adopted-away children of ill parents to risk for illness in adopted-away children of well parents.
High-Risk Studies
Biochemical studies of individuals with psychiatric diseases are always confounded by disease effects: For example, are biochemical differences between affected individuals and control subjects related to the cause of the disorder, or are they related to the effects of the disorder (or its treatment)? When investigating possible biochemical differences for a genetic disease, researchers can address this difficult issue by studying a group of individuals (usually adolescents or young adults) who are at high risk to develop the disorder under study (usually because they have parents or other relatives with the disorder). The high-risk group may then be followed over time to assess whether the biochemical abnormalities observed are predictive of the disease.
Genetic Analysis Methods
Specific methods of formal analysis have been developed to assess the way in which and to determine the location of the gene(s) involved.
Segregation Analysis
Segregation analysis is used to determine whether the pattern of illness in families is consistent with a specific mode of transmission. In 1971, Elston and Stewart developed a general model for single-gene inheritance that allowed researchers to estimate the likelihood that a particular mode of transmission could explain a given set of pedigrees (ie, families in which individuals are defined as ill, not ill, or illness status unknown). Computer-generated segregation analysis can now be used to detect not only single major locus inheritance but also polygenic inheritance and multifactorial inheritance (ie, in which both environmental and inherited factors are important in pathogenesis). However, four confounding variables, characteristic of many psychiatric illnesses, reduce the power of segregation analysis to confirm or exclude a particular mode of inheritance: (1) variable penetrance (some individuals with the genetic predisposition will not manifest the disease), (2) phenocopies (some individuals without the genetic predisposition will manifest symptoms of the disease), (3) genetic heterogeneity (more than one type of genetic cause can produce the same syndrome), and (4) uncertainty regarding the diagnostic boundaries of a syndrome.
Although segregation analysis is a powerful tool for delineating data from family studies for many disorders, it has been less useful in psychiatry thus far.
Linkage Analysis
At any given genetic locus, each individual carries two copies (alleles) of the deoxyribonucleic acid (DNA) sequence that defines that locus. One of these alleles is inherited from the mother, and the other is inherited from the father. These alleles will be transmitted with equal probability (ie, 1/2), one of the two alleles to each offspring. If two genetic loci are close to each other on a chromosome, their alleles tend to be inherited together (not independently) and are known as linked loci. During meiosis, crossing-over (also known as recombination) can occur between homologous chromosomes, thus accounting for the observation that alleles of linked loci are not always inherited together.
The rate at which crossing over occurs is directly proportional to the distance on the chromosome between them. The genetic distance between two linked loci is defined in terms of the percentage of recombination between the two loci (this value is known as theta &Theta). Loci that are far apart on a chromosome will have a 50% chance of being inherited together and are not linked. Thus the maximum value for is &Theta 0.5, and the minimum value is 0. Linkage analysis is a method for estimating &Theta for two or more loci.
The probability that two loci are linked is the probability that &Theta < 0.5. The probability that the two loci are not linked is the probability that &Theta = 0.5. Thus a logarithm-of-the-odds ratio (LOD) score is defined:
In practice, probabilities are calculated for values of &Theta varying between 0 and 0.5, and the value of that gives the highest probability is assumed to be correct for the numerator. Given the distribution of the marker alleles and the disease phenotype in question among members of the sampled pedigrees, one can calculate the probabilities in the preceding equation and arrive at a LOD score. Although it is possible to perform such calculations by hand, LOD scores are usually calculated using computer programs. Because a LOD score is a log value, scores from different families can be summed. A LOD score of 1.0 indicates that linkage is 10 times more likely than is nonlinkage. For simple genetic conditions, a LOD score of 3 or greater is evidence for linkage, and a score of –2 or less is sufficient to exclude linkage for the sample studied. For disorders with more complex forms of inheritance (including most psychiatric disorders), a higher positive LOD score may be required to be confident of linkage.
LOD score analysis requires the estimation of genetic parameters such as gene frequency in affected and unaffected subjects. Such estimates are difficult to determine in a complex disease. Alternative statistical methods are available that assess allele sharing among affected persons without requiring parameter estimation. The best known of these methods is the affected sib pair method.
Association Studies
Not all clinical genetic investigations of disease utilize families. In association studies the investigator compares allele frequencies for a given locus in two populations, one of which is composed of unrelated individuals who have a disease and the other of which (the control population) is usually composed of ethnically similar, unrelated persons who do not have the disease. If a particular allele commonly predisposes individuals to the disease in question, then that allele should occur more frequently in the disease population than in the control population.
There are pitfalls to an association study, however. The locus chosen for study must predispose individuals to illness (or be extremely close to a predisposing locus--this is known as linkage disequilibrium). Thus loci chosen for association studies are often known as candidate genes. False-positive results can easily occur if the two populations are not matched carefully for ethnic background. A preferable strategy is to also sample DNA from the parents of affected individuals, in which case the nontransmitted alleles are used as a control (known as the haplotype relative risk method).
Molecular Genetic Methods
Advances in our understanding of the human genome have allowed the development of new and powerful techniques for gene localization. The first commonly employed technique using DNA markers involved detection of restriction fragment length polymorphisms (RFLPs). Restriction enzymes are proteins of bacterial origin that have the enzymatic capacity to cleave double-stranded DNA wherever a particular linear sequence of base pairs is found in the DNA exposed to the enzyme. The sequence of base pairs may be four, six, or even eight pairs in length, but each restriction enzyme (over 100 different restriction enzymes have been isolated) is highly specific for a particular sequence of base pairs and, given the proper incubation conditions, will cut DNA only at the appropriate sequence of base pairs. This sequence is termed the recognition site for the enzyme. If human genomic DNA is cut by one of these restriction enzymes, more than one million fragments of various sizes may result. A restriction enzyme can be used to reveal a polymorphism (DNA variant) if digestion of an individual’s DNA results in fragments that differ in size from those usually obtained in a specific region of DNA. Table 6-1 lists the steps involved in this technique.
A variant of the RFLP polymorphism is known as the variable number of tandem repeat (VNTR) polymorphism. In these VNTR polymorphisms (also known as minisatellites), the variation in length of the DNA fragment comes from differences in the number of a tandemly repeated specific DNA sequence that is found between two adjacent restriction fragment sites. In the example diagrammed at the end of this section, consider that each box, |--|, represents a 200-base-pair repeated DNA sequence that has no recognition site for the restriction enzyme BamHI within the repeated sequence. Thus when the diagrammed sequence is cut with BamHI, one fragment will contain 800 base pairs and the other will contain 1000 base pairs, a difference easily detectable with the RFLP techniques described in Table 6-1.
Unfortunately, VNTRs have a tendency in the human genome to cluster at the ends of chromosomes, and sets of VNTR markers have been difficult to find in the middle of chromosomes. This difficulty in finding evenly spaced VNTR markers throughout the human genome has spurred the development of simple sequence repeat (SSR) markers, also known as microsatellites.
SSR markers represent a group of polymorphisms that resemble the VNTR markers in that the polymorphism is based on a variable number of a tandemly repeated sequence. However, in the case of SSR markers, the tandemly repeated sequence is not unique and usually consists of two to five nucleotides. The repeated sequence is often--(CA)n--,--(AG)n--, or--(AAAT)n--, although other SSR sequences have been described. The region containing the SSR is amplified by using a thermostable DNA polymerase in a polymerase chain reaction (PCR).
