On simple repetitive DNA sequences and complex diseases

Share Embed


Descripción

Elrctrophoresis 1997, 18, 1577-1585

Cornelia Epplen' Eduardo J. M. Santos' Winfried Maueler' Paul van Helden' Jihg T. Epplen'

Simple repeats and disease

1577

On simple repetitive DNA sequences and complex diseases

Simple repetitive DNA sequences are abundantly interspersed in eukaryote genomes and therefore useful in genome research and genetic fingerprinting 'Molecular Human Genetics, in plants, fungi and animals, including man. Recently, simple repeats were also Ruhr-University Bochum, Germany identified in some prokaryotic genomes. Hence the same probes can be 'MRC Centre, Medical Biochemistry, applied for multilocus DNA fingerprinting in medically relevant bacteria. Stellenbosch University, Tygerberg, Simple repeats including composite dinucleotide microsatellites are differentiSouth Africa ally represented in different compartments of eukaryote genomes. Expanded triplet blocks in and around certain genes may, for example, cause so-called trinucleotide diseases in man. As a consequence, simple repetitive sequences should also be characterized with respect to their influences on the DNA structure, gene expression, genomic @)stability and their development on an evolutionary time scale. Here three examples of microsatellites in the human major histocompatibility complex (HLA) are investigated, a (GT), microsatellite situated 2 kb 5' off the lymphotoxin a (LTA) gene, a (GAA), block in the 5' part of the HLA-F gene and a composite (GT),(GA), stretch in the second intron of HLA-DRBI genes. Grossly differing mutation rates are evident in these elements as well as varying linkage disequilibria. The unfolding of these simple repeats in distant human populations is covered including Caucasians, Bushmen and South American Indians. Furthermore, implications of simple repeat neighboring genes are discussed for the multifactorial diseases multiple sclerosis (MS), rheumatoid arthritis (RA) and early onset pauciarticular arthritis (EOPA). Polymorphisms of HLA-DRBI and T cell receptor l3 variable (TCRBV) genes confer susceptibility for these autoimmune diseases as demonstrable by intronic simple repeat variability. Microsatellite polymorphisms within the TNF region reveal linkage disequilibria with HLA-DRBI and different promotor alleles of the TNFA gene. Disease associations with TNFA microsatellite alleles are, on the one hand, secondary to associations with HLADRBl genes (in MS) or they represent additional risk factors (in RA, EOPA) on the other hand. Evolutionary persistence, various structural conformations and the specific binding of nuclear proteins to several simple repeat sequences refute the preconceptions of biological insignificance for all of these ubiquitously intetspersed elements.

1 Introduction By definition, simple repetitive DNA [l] is composed of short sequence motifs ranging from 1-6 bases in length reiterated tandemly for 5 to 100 times or more [2]. Such conspicuous blocks are known to be widely interspersed throughout eukaryotic genomes and even in a number of prokaryotic chromosomes (e.g., Mycobacterium tuberculosis [3] and Helicobacter pylori, unpublished data). Longer simple repeat blocks tend to vary in the numbers of tandem repeat units [4-6]. The variability at microsat~~

~~

~~

~~

Correspondence: Prof. Dr. Jorg T. Epplen, Molecular Human Genetics, Ruhr-University, D-44780 Bochum, Germany (Fax: +49-234-7094196; E-mail: [email protected]) Nonstandard abbreviations: EOPA, early onset pauciarticular arthritis; HLA, human leucocyte antigen; LTA, lymphotoxin a gene; LTa, lymphotoxin a protein; MHC, major histocompatibility complex; MS, multiple sclerosis; RA, rheumatoid arthritis; RR, relative risk(s); TCRBV, T lymphocyte receptor B variable region element; TNFa, tumor necrosis factor a protein. Keywords: DNA binding / Evolution / Immunoprinting / Multifactorial diseases / Simple repeats 0 VCH Verlagsgesellschaft mbH, 69451 Weinheim, 1997

ellite loci is based on variable numbers of the perfectly organized tandem units. The mechanism predominantly responsible for the so-called repeat slippage [7] phenomenon has yet to be characterized more thoroughly. Polymorphic simple repeats, i.e. microsatellites, are easily and rapidly typed by polymerase chain reaction (PCR, op cit.). Semi-automatable typing of microsatellite alleles has revolutionized mapping of genomes and allowed construction of extremely high density genetic maps of mouse and man [8, 91. One consequence of these maps is that practically all monofactorially inherited diseases with known chromosomal localization can be diagnosed indirectly via linkage analysis in informative family situations [lO]. To date these efficient tools for genome research have remained essentially unfathomable in many aspects of their biological functions and their meaning for the development of diseases. Why do some of the perfect tandem repeats degenerate into crypticity whereas others appear to remain accurately reiterated for hundred thousands of generations and even across species' barriers? Here we illustrate some characteristics of selected simple repeats and their usefulness for typing genetic predispositions for complex inherited human diseases. 0173-0835/97/0909-1577 $17.50+.50/0

1578

C. Epplen et al.

Elecfrophoresis

1991, IS. 1577-1585

2 Simple repeats in the genomic junkyards Simple repetitive DNA was regarded as a mere offshoot of eukaryotic genome research by most molecular geneticists until the instrumental character of these elements was fully appreciated as efficient marker systems for multilocus DNA fingerprinting [ l l ] and especially for DNA profiling [4-6, 91. Data bank searches and experimentation reveal that only a minute subset of the highly abundant simple repeat sequences appear in mature mRNA, and then predominantly in the 3’ and 5’ untranslated regions [12, 131. Inconspicuous expression as well as the futile search for immediate biological functions has inspired interpretations that these elements can be subsumed in the voluminous category of ‘junk’ DNA [14]. Interestingly, Ohno’s elaborated concept of ‘junk’ DNA includes a utility scale ranging from completely superfluous (‘garbage’) to virtually vital as defined by evolutionary conservation potential (‘antiques’). Over the years several ‘philosophical’ opinions were publicized on simple repeats but little was formally proven, despite scholarly executed experimentation. Recent revelations have sparked true surprises concerning this category of repeats in man: More than a dozen so-called trinucleotide diseases have been defined to date involving different triplet repeat motifs in the causal pathogenesis and transmittance of diseases [15, 161. Therefore simple repeats may also have functional, if not pathological significance when encountered by chance among the more than 7 X 10’ nucleotide sequences (harboring >7.5 X 10’ nucleotides) enlisted in the public databases. The EMBL databank was searched using the FASTA algorithm for all blocks of simple (mono- to tetranucleotide) repeats extending for 40 bases or more in summer 1996. The representation of perfect simple repeats (A) and composites thereof (B) is depicted in Fig. 1; at first glance, for the simple tetranucleotides, 44 different motifs should have been considered, but only 26 basic units are included. All other basic motifs, up to a length of four bases are not contained, because they are represented within the shorter motifs (mono- to trinucleotides) or they are obtained by shifting the phase by one or several positions or by reading the complementary strand of the depicted entities. The incidence of all possible permutations of simple mono- to tetranucleotides is far from being equal in accessible sequences. Certainly, some reporting biases do not yet allow extrapolation of the true natural frequencies of simple repeats. Among these skewing biases in databanks are overrepresentation (i) of human sequences, (ii) of model organisms, (iii) of genes vs. underrepresentation of introns and spacers; and (iv) excess of certain microsatellite motifs because of their increased presence on short restriction fragments used for establishing respective minilibraries. In contrast, underrepresentation of very large simple repeat blocks is due to (i) reduced cloning efficiency, (ii) general instability of simple repeat tracts (reduction in size, rearrangements) in the prokaryotic hosts employed for molecular cloning and (iii) underreporting due to limited interest of the gene hunters in these elements. Upon inspection of Fig. 1 it is obvious that motifs containing CpG dinucleotides are virtually never observed in longer simple repeat blocks because they are biologically erased from modern genomes of the eu- and prokaryotes investi-

Figure 1. Representation of simple-repeat DNA sequences consisting

of mono- to tetranucleotide motifs in the EMBL data bank as of May 1996. (A) Perfect simple tandemly reiterated sequences and (B) cornposite repeat blocks, each at least 40 base pairs in length or more. The more frequent (composite) simple repeats are denoted explicitly on the left-hand side. Note the remarkably high frequencies of simple repeats exhibiting perfect (YN) dinucleotide periodicity. For discussion of ascertainment biases see text. {s = ‘strong bondage’: c or g; w = ‘week bondage’: a or t).

gated. The general lack of CpG dinucleotides results because these dinucleotides are prone to mutate (after C residue methylation and oxidative deamination). Remarkably, base pairs forming strong bondage (S; C or G) are less abundant in simple repeats than those held together by only two hydrogen bridges (W; A or T). This fact agrees with the general (A+T) surplus vs. the lowered (G+C) frequencies in the introns and spacers of eukaryotes. Clearly, (gt),20/(ac),2,, microsatellites outnumber all other motifs by far. Notably, (gata),,, stretches are most abundant among the longer basic motifs. Thus, in general terms, perfect (YN) dinucleotide periodicity appears particularly successful for generating/maintaining a simple repeat block in the genome. This selective effect on the relative frequencies is even more pronounced when the composite simple repeat blocks are inspected more carefully (see Fig. 1B): Whereas (gt),(ga), repeats are frequently encountered, all possible shifts in the (YN) periodicity at the border of the two perfect tandem blocks are practically never observed. This remarkable phenomenon - that the (YN) periodicity has to be maintained in composite simple repeats - can also be observed for other di- and tetranucleotide combinations (data not shown). Implications of the perfect dinucleotide periodicity are discussed below.

Elecfrophoresis 1997, 18, 1577-1585

It may be possible to propose some rough sequencebased rules for the generation and/or maintenance of simple repeats in present-day genomes (databanks). These rules have to be scrutinized thoroughly when more unselected random sequence information is available. The mere representation of simple tandem repeats in the databanks argues for mechanisms that allow several motifs to expand considerably more efficiently than others, both with respect to primary establishment (incidence) and the secondary elongations (increased length). Subsequently, the stability of simple repeats appears to also be dictated by their sequence contents and especially their genomic environments (see below). Table 1. Frequent simple repeat blocks contain

No CpG dinucleotides W (AIT) % S ( G K ) bases Perfect YN dinucleotide periodicity

3 Intragenic simple repeats vs. those in the genomic ‘desert’ So far, the vast majority of microsatellites in databanks and novel genome maps are more or less anonymous in relation to neighboring coding units. Positional relationships between these genomic constituents (i.e. fully integrated maps) will be clarified on a large scale after the completion of the human genome project. Interestingly, the number of genes in humans [17] and that of longer simple repeat blocks are of the same order of magnitude [18]. Yet extremely long perfect simple sequences, i.e. classical satellite DNAs [l], do not allow complete sequence analysis sensu strict0 due to the limitations of the currently available methodology. In the accepted scenario where the overall distribution of simple repeats is more or less random, most of these repeat blocks would be situated in the vast intergenic spacers and in the large introns of the typically sized vertebrate genomes. But also in the extremely small eukaryotic and prokaryotic genomes, simple genetic redundancies are recognized to an increasing extent [ 191. Generalized and direct effects of the positioning of simple repeats in respect to coding sequences have not been recognized so far. Does the short- and long-term development of simple repeats in evolution reflect their localization in relation to genes? Several findings suggest that influences may be exerted by certain simple repeats on the regulation of gene expression.

Simple repeats and disease

1579

protein coding alleles [20]. This principle of indirect gene diagnoses via microsatellites has been expanded and already exploited successfully in many different genes that are relevant for the controlled functioning of the human immune system. Different alleles of these genes may have delicately altered functions, and typing of microsatellite markers specific for these alleles therefore allows the delineation of functional differences. Collectively, typing of immune genes via intronic or gene adjacent microsatellites has been termed ‘immunoprinting’ [21, 221. By combining several reactions into multiplex PCR systems, many hundreds of patients and equal numbers of matched controls can be screened efficiently for genetic predisposition factors to frequent multifactorial diseases like rheumatoid arthritis (A) of adults [22], early onset pauciarticular arthritis (EOPA) of children [23] and multiple sclerosis (MS) [24]. As mentioned above, indirect typing of the exonic alleles via intronic microsatellites may be applied not only to the locus immediately concerned but also extended across varying genomic distances due to variable degrees of linkage in different parts of the human TCRBV gene complex containing some 68 BV elements. Quite similar conclusions can also be drawn for the major histocompartibility complex (MHC) on an even broader experimental data basis, i.e. the human leucocyte antigen (HLA) region in man, which includes in addition to transplantation antigen and other genes, also the tumor necrosis factor a gene (TNFA) locus. Here, a (GT), microsatellite, termed tumor necrosis factor a protein (TNFu), is located approximately 2 kbp upstream of the human lymphotoxin a (LTA) promoter region [24]. The TNFa microsatellite shows 15 different length alleles in >700 Caucasians, whereas 72 Bushmen and 139 South American Indians from seven tribes harbor only 10 and 8 alleles, respectively (Fig. 2A). The distributions of the allele frequencies vary considerably between Caucasians, Bushmen and South American Indians. Alleles TNFalO and a l l are represented significantly more often in the German population compared to Bushmen and South American Indians. Allele a11 is in linkage disequilibrium with HLA-DRB1*15, a major risk factor for MS. Furthermore, in Caucasians practically all TNFu microsatellite alleles are in linkage disequilibrium with certain HLADRBI sequences [24], a consequence of a moderate-tolow mutation rate in this simple repeat block. Microsatellite mutation frequencies are different at the several loci scored so far [25]. Extremely unstable repeat blocks involving more complex mutation patterns like minisatellites [26] or cryptic simple repeats (see below) have not been recognized close to coding genes so far. These observations could be interpreted to mean that deleterious phenotypic effects may result from mutations at intragenic repeats, which are to be prevented in evolutionarily successful species.

A few examples of perfect simple repeats in the vicinity of coding sequences may suffice to illustrate possible relationships between coding genes and tandemly reiterated repeat blocks. A certain length repeat unit may always associate with a certain allele of a gene. Thus the principle of linkage can be exploited, e.g. for indirect gene diagnostics. For example, intronic (GT), microsatellites in T cell receptor variable region elements (e.g., 4 GAArbage, GAArnish or GAArotte? TCRBV6 genes) have been shown to be moderately polymorphic [20]. Interestingly, definable groups of microsat- Simple trinucleotide repeats are the least abundant in ellite length alleles appear tightly linked to one or the the databases among the blocks containing mono- to other form of a biallelic exonic polymorphism. Thus tetranucleotide motifs (see Fig. 1A). This fact is certainly these circumstances enable the indirect analysis of the not due to ascertainment biases but a reflection of their

1580

Electrophoresrs 1991, 18, 1577-1585

C . Epplen et al.

frequencies in the genomes. The true abundance of trinucleotide blocks may even be exaggerated: During the past 3-5 years, simple triplet repeats have been traced and pursued with utmost intensity because of the revelations in the context of trinucleotide diseases [15, 161. In the causal pathogenesis of trinucleotide diseases deleterious effects of ‘dynamic mutations’ [27] have been evidenced in carriers of certain intragenic triplet repeat blocks whenever these simple repeats are expanded over precisely definable thresholds [ 161. Also with respect to the discussion points mentioned in the previous section, the following statement appears appropriate: The consequences of mutations may vary on a scale from being selectively neutral (slippage in simple repeats of the ‘genomic desert’) to most harmful whenever they exceed defined limits of trinucleotide repeats in mRNA encoding transcription units.

eral kilobases of length. Cloning and sequencing of this locus revealed, in addition to a (GAA),, block, interesting sequence features including flanking direct and inverted repeats (which potentially form hairpins) in this chicken locus. Furthermore, in the rabbit, a similarly irregularly transmitted (GAA),, locus has been followed in extensive multilocus DNA fingerprinting exercises aimed primarily at relationship analyses (E. T. Epplen et al., unpublished data). Such loci are not yet known for the human species. In general, (GAA), elements are comparatively rare in the human genome [29], especially in the vicinity of genes, but an imperfectly reiterated (GAA), repeat has been identified in the 5‘ untranslated region of the HLA-F gene [30, 311. This repeat exhibits comparatively high mutability (=lSO/o (9/600) of the meioses show mutations (Santos, E. J. M., Epplen, J. T., Guerreiro, J. E, Epplen, C., submitted) due to small scale slippage, resulting in loss and gain of one or two (GAA) units. Interestingly, most mutations occur in male meioses. So far no mutation was detected in the short allele range. Yet 90% of the alleles scored for (ir)regular transmission represented the longer (GAA), blocks. In consequence of this high mutability, linkage disequilibria cannot be demonstrated in the HLA complex using this highly informative microsatellite, which, in addition, exhibits some simple sequence crypticity. Accordingly, a rapid turnover at this locus could be expected on the population level. This effect can be extrapolated from the allele distribution patterns observed in three assorted populations: Caucasians, Bushmen and South American Indians (Fig. 2B). In contrast to Caucasians, in which 33 different length alleles were demonstrated in a cohort of 198 individuals, the smaller panels of Bushmen ( n = 148) and South American Indians (n = 234) exhibit 24 and 16 alleles, respectively. Theoretically, founder effects in individual South American Indian tribes could yield ‘sawtoothed’ distribution patterns for a limited time during population development. Indeed the distribution of the allele lengths in South American Indians is somewhat less smooth compared to that observed in the Caucasians. Due to the comparatively high mutation rate, the effect of strikingly predominating allele lengths vanishes rapidly within a few generations. Currently we are studying if these frequencies in different populations allow differentiation between two basically contrasting modes of evolution in simple repeat loci, ‘stepwise’ mutations (slippage by one unit per event) vs. ‘saltatory’ changes comprising several simple repeat units or the whole locus at a time. These small-scale effects appear selectively neutral in evolutionary terms. Still it has to be investigated if the smallest and the largest alleles have differential effects on gene expression, as e.g. the different minisatellite alleles in the insulin gene [32, 331.

In family studies from controlled breedings of chicken, DNA fingerprinting with several different multilocus probes has revealed the expected orderly inheritance of fingerprint bands from the parents to their offspring [28]. Yet using probe (GAA), or the longer complement (TCT),,, irregular transmission of bands was observed for one specific locus as well as several other multilocal bands which followed the Mendelian rules [28]. Such gross changes reflect deletions/insertions of up to sev-

Certain substantially elongated (GAA), repeats may also herald very severe consequences for their carriers. Recently the gene responsible for Friedreich ataxia, an autosomal recessively inherited neurodegenerative disease manifesting usually in later youth, has been described and termed frufuxin [34]. Subsequently it was supposed that the actual Friedreich ataxia gene (STM7) encodes a novel phosphatidylinositol-4-phosphate5-kinase [35].The yrufuxin gene’, would then represent part of

Figure 2. Allele frequencics of di- and trinucleotide microsatellites

in

the HI,A complex of Caucasians from Germany, South American (SAm.) Indians from the Amazonas region in Brazil, and Bushmen from South Africa. (A) 15 different alleles of the TMFu (GT),, block situated 5’ of the lymphotoxin a (LTA) gene promotor and (B)36 different alleles of the (GAA),, stretch located in the 5’ untranslated region of the HLA-F gene. Note the similarities in the overall frequency distributions in ethnic groups without direct recent migration possibilities (admixture).

Elecrrophoresis 1997, 18. 1577-1585

the STM7 gene. In addition to some rare point mutations in the protein coding region, in more than 98% of the cases the (GAA), repeat in intron 18 of the STM7 (intron 1 of the frataxin) gene is elongated excessively to 200 to >900 trinucleotide units. Physiologically this simple repeat is only a polymorphism when harboring from 7 to 27 (GAA) motifs in Europeans (heterozygosity rate 0.7 [36]. The exact pathophysiological consequences of the deleterious expansions have not yet been elucidated, but major influences on the genomic architecture of the locus and blocking of mRNA processing can be invoked a priori. In this context knowledge of the structural and protein binding properties of perfect (GAA), blocks is mandatory for rational explanations. In our own experience, chemical modifications reveal characteristic secondary structures of a (GAA),, tract as can be inferred from in vitro studies of genome-derived DNA fragments, with and without super-coil stress in plasmid vectors and various experimental conditions (Maueler, W., Kyas, A., Keyl, H. G., Epplen, J. T., submitted). Among these different conformation are e.g. triple helices of the H-y5 type. Not surprisingly, these spectacular structures bind several proteins at the (GAA), target sequence with various specificities ( o p . cit.). Again, excessively elongated (GAA), blocks may succumb to additional unforeseeable phenomena with respect to their structure and functionality. Methodologically interesting is the fact that secondary structures of long (GAA), tracts prevent hybridization in gels in situ (C. Epplen et al., in preparation). This unusual property of (GAA), tracts does not depend exclusively on the length of the repeat but also on the adjacent sequences.

5 Polymorphism vs. hypervariability: HLA-DRB exodintron relationships A selected subset of dinucleotide repeats appears comparatively stable over long evolutionary periods in introns or in the vicinity of vertebrate genes. In comprehensive surveys (GT), and (GA), elements have so far been considered separately in the evolution of mammals [37, 381. Both of these simple repeat motifs are known to potentially exert substantial effects on the regulation of the expression of flanking genes, at least in in vitro test systems [39-411. As mentioned above, composite (GT),(GA), blocks are by far the most abundant perfect simple repeats immediately adjacent to one another. Over the years, we have extensively studied the composite (GT),(GA), repeats situated in the second intron of the MHC class I1 DRB genes [42,43]. These are perhaps the most perpetual extremes of the evolutionarily preserved composite repeats. Therefore also their ‘flanking’ sequences, the protein encoding exons, should be considered in relation to the simple repeats. Several of the genes located in the MHC encode cell surface proteins that present a plethora of different antigens to immunocompetent T lymphocytes [44]. Two major classes of MHC-encoded cell surface proteins have been characterized in detail: Class I molecules are expressed ubiquitously while the expression of class I1 molecules is restricted to those cells actively participating in specialized functions of the immune system. The HLA complex

Simple repeats and disease

1581

has been studied intensely and the genomic map of this 4 Mbp region is most advanced [45]. In humans there are five different D R haplotype groups (DRl, DR51, DR52, DR8 and DR53; see [45]. Each DR haplotype contains one DRA gene and a varying number of DRB genes and pseudogenes [45]. All DR haplotypes harbor an allele of the highly polymorphic DRBZ locus. Within haplotypic groups, subgroups have been delineated. MHC molecules are apparently selected for their ability to present many different antigens effectively to immuno-surveying T lymphocytes [44]. Consequently, high degrees of coding polymorphisms could be expected and have been verified. The peak of the protein polymorphism is found in the second exons of the vertebrate MHC-DRB genes [45]. The respective portion of the MHC proteins makes all the steric contacts to the nominal antigen molecules in form of the ‘antigen presenting groove’. This protein domain was therefore already expected to be especially variable in order to be able to accomodate as many different antigens as possible with sufficient affinity to trigger immune responses. On the other hand, noncoding polymorphism have been characterized only to a limited extent in the HLA complex. These exceptions comprise mainly indirect (microsatellite) diagnoses of gene polymorphism which are important for functions of the immune system, e.g. cytokines, HLA-F: etc. [46]. With regard to HLA-DRBI genes an earlier study has revealed that the exceptional polymorphism of exon 2 relates to the hypervariability of a neighboring composite microsatellite locus [42]. This microsatellite in the MHC-DRBI genes is all the more interesting since not only the e x o n h t r o n architecture is exactly preserved among all vertebrates investigated but also the basic structure of the (GT),(GA), simple repeat [431. Among >520 HLA-DRB1 alleles analyzed, >100 different types of microsatellites were observed [46]. The perfect (GT), and (GA), blocks vary in length and may be ‘degenerated‘ in part, mostly in a subgroup-specific manner. Interestingly, the extent of microsatellite diversity varies in given DRBl alleles. While the microsatellites of the DRZ DR9 alleles and in the DRl group are virtually invariant, in DR4 and DR13 in particular, simple repeats appear hypervariable with at least 15 or 17 different length alleles, respectively. Comparing Caucasians, Bushmen, and South American Indians, the microsatellite variation in identical DRBl alleles (e.g. DRB1*0102, 03011, 1302) is smaller than within any of the DR groups in Caucasians. Taken together, extremely polymorphic DRBl exons evolve in concert with certain variants of an exceptionally well-preserved microsatellite. Nuclear proteins bind differentially to this composite repeat [47,48]. In addition to being useful for some special DRB typing purposes, the ontology and evolutionary development of this composite repeat block argues for its own biological meaning, i.e. with respect to directinghegulating genomic recombination. These conclusions are supported by the observation that in HLA-DRBV genes the perfect (GT),(GA), organization is lost concomitantly when the gene function has subsided (see EMBL databank entries). Obviously these simple repeats then develop into crypticity, i.e., the repetitive nature of the sequence can still be recognized, albeit lengths and perfection are gradually lost. It is not yet clear whether the

1582

C. Epplen et al.

Electrophoresis 1997, IS, 1577-1585

rules established for one such locus are valid for the repeat developments (higher-order organization) in other locations containing different repeats and different flanking sequences.

below, including their polymorphisms. In order to cover as many genetic polymorphisms of these genes as possible, we have developed an efficient, indirect, microsatellite typing approach which has been termed ‘immunoprinting’ [21].

6 Immunoprinting predisposition genes for MS, a multifactorial disease

Genome-wide screenings using microsatellites for chromosomal regions that predispose for MS have been performed recently, but these whole-scale searches have met with a number of theoretical and practical difficulties in the three separate studies reported to date [49-511. A parallel candidate gene approach was targeted on the genomic regions syntenic to those of a rodent animal model [52]. For a number of different reasons, MS models in inbred mice and rats were demonstrated to lack informativity for the actual situation in man as has been anticipated. Because of the considerable amount of experimental risks involved in the aforementioned strategies a priori we embarked on a different strategy and employed microsatellites in and around candidate genes to reveal genetic predisposition factors to MS.Thus ‘immunoprinting’ [21] was applied to a panel of more than 600 MS patients and the respective number of controls. Increasingly larger panels have to be screened for many different genetic markers in order to arrive at conclusions that stand the necessary statistical tests.

The future challenge in DNA diagnostics concerns a number of disorders that are frequent and complex-inherited, i.e. multifactorial diseases. Like RA (prevalence 1O/o), MS represents such an enigmatic, not infrequent autoimmune indisposition in peoples of the Northern hemisphere (prevalence 0.1 Yo). The genetic contribution to such diseases is variable and the nongenetic influences are practically not known yet. Genetic components in RA and MS are evident from increased relative risks in siblings (As 6-10 and 20-40), respectively, compared to the general population. Theoretically, a single or quite a few major gene effects or, alternatively, many minor participating genes are possible to contribute to the manifestation of the disease. The nature of the ‘genetic defect’ may range from an apparently physiological polymorphism, with no negative influence by itself, to one that is only manifested in combination with one or more other polymorphism that obviously affect the function(s) of the gene in question. The causal pathogenesis of MS itself, i.e., the ill-directed immune response to potentially different self antigens, is not yet well understood. On the other hand, many different instances of the human immune system appear to be involved in the early phases of the disease as well as its maintenance, including the variably chronic course and its possible acute exacerbations. A simplified view of these functional relationships and the involved molecules is presented in Fig. 3. The genes encoding relevant cell surface molecules, interleukins, receptors, etc., are dealt with

The components depicted in the sketch of Fig. 3 have been included in the ‘immunoprinting’ scheme employed here. Clearly, the HLA-DRB1 genes either contribute directly to the risk of developing MS or they are tightly linked to the predisposing factors. While DRBl*15 individuals run a 3.7-fold risk of MS (p, < 10-’) compared to the average, the increased risk of DRB1*03 individuals is hardly recognizable (1.4-fold) and the latter is not significant after correction for multiple comparisons (p, < 0.8)[24]. For DRBl*03 individuals, the relative risk

Figure 3. ‘Causal pathogenesis’ of MS - a simplified view. Following unknown signals lymphocytes cross the barrier into the brain tissue. The myelin sheeths of the axons are attacked and destroyed by macrophages after activation by T lymphocytes which have recognized their nominal autoantigens (MBP/PLP/MOG: myelin basic protein/proteo lipid protein/myelin-associated oligodendrocyta glycoprotein) specifically. Various soluble factors (FGF, IFNB, IFNG, 1L1, IL2, TNF, etc.) are secreted to activate mechanisms that result in chronic demyelination without sufficient remyelination.

Elecrrophoresis 1991, 18, 1577-1585

Simple repeats and disease

1583

(RR) to develop MS increases >22-fold when they carry a certain allele of the TCRBV6S3 gene [24]. On the other hand, DRBI*l5 patients are characterized by a linkage disequilibrium pattern of TCRBV6Sl to TCRBV6S3 that differs substantially from the one observed in healthy controls [24]. In addition to these molecules immediately involved in antigen recognition, polymorphisms of lymphokines may predispose to MS: A promotor polymorphisms in the TNFA gene leads to higher constitutive expression. It is overrepresented in the HLADRBl*03 individuals, both in patients and controls [24]. Furthermore, a novel DRBIILTA haplotype correlates with disease progression (Epplen, C., submitted). Similarly, a certain interferon 0 polymorphisms may confer protection from MS (Epplen, C., submitted). Consequently, it is already obvious that the genetic contributions to MS manifestation are particularly complex as expected from the beginning of the study. In addition, many more genetic intricacies may be revealed when the course of the disease is also taken into consideration. The massive work load involved in these predisposition studies of multifactorial diseases necessitates a high degree of experimental automation. The required effectiveness can only be reached by employing whole batteries of closely linked microsatellites in all the interesting candidate loci to optimize technical simplification.

MS. The MHC molecule presents self or foreign peptides to immunocompetent T cells, which recognize not only the antigen by itself but also require specific contact with the host’s own MHC molecule (MHC restriction). Hence certain autoantigens may be presented in an immunogenic form by only a single HLA allele or several HLA alleles, which share regions critical for peptide binding or TCR/MHC contact residues. Such critical regions (amino acid residues 67-86) have been defined for disease-predisposing HLA-DRBI alleles in RA (DRB1*0401, 0404, 0408, 0101, 1302) a fact which led to the ‘shared epitope’ hypothesis and which suggested presentation of a common autoantigen. In contrast, disease association with the DRB1*08 and DRBl*ll alleles in EOPA as well as DRB1*15 and DRB1*03 in MS cannot be explained by shared epitopes. Distinct disease entities or associations due to linked genes are more likely in these autoimmune diseases. Compounds of the myelin sheeth such as myelin basic protein or proteolipid protein may be presented by distinct HLA alleles and they may be immunogenic only for certain TCR alleles. Thus the pathological condition depends probably on certain combinations of HLA and TCR alleles in different but also in the same autoimmune disease. Increased RR for HLA and TCR allelic combinations in RA, EOPA, and MS are depicted in Fig. 4.

7 HLA and disease association: functional aspects vs. linkage disequilibrium

Candidate genes in linkage disequilibrium with certain DRBl alleles are proinflammatory cytokines such as TNFa and LTa. Alleles of a microsatellite (TNFu) located 2 kbp 5’ of the LTa gene were found in strong linkage disequilibrium with DRBl alleles and a TNFa promoter

HLA/disease associations have been described for a variety of autoimmune disorders like RA, EOPA, and

HLA and disease association 1

f

RRz1.4

RR=2.7

RA

MS

MS RR= ?

RR=3.7

RR=

f RR=?

RR= 4

?

EOPA

*

RR=22

RR=7.1

R R = .l

tl

P

0

RR=?

RR=?

i

RR=4

RR=?

L

d

I

MS RR- 1.1

RR= 1.4 HLA-DRBI *03

RR = 0.56

RA

MS RR=2.1

RR=3.7

TNFn I1

RR=1.7

EOPA

RR=4 HLA-DRBI*M

RR=1.3

-

1

tl RR=?

EOPA

RR= 1.8

RR=2.1 TNFn 6

RR=7.1 HLA-DRBI *08

RR= 1.6

EOPA RR=2.2

RR=4

TNFn 5-7

RR = 12.8

1584

C. Epplen er al.

allele linked to the DRB1*0301 haplotype [24]. Thus the question arose whether increased RR, attributed to certain TNFa microsatellite alleles, may be primary, secondary or additional to the HLA association. In MS, disease association to the TNFall allele is secondary to the DRBl*lS association and no increased RR was observed for DRBZ*03 patients carrying the TNFa2 allele. Nevertheless, this haplotype bears in addition a promoter polymorphism of the TNFA gene, leading to higher constitutive expression of TNFa, which might influence disease outcome [24]. In RA the same microsatellite allele, which is not linked to the DRB1*04 allele, increases the RR to acquire RA. In contrast, in EOPA patients the TNFa5-7 microsatellite alleles provide an additional risk factor in DRB1*11/12 individuals (Fig. 4).

8 Concluding remarks The mutation rates in simple repeat blocks vary by up to more than five orders of magnitude (10°/10-6). Immediate implications concern the varying information contents of different microsatellite loci for the surrounding genomic territories. Massive indirect, albeit imperishable information can be retrieved on genetic conditions via linkage/ association of genes to these still partly anonymous markers. Besides these application oriented aspects of microsatellite markers, phenotypic effects of different allelic forms of selected simple repeats are already evident as of today, both in model organisms as well as in humans. This fact may have considerable implications, e.g. for the development and course of the frequent multifactorial diseases. Therefore, before the full potentials of these tools are exploited, some consensus should be reached to avoid generating unnecessary knowledge about the genetic status of individual human beings without their informed consent. We thank our departmental members for discussions of the manuscript. This work was supported by stipends (to EJMS and CE) from the DAAD and from the state of NorthRhine Westphalia, respectively, QS well as a Humboldt/FRD research award (South Africa, to JTE).

9 References [l] Skinner, D., Biuscience 1077, 27, 79k796. [2] Tautz, D., in: Pena, S. D. J., Chakraborty, R., Epplen, J. T., Jeffreys, A. J., (Eds.), DNA fingerprinting: State of the Science, Birkhauser, Basel, pp. 21-28. 131 Warren, R., Hauman, J., Beyers, N., Richardson, M., Schaaf, H. S., Donald, P., van Helden, P., S.A. Med. J. 1996, 86, 45-49. [4] Lilt, M., Luty, J. A., Am. J. Hum. Genet. 1989, 44, 397-401. [5] Tautz, D., Nucleic Acids Res. 1989, 17, 6463-6471. [6] Weber, J. L., May, P. E., Am. J. Hum. Genet. 1989, 44, 388-396. [7] Levinson, G., Gutman, G. A., Mol. B i d . Evol. 1987, 4, 203-221. [8] Dietrich, W. F., Miller, J., Stehen, R., Merchant, M. A., DamronBoles, D., Husain, Z., Dredge, R., Daly, M. J., Ingalis, K. A., O'Connor, T. J., Evans, C. A., DeAngelis, M. M., Levinson, D. M., Kruglyak, L., Goodman, N., Copeland, N. G., Jenkins, N. A., Hawkins, T. L., Stein, L., Page, D. C., Lander, E. H., Nature 1996, 380, 149-152. [9] Dib, C., FaurC, S., Fizames, C., Samson, D., Drouot, N., Vignal, A., Millasseau, P., Marc, S., Hazan, J., Seboun, E., Lathrop, M., Gyapay, G., Morissette, J., Weissenbach, J., Nature 1996, 380, 152-154.

Electrophoresis 1997, 18, 1577-1585

[lO] Epplen, J. T., Buitkamp, J., Bocker, T., Epplen, C., Gene 1995, 159, 49-55. [ l l ] Ali, S., Muller, C. R., Epplen, J. T., Hum.Genet. 1986, 74,239-243. [12] Epplen, C., Melmer, G., Siedlaczck, I., Schwaiger, F.-W., Maueler, W., Epplen, J. T., in: Pena, S. D. J., Chakraborty, R., Epplen, J. T., Jeffreys, A. J., (Eds.), DNA Fingerprinting: State of the Science, Birkhauser, Basel 1993, pp. 29-45. [13] Epplen, C., Epplen, J. T., Hum. Genet. 1994, 93, 35-41. [14] Ohno, S., in: Pfeiffer, R. A. (Ed.), Modern Aspects of Cytogenetics: Constitutive Heterochromatin in Man, Schattauer, Stuttgart 1972, pp. 169-180. [15] Ashley, C. T., Warren, S . T., Annu. Rev. Genet. 1995, 29, 703-728. [16] Warren, S. T., Science 1996, 271, 1375-1375. [17] Bird, A. P., Trends Genet. 1995, 11, 94-100. [18] Epplen, J. T., J. Hered. 1988, 79, 409-417. [19] Hancock, J. M., Nature Genet. 1996, 14, 14-15. [20] Gomolka, M., Epplen,C., Buitkamp, J., Epplen, J. T., Immunogenet. 1993, 37, 257-265. [21] Epplen, J. T., Hum. Genet. 1992, 331-341. [22] Gomolka, M., Menninger, H., Saal, J. E., Lemmel, E.-M., Epplen, J. T., Epplen, C., J. Mol. Med. 1995, 73, 19-29. [23] Epplen, C., Rumpf, H., Albert, E., Haas, P.. Truckenbrodt, H., Epplen, J. T., Eur. J. Zmmunogenet. 1995, 22, 311-322. [24] Epplen, C., Jackel, S., Santos, E. J. M., D'Souza, M., Poehlau, D., Dotzauer, B., Sindern, E., Haupts, M., Rude, K.-P., Weber, F., Stover, J., Poser, S., Gehlen, W., Malin, J.-P., Przuntek, H., Epplen, J. T., Ann. Neural. 1997, 41, 341-352. [25] Weber, J. L., Wong, C., Hum. Mol. Genet. 1993, 8, 1123-1128. [26] Dubrova, Y. E., Nesterov, V. N., Krouchinsky, N . G., Ostapenko, V. A., Neumann, R., Neil, D. L., Jeffreys, A. J., Nature 1996, 380, 683-686. [27] Sutherland, G. R., Richards, R. I., Proc. Natl. Acad. Sci. USA 1995, 92, 3636-3641. [28] Epplen, J. T., Ammer, H., Kammerbauer, C., Schwaiger, W., Schmid, M., Nanda, I., Adv. Mol. Genet. 1991, 4, 301-310. [29] Siedlaczck, I., Epplen, C., RieB, O., Epplen, J. T., Electrophoresis 1993, 14, 973-977. [30] Geraghty, D. E., Wie, X., Orr, H. T., Koller, B. H., J. Exp. Med. 1990, 171, 1-18. [31] Raha-Chowdhury, R., Bown, D. J., Worwood, M., Hum. Genet. 1996, 97, 228-231. [32] Vafiadis, P., Bennett, S. T., Todd, J. A., Nadeau, J., Grabs, R., Goodyer, C. G., Wickramasinghe, S., Colle, E., Polychronakos, C., Nature Genet. 1991, 15, 289-292. [33] Pugliese, A., Zeller, M., Fernandez, A., Jr., Zalcberg, L. J., Bartlett, R. J., Ricordi, C., Pietropaolo, M., Eisenbarth, G. S., Bennett, S. T., Patel, D. D., Nature Genet. 1997, 15, 293-297. [34] Campuzano, V., Montermini, L., Molto, M. D., Pianese, L., Cossee, M., Cavalcanti, F., Monros, E., Rodius, E, Duclos, F., Monticelli, A., Zara, F., Canizares, J., Koutnikowa, H., Bidichandani, S. I., Gellera, C., Brice, A., Trouillas, P., De Michele, G., Filla, A., De Frutos, R., Palau, F., Patel, P. I., Di Donato, S., Mandel, J.-L., Cocozza, S., Koenig, M., Pandolfo, M., Science 1996, 271, 1423-1427. 1351 Carvajal, J. J., Pook, M.A., dos Santos, M., Doudney, K., Hillermann, R., Minogue, S., Williamson, R., Husuan, J. J., Chamberlain, s., Nature Genet. 1996, 14, 157-162. [36] Epplen, C., Epplen, J. T., Frank, G., Miterski, B., Santos, E. J. M., Schols, L., Hum. Genet. 1997, 99, 834-836. [37] Stallings, R. L., Ford, A. F., Nelson, D., Torney, D. C., Hildebrand, C. E., Moyzis, R. K., Genomics 1991, 10, 807-815. [38] Stallings, R. L., Genomics 1995, 25, 107-113. [39] Hamada, H., Seidman, M., Howard, B. H., Gorman, C. M., MO/. Cell. Biol. 1984, 4, 2622-2630. [40] Lu, Q., Wallrath, L. L., Granok, H., Elgin, S. C., Mol. Cell. Biol. 1993, 13, 2802-2814. [41] Tsukiyama, T., Becker, P. B., Wu, C., Nature 1994, 367, 525-532. [421 RieB, O., Kammerbauer, C., Roewer, L., Steimle, V., Andreas, A., Albert, E., Nagai, T., Epplen, J. T., Immunogenet. 1990, 32, 110-116. [431 Schwaiger, F.-W., Epplen, J. T., Immunol. Rev. 1995, 143, 199-224. [44] Klein, J., Natural History of the Major Histocompatibility Complex, Wiley, New York. [45] Svensson, A.-C., Setterblad, N., Pihlgren, U., Rask, L., Andersson, G., Immunogenet. 1996, 43, 304-314.

Elecrrophoresis 1997, 18, 1577-1585

[46] Epplen, C., Santos, E. J. M., Guerreiro, J. E, Helden, P. V., Epplen, J . T., Hum. Genet. 1997, 99, 399-406. [47] Maueler, W., Frank, G., Muller, M., Epplen, J. T., J. Cell. Biochem. 1994, 56, 74-85. [48] Epplen, I. T., Kyas, A., Maueler, W., FEBS Left. 1996, 389, 92-95. [49] Sawcer, S., Jones, H. B., Feakes, R., Gray, J., Smaldon, N., Chataway, J., Robertson, N., Clayton, D., Goodfellow, P. N., Compston, A., Nature Genet. 1996, 13, 464-468. [50] The Multiple Sclerosis Genetics Group, Nature Genet. 1996, 13, 469-47 1.

Simple repeats and disease

1585

1511 Ebers, G. C., Kukay, K., Bulman, D. E., Sadovnick, A. D., Rice, G., Anderson, C.,Armstrong, H., Cousin, K., Bell, R. B., Hader, W., Paty, D. W., Hashimoto, S., Oger, J., Duquette, P., Warren, S., Gray, T., O’Connor, P., Nath, A., Auty, A., Metz, L., Francis, G., Paulseth, J. E., Murray, T. J., Prys-Philipps, W., Nelson, R., Freedman, M., Brunet, D., Bouchard, J.-P., Hinds, D., Risch, N., Nature Genet. 1996, 13, 472-476. [52] Kuokkanen, S . , Sundvall, M., Tenvilliger, J . D., Tienari, P. J., Wikstrom, J., Holmdahl, R., Petterson, U,, Peltonen, L., Nature Genet. 1996, 13, 477-480.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.