DNA sequence comparison of micropia transposable elements fromDrosophila hydei andDrosophila melanogaster
Descripción
Chromosoma (Berl) (1990) 99 : 11l-117
CHROMOSOMA © Springer-Verlag1990
DNA sequence comparison of micropia transposable elements from Drosophila hydei and Drosophila melanogaster Dirk-Henner Lankenau, Peter Huijser*, Erik Jansen, Koos Miedcma, and Wolfgang Hennig Department of Molecular and Developmental Genetics, Catholic University, Toernooiveld, NL-6525 ED Nijmegen, The Netherlands ReCeived October 1, 1989 / in revised form January 19, 1990 Accepted January 19, 1990 by H. J/ickle
Abstract. Members of the retrotransposon family micropia were discovered as constituents of wild-type Y chrom o s o m a l fertility genes from Drosophila hydei. Several members of the micropia family have subsequently been recovered from Drosophila melanogaster and four micropia elements, micropia-DhMiF2, -DhMiF8, - D m l 1 and -Dm2, two each from D. hydei and D. melanogaster, have been totally sequenced ( 1 7 k b of micropia sequences and 6.8 kb f r o m insertions) 1. Comparative analysis of micropia sequences revealed a complex pattern of divergence within a single Drosophila genome. The divergence includes deletions, possibly by a slipped mispairing mechanism, insertions of a retroposon, and of another retrotransposon (copia) and "positional nucleotide shuffling" within the tandem repeats of the 3' non-protein-coding region of micropia elements. A 10 bp long sequence of each repeat unit of the 3' tandem repeats of micropia elements is highly conserved and is therefore a candidate of functional importance either in transposition events or in regulatory activity on flanking D N A sequences.
Introduction The putative biological functions of transposable elements relevant in the regulation of cellular genes have already been considered by McClintock (1956). Whether transposons became regulatory modules and pacemakers o f wild-type genes in the course of evolution has
* Present address: Max-Planck-Institut fiir Zfichtungsforschung, Egelspfad 3, D-5000 K61n 30, Federal Republic of Germany Abbreviations: LTR long terminal repeat; PBS primer binding site; PolII RNA polymerase II; bp base pairs; kb kilobases (pairs); LINE long interspersed sequence; MHC major histocompatibility complex ~The DNA sequence of micropia-Dm2 has not been published (EMBL sequence data library accession no. X14173). The other sequences have been published by Huijser et al, (1988) and Lankenau et al. (1988) (accession no. X14037) This paper is dedicated to the 90th birthday of Prof. Dr. Bernhard Rensch. OJfprint requests to: W. Hennig
not yet been confirmed. It has been pointed out that none of the genuine transposable elements analysed so far is an integral part of wild-type genes (Schwarz-Sommer and Saedler 1987). Recently however retrotransposons of the micropia family have been identified as natural constituents of fertility genes of Drosphila hydei (Huijser et al. 1988). Micropia element transcripts are part of the giant Y c h r o m o s o m a l transcription units o f the lampbrush loops " T h r e a d s " and "Pseudonucleolus" in p r i m a r y spermatocytes. To a p p r o a c h an understanding of the function of micropia elements within these wild-type fertility genes we c o m p a r e d the evolutionary changes of two micropia elements microdissected from the lampbrush loops Threads of D. hydei (Hennig et al. 1983; Huijser etal. 1988) and of two randomly chosen micropia elements f r o m Drosophila melanogaster (Lankenau et al. 1988; D.-H. Lankenau, unpublished results) by sequence comparison. Under certain conditions, as outlined in this paper, the evolutionary differences and similarities can serve as experiments of nature to identify putative functional sequences, because sequence conservation is expected. O f particular interest is the 3' tandem repeat region. We identified sequences of 10 bp within each repeat unit of all micropia elements which are highly conserved. They are therefore candidates of functional importance either in transposition events or in regulatory activity on flanking D N A sequences. Additionally molecular drive events which have taken place in members of the micropia family are described.
Materials and methods Molecular techniques. Isolation of nucleic acids was carried out according to standard protocols (Maniatis et al. 1982). DNA blotting, labelling by nick translation and hybridization are described by Hennig et al. (1982) and Huijser and Hennig (1987). DNA sequencing was performed by the dideoxy chain termination method of Sanger et al. (1977) as described (Lankenau et al. 1988). Computer analysis. The analysis of DNA sequences was performed with the aid of computer programs from Pustell and Kafatos (1984) and a Turbo-Pascal program package from C.R. Lankenau. Dot matrixes were computed as described (Pustell and Kafatos 1982).
112 Codon bias and coding prediction analysis was done with the Cstatistics program of Pustell and Kafatos (1986). DNA sequence data were taken from GenBank release 52.0. RNY-rhythm analysis was done according to Shepherd (/981). CAP site-TATA box correlation analysis was used to produce constraint profiles of RNA polymerase II (PolII) promoters as described (Lankenau et al. 1988, 1989). The retroposon inserted into micropia-Dm2 was screened on the EMBL sequence data library with the aid of the search program FASTN (Lipman and Pearson 1985).
p r o p e r t i e s o f a n i n t a c t r e t r o t r a n s p o s o n ( L a n k e n a u et al. 1988). H o w e v e r , this e l e m e n t also has two defects w h i c h m i g h t d e s t r o y its a b i l i t y to t r a n s p o s e a u t o n o m o u s l y . T h e first defect is a d e l e t i o n o f 30 n u c l e o t i d e s in the 5' L T R , d e s t r o y i n g the C C A A T box. This deletion, like m o s t others in different m i c r o p i a elements, is f l a n k e d b y s h o r t direct r e p e a t s (Fig. 2). T h e s e c o n d defect is a 4 b p deletion w i t h i n the i n t e g r a s e c o d i n g region, causing a r e a d i n g f r a m e shift t h a t d o e s n o t exist in the o t h e r m i c r o p i a elements (Fig. 1) ( L a n k e n a u et al. 1988). M a n y o t h e r r e a d i n g f r a m e defects are b a s e d on single n u c l e o t i d e m u t a t i o n s in m i c r o p i a - D h M i F 2 , - D h M i F 8 a n d - D i n 2 (Huijser et al. 1988). In the 3' n o n - p r o t e i n - c o d i n g r e g i o n a n d also in the c o d i n g sequences o f the elements we find deletions w h i c h
Results
An overview o f rearrangements in micropia elements Minor modifications. A m o n g the f o u r s e q u e n c e d m i c r o p i a elements (Fig. 1) m i c r o p i a - D m 1 1 shares m o s t o f the
Orosophila melanogaster Leu tRNA .~1/pbs il MHC
~TR IIlmicropia,Dm11/5 = TACA
H [ f L'PROT I
~-
RT
tandem
H RNase
I t I
I
INT
TACA 3 I
I IIII11
3'pbs
ORFs 51 LTR AGCAA ,......~ *7
HE ii
\
3I LTR ~=~----~ ~AGCAA
~
insertion of total oopia element
L T E 5 vector ~ 1 i
l micropia Dm2]
E i ~
R
/ 5'pbs Leu tRNA
~ MHC
~
r~trop .... I f I PRO]" I LI I
RT
H RNase
I t I
E I
INT
tandem ~ l illll
H~LTR nsert on 3'pbs
ORFs
Drosophila hydei E LT".~
Imicropia DhMi21 5' vectorl |]
f 5pbs Leo tRNA
H
MHC
f I PROT J
I
RT
I t I
/
I
INT
,
!~
JJJl[J Jill
LTR
3'
E vector 3'pbs
H
,
l micropia DhMi81
tandems
E
' RNase
5J
-
entiedi ....
vector i RT I t IRNasel
E
i
tandem alignment . . . LIrt . gap ~ puz.
[lllll . . . .
r~--T~;1~
1 3'prr
insertion of retroposon?
3~
T T X T X T T AAT T T T T T T T X X T
lkb lOObp
Fig. 1. Macro-alignment of four micropia elements from Drosophila hydei and Drosophila melanogaster. All positions are exactly defined at the DNA sequence level. A detailed structural analysis of one element, micropia-Dml 1, has been given by Lankenau et al. (1988). The most conserved part of all elements includes the protease (PROT) and parts of the reverse transcriptase (RT). The 3' non-protein-coding region, including the 3' tandem repeats (tandem; cf Fig. 5) is highly diverged (compare Fig. 4). The long terminal repeats (LTRs) of the D. melanogaster micropia-Dml 1 and -Din2 elements are homologous and the LTRs of the D. hydei micropia-DhMiF2 and -DhMiF8 elements are at least partially homologous. In contrast, the LTRs of D. hydei and D. melanogaster share no sequence similarity, LTRs of micropia-Dmll and -DhMiF8 both possess internal tandemly duplicated sequences (arrowheads) which are not homologous between the two species. Most obvious rearrangements of these elements are (1) a short deletion within the 5' LTR of micropia-Dmll, (2) the insertion of a complete copia element and a retroposon into micropia-Dm2, (3) an abrupt ending at the 3' end of micropia-Dm2 perhaps caused
by another insertion of an unidentified transposable element, (4) a large unidentified insertion unprecisely flanked by 19 bp duplications in micropia-DhMiF8 (Huijser et al. 1988), and (5) a shorter 3' end of micropia-DhMiF8 followed by a poly(T/A) tail which might belong to another unidentified retroposon or DhMiF8 itself. The reading frame shifts shown for D. melanogaster are explained in the text. An alignment gap was introduced because of a shorter 3' non-protein-coding sequence region in micropia-DhMiF8 (no deletion can be observed). Target site duplications are indicated for the copia element and micropia-Dmll; The MHC region is only similar to the maj or histocompatibility complex genes of mammals, and is most likely not homologous (Lankenau et al. /988). put putative; ORF open reading fi'ames of micropia-Dmll and -Dm2; pbs primer binding site; prr purine rich region; f CCHCfinger motif of retroelements; t tether; RNase homology to bacterial RNase H ; INT integrase; E EcoRI ; H HindII1 ; wavy lines show the sequenced parts of copia inserted into micropia-Dm2; E and H in copia are mapped restriction sites which are consistent with the published sequence of copia (Emori et al., 1985)
113 DmlI-5'LTR
218 C G G G A T T T T G C A A A A A C G A
CTTGCGCTG
Dmll-3'LTR
5169 C G G G A T T T T G C A A A A A C G A
.... 25 .... G G C C A C T T G C G C T G
Dmll
2251
Mi2
2018 G A C G G T C A A .... 27 .... ~ T C A A A A A T
GACGGC
may be based on a "slipped mispairing" mechanism (Figs. 2 and 4) as has been suggested to account for some deletions in human globin genes (Efstratiadis et al. 1980). Statistically it is unlikely that short direct repeats (4 8 bp) always occur in the flanking sequences close to the deleted fragments just by chance. Therefore one might suggest that the direct repeats promoted deletions perhaps by slipped mispairing during DNA replication according to the model of Efstratiadis etal. (1980) (Fig. 2).
CTCAAAAAT
Dmll
3691 T A T C T G T T A C C T T A A .... 14 .... AGCTGTGTTAAT
Mi2
3481 CACCTGT-ACCAC
Dmll Mi2
4623 GACGATGA GTTTGGATTGAA ******** * * ** * 4313 G A C G A T G A T T G T T T ..... 82 ..... ATCATGTTTTAT
DN2
4281 AAAT
Hi2
4201
A-CTGTATCAGG
slippage 1 in Fig.4
ATAGAA
AAATGGTCTGATAGAA
Dm2
3576 A G A T G C C G A A A G ..... 15 ..... GAAATCTTC
Mi2
3505 CCAT
Dm2
2106 T G T G C C A G A
Hi2
2008 T G T G A C C C C G G A C ..... 29 ..... CGGACTCAAAAA
AAAG
TCTTC
CGGCCTCAAAAA
Ni8
2983
tandem-i ********
ATTGAGTTT *** *****
Ni2
4223
tandem-i
ATTG ........ 212 + T2 ......... A T T A A G T T T
Mi8
3059 A T T G A G T T T T G A A T - T
Mi2
4638 A T T A A G T - T T G A A A - T ..... 50 .... T G T - C A A G G T C A - G G A - T
Dmll
4871
slippage 4 in Fig.4
GT-AAATTTGCCATA-T slippage 3 in Fig.4
G A A T A G T G A T G A A A G T ..... 50 ..... G T G A A A T G T C A - - G A A T
Fig. 2. The major deletions in micropia elements. Ten major deletions of 14 to 300 bp were detected. Eight of these are flanked by short direct repeats of between 4 and 8 bp (two of them are flanked only by 2 bp direct repeats, and are therefore not significant). The distance of repeats to the site of deletion does not exceed 12 nucleotides while the chance of such a duplication is only 4 -n (where n is the length of one repeat unit). Since eight of the ten deletions possess direct repeats with a length of 4 bp (five times), 5 bp (twice) and 6 bp (once) which statistically would be expected once in 256, 1024 and 4096 bp, a significant correlation seems to exist between deletions and short direct repeats. This seems even more to hold true, when it is taken into account that the accuracy of the alignments is reduced by evolutionary divergence. Seven of the ten deletions remove one of the repeats entirely and either none or part of the other repeat. This pattern is very similar to that observed by Farabaugh and Miller (1978), and Efstratiadis et al. (1980). These authors point out that the presence of direct repeats could promote deletions by slipped mispairing during D N A replication according to a model proposed by Streisinger et al. (1966). Compare also with slippages shown in Figure 4. The deletion of D m l l - 5 ' L T R has been confirmed by sequencing several different clones originating from different transformations. Mi2 and Mi8 micropia-DhMiF2 and -DhMiF8 respectively; Droll and Din2 micropia-Dml 1 and -Dm2; numbers within dotted lines' indicate the number of nucleotides not shown. T2 tandem repeat cluster 2 of micropia-DhMiF2. The numbers in front of each sequence represent the published sequence numbers. For micropia-Dm2 the numbers refer to the element without the copia insertion
Large insertions: retroposon, copia, and "unidentified" insertion. The frame shift within the protease region of micropia-Dm2 is caused by an insertion 90 nucleotides in length (Fig. 3) flanked by an 8 or 6 nucleotide target site duplication. The inserted sequence possesses two open reading frames extending to the end of the insertion and two polyadenylation signals. Since the insertion carries a poly(A) tail sequence at the 3' end as typically added in a polyadenylation reaction to RNA transcripts, all characteristics fit different insertion models for retroposons, like the "in situ cDNA synthesis" model (Rogers 1985) or the mechanism proposed for Alu sequences by Jagadeeswaran et al. (1981). Another large insertion into the RNase/integrase region of micropia-DhMiF8 has been described by Huijser et al. (1988) (Fig. 1). The exact ends of this insertion cannot be identified even though it is characterized by a duplication of 10 bp. It might cary a DNA sequence derived from a prior site of insertion as does the jockey element near the yellow gene (Geyer et al. 1988). One of the 10 bp duplications created as a consequence of the insertion of jockey is not immediately adjacent to its insertion site but at a distance of 25 bp. The DNA between the duplication and the jockey element seems to originate from the chromosomal location where jockey was located before the transposition event. The largest insertion found in micropia occurred within the region similar to MHC of micropia-Dm2. Here, a complete copia element is integrated with the target site duplication 5'-AGCAA-copia-AGCAA-3'. We sequenced parts (1.4 kb) of the copia element and found no differences at positions 1-70, 2768-3220, 41464650 and 4768-5143 from the published sequence (Emori et al. 1985). In the regions that had not been sequenced target site
Dmll
AA T T G T T C ** * * * * * *
Din__2_2 AA
TTGTTC
TTTTATATGTTAATTGCGCTGTTATGTTACTGTTACTGCATTGTATTGATTCATCGC
1__) Dmll Dm2
poly
ORFs
1.3
A
TTCTAAATAAATAAATATATAAAAAAAAAAAA
ORFX+S
TTGTTC
CGTTAC ******
3'
CGTTAC
3'
slt* duplication
Fig. 3. Retroposon insertion into micropia-Dm2. Two putative polyadenylation signals are located unusually close to the poly(A) tail. No significant homologies were detected in the EMBL D N A sequence library, searching 22 x 106 nucleotides, poly A polyadenylation signal; O R F open reading frame
114
we mapped the EcoRI, HindIII and XmnI sites at the expected positions. Thus it seems that this copia element is not modified at all and may have been actively transposed more recently into micropia-Dm2 (Fig. 1).
Identification of conserved sequences in the 3' non-proteincoding region including the 3' tandem repeats of micropia elements The pattern of divergence in micropia 3'non-protein-coding region in D. hydei and D. melanogaster. Micropia elements in both D. melanogaster and D. hydei possess a non-protein-coding region between the 3' end of the large open reading frame and the 3' LTR. In micropiasl|pplael Mi8
tandem-i
...........................................................
Mi2
tandem-i
ATTG_...__TTTCTTTTTGAATGAAATTTGGAAGTTTAGTTAAAGAAAATGTAAAATCGACAA
Dmll
tandem-i
.......................................................... sllpplge4
4377
fllpp|gel Mi8
.....................................................................
Mi2
TTTGGGCAAAATATTATGTAATAAAACAAGCATCAT'GTTTTATTTTTGAAACrTGCATAGGTGAAGTTA * * ** * ** ** * * *
4446
Dmll
...............................
*****
*
GTTTGGATTGAATTAATAATCAAGTGTGTGTGAACTGG 4668
Mi8
.....................................................................
Mi2
TTGAA--TTGAATTGAAAGAAATATGTTTTCAAATGTTTTAATTAAGAATAAATGTTAAAAGTTTGTA*** . * *~ . ****** ~**** * ***** **** * *
4513
Dmll
CGGAAGATCG-ATATATAGAAAT
Mi8
....................................................................
,
***
.... CGATAAATGATAATGTTAAG-ATAAGTTGTGAGCTGATGTAT 4731
Mi2
TGAAGAAATGTTGAACTGAATAT . ** . ***********
Dmll
TACTGATCAATGGAACTGAATATGAAATAGAATAAGTTATCCCAGCAACAGTGAAATAAGAGCTGTTT
: ..............
tandem-2
....................
......
4800
s11ppswe~
Mi8
:
?
....................................................................
Mi2
..............................
Dmll
TGTTTCTTCACAGAATTAAGATTTAAGAAATACACCTGATAAAGTCAAACTAATGAAATTAAATGTTAT
tandem
2 .............................. 4869
Illppngo4 .... slipped
mlspelrlng
Mi8
--ATTGAGTTTTGAAT-T
Mi2
--ATTAAGTTT-GAAA-TCTATTAC~AAGACATTTTTAAAGTTAATGTTTGGCATATTACA* ** * **** * * * ** ** ** * * *
***
Dmll
*****
***
....................
*
4694 *
*****
TGAATAGTGAT-GAAAGTAGGTGATCTTGATATCTTGGTATCTCGGTATCAAAAGCTTACAC ............ ?
sllppnge2
4930
Micropia 3' tandem repeats are conserved and therefore seem to be functional. Significant evidence for functional
nllppngl3
3074 Mi8
Din11, -Dm2 and -DhMiF2 this region is about 550 bp long, while in micropia-DhMiF8 it spans only 180 nucleotides. While it has been possible to assign well-known functions to all other regions of micropia elements (Lankenau et al. 1988) the highly conserved 3' tandem repeats within this region represent a new feature o f retroelements. To assess possible constraints against mutations we carried out a comparative sequence analysis (Fig. 4). Such an analysis is dependent on a proper alignment of the sequences. When sequences are compared that have been constrained by clear functional pressure during evolution (e.g. RNA- or protein-coding sequences), a sufficient number of homologies and invariances distributed along the entire sequence almost always allows an assignment of positions (Eigen et al. 1985). A priori this does not hold true for non-coding regions because the sequences might have diverged to complete randomization. Other problems are caused by molecular drive events, modifying the sequences to such an extent that alignments are difficult or impossible. Figure 4 shows an alignment of the non-protein-coding region of the three micropia elements, -DhMiF8, -DhMiF2 and -Din11. Some sequence blocks within the alignment (Fig. 4) are better conserved than others. The overall similarity in different stretches of the compared sequences ranges between the lowest K_min value (54.3%) and the highest Kmax value (62.9%) (Miyata 1982; see Fig. 4 legend). Two long sequence blocks are nearly identical in D. melanogaster and D. hydei: 5 ' T X G A A C T G A A T A T - 3 ' (DhMiF2 position 4524; micropia-Dml I position 4742) and the ( + ) strand primer binding site region 5 ' - T T A C A X G A G G A C G T G X XAAXGTCAGXATGGCCG-3' (DhMiF2 position 4690 and m i c r o p i a - D m l l position 4925); these might represent functional islands (Fig. 4). However we cannot exclude that the divergence of these conserved blocks might not yet have reached total randomization just by chance.
...... GTAAATTTGCCATATTGGCC ** • ** *****
:..... >prr
Hi2
GAGGACGTGTCAAGGTCAGGATGGCCG
:..... >LTR
Dmll
GAGGACGTG-AAATGTCAGAATGGCCG:
..... >LTR
mI|ppBgQ3
and
pIS
DhMiF2 Dmll
reglo.
Fig. 4. Alignment of non-protein-coding and 3' tandem region. This alignment is based on alignments of the 3' end of long ORFs from the integrase-coding region. These alignments (compare Lankenau et al. 1988) define the places of the first tandem repeat unit within the non-coding region (Fig. 1). From here, the alignment can be extended towards the 3' LTRs. Good positional assignment of sequences can be achieved if we take into account that often deletions of DNA sequences might have occurred by slippage replication (Efstratiadis et al. 1980; Fig. 2 this paper). Therefore we should find shorter duplications close to many larger gaps within the alignment. Such duplications in turn argue for a correct alignment. Even though we have to account for modifications within the ancestral duplications that "catalysed" the deletion, we can indeed identify such duplications in every large gap of this alignment (slippages 1~4 in Figs. 2 and 4; duplications flanking the deletions are underlined). '"", primer binding site
sequences has only been obtained for the 3' tandem repeats (Fig. 5). While the micropia alignments in Figure 4 are only based on 4 sets of sequences, in the case of the 3' tandem repeats we can work with an extended set of data (25 repeat units, including unpublished c D N A sequences). An alignment of a representative number of tandem repeats is given (Fig. 5). From this we can appraise under what constraints evolution has " w o r k e d " on these sequences. Comparisons of the repeat segments are possible at four distinct levels which we shall consider in the following paragraphs: (level 1) within one micropia element, (level 2) between elements of one species, and (level 3) between the species D. melanogaster and D. hydei. The fourth level is a result of alignment (Fig. 5) and describes, for example, the distribution of variability within one segment. The major result of the sequence alignments is the identification of two highly conserved sequence blocks of 10 bp and 4 bp within each tandem unit. The 10 bp
115 highly conserved
~ :
! .
.
.
.
.
.
.
.
.
.
.
.
.
.
1 Din2 TI
.
.
.
.
.
.
.
.
.
.
.
I I $-2 Dmlll T1 I S-3
.
I
I
.
.
.
I ~ [ TCATCGTCTC IACCT G IACGG ] .~ I TCATCGTCTC ] ACCTAG I ACGG
TCRTCGTCTC IACCTAG IACGG IATATCTC
.
,;;[
.
.
.
.
.
.
CG T
. . . . .
I ~ I TCRTCGTTTC ITCTTAA IACGG I ~ I TCGTCTC I TCTTRG, RCGG . . . . . . . . . . . .
I I S-2 Mi2 I T2 1S-3 I IS-4 ;;-[
.
TCA . . . . . . . . . . . .
....
.
I. . . . . . . . LA.T..AA..CTGR CART I ~ ~ I ATA~ CART
IS- 5 I IS-6
Mi2 T1
.
" " CA AT
I 4{ I 1. . . . . . . . TCATCGTCTC, ACCTAG IACGG IA T R ~-~-~ CART
I 1S-2 I ] S-3
.
~"
I IS-4
....
Mi8 T1
_2
TCATCGTCTC IACCTAGI ACGG I.A.T.A .~-T~ CART I ~ l I TCATCGTCTC IACCTAG IACGG IATA
.
....
~ 2_I
.
I
S-5 .
!
'i 4{ i i TCATCGTCTC I ACCTAG ] ACGG ] A.T,A~-~
S-4
.
I
TCATeGTCTCIRe~TG IACGGI~T.A.~t"~'~CART
Is- 2 I I S-3
I .
.
variable highly variable conserved
~~
I IRTAACCATT ARCAA I ...... I RTATCCAACTGATAA ....
I ag CRTCGTCTC I ACTTGG l ~ CTTCGTCTC IACTTGG ~ ~ CATCGTCTC 1ACTTGG
I I ACGG I IACGG I IACGG
. . . . . . . . . .
I ] ~T~.. C 1 ] ~ C I I TTCR C
. . . . . . . . . . . .
I I S-2 I IS-3
. . . . . . . .
~~
I ~ 1 I TCATCGTCTC I ACTTGG ] ATGG I 1 ~ I I TCGTCTCIACTTGGIACGGIAT
[E~]
C
CC
CTGA C
CC
IS-4 TCRGCTTC CIACT AGIACC IAT TTGA CGA GCCC ............................................................... ** ** * * ** ** *** **** L1 TC TTTAIAGGGIAT CCTG CART rc 3'5' ............................................................... ** ******* , ** • ************ satelliteadjacent plasmid 1.672-453
TC TCGTTTC A
GGGIA
IATAACCCACCAA C
TTG
Fig. 5. Positional alignment of micropia 3' tandem repeats. Tandem repeats are shown reversed complementary to the published micropia sequences. The analysis shows that there is a highly conserved region-l, a variable region-I, a conserved region-2 and a highly variable region-2. These different sequence blocks might have evolved in different fashions, it is most likely that the conserved regions reflect functional pressures acting on them. Short sequence blocks (here represented by differently marked blocks of nucleotides) may be well "conserved" within the variable regions. But the pattern of occurrence of these sequence blocks within one micropia element, within one species or between Drosophila hydei and Drosophila melanogaster resembles a random shuffling of playing cards. For further explanations see text. S segment; T1 tandem1 in a cluster; Mi2, Mi8, Din2, Dmll tandem repeats from four micropia elements; L1 LINE1 (Wincker et al. 1987); 1.672-453 is a moderately repeated DNA adjacent to simple satellite DNA namely (AACAATA)68... of D. melanogaster (Lohe and Brutlag 1987). A insertion within 1.672-453. * positions identical with at least one micropia element. Filled and outlined stars represent species-specific nucleotides long sequence especially is 100% conserved within every single micropia element (level 1), in elements of the same species (level 2) and even between the two species D. melanogaster and D. hydei (level 3). In addition this se-
quence is found in transcripts of D. melanogaster (D.-H. Lankenau, unpublished results) (levels 1 and 2). The significance of the conservation pattern becomes clear if one compares this high degree of conservation with the high degree of variability in the flanking sequences within the tandem unit itself (Fig. 5, level 4) or with the other 3' non-coding sequences outside o f the tandem repeat clusters (Fig. 4). The high degree of conservation of the tandem repeat makes its functional importance very likely. This assumption is supported by the fact that the tandem repeats are conserved between the two Drosophila species (level 3) while not even the LTRs of micropia elements, with well-established functions, possess any interspecific similarity (level 3, Lankenau et al. 1989). Also, less conserved sequences within the tandem units possess interesting patterns of divergence which we call "positional nucleotide shuffling". The higher degree of conservation of the variable region-2 between the D. melanogaster micropia elements (level 2) could be the result of molecular drive mechanisms such as unequal crossing over, which is a typical homogenization mechanism within tandem arrays (Dover et al. 1982). A strong argument against unequal crossing over is, however, the high differential divergence of the two elements derived from D. hydei (micropia-DhMi2, level l; micropiaDhMi2 and -MiF8, level 2). The patterns of divergence are shown in Figure 5 as a co-ordinated alignment, taking into account the four levels of comparison. The vertical axis represents tandem segments (S-1 through S-n) belonging to tandem clusters Tn of micropia elements -Din2, D m l 1, -DhMiF2, and -DhMiF8. The horizontal axis defines four regions: highly conserved-l, conserved2, variable-i, and highly variable-2. There is more freedom for mutations in variable positions of the tandem repeats on level 3 as well as on level 4 compared with coding sequences. But certain nucleotide constellations are preferred and relatively stable (wavy underlining and boxes in Fig. 5). Sometimes sequence motifs may disappear, like 5'-CART-3' in Mi2T2 (wavy underlining in other clusters, Fig. 5) and consequently the tandem clusters become shorter. Another sequence motif may "arise" which can be aligned to another position and may represent a very small functional constraint (Fig. 5, boxed 5'-CTCA-3' in Mi8T1). The length of tandem units may indeed also play a functional role since the number of nucleotide residues within one tandem repeat segment is always of the same order of magnitude as in the other segment members of the cluster (level 1). Additionally only minor length variations are found between clusters of different species as well as within a species (levels 3 and 2, respectively). The only species specific nucleotide (level 2/3) is located within the variable-1 region (marked by filled and outlined stars in Fig. 5). Another indication of functional importance is the similarity of the micropia tandem repeats to the 66 nucleotide tandem repeats of the 3' part of L I N E I elements (Wincker et al. 1987). Twenty nucleotides from a LINE1 tandem repeat unit can be aligned to the micropia tandem repeats (Fig. 5, bottom) (Lankenau et al. 1988). Another, much less obvious similarity exists with the rood-
116 erately repeated DNA 1.672-453 which was found adjacent to simple satellite DNA of D. melanogaster (Fig. 5, bottom) (Lohe and Brutlag 1987). An explanation of the observed conservation patterns is that specific protein-DNA interactions with the micropia non-coding region have played a role in creating selective constraints. One can speculate whether the conserved sequences have some function either in the regulation of the transposition activities of micropia or, alternatively, in protein-DNA interactions influencing chromatin regions outside the micropia element as described for other retroelements (cf. Parkhurst et al. 1988).
Discussion
Micropia elements were discovered as constituents of the D. hydei Y chromosomal wild-type fertility genes Threads and Pseudonucleolus. From ultrastructural data it is believed that these lampbrush loops represent large transcription units with transcript lengths of 500 to 1000 kb, and larger than 1000 kb, respectively (deLoos et al. 1984; Grond et al. 1983, 1984; Grond 1984). Micropia element sequences are found to be transcribed on the Threads and Pseudonucleolus (Huijser et al. 1988). There are two ways to interpret the hybridization reactions with transcripts in these lampbrush loops: (1) The initiation site of the transcription unit of the lampbrush loops is far away from the regulatory sites of any micropia element. The transcription process ignores the micropia regulatory sites and just reads through to a specific R N A termination signal at the end of the loops. A similar mechanism has been described from the transcription pattern of HBV (hepatitis B virus). During circular HBV proliferation the signals for cleavage and polyadenylation are ignored during the first transit past these sites but honoured on the second passage (Ganem and Varmus 1987). (2) The regulatory sites of transcription of micropia are functional and the radioactive signals observed by in situ hybridization are autonomous transcripts of micropia. This interpretation is favoured by the finding of small transcripts in Miller spreading experiments between the large transcripts of Pseudonucleolus or the two types (bush-like and fibrillar) of transcripts on the Threads. Both results may be indicative of secondary initiation sites of transcription within the loops (deLoos et al. 1984; Grond 1984). Both possibilities are likely to represent two simplified alternatives of a much more complex natural situation. It is shown in this paper, and by Huijser et al. (1988) and Lankenau et al. (1988, 1989) that micropia possesses the functional sequences of typical retroelements. Even though it has been argued that transposable elements may not directly play a major role in cell differentiation processes (Finnegan et al. 1982; Potter et al. 1979), micropia might represent an example of the competition between regulatory and transcription factors of the retroelement itself and of the Threads and Pseudonucleolus transcription units.
A well-studied example is the retrotransposon gypsy. In the y2 mutant, this transposable element inserted 700 bp upstream from the transcription start of the yellow gene, giving rise to the temporal and tissue-specific y2 phenotype. The insertion does not affect the early transcription of this gene but alters expression in the pupa such that adult y2 flies have normal-coloured bristles, whereas the wings and the body cuticle are yellow. This altered differential expression of the yellow gene is not simply the result of insertion of gypsy into sequences necessary for tissue-specific expression or a distancing of these sequences from the yellow promoter but rather is caused by specific sequences located in the 5' untranslated region of gypsy. Revertants with one remaining solo LTR or those where the sequences between both gypsy LTRs have been replaced by another transposable element no longer show the y2 phenotype. The y2 phenotype can be altered (reverted) by the gene product of suppressor of Hairy-wing [su(Hw)], which is a protein with 12 repeats of the Zn finger domain. This su(Hw) protein interacts with 12 copies of a sequence motif of the gypsy elements (Parkhurst and Corces 1986; Kubli 1986; Geyer et al. 1988; Spana et al. 1988; Parkhurst et al. 1988). One might assume that the conserved micropia tandem repeats possess an analogous function. Recently DNA sequences homologous to the protease and reverse transcriptase of gypsy have been identified on the lampbrush loops "Nooses", another wild-type Y chromosomal fertility gene of D. hydei (R. de Graaf, D.-H. Lankenau, P. Vogt and W. Hennig unpublished results). Further research will show if a Zn finger binding site is conserved within this new retrotransposon. It is still unknown whether the transcripts from the Threads and Pseudonucleolus possess exons and if micropia elements are parts of Y chromosomal introns. If this holds true, their influence on the transcription chemistry of the flanking exons can be compared to that of the wa mutation where copia is inserted into the second white intron (Gehring and Paro 1980), or to the f~ mutation where gypsy is inserted into the R N A coding region of forked. On the other hand it is also known that gypsy does not affect the expression of other genes located in vicinity of forked (Parkhurst and Corces 1985). Comparable to the regulative capacity of these insertions micropia elements (and the recently found gypsyrelated retrotransposon in the Nooses) may play a regulatory role within the Drosophila fertility genes. The testis specific transcription of Y chromosomal micropia elements (Huijser et al. 1988) might compete with the transcription of other Y chromosomal lampbrush loop D N A sequences. Possible candidates for regulatory influences are sequences on the LTRs and the 3' tandem repeats of micropia, especially those regions highly conserved between all known elements. The putative function of these sequences is not necessarily disrupted by the large number of rearrangements described in this paper, as long as they are unaffected themselves. Therefore "defective" micropia elements like DhMiF8 may also contribute to wild-type lampbrush loop function.
117
Acknowledgements. We are grateful to S. Lankenau and Dr. D. Ribbert for critically reading the manuscript. We thank C.R. Lankenau for writing a package of DNA sequence analysis programs in Turbo-Pascal, and Dr. R. Brand, Dr. J. Hackstein, R. Hochstenbach, H. Kremer, and F. Wang for discussion. Excellent technical support was given by R. Dijkhof, R. de Graaf, D. ten Hacken and W. Janssen. One of us (D.-H.L.) was supported by a Ph.D. fellowship of the Studienstiftung des deutschen Volkes. References Dover GA, Brown S, Coen E, Dallas J, Strachan T, Trick M (1982) The dynamics of genome evolution and species differentiation. In: Dover GA, Flavell RB (eds) Genome evolution. Academic Press, London, pp 343-372 Efstratiadis A, Posakony JW, Maniatis T, Lawn RM, O'Connel C, Spritz RA, DeRiel JK, Forget BG, Weissman SM, Slightom JL, Blechl AE, Smithies O, Baralle FE, Shoulders CC, Proudfoot NJ (1980) The structure and evolution of the human betaglobin gene family. Cell 21 : 653 668 Eigen M, Lindemann B, Winkler-Oswatitsch R, Clarke CH (1985) Pattern analysis of 5s rRNA. Proc Natl Acad Sci USA 82:2432 2441 Emori Y, Shiba T, Kanaya S, Inouye S, Yuki S, Saigo K (1985) The nucleotide sequences of copia and copia-related RNA in Drosophila virus-like particles. Nature 315 : 773-776 Farabaugh P J, Miller JH (1978) Genetic studies of the lac repressor VII. On the molecular nature of spontaneous hotspots in the lac I gene ofEscheriehia eoli. J Mol Biol 126:847 863 Finnegan D J, Will BH, Bayev AA, Bowcock AM, Brown L (1982) Transposable DNA sequences in eucaryotes. In: Dover GA, Flavell RB (eds) Genome evolution. Academic Press, London, pp 2940 Ganem D, Varmus HE (1987) The molecular biology of the Hepatitis B virus. Annu Rev Biochem 56:651-693 Gehring WJ, Paro R (1980) Isolation of a hybrid plasmid with homologous sequences to a transposing element of D. melanogaster. Cell 19:892904 Geyer PK, Green MM, Corces VG (1988) Reversion of a gypsyinduced mutation at the yellow (y) locus of Drosophila melanogaster is associated with the insertion of a newly defined transposable element. Proc Natl Acad Sci USA 85 : 3938-3942 Grond CJ (1984) Spermatogenesis in D. hydei. Ph. D thesis, University of Nijmegen Grond CJ, Siegmund J, Hennig W (1983) Visualization of a lampbrush loop-forming fertility gene in Drosophila hydei. Chromosoma 88 : 50-56 Grond CJ, Rutten RGJ, Hennig W (1984) Ultrastructure of the y chromosomal lambrush loops in primary spermatocytes of Drosophila hydei. Chromosoma 89: 85 95 Hennig W, Vogt P, Jacob G, Siegmund I (1982) Nucleolus organizer regions in Drosophila species of the repleta group. Chromosoma 87:279 292 Hennig W, Huijser P, Vogt P, J/ickle H, Edstr6m J-E (1983) Molecular cloning of microdissected lampbrush loop DNA sequences of Drosophila hydei. EMBO J 2 : 1741-1746 Huijser P, Hennig W (1987) Ribosomal DNA-related sequences in a Y chromosomal lampbrush loop of Drosophila hydei. Mol Gen Genet 206:441451 Huijser P, Kirchhoff C, Lankenau D-H, Hennig W (1988) Retrotransposon-like sequences are expressed in the Y chromosomal lampbrush loops of Drosophila hydei. J Mol Biol 203:689697 Jagadeeswaran P, Forget BG, Weisman SM (1981) Short interspersed repetitive DNA elements in eucaryotes: transposable DNA elements generated by reverse transcription of RNA PolIII transcripts? Cell 26:141 142 Kubli E (1986) Molecular mechanisms of suppression in Drosophila. Trends Genet 2:204-209
Lankenau D-H, Huijser P, Jansen E, Miedema K, Hennig W (1988) Micropia: a retrotransposon of Drosophila combining structural features of DNA viruses, retroviruses and non-viral transposable elements. J Mol Biol 204:233 246 Lankenau D-H, Huijser P, Hennig W (1989) Characterization of the long terminal repeats of micropia elements microdissected from Y-chromosomal lampbrush loops "Threads" of D. hydei. J Mol Biol 209:493-497 Lipman D J, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227:1435-1441 Lohe AR, Brutlag DL (1987) Adjacent satellite DNA segments in Drosophila structure of junctions. J Mol Biol 194:171 179 deLoos F, Dijkhof R, Grond CJ, Hennig W (1984) Lampbrush chromosome loop-specificity of transcript morphology in spermatocyte nuclei of D. hydei. EMBO J 3:2845-2849 Maniatis F, Fritsch EF, Sambrook J (1982) Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY McClintock B (1956) Controlling elements and the gene. Cold Spring Harbor Yearbook 21 : 197-216 Miyata T (1982) Evolutionary changes and functional constraints in DNA sequences: In: Kimura M (ed) Molecular evolution, protein polymorphism and the neutral theory. Japan Scientific Societies Press, Tokyo/Springer, Berlin Heidelberg New York, pp 233~60 Parkhurst SM, Corces VG (1985) Forked, gypsys, and suppressors in Drosophila. Cell 41:429-437 Parkhurst SM, Corces VG (1986) Interactions among the gypsy transposable element and the yellow and suppressor of Hairywing loci in D. melanogaster. Mol Cell Biol 6:47-53 Parkhurst SM, Harrison DA, Remington MP, Spana C, Kelley RL, Coyne RS, Corces VG (1988) The Drosophila su(Hw) gene, which controls the phenotypic effect of the gypsy transposable element, encodes a putative DNA-binding protein. Genes Dev 2:1205 1215 Potter SS, Brorien WJ, Dunsmuir P, Rubin GM (1979) Transposition of elements of the 412, copia and 297 dispersed repeated gene families in Drosophila. Cell 17 : 415427 Pustell F, Kafatos FC (1982) A high speed, high capacity homology matrix: zooming through SV40 and polyoma. Nucleic Acids Res 10:47654782 Pustell F, Kafatos FC (1984) A convenient and adaptable package of computer programs for DNA and protein sequence management, analysis and homology determination. Nucleic Acids Res 12:643-655 Pustell F, Kafatos FC (1986) A convenient and adaptable microcomputer environment for DNA and protein sequence manipulation and analysis. Nucleic Acids Res 14:479488 Rogers JH (1985) The origin and evolution of retroposons. Int Rev Cytol 93:187-279 Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-termination inhibitors. Proc Natl Acad Sci USA 74: 5463-5467 Schwarz-Sommer Z, Saedler H (1987) Can plant transposable elements generate novel regulatory systems? Mol Gen Genet 209: 207-209 Shepherd JCW (1981) Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification. Proc Natl Acad Sci USA 78 : 159(~1600 Spana C, Harrison DA, Corces VG (1988) The D. melanogaster suppressor of Hairy-wing protein binds to specific sequences of the gypsy retrotransposon. Genes Dev 2:1414-1423 Streisinger G, Okada Y, Emrich J, Newton J, Tsugita A, Terzaghi E, Inouye M (1966) Frameshift mutations and the genetic code. Cold Spring Harbor Syrup Quant Biol 31:77-84 Wincker P, Jubier-Maurin V, Roizes G (1987) Unrelated sequences at the 5' end of mouse LINE-1 repeated elements define two distinct subfamilies. Nucleic Acids Res 15:8593-8606
Lihat lebih banyak...
Comentarios