Novel Drosophila melanogaster genes encoding RRM-type RNA-binding proteins identified by a degenerate PCR strategy

Share Embed


Descripción

Gene, 154 (1995) 187-192 ©1995 Elsevier Science B.V. All rights reserved, 0378-1119/95/$09.50

187

GENE 08629

Novel Drosophilamelanogaster genes encoding RRM-type RNA-binding proteins identified by a degenerate PCR strategy (DNA cloning; protein comparison; ribonucleoprotein identifier; post-transcriptional regulation; pre-mRNA splicing factors; molecular evolution; Arabidopsis)

St6phanie F. Brand, S6bastien Pichoff, St6phane Noselli and Henri-Marc Bourbon Centre de Biologie du D4veloppement, UMR 9925 CNRS/UPS, Universit~ Paul Sabatier, 31062 Toulouse C~dex, France

Received by G. Bernardi: 12 August 1994; Revised/Accepted: 22 September/3 October 1994; Received at publishers: 18 November 1994

SUMMARY

We are interested in identifying Drosophila melanogaster RNA-binding proteins involved in important developmental decisions made at the level of mRNA processing, stability, localization or translational control. A large subset of the proteins known to interact with specific RNA sequences shares an evolutionarily conserved 80-90-amino-acid (aa) domain referred to as an RNA-recognition motif (RRM), including two ribonucleoprotein identifier sequences known as RNP-1 and RNP-2. Hence, we have herein applied degenerate polymerase chain reaction (PCR) methodology to clone three additional members (termed fox2, fox8 and fox21) of the D. melanogaster RRM-protein gene superfamily encoding putative trans-acting regulatory factors. Representative cDNA clones were isolated, the conceptual aa sequences of the candidate Rox proteins were inferred from their nucleotide sequences, and database searches were conducted. Rox2 displays extensive aa sequence similarities to putative RNA-binding proteins encoded by the genomes of the plants Oryza sativa and Arabidopsis thaliana; Rox21 resembles essential metazoan pre-mRNA splicing factors; as described elsewhere, Rox8 is likely a fly homolog of the two human TIA-l-type nucleolysins/-Brand and Bourbon, Nucleic Acids Res. 21 (1993) 3699-3704].

INTRODUCTION

Many regulatory proteins participate in the control of gene expression at the post-transcriptional level via specific interaction with RNA sequences (Frankel et al., 1991). A great number of these trans-acting factors contains single or multiple copies of a loosely conserved 80-90-aa RNA-recognition motif termed RRM (also Correspondence to: Dr. H.-M. Bourbon, Centre de Biologie du D6veloppement, UMR 9925 CNRS/UPS, Bat. IVR3, Universit6 Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse Cddex, France. Tel. (33-61) 558-288; Fax (33-61) 556-507; e-maih [email protected]

Abbreviations: A., Arabidopsis; aa, amino acid(s); bp, base pair(s); cDNA, DNA complementary to RNA; D., Drosophila; est, expressed sequence tag; hnRNP, heterogeneous nuclear RNP; mRNA, messenger RNA; nt, nucleotide(s); O., Oryza; oligo, oligodeoxyribonucleotide; SSDI 0378-1119(94)00840-X

known as RBD or RNP-CS motif; Kenan et al., 1991). This module has evolved to bind different RNA structural elements and, through gene duplication and the addition of auxiliary domains, has given rise to a great number of proteins with functions in various aspects of cellular RNA metabolism (Birney et al., 1993). The superfamily of RRM-bearing protein genes (hereafter referred to as the rox gene family, for genes with RRM-encoding boxes) ORF, open reading frame; PABP, poly(A)-binding protein; pabp, gene encoding PABP; PCR, polymerase chain reaction; pI, isoelectric point; pre-mRNA, precursor of mRNA; RBD, RNA-binding domain; RNP, ribonucleoprotein particle; RNP-CS, ribonucleoprotein consensus sequence(s); RNP-IDs, RNP-1 and RNP-2 ribonucleoprotein identifier sequence(s); Rox, RRM-bearing protein; fox, Rox-encoding gene; ROX, PCR-amplified rox sequence; RRM, RNA-recognition motif(s); snRNP, small nuclear RNP; SR, Ser/Arg-rich domain; TIA-1, granule-associated nucleolytic protein from human cytolytic T-lymphocytes.

188 includes members coding for proteins that participate in constitutive pathways common to all eukaryotic cells, such as the mRNA poly(A)-binding protein (Adam et al., 1986), various hnRNP proteins (Dreyfuss et al., 1993), a number of snRNP proteins (Kenan et al., 1991) and the nucleolar protein nucleolin (Bugler et al., 1987), while other members encode proteins required for selective cell fate determination (Bandziulis et al., 1989; Lantz et al., 1994). In Drosophila, many developmental processes are known to be regulated at the level of pre-mRNA splicing (Rio, 1993), mRNA localization (Macdonald, 1992) or translational control (Wharton and Struhl, 1991). However, few of the factors interacting with specific cisregulatory RNA sequences have so far been identified from genetic screens. Thus, our aim was to perform an oligo-directed search for novel RRM-type RNA-binding protein genes that may control important aspects of fly development.

EXPERIMENTALAND DISCUSSION

(a) Amplification of D. melanogaster sequences encoding RRM-type RNA-binding domains The framework of an RNA-interacting surface is provided in examined RRM by a four-stranded antiparallel [3-sheet platform apposed to two c~-helices (reviewed by Kenan et al., 1991). The core of this [3-stranded open platform includes two spatially juxtaposed segments of 8 and 6 aa, respectively, known as the RNP-1 and RNP-2 ribonucleoprotein identifier sequences (hitherto referred to as the RNP-IDs), which are highly conserved between most of the so far identified RRM proteins (Fig. la). To search for additional members of the D. melanogaster rox gene superfamily, we deviced an experimental design based on an application of the PCR technology. Hence, degenerate RNP-1- and RNP-2-based oligos were designed to match possible sequences coding for the entire corresponding [3-sheets (see Fig. 1). Since almost every position of both RNP-IDs may accommodate conservative aa exchanges, we predicted that a single set of highly degenerate 21-mer oligos (degeneracy of 262 144 and 786432, respectively), coding for all possible aa sequences, would be difficult to use. Thus, we derived several sets of moderately degenerate PCR primers (between 1024 and 65 536 different sequences) encoding non-overlapping subsets of conceptual RNP-IDs (Fig. lb; and data not shown). Several criteria were selected to retain PCR amplification products as DNA fragments derived from rox genes: (i) In most of the RRM proteins identified so far the RNP-IDs are separated by approx. 30 aa (see Fig. la),

and in those with multiple RRM the spacing between two consecutive copies is relatively invariant. Thus, each oligo combination was tested for its ability to amplify discrete products of expected sizes, i.e., about 140-160 and 430-460 bp (A- and B-class products, respectively; Fig. lc) according to the presence of one or two adjacent RRM-encoding sequences. Among the combinations of degenerate oligos tested only two (hitherto referred to as the P1 and P2 primer sets; see Fig. lb for sequences) gave rise to significant amplifications of appropriately-sized products using as a PCR template cDNA (prepared from a 4-8-h embryo-stage cDNA library) or genomic DNA; (ii) Cloned A- and B-type amplification products should contain an appropriately-phased uninterrupted ORF; (iii) The inferred conceptual polypeptides were inspected for the presence of diagnostic aa at key positions in the RNP-2/RNP-1 spacer region (see Fig. la). As shown in Table I, the cloned amplification products which were finally retained (hereafter referred to as the ROX products) were distributed among six different types; three of these (denoted ROX2, ROX8 and ROX21 ) were identified as novel RRM-encoding sequences. Strikingly, the three other ROX-type products corresponded exclusively to specific amplifications of sequences encoding three out of the four adjacent evolutionarilyconserved RRM of the previously characterized mRNA poly(A)-binding protein (PABP) (Left,re et al., 1990). In contrast, none of the other known D. melanogaster RRMencoding sequences were recovered. Analysis of the range of the ROX sequences and their relative distribution among cloned PCR products reveals further the limits of an oligo-directed approach. First of all, the utilization of the P2 oligo set resulted exclusively to amplifications of pabp-derived sequences. Second, it is noteworthy that the ROX2-type fragments were highly represented among products obtained with the P1 oligo set from each DNA template. Third, ROX21-type sequences were only identified among clones resulting from amplification of genomic DNA, despite the fact that the corresponding cDNA template was not under-represented in the starting material (see section e). All together, these data strongly suggest that the differential distribution of the ROX sequences among the individually-isolated products was not strictly related to abundance of the corresponding template DNAs, but instead was probably due to thermodynamic and/or kinetic parameters of the PCR amplifications dependent upon sequences of the oligos and target DNAs. Also relevant to this, the range of RRM sequences identified via a similar PCR-based approach (albeit with different oligos and DNA template) does not overlap with those characterized here, except for ROX8 which appears to be nearly identical to RRM12 (Kim and Baker, 1993).

189

CI

~1

o~1

]32

~3

(x2

[34

I

- - - l ~ IK--mS - ~ G - I - I - I - - D = = T G ~ E ~ . . . . A--]L . . . . G - - ~ R - I R I - - A - SIJ IR YA]W ~ V L S ~II~YI[]I V a K LKV S SV NV V N I V NV L

l

I

1

l

RNP-2

b

5'

:~3' N/T I / V

Y I/V

6

3'~

N L/if,q/

6

ACGGATCCACN RTN TAY RTN GGN AAY NT A

I

N/T/S L

F I/V 6/K/N N L

Y I/V

E/D FlY P1

6 .F A/6 F I/V E/D/N FlY CCN AAR CCN AAR YAN CTN AAR TTCGAACG G G T

RRM i

-into

Y A/6

CCN ATR CCN ATR YAN CTN AAR TTCGAACG G T

ACGGATCCACN YTG I-TY RTN RRN AAY YT R

~-

5'

p2

RRM 2

Hi

• A

• A

--

[--.

DNA templote

C l a s s o f PCR ~-

products

B Fig. 1. Experimental design for cloning RRM-encoding sequences. (a) A consensus sequence derived from alignment of 71 RRM available to us (not shown) is shown at the top. The indicated aa were recovered in at least 23 of these RRM. The diagnostic aa of Bandziulis et al. (1989) are boxed in reverse type. Loosely conserved aa are indicated by a dash (-) A variable number of aa that may be inserted are symbolized by symbol (=). The positions of the segments liable to fold into [3-sheets ([31 through [34) or a-helices (~1 and ~2) are indicated above the consensus sequence. (b) The sequences of the RNP-1- and RNP-2-based degenerate oligos (P1 and P2 primer sets) used for PCR amplifications are shown with alternative nt beneath. R, A or G; Y, C or T; and N represents all four possible nt. Corresponding encoded RNP-1- or RNP-2-based aa sequences are shown in italics above the oligos. The RNP-l-derived nt sequences (antisense primers) are the complements of the coding sequences that specify indicated aa and are presented in the reversed orientation. Restriction sites for HindIII or BamHI have been incorporated at the 5' end of the RNP-1- or RNP2-based oligos, respectively, for cloning purposes. (e) Products from PCR amplifications of single or adjacent RRM-encoding sequences are indicated (referred to as the A- and B-class PCR products, respectively). PCR primers are represented by arrows. The positions of the sequences encoding the RNP-2- and RNP-l-based segments are shown for two adjacent RRM using filled and dotted boxes, respectively. Methods: Genomic DNA obtained from a D. melanogaster wild-type stock (Canton-S), or purified double-stranded eDNA prepared from a 4-8-h embryo stage library made in the pNB40 vector (Brown and Kafatos, 1988), were used as PCR templates. Two successive runs of 35 cycles, each of 30 s at 94°C, 30 s at 55°C, 1 min at 72°C, followed by a final extension of 5 min at 72°C, were performed on a Techne SCS2 thermocycler. The first PCR amplification reactions (20 ~tl) contained about 1 ng of eDNA or 200 ng of genomic DNA/0.2 mM dNTPs/1 ~tg of each oligo, and were performed with 0.5 unit of Taq DNA polymerase (Boehringer-Mannbeim) using the manufacturer's buffer. One-tenth dilution of the first reaction (1 ~tl) was used as the substrate for the second run. PCR-amplified eDNA or genomic DNA were restricted, appropriately-sized fragments were recovered from 2% agarose gels and finally subcloned into the vector pBluescript II SK(-t (Stratagene, La Jolla, CA, USA), giving rise to libraries of transformed bacteria containing independent recombinant plasmids (pBS series). From a number of randomly-picked clones plasmid DNA was prepared using standard alkaline lysis. Sequences of corresponding amplified DNA inserts were obtained by the dideoxy method with T3 and T7 primers using a Sequenase kit (US Biochemical, Cleveland, OH, USA). Clones derived from amplifications of genomic DNA were sorted by double-lane (A and T lanes) sequencing.

(b) Nucleotide sequence of rox2 cDNA and the inferred Rox2 aa sequence: comparison to putative RNA-binding proteins encoded by the rice and Arabidopsis genomes To further establish that the ROX2, R O X 8 and ROX21 PCR

products

corresponded

to

novel

RRM

protein

genes, t h e y w e r e in t u r n u s e d as h y b r i d i z a t i o n p r o b e s to

R e s t r i c t i o n m a p p i n g a n d D N A s e q u e n c i n g a n a l y s i s of f o u r different rox2 e D N A c l o n e s e s t a b l i s h e d t h a t t h e y are p r e d i c t e d to e n c o d e a c o m m o n p r o t e i n (not shown). T h e nt s e q u e n c e d e t e r m i n e d

for the l a r g e s t e D N A

insert

(1757 bp) a n d the i n f e r r e d c o n c e p t u a l a a s e q u e n c e a r e s h o w n in Fig. 2a. T h e R O X 2 s e q u e n c e is f o u n d at nt 708

c l o n e s f r o m the p r e -

to 851. T h e p r o p o s e d s t a r t c o d o n is at nt 420, w h i c h is

v i o u s l y u s e d e m b r y o - s t a g e e D N A l i b r a r y . As s h o w n else-

in a r e l a t i v e l y g o o d c o n t e x t for D. melanogaster t r a n s l a -

w h e r e , the R O X 8 - e n c o d e d a a s e q u e n c e e n c o m p a s s e s t w o o f t h r e e R R M o f a 5 0 - k D a p r o t e i n w i t h e x t e n s i v e struc-

t i o n i n i t i a t i o n ( A G C A A T G vs. M A A M A T G ( w h e r e M = A o r C); C a v e n e r , 1987). T h i s i n i t i a t o r A T G p r e d i c t s a

t u r a l similarities to t h e t w o d e s c r i b e d h u m a n T I A - l - t y p e

2 5 - k D a p r o t e i n c o m p o s e d o f 224 a a w i t h a c a l c u l a t e d p I

a p o p t o t i c cell d e a t h f a c t o r s ( B r a n d a n d B o u r b o n , 1993).

of 5.05. T h i s c o n c e p t u a l p r o t e i n ( h i t h e r t o r e f e r r e d to as

i s o l a t e several o v e r l a p p i n g e D N A

190

Cl

b

ATCG~GCGAAAACT~CATCGGTAAAAACG~GCGA~C~AAATCTGTTTC~TTTGTG 60 TCAAAAACT~GTACGTTTTCGTATAGTTGTAAATATT~TAAATAATAAATAAAT~GC 120 TT~AAATAAATATTT~GCTGACGTGGTGTGGCGCGAAATAAAGTCGAAAAGGGTTTAA 180 AAACGT~ACTTTTGAACGAAAAACAGCAACCCAGCAACCC~CGT~GTGTGTTTTC~ 240 AGGGCAGCC~TTTGGGCGAACTCAACGTCAGCAGAACG~CTCGTAATTTGT~ATTTA 3~ TTAAACC~CC~ACAT~TTACGACAACTGCCGTTTTG~T~C~GCCGG~GC 3 ~ TTTTGCGCCAACAAAACCG~CTTTTGTAGC~ACGTAAACAAACCCAAATCCAAAGCAA 420 TGGCCG~GAAG~C~TGAACG~GATCAGCTCCTGG~TC~TGG~G~ 480 M A D E O I T L N E O Q L L E S L E E T 20 ACGGGG~CAAGAG~TGAGATCGCC~AGAGGTCG~G~GAGGGCAGCATGCAAATCG 540 N G E ~ E T E Z A T E V E E E G S M Q I 40 ~CCGGAACTGG~GCC~AAAGGCTCGAGTCAAGG~GGAAGAGG~GCC~G~GA 6 ~ O P E L E A I K A R V K E ~ E E E A E K 60 TAAAGC~ATGC~TCGGG~TGG~AAACAAATGCGCGGTGGGTCTACCACCGGCTTGG 660 I K ~ M ~ S E V O K Q W R G G S T T G L 80 CCAC~TCCCGCTTTCTCTTG~G~AAGCAGGAAATC~C~GCGGTCCGTCT~GTGG 72@ A T V P L S L E E K q E I O r R ~ I ( ~ GCAATGTGGACTACGGCGCATCGGCCGAGGAACTIGCGGCCCACTTCCACGGATGCGGCA[20780

AAAAATTAGCATTTTTGCAACGAGTTTAGCCACCCTGGTCGAAAAGTT~I,ATATAGATACA AACGCTAAATTGTCCGGTGGTGATAACTTATTGATTATTTATTTTTCAAGGCAGCTATC C GATTTGCAGGATAACTCCGAAAACGAATCAGAAACAGTGAATCCAGA TGTCCAGCATGGG M V TGATCAGCGCGGGACACGGGTGTATGTCGGCAATCTGACCGACAAAG TGAAAAAGGATGA 0 Q R ~ r

/

60 120 180 2 240

~ II z2

ACTGAACGGATCCGAGCTGCTCGG(~TCCCA(~CTGCG(~GTGGAG~TCTCAAAAGGGCGGCC 420 |

l

|

|

K

G R PSZ

ACGCCAGGGTAGGCGTGGCGGACCCATGGACAGGGGCGGACGACGCGGCGACTTTGGCCG 480 R Q G R~ P M D~ R R G D F G R102 GCACAGCATCACU / ~GCGGTGGTAGCGGCGGAGGCGGTTTCCG GCAGCGCGGATCCAGCGG 540 H S I r S G G S G G G G F R Q R G S !; 6122 ATCCTCAAGCCGGCACACGGAGCGGGGCTATAGCTCCGGCCGATCAG GTGCAAGCAGCTA 600 S S S R H T E R G Y S S G R 5 G A S 5 Y 142 TAATGGCAGAGAGGGCGGCGGCAGCGGCTTCAATCGC CGCGAGGTTTACGGCGGTGGACG 660 N G R E G G G 5 G F N R R E V Y G G G R 162 CGACAGCAGCCGCTACAGCAGC GGAAGTAGCGCCAGCTACGGACGCACTGGTGGTCAGTC 720 D S S R Y S $ G S S A S Y G fl T G G Q S 182

CAATCAACI?ITAACCAT~ITITGIAACAAGIqTG'IG~6~CCCC~G~TTclCAT `4~40

ACATTGAGTTTGITTCCAAGGAGTTTGTCGAGA~IGCATTIGCCATGA~CGA~CI zee'" TCT

GGCCGGACGC TTCAGGTCCC GC TCGCCGGTGGGAAACCATCGAT TCTAATGACAACACCA A G R F R S R S P V G N H R F

780 197 CTCATAGTTGCTTAAGGATTGCCTATGAGTAAAGATTAA TAAATAATAACTTAAGCGCGA 840 CCGTAAAACGCAGACTCAAACATTTAAAATCGTAGCATTCGATCGTTTTCGATCGTCCAA 900

TCCGAGGGCGTCAAATAAAGGTAATGTCGAAGCGCACAAACCGCCCTGGACTCTCCACCA 960 K

R

T N

R P

G L

5

T 18~

CAAACCGTTTCGCACGCGGCAGCTTCCGGGGTCGAGGAGCCCGTGTCTCCCGGGCGTGCT 1020 T N R F A R G S F R G R G A R V 5 R A C 20@ GTCACTCCACCTTCCGAGGCGCCCGAAGAGCTATGGGTTAC CGTGGTCGCGCCAATTACT 1080 C H S T F R G A R R A M G Y R G R A N Y 220 ACGCTCCTTACTGATTATTGTTTTATTAAAATCCCTAG TAAGATTTTTTTACAATATAAT 1140 Y A P Y 224 TTCAAAAAACATTATTTTATT GCCAAACTTCTTTGCTAAATAAAAAAGAGAAAACAATCG 1200 CAACAGAAAACGCCAGAAAAAATATCGTTTCAATAAAAAATGCATCAGCAGCAAGAAAAC 1260 AGGCGAAAAGAAGAGACCTCGAAAAT AATAAAAAATCAACAAGGAAAAC GAATATCTTGA 1320 CTTGGCTATAATGTTTTTTGAAGCACCCAACACCAAACACCCCCTTTTCACACATTTTAA 1380 TTTTCAATAAAACAT~v~AAAAA~AGATTATGAAAACAAGAAG TTGATTTAAAA 1440 AACAAAAAAGTATATACMCAATAAG TATATAAAAGAAGCG CAAAAGAAACCACAAACCC 15~ CA,4ATAAAAGAAGT.T,AA~GA TGGCAAA~GCAAA TAAAGAAG~ T A T A , / U k A A ~ 1560 AAATCAAAAGAACACAAGACTCTTGGGAAATAAGTAAATCAACAAa,ATATATAAAAAGTG 1620 CGTGTGTAATTTATATATTT~TATATAGAGTAAAGAGAAGATAAGAAIL4,GA 1680 AGTTGAATTTACTAAAGCAAAAAAGATGAAGACA.4ATAT,4ATCGA.4TTAA4.ZA6AAA~U~ 1740 1757

CAGATTCCCTCATTCCCCCACATCAACAACAACAAC A A , 4 A T G A . ~ G AAAAA 960 ACAAAAAACTGATAGTCGTCGTCTCTTCATTATTTATTTAAC GAATACCCATTAACATGA 10Z0 CAATTGCAAGCACGTAAAATTTAAGTTATAACGTAGAAAACTAGTAAGAAAAATACGCTA 1080 AGACTCGAACAACGTCAAACCAAAATCTAGAAGTACGCAATTTTAATAAATTAAATGCGG 1140 AAGCAGTTTCTCGTATAC~ 1173

d R21RVYVGNLTDKVKKDDLEGEFTKYGKLNSWI--AFNPPGFAFVEFEHRDDAEK SRI KVYVGNLGSSASKHEIEGAFAKYGPLRNW/V--ARNPPGFAFVEFEDRRDAED SR2 KVYVGNLGTGAGKGELERAFSYYGPLRTVWI--ARNPPGFAFVEFEDPRDAED

*

,•t

,

,,

*

,

,

,,*,

,,

,,,w,

**,

. . . . . . . . . . . . . . . . .

RR~

SR5 RIYVGNLPF~IRTKDIEDVFYKYGAIRDIDLKNRRGGPPFAFVEFEDPRDAED63

,

,

49 54

109 114

R21 ACDILNGSELLGSQLRVEISKGR SRI ATAALDGTRCCGTRIRVEMSSGR SR2 AVRGLDGKVICGSRVRVELSTGM SR3 AVRELDGRTLCGCRVRVELSNGE SR4 AVYELNGKELLGERVVVEPARGR SR5 AVYGRDGYDYDGYRLRVEFPRSG •

*w,

*

••

*•

*

81 85 85 84 79 92

(58%) (57%) (54%) (43%) (41%)



. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

R2 EELEAHFHGCGTINRVTILCNKADGHPKGFAYIEFGSKEFVETAL/C4NETLFRGRQIKVM 01 EEVQAHF... A1 . ..HF(~SCGTVNRVTILTDKF-GQPKGFAYV~FVEVEAVQNSLILNESELHGRQIKVS A2 . . . . VEVEAV~EAL(~LNES ELHGRqLKVL •,

56

SR4 RVYVGGLPYGVRERDLERFFKGYGRTRDILIKN . . . . . GYGFVEFEDYRDADD 46

,*

R2 RVKEMEEEAEKIK(~MQSEVDK(~MRGGSTTGLATVPLSLEEKQEIDTRSVYVGNVDYGASA 01 RLKEMEEEAAALRDMQAKVAKEN~GGPAREEILARL(~LKAKE(~VDARSVYVGNVDYPCAP

56

SR3 KVYVGNLGNNGNKTELERAFGYYGPLRSVWV--ARNPPGFAFVEFEDPRD~I~O 55

C R2 MADEDITLNED~LLESLEETNGEQETEIATEVEEEGSM~IDPELEAIKA 01 .GSTKVGRGERMDEEEHEVYC~EIPE--DGDMDAADVDMASGGDOAAAV~---ELD~MKR

58

169 121 54 25

,e,e**,

,

R2 SKRTN RPG LS TTNRFARG SF RGRGARVS RACCHST F RGARRAMGYRGRAN YYAPY 224 AI AKRTNVPGM ........ RQF RGRGRPF ............ RI:~4RGFMPGVPFYPPY ..... 89

A2 QKRTNVI:~L. . . . . . . . KQFRGR--RF. . . . . . . . . . . . NPYMGYRFRRPFMSPYMYGPY e***

~,.e

•~**e

*,

*

ee*e

+

59

,•,*+

A1 AYGRVPRFRRPMRYRPY... 106 A2 GYGKAPRFRRPMRYMPYQ... 77

Fig. 2. Nucleotide sequences of rox2 and rox21 cDNAs and aa sequence comparisons of the inferred Rox2 and Rox21 proteins to related gene products from plants and metazoan, respectively. The nt and encoded aa sequences of rox2 (a) and fox21 (b) cDNAs. The single R R M of both proteins is boxed in reverse type. The two R G G motifs found in Rox21 are boxed in grey. The aa positions {+ 1 for the predicted start codon) are numbered in italics. 5' and 3' non-coding nt sequences are derived from the largest identified cDNAs. The first and last nt of additional c D N A s having different shorter 5' or 3' non-coding regions are underlined. Polyadenylation signals near the 3' ends of the c D N A s are italicized, as well as an upstream in frame Met codon found in rox21 eDNA. These sequences have been deposited in GenBank (accession Nos. L34934 and L34935). (e) Comparison of the D. Melanogaster Rox2 aa sequence with its plant homologs. Three protein fragments highly related to Rox2 (R2) were identified as parts of conceptual gene products from O. sativa (O1; G e n B a n k accession No. D15970) and A. thaliana (A1 and A2; GenBank accession Nos. T04366 and Z17609, respectively) and their aa sequences were aligned with their insect homolog. The numbers on the right refer to positions in the respective aa sequences. Gaps are introduced in the aa sequences for obtaining an optimal alignment. Positions of aa of Rox2 which are recovered

191 TABLE I Distribution of RRM-encoding sequences among individually-isolated PCR products PCR amplified RRM-encoding sequencesa

ClassA ROX2 ROX21 pabp(RRMll )b pabp(RRMIV) b

Class B ROX8 pabp(RRMll + lll)b

Oligo set used in PCR c

Number of independent a clones sequenced cDNA template

genomic DNA template

P1 P2 P1 P2 P1 P2 P1 P2

26/'28 O/24 0/28 0/24 1/28 0/24 0/28 22/24

64/100 N.D. 12/100 N.D. 1/100 N.D. 2/100 N.D.

P1 P2 P1 P2

6/12 0/12 2/12 12/12

N.D. N.D. N.D. N.D.

" Refers to PCR amplifications of single (class A) or adjacent (class B) RRM-encoding sequences b Refers to PCR products derived from the pabp gene which encodes a protein with four adjacent RRM (designated as RRMI through RRMIV}. c See Fig. lb for sequences of oligos of the P1 and P2 sets. d Number of corresponding ROX-type PCR products represented among a definite number of randomly-picked clones. N.D., not done. Rox2) c o n t a i n s a single R R M (aa 97 to 1 7 0 ) j u x t a p o s e d t o w a r d the N - a n d C terminus, respectively, to acidic (29 out of 96 aa, i.e., 30%, are A s p or Glu) a n d basic (13 out of 54 a a are Arg) a u x i l i a r y d o m a i n s . A p r o t e i n sequence h o m o l o g y search was c o n d u c t e d to detect p u t a t i v e s t r u c t u r a l h o m o l o g s of Rox2 in o t h e r species. Strikingly, this analysis revealed extensive a a sequence similarities with three c o n c e p t u a l p r o t e i n fragments e n c o d e d by p a r t i a l O R F s f o u n d in expressed sequence tags (est) isolated from the p l a n t s Oryza sativa a n d Arabidopsis thaliana. As shown in Fig. 2c, the rice

est-encoded p r o d u c t c o r r e s p o n d s to the N - t e r m i n a l m o i e t y of Rox2 a n d b o t h p r o t e i n s d i s p l a y an overall a a sequence i d e n t i t y of 38% ( 6 8 % similarity if conservative a a exchanges are considered). Strikingly, the s t r u c t u r a l h o m o l o g y raises to 4 8 / 6 8 % (aa identity vs. similarity) over the last 74 a a (out of a total of 121 a a e n c o d e d by the available est sequence) of the rice p r o t e i n f r a g m e n t which can be fully aligned with the h o m o l o g o u s p o r t i o n of Rox2 (aa 43 to 116). Finally, it is n o t e w o r t h y that the h o m o l o g y is even m a r k e d l y high (63/100% a a identity vs. similarity) over the R R M p o r t i o n . Conversely, the two A. thaliana est sequences e n c o d e highly related p r o t e i n fragments ( 6 3 % overall a a i d e n t i t y in a pairwise c o m p a r i son) d i s p l a y i n g striking s t r u c t u r a l relationships to Rox2 over the R R M p o r t i o n ( a b o u t 58/80% a a i d e n t i t y vs. similarity for b o t h Arabidopsis proteins). Hence, in light of the s t r o n g a a sequence similarities a n d c o m m o n d o m a i n o r g a n i z a t i o n s , one can a s s u m e t h a t the rox2related O. sativa a n d A. thaliana est sequences c o r r e s p o n d to genes e n c o d i n g p l a n t h o m o l o g s of the insect p r o t e i n whose functional significance in fruit fly d e v e l o p m e n t r e m a i n s to be established.

(c) Nucleotide sequence of fox21 c D N A and the inferred Rox21 aa sequence: comparison to pre-mRNA splicing factors Likewise, restriction m a p p i n g a n d D N A sequencing analysis of four different rox21 c D N A classes revealed t h a t they are p r e d i c t e d to share a single O R F (not shown). T h e nt sequence of the largest c D N A insert (1173 bp) a n d the inferred c o n c e p t u a l a a sequence are shown in Fig. 2b. The R O X 2 1 sequence e n c o m p a s s e s nt 197 to 325. Two p o t e n t i a l in frame i n i t i a t o r c o d o n s could be detected (at nt 167 a n d 176), the latter being in a m u c h better c o n t e x t ( C A G C A T G vs. C C A G A T G ) . T h u s the A T G at nt 176 is likely to be the b o n a fide start c o d o n a n d predicts a 2 1 - k D a p r o t e i n ( h i t h e r t o referred to as Rox21) of 197 a a including a single R R M l o c a t e d at its N terminus (aa 8 to 78). Rox21 a p p e a r s to be very basic (calculated p I of

in at least one plant polypeptide are indicated by underlying asterisks; those corresponding to conserved substitutions (according to the Dayhoff matrix) are marked by underlying apostrophes. Residues of the RRM portion are overlined. (d) Rox21 and a selected range of SR proteins share highly related RRM. The RRM of Rox21 (R21) is compared to homologous domains of the D. melanogaster SRp20 (SR1; Genpept accession No. 104929), human 9G8 (SR2; Cavaloc et al., 1994), mouse SRp20 (SR3; Genpept accession No. 110838), D. melanogaster SRp55 (SR4; Genpept accession No. X62446) and human SRp30a (SR5; Genpept accession No. M69040) non-snRNP spliceosomal proteins. The numbers on the right refer to aa positions in each protein. Gaps are introduced in the aa sequences for obtaining an optimal alignment. Pairwise percent aa identities to R21 are indicated between brackets in the right-hand column, and conserved positions (including highly similar aa, as defined by the Dayhoff matrix) in the six sequences are marked by underlying asterisks. Methods: Cloned ROX2 and ROX21 fragments were amplified by PCR from pBS2 and pBS21, respectively, a2p-labeled and used as hybridization probes to screen a 4 8-h embryo-stage cDNA library as previously described (Brand and Bourbon, 1993). Four and 50 positives out of 105 plasmid recombinants were obtained, respectively, which in both cases were distributed among four different classes. A combination of deletion cloning and internal oligo strategies was used to obtain the nt sequences of the largest cDNA inserts on both strands. Database searches were achieved using the BLAST service at the National Center for Biotechnology Information. The aa sequence alignments were performed as previously described (Brand and Bourbon, 1993).

192 11) due to a high c o n t e n t of A r g (23 out of 119 aa, i.e., 20%) in its C - t e r m i n a l part. It is n o t e w o r t h y t h a t this region is also enriched in G l y (28%) a n d Ser (20%), a n d includes two R G G boxes. The R G G b o x is an e v o l u t i o n arily c o n s e r v e d m o t i f which has been s h o w n to confer non-specific R N A - b i n d i n g activity (Dreyfuss et al., 1993). D a t a b a s e searches detected striking a a sequence similarities between Rox21 a n d several m e m b e r s of a family of n o n - s n R N P p r e - m R N A splicing factors k n o w n as the SR p r o t e i n s (reviewed b y Birney et al., 1993). As shown in Fig. 2d, the highest h o m o l o g y scores (41 to 58% a a identity) were detected over the R R M of SRp20 from m o u s e a n d fruit fly, of 9 G 8 from h u m a n , of SRp55 from fruit fly a n d of S R p 3 0 a from h u m a n . N o t e w o r t h y , a l t h o u g h Rox21 a n d related SR p r o t e i n s c o n t a i n a high p r o p o r t i o n of Ser a n d A r g in their respective auxiliary d o m a i n , no extensive a a sequence similarities c o u l d be d e t e c t e d in this p o r t i o n (not shown). Based on the fact t h a t Rox21 a n d a range of SR p r o t e i n s share closely related R R M at their N - t e r m i n a l ends a s s o c i a t e d to p r o m i n e n t A r g + S e r - r i c h auxiliary d o m a i n s c o n t a i n i n g n u m e r o u s RS a n d SR dipeptides, we p r o p o s e that Rox21 m a y represent a novel p r e - m R N A splicing factor.

(tl) Conclusions In s u m m a r y , we have been able to identify by P C R three a d d i t i o n a l m e m b e r s of the R R M - p r o t e i n gene family in flies. The Drosophila system offers n o w a molecular genetic a p p r o a c h for a functional e v a l u a t i o n of the e n c o d e d p u t a t i v e r e g u l a t o r y R N A - b i n d i n g p r o t e i n s in the d e v e l o p m e n t of an insect. As a first step, e x p e r i m e n t s are in p r o g r e s s to d e t e r m i n e the cytological l o c a t i o n of the rox2 a n d rox2I genes a n d the l o c a l i z a t i o n a n d R N A b i n d i n g specificities of their respective p r o d u c t s .

NOTE ADDED IN PROOF Recently, p u t a t i v e h o m o l o g of R o x 2 has been identified in Saccharomyces cerevisial ( G e n p e p t accession No. Z38062).

ACKNOWLEDGEMENTS We t h a n k D r s F. Amalric, A. Vincent a n d J. S m i t h for h o s p i t a l i t y d u r i n g the initial p a r t of this work. We also t h a n k Yvette D e P r e v a l for synthesizing the d e g e n e r a t e oligos. Finally, we a c k n o w l e d g e J. S m i t h a n d D. C r i b b s

for critical r e a d i n g of the m a n u s c r i p t . This w o r k was supp o r t e d by an A T I P E g r a n t (No. 3) from the C e n t r e N a t i o n a l de la Recherche Scientifique a n d by a student fellowship (to S.F.B.) from the Minist6re fran~ais de l ' E n s e i g n e m e n t Sup6rieur et de la Recherche.

REFERENCES Adam, S., Nakagawa, T., Swanson, M.S., Woodruff, T. and Dreyfuss, G.: mRNA polyadenylate-binding: gene isolation and sequencing and identification of a ribonucleoprotein consensus sequence. Mol. Cell. Biol. 6 (1986) 2932-2943. Bandziulis, R., Swanson, M.S. and Dreyfuss, G.: RNA-binding proteins as developmental regulators. Genes Dev. 3 (1989) 431-437. Birney, E., Kumar, S. and Krainer, A.: Analysis of the RNA-recognition motif and RS and RGG domains: conservation in metazoan premRNA splicing factors. Nucleic Acids Res. 21 (1993) 5803 5816. Brand, S. and Bourbon, H.-M.: The developmentally-regulated Drosophila gene rox8 encodes an RRM-type RNA binding protein structurally related to human TIA-l-type nucleolysins. Nucleic Acids Res. 21 (1993) 3699 3704. Brown, N. and Kafatos, F.: Functional cDNA libraries from Drosophila embryos. J. Mol. Biol. 203 (1988)425-437. Bugler, B., Bourbon, H.-M., Lapeyre, B., Wallace, M.O., Chang, J.-H., Amalric, F. and Olson, M.O.J.: RNA binding fragments from nucleolin contain the ribonucleoprotein consensus sequence. J. Biol. Chem. 262 (1987) 10922-10925. Cavaloc, Y., Popielarz, M., Fuchs, J.-P., Gattoni, R. and St6venin, J.: Characterization and cloning of the human splicing factor 9G8: a novel 35 kDa factor of the serine./arginine protein family. EMBO J. 13 (1994) 2639 2649. Cavener, D.: Comparison of the consensus sequence flanking translational start sites in Drosophila and vertebrates. Nucleic Acids Res. 14 (1987) 1353 1361. Dreyfuss, G., Matunis, M., Pinol-Roma, S. and Burd, C.: hnRNP proteins and the biogenesis of mRNA. Annu. Rev. Biochem. 62 (1993) 289-321. Kenan, D., Query, C. and Keene, J.: RNA recognition: towards identifying determinants of specificity. Trends Biochem. Sci. 16 (1991) 214-220. Kim, Y,-J. and Baker, B.: Isolation of RRM-type RNA-binding protein genes and the analysis of their relatedness by using a numerical approach. Mol. Cell. Biol. 13 (1993) 174 183. Lantz, V., Chang, J., Horabin, J., Bopp, D. and Schedl, P.: The Dro,sophila orb RNA-binding is required for the formation of the egg chamber and establishment of polarity. Genes Dev. 8 (1994) 598-613. Lefrere, V., Vincent, A. and Amalric, F.: Drosophila melanogaster poly(A)-binding protein: cDNA cloning reveals an unusually long 3'-untranslated region of the mRNA, also present in other eukaryotic species. Gene 96 (1990) 219 225. Macdonald, P.: The means to the ends: localization of maternal messenger RNAs. Semin. Dev. Biol. 3 (1992) 413-424. Rio, D.: Splicing of pre-mRNA: mechanism, regulation and role in development. Curt. Opin. Genet. Dev. 3 (1993) 574 584. Wharton, R. and Struhl, G.: RNA regulatory elements mediate control of Drosophila body pattern by the posterior morphogene nanos. Cell 67 (1991) 955-967.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.