Gorilla and orangutan c-myc nucleotide sequences: Inference on hominoid phylogeny

June 13, 2017 | Autor: F. Galibert | Categoría: Evolutionary Biology, Genetics, Molecular Evolution, Phylogeny, Gorilla, Humans, Sequence alignment, Animals, Introns, Hominidae, Codon, Amino Acid Sequence, Base Sequence, Open Reading Frame, Nucleotides, Biochemistry and cell biology, DNA sequence, Humans, Sequence alignment, Animals, Introns, Hominidae, Codon, Amino Acid Sequence, Base Sequence, Open Reading Frame, Nucleotides, Biochemistry and cell biology, DNA sequence

Share Embed

Laporkan tautan ini

Descripción

J Mol Evol (1995) 41:262-276

jouRNALo MOLECULAR IEVOLUTION © Springer-VerlagNew York Inc. 1995

Gorilla and Orangutan c-myc Nucleotide Sequences: Inference on Hominoid Phylogeny Khosro Mohammad-Ali, Marthe-Elisabeth Eladari,* Francis Galibert Laboratoire "RecombinaisonsG6n6tiques"--UPR 41 CNRS, Facult6 de M6decine, 2 av du Pr Ldon Bernard, 35043 Rennes Cedex, France Received: 11 August 1994 / Accepted: 2 January 1995

Abstract.

The nucleotide sequences of the gorilla and orangutan myc loci have been determined by the dideoxy nucleotide method. As previously observed in the human and chimpanzee sequences, an open reading frame (ORF) of 188 codons overlapping exon 1 could be deduced from the gorilla sequence. However, no such ORF appeared in the orangutan sequence. The two sequences were aligned with those of human and chimpanzee as hominoids and of gibbon and marmoset as outgroups of hominoids. The branching order in the evolution of primates was inferred from these data by different methods: maximum parsimony and neighborjoining. Our results support the view that the gorilla lineage branched off before the human and chimpanzee diverged and strengthen the hypothesis that chimpanzee and gorilla are more related to human than is orangutan.

Key words:

DNA sequencing - - c-myc - - Primates - - Phylogeny - - Gorilla - - Orangutan

Introduction At present, no definitive agreement has been reached on either the correct branching order or differential evolu-

* Present address: Laboratoire "R6trovirus et R6trotransposons des V e r t 6 b r 6 s " - - U P R 43 CNRS, Universit6 Paris 7, H6pital Saint-Louis, 16 rue de la Grange aux Belles, 75475 Paris Cedex 10, France Correspondence to: F. Galibert

tion rates among the higher primates. The morphological picture of primate phylogeny has not unambiguously identified the nearest outgroup of anthropoids and most of all has not resolved the branching pattern within hominoids. On the other hand, the molecular picture could provide more resolution and clarity the systematics of hominoids. Until now, four types of genes have been studied to determine phylogeny of hominoids: the ~-type globin genes, the immunoglobulin genes, the RNA genes, and the mitochondrial DNA sequences. Among the studies on mitochondrial DNA, that of Hayasaka et al. (1988), Hasegawa et al. (1990), Ruvolo et al. (1991), and Horai et al. (1992) favor the grouping of chimpanzee with human, whereas for Brown et al. (1982) chimpanzee and gorilla are grouped. Gonzales et al. (1990), using 28S rRNA sequences, conclude that human and chimpanzee are the most closely related pair. The same conclusion is drawn from studies of the C~ and Cc~ immunoglobulin primate genes (Ueda et al. 1988, 1989). Analyses on the tVl]-8, [3-type globin pseudogene favor the grouping of chimpanzee with human first (Miyamoto et al. 1987; Holmquist et al. 1988; Maeda et al. 1988; Goodman et al. 1990). More recently, these results have been reinforced by analysis of the 8-[3-globin intergenic region (PerrinPecontal et al. 1992) and the flanking sequences of the "y-globin genes (Bailey et al. 1992). However, the results of Oetting et al. (1993), analyzing the evolution of the tyrosinase-related gene in primates, suggest that gorilla would be closer to human than chimpanzee. In order to help resolve the separation of human, chimpanzee, and gorilla lineages, we have chosen to

263 study the c-myc oncogene. The myc-oncogene family contains coding sequences that have been preserved in different species for over 400 million years, which allows good alignment between related species such as those belonging to higher primates. Moreover, the length of this sequence (around 6,600 nucleotides) is sufficient to provide enough informative sites to permit estimation of the relationship between human and great apes. In this study, the complete sequences of the c-myc gene have been determined for gorilla and orangutan. These sequences were compared one to another and to a human sequence described by Gazin et al. (1984), to a chimpanzee sequence (Argant et al. 1991), and to gibbon and marmoset sequences (Eladari et al. 1992), the latter as outgroups of the hominoids. We also compared the amino acid sequences of the myc protein.

Materials and Methods Source. Gorilla (Gorilla gorilla) and orangutan(Pongo pygmaeus) Iibraries were obtainedfrom genomicbanks of the AmericanType Culture Collection.From these libraries,severallambdacloneswere sorted out by hybridizationwith a c-myc probe and the myc insert was sequenced with the dideoxy method (Sanger et al. 1977) after sonication, repair, and cloning in M13. When we aligned the orangutanDNA sequence with that of other primates, we found that this DNA, stored as orangutanin the ATCC, was in fact derived from a gibbon, and we decided to find another source. OrangutanDNA was then extracted from blood samples obtained from the "Zoo de Mficon" (France). The genomicregion corresponding to c-myc gene was amplifiedby PCR, using several sets of oligonucleotides specific of the human gene. The amplificationproducts were cloned in phage M13mp89 (Huang et al. 1989) and several clones were sequencedin order to detect possible misincorporationby PCR and discriminatepolymorphicpositions.

Sequence Alignment. Multiplealignmentswere then performed using the Clustal V program. The approach used in Clustal V is a modified version of the method of Feng and Doolittle (1987). The pairwise aligned sequenceswere compared each other in order to determinean initial guide tree (dendogram); then the sequences were aligned in larger and larger groups accordingto the branchingorder in this dendogram. This approach allows a very usefuI combinationof computational tractabilityand sensitivity. The positions of the gaps generated in early alignments remain throughout later stages. At each alignment stage, we aligned two groups of already-alignedsequences,using the algorithmof Myers and Miller (1988) for optimization. Phylogenetic Analysis. We used two methods to infer phylogenies: • The first is a distance-basedmethod. The number of sequence differences in pairwise comparisons of aligned sequences was computed. Differenceswere calculated in terms of the number of transitions and transversions. Percent divergence figures were then calculated by Kimura's two-parameter method (Kimura 1980) and these distances were used with the neighbor-joiningmethod (Saitou and Nei 1987) to constructunrootedphylogenetictrees. The neighbor-joiningmethod allows for unequalrates of evolutionin different lineages. • The second one is a maximumparsimonymethod using the DNAPARS program (Fitch 1971).

We made bootstrap resamplingof the nucleotidesequencesin order to assess the robustnessof these phylogeniesby the Seqboot program. All the programs used are from the PHYLIP package (Felsenstein1988).

Results From the analysis of the sequences, we observed that, potentially (since we did not analyze the mRNA encoded by these sequences), the gorilla and orangutan c-myc genes had the same characteristic structure (three exons and two introns) as the human gene, with the major polypeptide open reading frame (ORF) residing in the second and third exons (Fig. 1). There was 98.42 and 97.34% similarity between human and gorilla and human and orangutan genes, respectively, and as expected, exons were better conserved than introns (Table 1). The sequence of the first exon was 99.14% conserved between gorilla and human. As in the case of the human c-myc gene, this region contained an ORF (with an A T G at nucleotide 2304) which extended for 188 codons down to a stop codon (nucleotide 2868). However, we failed to observe an ORF in the orangutan sequence due to the insertion of an additional G in the orangutan, gibbon, and marmoset sequence at position 2537, introducing downstream a transition to a phase blocked by several stop codons. The nucleotide sequence similarity between human and orangutan was reduced to 97.94% (Table 1). When we considered the different regions of the gene, we observed that gorilla was closer to human than orangutan and that the percent divergence between gorilla and orangutan c-myc sequences was larger than the percent divergence between human and gorilla (Table 2). We made an alignment as indicated in Materials and Methods of the gorilla and orangutan c-myc sequences with already-published sequences for human, chimpanzee, gibbon and marmoset in order to construct phylogenetic trees, and used two methods to estimate the percent of divergence between the different sequences (Table 3). The levels of divergence calculated according to Kimura are always smaller than those calculated according to the first method. This difference can be explained by the fact that in the Kimura formula, insertion/deletion events were not taken into account. With both methods, we observed that orangutan was the most divergent of the human-great ape group, as it differs from other hominoid with a mean percentage of %D 2.64 (from 2.51 to 2.74), and that human and chimpanzee represent the closest species. The mean divergence between the human-great ape clade and outgroups is 3.10% for gibbon (from 2.96 to 3.21) and 9.97% for marmoset (from 9.18 to 10.41). Levels of divergence calculated by Kimura's method were used to estimate phylogenetic distances using the neighbor-joining method (Fig. 2A). We also used the

264 HU

1504 CACATCTCAG

GGCT

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

A AACAGACGCC

TCCCGCACGG

GGCCCCACGG

AAGCCTGAGC

1568 AGGCGGGGCA

CH

C

G

...............

A

C

A

A C A

C

A

GG OO GIB MAR

C C C A

G G T G

............... A ............... A ............... A TAGCGC GTCCAGGAGG

A C C C

A A A G

A A A C

C C C T

A A A G

HU CH GG O0 GIB

1569 GGAGGGGCGG

TTTGGCAGCA

AATTGGGGGA

CTCAGTCTGG GTGGAAGGTA G G G G

C

MAR

HU

TATCTGCTGC A A A A

1649 ATACATAATG

1648 ATAGCTGTGC

CGCTCTCCAA

1728 GTATACGTGG

A

CATAATACAT

GACTCCCCCC

AACAAATGCA

ATGGGAGTTT

ATTCATAACG

G G G

A A A

T T T

T T C

T T T

T T T

GIB MAR

A G

A G

T A

T T

C T

T C

1729 CAATGCGTTG

CTGGGTTATT G G G

TTAATCATTC

TAGGCATCGT A G A G A G

TTTCCTCCTT T T T

ATGCCTCTAT A A A

GIB

G

A

G

T

A

MAR

A

C

A

G

C

HU CH

1809 AACATCCCAC T

GCTCTGAACG

CGCGCCCATT GC

GG

T

GC

O0 GIB MAR

T T C

GC TC GG

HU

1889 CCTTTCCCCA

GCCTTAGCGA

AATACCCTTC TTTCCTCCAC C A C T A C A A G

GGCGCCCTGC

T T T

C C T

A A C

GIB MAR

T G

T G

C C

1969 TGGAGGGCAG

CTGTTCCGCC

TGCGATGATT

CH

G

GG OO GIB MAR

G G G T

C C G

AGCCTGGTAC

CH GG OO

HU

A A A G

TCCAATCCAG

CH GG O0

HU CH GG OO

C T C C

T

TCTCCCTGGG T

C C C

GCGCGTGGCG G G

C

TT TT -T TC

TGGCGGTGGG

GGACAAGGAT

AT AT AT GA

T

ACTCTTGATC TT

C C C

G G T

TATACTCACA AC

1808 CATTCCTCCC TATCTACACT C C C C C C C C C C C T

G G G G

T T T C

1888 AAAGCGCGGC AA C AG AA AA CA

A C C C

1968 CGCGCAGTGC GTTCTCTGTG T GC G TG C GC G TG TG C GT G C C

GCGGTTTGTC T T

C

AT G GT A

AAACAGTACT A CA A A A G A

CA CA CA TG

A A A G

TG GG

2048 GCTACGGAGG CG CG TG CG AA 2128 CGGCTCTCTT CGG CGG CTG CGG AGA

HU CH GG OO GIB MAR

2049 AGCAGCAGAG C C C C G

AAAGGGAGAG GG GG GG GG ---

GGTTTGAGAG AG AG AG AG CC

GGAGCAAAAG G G G A C

AAA_ATGGTAG

HU

2129 ACTCTGTTTA

CATCCTAGAG

CTAGAGTGCT

CGGCTGCCCG GG T C GG T C CT T C CG T C GG A G

GCTGAGTCTC G G G G A

2208 CTCCCCACCT TCCCCACCCT CCCCACCCTC CCCACCT TCCCCACCCT CCCCACCCTC CCCACCT TCCCCACCCT CCCCACCCTC .................. CCCACCCTC CCCACC ..................... C C C .... C A C C C G C C - T C C C C A C C - - -

TTCCCAAAGC

AGAGGGCGTG

GGGGAAAAGA

CH GG OO GIB MAR

HU CH GG OO

T T G G G 2209 CCCA .........

G G G G A

TAAGCGC

C C C C T

CCCTCCCGGG

CCCA ......... T CCCACCCTCC CCGT CCTACTCTCC CCAG

Fig. 1. Sequence alignment. Alignment of the Gorilla gorilla (GG) and Pongo pygmaeus (00) c-myc sequences with already published sequences: (HU) Homo sapiens (Gazin et al. 1984); (CH) Pan troglodytes (Arganlt et al. 1991); (GIB) Hylobates lar," and (MAR) Callithrix jacchus (Eladari et al. 1992). The numbering of the human sequence is froln Gazin et al. (1984). First digits of numerals are aligned with the

GCGCGCGTAG CC T CC T CC C CC C AA C

TT~TTCATG

AAAAAGATCC C C C

2279 TCTCTCGCTA T T T T G G

corresponding nucleotides. Only the positions where one or more nucleotides differ from the human sequence are indicated. Void positions correspond to sequence identical to the human one. Missing nucleotides are indicated by a dash. The limits of the various regions of the sequence are indicated (exon 1-intron 1-exon 2-intron 2-exon 3).

265 GIB MAR

...... CTCC ---ACCCTCC

CCAG CCAG

C A

2280 HU CH

Upst ream/Exon

ATCTCCGCCC T

GG O0 GIB MAR

T T T C

HU CH GG OO GIB MAR

2356 C---ACCGCC --------GTC

HU CH GG O0 GIB MAR

HU CH GG O0 GIB

ACCG- --GC CCTTTATAAT G .... A G .... G .... G .... AACCC

2433 AAGAACGGAG

2513 GAGGCAGAGG

HU CH GG O0 GIB MAR

HU CH GG

T T T C

GCTGTGCTGC G

G G A G

G G G C

GGCAGGGCTT

CTCAGAGGCT A G A A A

GGAGGGATCG AT AT AT AT GA

CGCTGAGTAT

AAAGCCGGT C C C T C

TTTCGGGGCT G G A G G

TTATCTAACT

CGCTGTAGTA

GAGCGAGCGG

GCGG-CCGGC -CC -CC GCC GTC

TAGGGTGGAA T T T T

GAGCCGGGCG

AGCAGAGCTG A A A A

CGCTGCGGGC G G G G C C C G

G G A

C C T

C T A

G

GGGGCTTCGC G T G G G

C

CTCTGGCCCA T GC C GC

GCCCTCCCGC C C

C AC C GC C GT

T C C

-

G T G T A C

G

ACG;ukACTTT G C C C A T A G C A C A T A T A T A T T

GCGGGCGGGC G G C G G

ACTTTGCACT

GGAACTTACA

ACACCCGAGC G G G G T

2751 CTCTCCCGAC

GCGGGGAGGC

CATTTGGGGA GG GG

CACTTCCCCG T T

CCGCTGCCAG C G C G

GACCCGCTTC G T G T

T

2830 TCTCCTTGCA

GCTGCTTAGA

G G C

CGCTGGATTT

GG GA AG

TTTTCGGGTA

2512 ATTCCAGCGA C C C C T 2591 GTCCTGGGAA

2670 AGCCAGCGGT CCGCAACCCT G T G G T G

2671 TGCCGCATCC T C T C T T T C C C

T-ATTCTGCC G G

2432 TGGCGGGAAA

C G

TGATCCCC-C T T -

C T C

TCGCGGCCGC G

G G G T

CCTCGAGAAG C C C C G

O0 GIB MAR

HU

GA GA AA CG

GGACCCCCGA G

TCCCCTCCTG

2592 GGGAGATCCG-GAGCGAATAG G C T G C T

T T C

C G C G T G

G T G

G G A 2750 AAGGACGCGA

2829 TCTGAAAGGC C C

T T C

C C -

Exon I/Intron 1 GTGGAAAACC AGGTAAGCAC CGAAGTCCAC

2909 TTGCCTTTTA

CH GG

A A

T C T C

A A

TC TC

AT AT

C C

A A

AC AC

AA AG

C T

C C

O0 GIB MAR

A A C

T C T T A T

A A G

CC TA TC

GT GT GC

C C G

A A G

TC AC AG

AG AG GG

T T T

T T T

2910 HU CH GG O0 GIB MAR

ATTTATTTTT TT ....... A TCACTTTAAT ATTTATTTTT TT ....... A A A T T T A T T T T T TT ........ A A ATTTTTTTTT TTTTTTTTTA G A T T T A T T T T T ........ T T A G ............ TTTTTTTG G

GCTGAGATGA

GTCGAATGCC C C C T G

G G

G G

G G

G T

CTGGGGTGGG C G A C G A C G G C G G T G

GGG-TAATCC G - A G - A G GA GTA T - G

TAAATAGGGT G G

GTCTTTTCTC

2983 HU CH GG OO GIB MAR

Fig. 1.

GCTATTGACA C C C C T

Continued.

CTTTTCTCAG C C T C C

AGTAGTTATG A A A A G

GTAACTGGGG AC G AC G AC G AC G GG T

G G 2355

GGACGGCTGA GG T

GGGCCCCGGC CGTCCCTGGC C CC T G T C C T G C AC C G C C C T A C C A G G

GCA

OO GIB MAR

1

GCGAGGGTCT

A A A G

MAR

HU CH GG

T T

AGAACTGGAT

2982 CCATTCCTGC

3061 CGGGGTAAAG G A G G G

266

HU CH GG O0 GIB MAR

3062 3140 TGACTTGTCA AGATGGGAGA GGAGAAGGCA GAGGGAAAAC GGGAATG-GT TTTTAAGACT ACCCTTTCGA GATTTCTGCC C AG AG A TT GA C C AG AG A TT GC C C AG AA A TT GC C C AG AG C TC GC C G CT GG A T CT GC T

HU CH GG OO GIB MAR

3141 3220 TTATGAATAT ATTCACGCTG ACTCCCGGCC GGTCGGACAT TCCTGCTTTA TTGTGTTAAT TGCTCTCTGG GTTTTGGGGG T C AT C G C T G T C AT C G C T G T C GT C G C T A T C AT A G C G G G A AC C G T T G

HU CH GG O0 GIB MAR

3221 G-TGGGGGG- TTGCTTTGCG GTGGGCAGAA AGCCCCTTGC ATCCTGAGCT G T C G A G C G T C G A G C C T C G A G C G C C A A G C A G G T G G T A T

HU CH GG OO GIB MAR

3299 TGAGCCAGAT A AG A TG A AG GAG C AG

HU CH GG O0 GIB MAR

3379 A G C C G G A G A C G G A C A C T G C G G C G C G T .... C C C G C C C G C C G .... A .... G .... C .... A ACCG

HU CH GG O0 GIB MAR

3455 AAGTTGGCAT TTGG-CTTTT TAAAAAGCAA TAATACAATT TAAAACCTGG A A A A G C

HU CH GG O0 GIB MAR

3534 3613 GTAGGCGCAG GCAGGGGAAA AGGGAGGCGA GGATGTGTCC GATTCTCCTG GAATCGTTqA CTTGGAAAAA CCAGGGCGAA T G G A G G A G T G G A A G A G G A G A G G A G G G G A G G A G G G A G G A G C

CGCTCCGCAG CCGCTGACTT A A G T A A G T A A G T A A T T A G C

3298 CCTTGGAGTA GGGACCGCAT ATCGCCTGTG C O C G T G C G C A

3378 GTCCCCGTCT CCGGGAGGGC ATTTAAATTT CGGCTCACCG CATTTCTGAC G G G GG C G G G GG C G G G GG T G G T AG C T A G GA C 3454 TGTCCCCGCG GCGATTCCAA CCCGCCCTGA TCCTTTTAAG T C C T C C T T A T C T C C T 3533 GTCTCTAGAG GTGTTAGGAC GTGGTGTTGG G T T C G T C T A T C T G T C T G C C T

3614 TCTCCGCACC CAGCCCTGAC TCCCCTGCCG CGGCCGCCCT CGGGTGTCCT CGCGCCC--G AGATGCGGAG C G CT G C-CH C C G CT G T-GG C C A CT G CCT O0 C T G GG G CCT GIB C C G CT A CCT MAR A HU

3691 GAACTGCGAG GA T A GA T A GA T A GA T C TG G A

3771 GGTATCGCAG CGGGGTCTCT G G G G G G G G A C

HU CH GG O0 GIB MAR

3692 GAGCGGGGCT CTGGGCGGTT CCAGAACAGC TGCTACCCTT GGTGGGGTGG CTCCGGGGGA T G T G CG G G C T G T G CG G G C T G T G CA G G C T G T G CG G O C G T C AG C A G

HU CH GG O0 GIB MAR

3850 3772 GGCGCAGTTG CATCTCCGTA TTGAGTGCGA AGGGAGGTGC CCCTATTATT ATTTGACACC CCCC-TTGTA TTTATGGAGO A G CGA A TT C A A A G CAA A TT C G A G G CGA A CT C A T G G CGA G TT C C A A G G -GC G TC G A A

HU CH

3926 3851 GGTGTTAAAG CCCGCGGCTG AGCTCGCCAC TCCAGCCGGC GAGAGAAA-G AAGAAAAGCT GGCAAAAG~- -GAGTGTTGG CC G C AC A AA GG TGTTGG

Fig. 1.

Continued.

267 GG OO

CC TT

G C

C C

AA AA

GIB

CC

C

C

AA

MAR

CC

G

G

CA

HU

3927 A C G .... G G G G C G G T A C T G G

CH GG OO GIB MAR

HU

G C G .... G C G .... GCG--GGGTGGG ...... G

G G G G A

4003 AGGAGAGCGG

GT GT GT GT -C

GGGTGGGGAC

C C C C T

GGGGGCGGTG

G A G G G

CTAGGGCGCG

AGTGGGAACA

GAGAGGGAAG

G G G G A

C C

A AA GG T AA GG

TGTTGG TGTTGG

C

T AA

GG

TGTGGG

C

C AA GA

......

GTTGGGAGGG

GCTGCGGTGC

G G G G A

GCCGCAGCGG

C G G G G

AGGGGCCCCG

GCGCGG-AGC

4002 CGGCGGGGGT

T T C C C

G G G A G

GGGGTTCACG

4081 CAGCCGCTAG

CH GG OO

A A A

G G T

CT CG TG

A A A

G G G

C C C

G G G

T A T A T A

AG C AG C AG C

GIB MAR

A C

T A

CG AG

A G

G A

C G

G G

T A G G

AG T CT C

HU CH GG O0 GIB

4082 CGCCCAGGCG C C C C C G A G

C

MAR

HU CH

HU CH GG

TCTCCTTCAG

GTGGCGCAAA

CGGCTTCTTA T

A A A G

T T T C

4240 CGCTCCGGGC GG GG

AGGGC--GCC A C--G A A A G

C--G C--G C--G GTGA

TCCCGGGGGA GCGGGG-GCT G GGA GCG -G G GCG GGG CG

AGGGCCGATT A C A C

A T

A

G

C

G

-CC

GG

C

GCG

-G -G -A

T T

HU CH GG

GGAGAGGTTC A T A T

T TGC T TGC A GGG

A A T

G A G

T T C

4473 AGACTGCCTC CCGCTTTGTG T G T G

O0 GIB MAR

HU CH

G

4493 CTCTGCAAGG T TTC G T TGC G

OO GIB MAR

HU CH GG

G

T C C 4553 CTATGACCTC

G G T

GACTACGACT

G C

GGGACTGTGG A A A A G

CAAGCCGCTG G C T C GC

C T

T A

...... C C T T T A T T C C C C C A ...... A C ...... A C ...... A C ...... A C TCCCCC

T

CGCGCACTGC CG A CG A CG CG TC

C

CGGTGCAGCC

A A T

GTATTTCTAC

C T

CCAGCCGCTT C G C G C G C T

T T

4392 TAGGGGATAG G G G A

GCCCCTCAAC

G

ACCAAGACCC C T

C C T

A G G

TGCGACGAGG

4318 AGATAGCAGG T T T

T G

AGGTTTCCGC C C

C G C

C C G

GC AG

G

GCGCTGCGCC C C

A A C

G G

CCAAGACCAC A A A A

G

GG OO GIB MAR

HU CH GG OO

GTTCACTAAG TGCGTCTCCG T G G C C T A GC C T G GC C

GC GT

Intron i/Exon 2 TGCCCCGCTC CAGCAGCCTC CCGCGACGAT TC C A C G CC C A C G CT CC CC

4239 CCGGGCTTTG CG T CG T CA CG

G G-- GCC G C-- -CT

C

T~CGCTGCG GGGCCGACTC T A GCA T G GCA

C

GCA TGA

A G--

GAG

......

G G

CG GT

AAGGGTGCTC G G T G G C G G C A A C

A

T T

GG

AAGGGGGTGA G G G A

G

4161 TTCCTCACCG TT C TT C TT C TT T

C C

OO

4319 GGACTGTCCA GGA T AGA T GGA C GGA C

TCGATTCCTC

GCAAATTGTT A TGTA A TGTA - TGTA A TGTA

G A

GIB MAR

MAR

TTGGATTTTG A A G G

C

CGGCGGGCAC C C C C C C

HU CH GG OO GIB

ACTTTGTGCC T T T T

C

4162 CCACCTCCCG A

GG O0 GIB MAR

CCTCTCGCCT

C C C

GTTAGCTTCA T T T T C

AGGAGAACTT G

4472 CTTTAACTCA A A A A C 4552 CCAACAGGAA C C C C G

CTACCAGCAG

4632 CAGCAGCAGA

CCCCGCCCCT C C C

4712 GTCCCCTAGC T T T

G G G A 4633 GCGAGCTGCA

Fig. 1.

Continued.

GCCCCCGGCG C G C G C G

CCCAGCGAGG

ATATCTGGAA

GAAATTCGAG

CTGCTGCCCA

268 GIB MAR

C G

G A

C T

T G

HU CH GG OO GIB MAR

4713 4792 CGCCGCTCCG GGCTCTGCTC GCCCTCCTAC GTTGCGGTCA CACCCTTCTC CCTTCGGGGA GACAACGACG GCGGTGGCGG C C C A GG G T G C C T A GG G C G C C C A G G G C G C C C A G G G C G G AC G TA G C T

HU CH GG O0 GIB MAR

4793 4872 GAGCTTCTCC ACGGCCGACC AGCTGGAGAT GGTGACCGAG CTGCTGGGAG GAGACATGGT GAACCAGAGT TTCATCTGCG G C T G C T G C T G C T T T C

HU CH GG O0 GIB MAR

4873 4952 ACCCGGACGA CGAGACCTTC ATCAAAAACA TCATCATCCA GGACTGTATG TGGAGCGGCT TCTCGGCCGC CGCCAAGCTC C C A C C C C C C T

HU CH GG OO GIB MAR

4953 5032 GTCTCAGAGA AGCTGGCCTC CTACCAGGCT GCGCGCAAAG ACAGCGGCAG CCCGAACCCC GCCCGCGGCC ACAGCGTCTG A G C C A G T C A G C C C A C C A A C G

HU CH GG O0 GIB MAR

5033 5112 CTCCACCTCC AGCTTGTACC TGCAGGATCT GAGCGCCGCC GCCTCAGAGT GCATCGACCC CTCGGTGGTc TTCCCCTACC C G C A C C C A T A C C T G C A C C C G C A C C C G C G A G

HU CH GG O0 GIB MAR

5113 CTCTCAACGA CAGCAGCTCG CCCAAGTCCT GCGCCTCGCA T C C C T C A T C C C T G A T C T T T A C T C C C T G C G T C C C G C

}{U CH GG O0 GIB MAR

5193 TCCTCGACGG AGTCCTCCCC G G G A G

5272 GCAGGGCAGC CCCGAGCCCC TGGTGCTCCA TGAGGAGACA CCGCCCACCA CCAGCAGCGA G A G C G C G A G C G C G A C C G C T A C G G C G G C C A T

HU CH GG O0 GIB MAR

Exon 2/Intron 2 5273 CTCTGGTAAG CGAAGCCCGC AAGCCC C A ...... C AAGCCC C AAGCCC C GAGCCC T

CCAGGCCTGT CAAAAGTGGG C A T T A T C A T C A T C C C

HU CH GG OO GIB MAR

5352 5411 AACGGGCCAC T .................... CTTATTAGG AAGGAGAGAT AGCAGATCTG GAGAGATTTG GGAGCTCATC A .................... ATT A T A AG G T G .................... ATT A T A AG G T G ....................... A T C AG G T G .................... ATT A T A GG G T G TCAGTCGAC CCTGCCTTTC T AGT G G A AC A G

HI] CH GG OO GIB MAR

5412 ACCTCTGAAA CCTTGGGCTT T T T T T T T T C C

Fig. 1.

Continued.

AGACTCCAGC GCCTTCTCTC CGTCCTCGGA A TC A TC A TC G TC A CA

5192 TTCTCTGCTC T T T T C

5351 -CGGCTGGAT ACCTTTCCCA TTTTCATTGG CAGCTTATTT TC T T GC T TC T C GC T TG T T GC T TC T T GC T G GC C T TT C

5491 TAGCGTTTCC TCCCATCCCT TCCCCTTAGA CTGCCCATGT TTGCAGCCCC CCTCCCCGTT T C C CC C TC G G C G T C T CC C TT C G C G C C C CC C TT C G C G T G C AC C TT C A C G T C C CA C TT C G T C

269

HU

5492 TGTCTCCCAC

CCCTCAGGAA

TTTCATTTAG

GTTTTTAAAC

CTTCTGGCTT

ATCTTA-CAA

CH GG

G G

C C

A A

TCTG TCTG

C T C T

OO

G

C

A

TCTG

C T A

GIB MAR

G C

C T

A

TCTA ---G

C T . T C -

HU

5571 CTCCCGTTAA

CH GG OO GIB MAR

HU CH GG OO GIB MAR

CG CA CG TG CT

T T T T A

CATTTTAATT

A A A A G

5651 AG--TGAATG AG AG AG AG A-

GCCCTGGGGC GGGGTGGCAG T C GT GCA T C GT GCA T C GT GCC T C GT GCG C

T

TA ATG

GGAGTGTATG A TGTAT A TGTAT A TGTGT A TGTGT

5570 TTCTTCTTAC

-

T--TCTT TCTT .

AATGAGGATA A A A A

G .....

CTCAATCCAC

.

.

AGAGAGGATT GA T GA T -T GA C

G

GA

T

T C--5650 GATCTCTGAG G A A A A 5727

AATTGCTTCC CTCTTAACTT G C G C G C A C G T

5728 GTTTAGAGGC TAGGCAGGGC CTGCCTGAGT G G T AT G C G G GG T C T G C G G OO T C T G T G G GIB T C T G C C MAR G C C A C HU CH

CCGAGAAGTG CG CG TG CG CA

GT-GGGATTT -G -A -G -G TG

AATGAACTAT

CTACAAAAAT

GAGGGGCTGT A A A A

GCGGGAGCCA

GTGAACTGCC

TCAAGAGTGG

GTGGGCTGAG

GAGCTGGGAT

5807 GG GG GG GA AG

G G G G C

A A A A G

GA GC AC GC GC

G G G G -

GTG GTG GTG GTG --A 5887

58O8 HU CH

CTTCTCAGCC

GG OO GIB MAR

TATTTTGAAC ACTGAAAAGC AAATCCTTGC AAA C AAATCCT G C AAA C AAATCCT G C T C C

CAAAGTTGGA CAA G CAA G

CTTTTTTTTT CTTTTT CTTTTT

TCTTTTATTC CATTTT-TTC CTTTTT-TTC

CTTCCCCCGC C GC C GC

AAA C AAATCCT AAA C AAATCCT

G G

CAA CAA

G G

CTTCTT CTTTTT

TCTTTTTTTC T ...... TTC

C C

GC GC

GTT T GTTCAAA

C

TTG

C

AGGACC

T T T T T T ....

A

AT 5967

5888 HU CH GG OO GIB MAR

CCTCTTGGAC

TTTTGGCAAA T T T T A

ACTGCAATTT TTTTTTTTTT ATTTTTCATT G A T TC . . . . T T T T G G T T C T T T .... T G A C C ..... TTT T G A T T C T T T .... T A A G T C T T T .... C

TCCAGTAAAA

TAGGGAGTTG

TTGCAGCTAT

CATTTGCAAC

ACCTGAAGTG

AGTCCCTCAA

AAATAGGAGG

6047 TGCTTGGGAA

TGTCCTATCC CC C CC C

6127 TGGGAAGTTG G T G T

5968 HU

ACCAAGCAAT

CH GG OO GIB MAR

C C C G C

HU CH GG OO GIB MAR

6048 TGTGCTTTGC T T T T A

HU CH GG OO

6128 CACTTTTCTT C C C C T G

GIB MAR

T G T G

A A A A C

TTTGGGTGTG

GTCCATGCCA CA C CA A CA C CA TG

C C

TTCTTGGTAA

T T T

C C C

T G

C T

TAGGTAAGAA

TTGGCATCAA T T

TCCAAAGCCT C C C T C

CATTAAGTCT

TAACCCAGCT A A A

GTCTTTCCCT TTATGAGACT CTTACCTTCA TGGTGAGAGG C TCCCT TTATGAGACT CTTA A G C TCCCT TTATGAGACT CTTA A G C TCCCT TTATGAGACT CTTA A G C TCCCT TTAAGAGACT CTTA A T C .................. G G G

A C

T T C

CC CG GG

CTAAAGTCAT C C C C G

C G C

G T G T A G 6207 AGTAAGGGTG TA TA TA TA GT

HU CH GG OO GIB MAR

6208 GCTGGCTAGA C A C A G C A C A

6284 TTGGTTCTTT TTTTTT-TTT TTT-CCTTTT TT-AAGACGG AGTCTCACTC TGTCACTAGG CTGGAGTGCA TTGGTTCTTT TTTTTTGTTT TTT-CCTTTT TT-AAGACGG AGTCTCACTC TGTCACTAGG CTGGAGTGCA TTGGTTCTTT TTTTTTTTTT TTTTCCTTTT TT-AAGATGG AGTCTCACTC TGTCACTAGG CTGGAGTGCA TTG---CTTG TTTTTTTTTT TTTTCCTTTT TT-AAGACGG AGTCTCACTC TGTCACTAGG CTGGAGTGCA TTGGTTCTTT TTTTTTTTTT TTTTTTTTTT TTTAAGATGG AGTCTCACTC TGTCACTAGG CTGGAGTGCA .......................................................................

HU CH

6285 GTGGCGCAAT GTGGCGCAAT

CAACCTCCAA CAACCTCCAA

Fig. 1.

Continued.

CCCCCTGGTT CCCCCTGGTT

CAAGAGATTC CAAGAGATTC

TCCTGCCTCA CCCTGCCTCA

GCCTCCCAAG GCCTCCGAGG

TAGCTGGGAC TAGCTGGGAC

6364 TACAGGTGCA TACAGGTGCA

270 GG OO GIB MAR

GTGGTGCAAT CAACCTCCAA CCCTCTGGTT CAAGAGATTC TCCTGCCTCA GCCTCCCAAG TAGCTGGGAC TACAGGTGCA GTGGCACAAT CAACCTCCAA CTCCCTGGTT CAAGAGATTC TCCTGCCTCA GCCTCCCAAG TAGCTGGGAC TACAGGTGCA GTGGCGCGAT CAACCTCCGA CTCCCTGGTT CAAGAGATTC TCCTGCCTCA GCCTCCTAAG TAGCTGGGAC TACAGGTGCA ......................................... 6365

HU CH GG OO GIB MAR

CACCACCATG CCAGGCTAAT TTTTGT-AAT TTTAGTAGAG CACCACCATG CCAGGCTAAT TTTTGT-AAT TTTAGTAGAG CACCACCATG CCAGGCTAAT TTTTGT-AAT TTTAGTAGAG CACCACCATG CCAGGCTAAT TTTTGT-ATT TTTAGTAGAG CACCACCATG CCAGGCTAAT TTTTTTTATT TTTAGTAGAG ........................................ 6440 TGACCTCACG TGACCTCAAG TGACCTCACG TGACCTCATG

6439 C C A G G A T G G T C T C T .... CC C C A G G A T G G T C T C T .... CC C C A G G A T G G T C T C T .... CC C C A G G A T G G T C T C T .... CC ATGGGGTTTC ACCATGCTGG CCAGGATGGT CTCTATCTCC ........................................ ATGGGGTTTC ATGGGGTTTC A~GGGGTTTC ATGGGGTTTC

ATCGTGTTGG ACCATGTTGG ACCGTGTTGG ACCGTGTTGG

GIB MAR

6519 GGATTACAGG TGTGAGCCAG GGCACCAGGC TTAGATGTGG GGATTACAGG TGTGAGCCAG GGCACCAGGC TTAGATGTGG GGATTACAGG TGTGAGCCAG GGCACCAGGC TTAGATGTAG GGATTACAGG CATGAGCCAC GGCACCCAGC TTAGATGTGG TGACCTCGTG ATCCGCCCAC CTCGGCCTCC CAAAGTGCTG GGATTACAGG CATGAGCCAC AGCGCCCAGC CTAGATGTGG ................................................................................

HU CH GG OO GIB MAR

6520 6598 CTCTTTGGGG AGATAATTTT GTCCAGAGAC CTTTCTAACG TATTCATGCC TTGTATTTGT ACAGCATTA- ATCTGGTAAT CTCTTTGGGG C A G C A A G G T GTCTTTGGGG A A G C A A G G T CTCTTTGGGA A A G G A A A G T CTCTTTGGGG A A G C A A G G T ......... G A C A C A A A C C G

HU CH GG OO GIB MAR

6599 TGATTATTTT AATGTAACCT T T T CC T T T CC C T T CC T T A TC T C A CA

HU CH GG OO

HU CH GG O0 GIB MAR

ATCCGCCCAC ATCCGCCCAC ATCCACCCAC ACCCGCCCAC

CTCGGCCTCC CTCGGCCTCC CTCGGCCTCC TTCGGCCTCC

CAAAGTGCTG CAAAGTGCTG CAAAGTGCTG CAAAGTGCTG

Intron 2/Exon 3 coding 6678 TGCTAAAGGA GTGATTTCTA TTTCCTTTCT TAAAGAGGAG GAACAAGAAG ATGAGGAAGA TG CG G A GGA TG CG G A GGA TT CG G A GGA TG CG A A GGA CG CC G G ---

6679 6758 AATCGATGTT GTTTCTGTGG AAAAGAGGCA GGCTCCTGGC AAAAGGTCAG AGTCTGGATC ACCTTCTGCT GGAGGCCACA T G G T T GC C T G G T T GT T T G G T T GC C T G A T T GC C C

C

G

A

C

TC

C

HU CH GG O0 GIB MAR

6759 GCA-AACCTCC T C A C A G C C C A C T G G T C C T C A A G A G G T G C C A C G T C T C C A C A C A T C A G C A C A A C T A C G C A G C G C G C T C G C G T

HU CH GG O0 GIB MAR

6839 6918 ACTCGGAAGG ACTATCCTGC TGCCAAGAGG GTCAAGTTGG ACAGTGTCAG AGTCCTGAGA CAGATCAGCA ACAACCGAAA T T T A T T T A T T T A T T T A G C A G

HU CH GG O0 GIB MAR

6919 ATGCACCAGC CCCAGGTCCT CGGACACCGA A A G A A

HU CH GG OO GIB MAR

6999 7078 ACGAGCTAAA ACGGAGCTTT TTTGCCCTGC GTGACCAGAT CCCGGAGTTG GAAAACAATG AAAAGGCCCC CAAGGTAGTT C A T G G T C A T G G T C A T G G T T A T G G T C C C A A C

HU CH GG OO GIB MAR

7079 ATCCTTAAAA AAGCCACAGC ATACATCCTG TCCGTCCAAG C A G C A G C A g C A G T G A

Fig. 1.

Continued.

6838 GCCTCCCTCC G G G G C

6998 GGAGAATGTC AAGAGGCGAA CACACAACGT CTTGGAGCGC CAGAGGACGA G A A T G G A A T G G A A T G G A A T G A A G C A

7158 CAGAGGAGCA AAAGCTCATT TCTGAAGAGG ACTTGTTGCG C T G C T G C T A G C A C T A

271 7159 HU

Exon

GAAACGACGA

GAACAGTTGA

AACACAAACT

TGAACAGCTA

CGGAACTCTT

3 coding/Exon GTGCGTAAGG

3 not

coding

AAAAGTAAGG

CH

A

A

G

G

C

GG OO

A A

A A

G G

A G

C C

GIB MAR

A G

A G

A A

G G

A A

HU CH

7239 CTTCTAACAG T TA

GG O0 GIB MAR

HU

T T T G

TA TA TA AC

AAATGTCCTG AGCAATCA-C CT CTG GCA C CT CT CT CC

7318 TGAGTCTTGA

CCG CCG CCA AAC

GACTGAAAGA

TCA TCA TCA TTC

CTATGAACTT

GTTTCAAATG CA T

C C A T T -

CA CA CA GC

TTTAGCCATA

ATGTAAACTG

HU

CATGATCAAA

TGCAACCTCA

7317 CAACCTTGGC

GACTTTGGGC

ATAAAAGAAC

7397 TTTTTTATGC

TAAGATTTAC

7476 ACAATGTTTC

T T T C

CCTCAAATTG

CH GG O0 GIB MAR

A G A A A 7398 TTACCATCTT

TTTTTTTT-C

TTTAACAGAT

TTGTATTTAA

7238

AAAACGATTC

TG TG TG TG CA

GAATTGTTTT

TAAAAAATTT

CH GG OO

TC TC TC

TT--

C C T

TG TG TG

GIB MAR

-TC

TTT

T T

TG CT

HU CH GG OO GIB

7477 TCTGTAAATA

MAR

HU CH GG OO GIB MAR

HU

TTGCCATTAA A A A A

ATGTAAATAA

CTTTAATAAA

ACGTTTATAG

7557 AGTACCTAGT

ATTATAGGTA

CTATAAACCC

TAATTTTTTT

TATTTAAGTA

T

G

A

C

T

T

G

A A A A

A A A T

C C C T

T T T G

T T T G

A G G G

TAGAAAAAAT

Exon 3 not AAAATAACTG

TTTAAAGTTG

coding~Downstream GCAAATATAT

7715 TCTTCCCCCT CCCAACCACC TTC C T CAAC TTC C T CAAC C T C T T C

7793 GAGTTTTCCT

CAAC CAAC ACGT

CTGTTGAAAT

ACCATCCCTG AT AT

TTTGTTTTCA

CATTGAGCCA

AATCTTAAGT

TGTGAATGTT

TCAATTGCCC G G G G A

GGGTCTGGGG

GCCTTAAGGT

7873 ATG-ATAACA C C

Fig. 1.

GCCAGAGTTG

ACAGTTA--- GAAGGAATGG --G --G

A A A A G

TTGGAGGTTC TG TG TG CA CA

CAGAAGGCAG C C

G G T G G T A A T

A A A A G

CGT CGT CGT CGT TTC

A A G 7872 CCTGGAGACT

C C C C A

GTGAGAAGGT G G

CC CT CT TT TT

GAGAGGTAGG G G

-

C

---

G

A

G

G

G G

--AAA

G A

C T

A G

G T

Continued.

C C T

TAAGATGCTT

-

G

7714 TTGTTTCGTT

7792 CT--TCAGAG GGCGGTCTTA AGAAAGGCAA -- C G G T C A -- C G G T C A -- C -- C CA A

CTTTAAGTTC T T T T C

7634 ATTTTTTT---TTTT

T T T C T

AT AT TC

CH GG O0 GIB MAR

HU CH GG OO GIB MAR

CATTTTGCTT

G

G G G A G

TTC TTC CCT

7556 CCTAGTATAT A A A A

C

T T T C 7635 CTATTGTTTT

O0 GIB MAR

HU

GAATTTCAAT T T T T

G

CH GG OO GIB MAR

HU CH GG

CAGTTACACA

7941 CAA

272 Table 1.

Similarity (%) between human and other ape c-myc genesa Exon 1 nt: 2308-2881

Exon 2 nt: 4509-5277

Exon 3 nt: 6654-7670

Intron 1 nt: 2882-4508

Intron 2 nt: 5278-6653

99.00 99.50 (2)

99.50

99.70

99.60

97,50

98.00

98.42 99.00 (4)

99.14

99.09

99,02

97.67

97.67

97.34 97.95 (9)

97.94

98.83

98.92

96.39

95.55

96.78 97.95 (9)

97.43

98.70

98.43

95.95

95.62

91.54 96.35 (17)

92.47

95.32

94.00

89.81

88.56

Complete gene HU/CH nt (%) aa (%) HU/GG nt (%) aa (%) HU/OO nt (%) aa (%) HU/GIB nt (%) aa (%) HU/MAR nt (%) aa (%)

moset c-myc genes, nt (%), % of nucleotide similarity between the different regions of the c-myc genes determined from alignment of the two sequences (Fig. 1). aa (%), % of amino acid similarity (number of differences in the coding exons is shown in parentheses)

HU/CH means comparison between human and chimpanzee c-myc genes. HU/GG means comparison between human and gorilla c-myc genes. HU/OO means comparison between human and orangutan c-myc genes. HU/GIB means comparison between human and gibbon c-myc genes. HU/MAR means comparison between human and mar-

Table 2.

Comparison of human, gorilla, and orangutan myc-sequencesa

%

% Region

ts

Upstream nt 1504-2307 Exon 1 nt 2308-2881 Intron 1 nt 2882M508 Exon 2 nt 4509-5277 Intron 2 nt 5278-6653 Exon 3 nt 6654-7670 Exon 3 coding part nt 6654-7216 Exon 3 non cod. part nt 7217-7670 Downstream nt 7671-7941

tv

Gorilla/orangutan

Human/orangutan

Human/gorilla

ts

tv

%

gap

div

ts

tv

gap

div

9

+1

2.50

11

11

+1

2.92

7

4

-1

2.06

9

5

-1

2.57

-6 +2 0

3.61

34

16

3.36

1.17

9

3

-3 +2 0

-2 +5 -2 +1 0

4.45

31

20

4.17

1.08

9

1

0.89

5

1

-2 +4 -2 +1 0

-2 +1 0

1.31

4

0

1.53

0.37

0

1

-2 +1 0

gap

div 0.96

8

0.86

3

4

4

1

-1 0 0

21

13

-4

2.33

32

19

6

1

0

0.81

7

2

16

12

2.33

25

18

7

3

-2 +2 0

0.98

4

4

2

1

0

0.53

3

2

5

2

0

1.53

1

2

3

1

0

0.97

1

0

1.56

1.27 1.07

0.37

This table summarizes the localization, nature, and number of differences observed when comparing the human and gorilla, the human and orangutan, and also the gorilla and orangutan myc sequences, ts: total number of transitions observed when two c-myc sequences are compared, tv: total number of transversions observed when two c-myc sequences are compared. A gap in one sequence corresponds to the absence of one or several contiguous nt. In the human/gorilla comparison " - " corresponds to a gap in the human sequence and " + " to

a gap in the gorilla sequence. The same is true for the human/orangutan and gorilla/orangutan. % of sequence divergence values is calculated as follows:

maximum

mates were deduced from their nucleotide sequences. We

parsimony

method

(Fig. 2 B ) . I n b o t h c a s e s ,

(number of nt substitutions + number of gaps) actual shared positions + number of gaps

we obtained the same branching of species, with chim-

assumed,

p a n z e e c l o s e r to h u m a n t h a n g o r i l l a . C - m y c a m i n o a c i d s e q u e n c e s f o r t h e d i f f e r e n t pri-

same

from the extensive DNA

regulatory

elements

x 100

similarity, that the

(promoter,

protein

binding

sites, acceptor and donor splice in sites, polyadenylation

273 T a b l e 3.

HU/CH HU/GG HU/OO HU/GIB HU/MAR CH/GG CH/OO CH/GIB CH/MAR GG/OO GG/GIB GG/MAR OO/GIB OO/MAR GIB/MAR

Comparison of c-myc sequences: % divergence between species" Complete gene

Exon 1

Exon 2

%D

%K2P

%D

%K2P

%D

%K2P

%D

%K2P

%D

%K2P

%D

%K2P

1.05 1.58 2.66 3.13 10.13 1.41 2.51 2.96 10.12 2.74 3.21 10.41 3.13 9.18 10.05

0.89 1.35 2.13 2.60 8.60 1.17 2.15 2.58 8.65 2.33 2.71 8.80 2.75 8.76 8.67

0.51 0.86 2.06 2.57 7.52 1.03 2.58 3.09 7.74 2.57 3.09 8.09 3.60 8.25 8.25

0.52 0.86 1.91 2.27 7.16 1.04 2.45 2.80 7.35 2.44 2.80 7.73 3.52 8. I 1 7.91

0.26 0.91 1.17 1.30 4.74 0.91 1.04 1.30 4.68 1.56 1.69 5.07 1.43 4.81 4.68

0.26 0.92 1.18 1.31 4.84 0.92 1.05 1.31 4.84 1.58 1.71 5.26 1.45 4.99 4.84

0.39 0.98 1.08 1.57 6.21 0.78 1.08 1.57 6.21 1.27 1.77 6.50 1.38 6.21 6.02

0.39 0.99 0.79 1.39 6.25 0.79 0.79 1.39 6.25 0.99 1.59 6.58 1.19 6.13 6.04

1.47 2.33 3.63 4.54 10.02 1.72 3.31 3.99 9.92 3.37 4.05 9.91 4.17 10.21 10.34

1.36 2.11 3.18 4.23 9.32 1.61 2.99 3.69 9.17 3.11 3.82 9.23 3.87 9.55 9.76

1.97 2.25 4.45 4.38 19.67 2.41 3.74 4.10 19.46 4.17 4.63 20.13 4.65 20.72 18.88

1.61 2.06 3.81 3.86 15.01 2.29 3.19 3.80 15.08 3.81 4.26 15.48 3.95 16.52 14.22

a The % of sequence divergence values (%D) takes into account all nucleotide substitutions and the number of insertion/deletion events:

%D-

TS+TV+ID N+ID

oo~__~__~6 o, ~

0,0026c~

1

B

2Q)]

where P: transition frequency; Q: transversion frequency

35 H U 10 ~99,51

_ _

000404 CH

40 [1OOO1

000714 GG

29 [9981

[9961 O,01151 OO

27 CH 46 GG 80 O O

[1OOO1

[1000]

91 GIB

GIB (0,073~56)

Intron 2

%K2P = ~-In [(1 - 2P - Q) V ( 1 z

[9171 00451 [10001

Intron 1

The %K2P values estimate the divergence according to Kimura (1980), and considers only the substitutions:

x 100

where: TS: number of transitions; TV: number of transversions; ID: number of insertion/deletion; N: number of sites shared.

A

Exon 3

MAR

_(427)

MAR

Fig. 2. Phylogenetic trees derived from the sequence alignment presented in Fig. 1. A Tree constructed by the neighbor joining method of Saitou and Nei (1987). Numbers express relative distance. B Tree constructed by the maximum parsimony method. Numbers represent mutational events. Since these trees are unrooted, parentheses mean that

the numbers (0.07356) and (435) cannot be distributed between the branch of the marmoset (MAR) and the common branch of the other species. Numbers in brackets in A and B correspond to the bootstrap values obtained after 1,000 resamplings.

sites) gave rise to similar myc proteins. Figure 3 shows that there was a very good conservation of the sequence among primates. In all species, the protein was 439 amino acids in length, except for marmoset, which had a one-amino-acid-shorter sequence (438 amino acids) due to the deletion of a glutamine at position 261. (Numbers refer to the human sequence.) Amino acid replacements at two positions (211 and 417) could be interpreted as synapomorphies for the human/chimpanzee/gorilla clade and at position 82 for the human/chimpanzee clade. There was no position in favor of the human/gorilla or chimpanzee/gorilla clade. Human diverged from all the other primates by only one substitution: at position 354, an aspartate residue was replaced by a valine residue in human.

Discussion The great conservation of the c-myc gene organization and sequence among primates reflects the very high selection pressure that is exerted upon this gene and reinforces its biological importance. And the existence of an ORF overlapping the gorilla c-myc exon 1, as in human (Gazin et al. 1984) and chimpanzee (Argaut et al. 1991) genes, might be taken as an additional structural argument in favor of the coding capacity of this region. However, even using specific antibodies against the putative myc exon 1 protein, which recognizes a fusion protein encoded by a DNA construct Maltose-BindingProtein/Myc-exon 1/~-galactosidase, we failed to characterize such a protein (data not shown), contrary to what

274 HU CH GG OO GIB MAR MOU RAT CAT

1

MPLNVSFTNR ~fDLDYDSVQ PYFYCDEEEN FYQQQQQSEL QPPAPSEDIW KKFELLPTPP LSPSRRSGLC S T Y Q S T Y Q S T Y Q S T Y Q S S Y Q N T I H S A I H S A Y Q

HU CH GG OO GIB MAR MOU RAT CAT

71

SPSYVAV-TP FSLRGDNDGG GGSFSTADQL P Y AV-TP L G N S P Y AV-TP P G N S P Y AV-TP P G N S P Y AV-TP P G N S T C SV-TP P G N S P Y AVATS P E D N P Y AVATS P E D N P Y AF-AS P G D S

EMVTELLGGD MVNQSFICDP DDETFIKNII E V E V E V E V E V Q M E M E V

IQDCMWSGFS

139

HU CH GG OO GIB MAR MOU RAT CAT

140

AAAKLVSEKL ASYQAARKDS GSPNPARGHS G PN S G LN S G PN S S PN S S PN S T LS S T LS S G PS G

VCSTSSLYLQ DLSAAASECI DPSVVFPYPL NDSSSPKSCA V S S S S P V S S S S A V S S L S T V S S S S A V S S S P A V S T S S T V S T S S T G P T S P A

209

HU CH GG OO GIB MAR MOU RAT CAT

210

HU CH GG OO GIB MAR MOU RAT CAT

280

CKII SQDSSAFSPS SDSLLSSTES SPQGSPEPLV LHEETPPTTS SDSEEEQEDE EEIDVVSVEK RQAPGKRSES Q S P T QGS V ED E A G Q S P T QGS V ED E A G P S P T QGS V ED E A G P S P T QAS V ED E A S P S T T RAS V ED P G S T P RAS V ED E T A S T S RAT V DD E T A P S P A RAS A EE E P A

70

279

Loc Nuc GSPSAGGHSK PPHSPLVLKR CHVSTHQHNY AAPPSTRKDY PAAKRVKLDS PSAG S S V PSVG S S V PSAG I S V PSAG S S V PSSG S S V SPFR S S A SPSR S S A PSAG S P A

VRVLRQISNN RKCTSPRSSD V R T V R T V R T V R T V R T G K S G K S G K I

349

419

b-Hr.H CKII

Loc Nuc

HU CH GG OO GIB MAR MOU RAT CAT

350

TEENVKRRTH NVLERQRRNE LKRSFFALRD QIPELENNEK APKVVILKKA TAYILSVQAE EQKLISEEDL T D A V AE Q I E T D A V AE Q I E A D A I AE Q I K T D A V GE Q T K T D T V AE Q I K T D A I AD H T K T D A V AD H I K T D A V AG Q I K

HU CH GG OO GIB MAR MOU RAT CAT

420

LRKRREQLKH KLEQLRNSCA C C C C C G G C

bZ

Fig. 3. Alignment of myc protein sequences of different species. Comparison of c-myc amino acid sequences deduced from nncleotides sequences of Homo sapiens (HU), Pan troglodytes (CH), Gorilla gorilla (GG) (this work), Pongo pygmaeus (00) (this work), Hylobates

439

far (G1B), Callithrix jacchus (MAR), Felix sylvester (CAT) (Stewart et al. 1986), Rattus norvegicus (RAT) (Hayashi et al. 1987), Mus musculus (MU) (Bernard et al. 1983).

275

Gazin et al. (1986) observed with peptide-directed antibodies. We analyzed the entire c-myc gene sequence for different primates, using either the maximum parsimony method, an exhaustive search method which chooses the best tree among a large number of possible trees, or the neighbor-joining method, a stepwise clustering method which examines local topological relationships of the tree. Both methods allowed us to separate the human/ chimpanzee clade from gorilla, in good agreement with Perrin-Pecontal et al. (1992) analyzing the 13-globin gene region and Bailey et al. (1992) analyzing the ~-rl-globin region and the flanking noncoding sequences of the K-globin gene region. It is noteworthy that in our study, as in that of PerrinPecontal, the lengths of the sequences were on the same order of magnitude (around 6,000 kb) and the same treemaking methods were used. When analyzing sequences, the choice of the method of phylogenetic tree construction is of great importance. For instance, Brown et al. (1982), analyzing a segment of mitochondrial DNA 896 bp in length by means of the maximum parsimony method, found that chimpanzee and gorilla were closer to one another than to human. However, Hasegawa et al. (1990), analyzing the same segment by the maximum likehood method (Felsenstein 1981), concluded that human/chimpanzee clustering was preferred to the alternative branching orders among human, chimpanzee, and gorilla. The fact that we found a relatively long internal branch separating the human/chimpanzee clade from the last common ancestor with gorilla, ranging from 22 to 50% of the average human/chimpanzee terminal branch length, favors the human/chimpanzee grouping. These values are in good agreement with those obtained by Ruvolo et al. (1991) and Horai et al. (1992) studying mitochondrial DNA. In contrast to the present results, Oetting et al. (1993), studying the evolution of tyrosinase-related genes in primates, found that gorilla was more closely related to human than chimpanzee. This discrepancy could be accounted for by the fact that these authors analyzed pseudogenes (320 nucleotides in length) known to have evolved fast, thus rendering difficult a good alignment (Miyamoto et al. 1987, 1988; Ueda et al. 1989), rather than a functional coding gene, as in this study. Another possible explanation might be the use of a different computational analytical program; both methods, however, rely on parsimony. In good agreement with the branching order deduced from the analysis of the nucleotide sequences, comparisons of the deduced amino acid sequences allowed the separation of a human, chimpanzee, gorilla clade and among these three species we could identify a humanchimpanzee group. Our study of the c-myc gene and protein among primates suggests-that the question of the branching order

between human, gorilla, and chimpanzee could be solved by the existence of a human/chimpanzee clade. Moreover, this underlines the fact that numerous factors have to be taken into account when comparing DNA sequences in order to construct reliable phylogenetic trees. The sequence must be well conserved among the species studied to allow good alignment. It must be sufficient in length in order to provide enough informative sites, most of all when comparing tightly related species. Finally, the use of different computational programs and treemaking methods is necessary to avoid potential biases in the analysis. Acknowledgments. We wish to thank Dr. Vrronique Marie and Annick Thebault (SIS Pasteur) for helpful discussion and Dr. Vre Chaduc of Touroparc for the orangutan blood sample.

References Argaut C, Rigolet M, Eladari ME, Galibert F (1991) Nucleotide sequence of chimpanzee c-myc ongogene. Gene 97:231-237 Bailey WJ, Hayasaka K, Skinner CG, Kehoe S, Sieu LC, Slightom JL, Goodman M (1992) Reexamination of the African hominoid trichotomy with additional sequences from the primate [3-globingene cluster. Mol Phyl Evol 1:97-135 Bernard O, Cory S, Gerondakis S, Webb E, Adams JM (1983) Sequence of the murine and human cellular myc oncogenes and two modes of myc transcription resulting from chromosome in translocation B-lymphoid tumors. EMBO J 2:2375-2383 Brown WM, Prager EM, Wang A, Wilson AC (1982) Mitochondrial DNA sequences of primates: tempo and mode of evolution. J Mol Evol 18:225-239 Eladari ME, Mohammad-Ali K, Argaut C, Galibert F (1992) Gibbon and marmoset c-myc nucleotide sequences. Gene 116:231-243 Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368-376 Felsenstein J (1988) PHYLIP version 3-3 manual. University of Washington, Seattle Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351-360 Fitch W (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Systems Zool 20:406-416 Gazin C, Dupont de Direchin S, Hampe A, Masson JM, Martin P, Stehelin D, Galibert F (1984) Nucleotide sequence of the human c-myc locus: provocative open reading from within the first exon. EMBO J 3:383-387 Gazin C, Rigolet M, Briand JP, Van Regenmortel MHV, Galibert F (1986) Immunoclinical detection of proteins related to the human c-myc exon 1. EMBO J 5:2241-2250 Gonzales IL, Sylvester JE, Smith TF, Stambolian D, Schmickel RD (1990) Ribosomal RNA gene sequences and hominoid phylogeny. Mol Biol Evol 7:203-219 Goodman M, Tagle DA, Fitch DHA, Bailey W, Czelusniak J, Koop BF, Benson P, Slightom JL (1990) Primate evolution at the DNA level and a classification of hominoids. J Mol Evol 30:260-266 Hasegawa M, Kishino H, Hayasaka K, Horai S (1990) Mitochondrial DNA evolution in primates: transition rate has been extremely low in the lemur. J Mol Evol 31:113-121 Hayasaka K, Gojobovi T, Horai S (1988) Molecular phylogeny and evolution of primate mitochondrial DNA. Mol Biol Evol 5:626644 Hayashi K, Makino R, Kawamura H, Arisawa A, Yonedak K (1987)

276 Characterization of rat c-myc and adjoint region. Nucleic Acids Res 15:6419-6436 Holmquist R, Miyamoto MM, Goodman M (1988) Analysis of higherprimate phylogeny from transversion differences in nuclear and mitochondrial DNA by Lake's methods of evolutionary parsimony and operators metrics. Mol Biol Evol 5:217-236 Horai S, Satta Y, Hayasaka K, Kondo R, Inoue T, Ishida T, Hayashi S, Takahata N (1992) Man's place in hominoidea revealed by mitochondrial DNA genealogy. J Mol Evol 35:32-43 Huang ME, Gobet M, Gatibert F (1989) A modified M13 vector specifically designed for ExoIII-nested deletion sequencing. Methods Mol Cell Biol 1:161-164 Kimura M (1980) A simple method for estimating evolutionary rate of base substitution through comparative studies of nucleofide sequences. J Mol Evol 16:111-120 Maeda N, Wu CI, Bliska J, Reneke J (1988) Molecular evolution of intergenic DNA in higher primates: pattern of DNA changes, molecular clock, and evolution of repetitive sequences. Mol Biol Evot 5:1-20 Miyamoto MM, Slightom JL, Goodman M (1987) Phylogenetic relations of humans and African apes from DNA sequences in the ~rl-globin region. Science 238:369-373 Miyamoto MM, Koop BF, Slightom JL, Goodman M, Tennant MR (1988) Molecular systematics of higher primates: genealogical relations and classification. Proc Natl Acad Sci USA 85:7627-7631

Myers EW, Miller W (1988) Optimal alignments in linear space. Comput Appl Biosci 4:11-17 Oetting WS, Stine OC, Townsend D, King RA (1993) Evolution of the tyrosinase related gene (TYRL) in primates. Pigmental Res 6:171177 Perrin-Pecontal P, Gary M, Nigon VM, Trabuchet G (1992) Evolution of the primate ~-globin gene region: nucleotide sequence of the &~-globin intergenic region of gorilla and phylogenetic relationship between African apes and man. J Mol Evol 34:17-30 Ruvolo MT, Disotell T, Allard MW, Brown WM, Honeycutt RL (1991) Resolution of the African hominoid trichotomy using a mitochondrial gene sequence. Proc Natl Acad Sci USA 88:1570-1574 Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing photogenetic trees. Mol Biol Evol 4:406-425 Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain terminating inhibitors. Proc Natl Acad Sci USA 74:5463-5467 Ueda S, Matsuda F, Honjo T (1988) Multiple recombinational events in primate immunoglobulin-epsilon and alpha genes suggest closer relationship of humans to chimpanzees than to gorillas. J Mol Biol 27:77-83 Ueda S, Watanabe Y, Saiton N, Omoto K, Hayashida H, Miyata T, Hisajima H, Honjo T (1989) Nucleotide sequences of immunoglobulin-epsilon pseudogenes in man and apes and their phylogenetic relationships. J Mol Biol 205:85-90

Lihat lebih banyak...

Gorilla and orangutan c-myc nucleotide sequences: Inference on hominoid phylogeny

Descripción

Comentarios