Gorilla and orangutan c-myc nucleotide sequences: Inference on hominoid phylogeny
Descripción
J Mol Evol (1995) 41:262-276
jouRNALo MOLECULAR IEVOLUTION © Springer-VerlagNew York Inc. 1995
Gorilla and Orangutan c-myc Nucleotide Sequences: Inference on Hominoid Phylogeny Khosro Mohammad-Ali, Marthe-Elisabeth Eladari,* Francis Galibert Laboratoire "RecombinaisonsG6n6tiques"--UPR 41 CNRS, Facult6 de M6decine, 2 av du Pr Ldon Bernard, 35043 Rennes Cedex, France Received: 11 August 1994 / Accepted: 2 January 1995
Abstract.
The nucleotide sequences of the gorilla and orangutan myc loci have been determined by the dideoxy nucleotide method. As previously observed in the human and chimpanzee sequences, an open reading frame (ORF) of 188 codons overlapping exon 1 could be deduced from the gorilla sequence. However, no such ORF appeared in the orangutan sequence. The two sequences were aligned with those of human and chimpanzee as hominoids and of gibbon and marmoset as outgroups of hominoids. The branching order in the evolution of primates was inferred from these data by different methods: maximum parsimony and neighborjoining. Our results support the view that the gorilla lineage branched off before the human and chimpanzee diverged and strengthen the hypothesis that chimpanzee and gorilla are more related to human than is orangutan.
Key words:
DNA sequencing - - c-myc - - Primates - - Phylogeny - - Gorilla - - Orangutan
Introduction At present, no definitive agreement has been reached on either the correct branching order or differential evolu-
* Present address: Laboratoire "R6trovirus et R6trotransposons des V e r t 6 b r 6 s " - - U P R 43 CNRS, Universit6 Paris 7, H6pital Saint-Louis, 16 rue de la Grange aux Belles, 75475 Paris Cedex 10, France Correspondence to: F. Galibert
tion rates among the higher primates. The morphological picture of primate phylogeny has not unambiguously identified the nearest outgroup of anthropoids and most of all has not resolved the branching pattern within hominoids. On the other hand, the molecular picture could provide more resolution and clarity the systematics of hominoids. Until now, four types of genes have been studied to determine phylogeny of hominoids: the ~-type globin genes, the immunoglobulin genes, the RNA genes, and the mitochondrial DNA sequences. Among the studies on mitochondrial DNA, that of Hayasaka et al. (1988), Hasegawa et al. (1990), Ruvolo et al. (1991), and Horai et al. (1992) favor the grouping of chimpanzee with human, whereas for Brown et al. (1982) chimpanzee and gorilla are grouped. Gonzales et al. (1990), using 28S rRNA sequences, conclude that human and chimpanzee are the most closely related pair. The same conclusion is drawn from studies of the C~ and Cc~ immunoglobulin primate genes (Ueda et al. 1988, 1989). Analyses on the tVl]-8, [3-type globin pseudogene favor the grouping of chimpanzee with human first (Miyamoto et al. 1987; Holmquist et al. 1988; Maeda et al. 1988; Goodman et al. 1990). More recently, these results have been reinforced by analysis of the 8-[3-globin intergenic region (PerrinPecontal et al. 1992) and the flanking sequences of the "y-globin genes (Bailey et al. 1992). However, the results of Oetting et al. (1993), analyzing the evolution of the tyrosinase-related gene in primates, suggest that gorilla would be closer to human than chimpanzee. In order to help resolve the separation of human, chimpanzee, and gorilla lineages, we have chosen to
263 study the c-myc oncogene. The myc-oncogene family contains coding sequences that have been preserved in different species for over 400 million years, which allows good alignment between related species such as those belonging to higher primates. Moreover, the length of this sequence (around 6,600 nucleotides) is sufficient to provide enough informative sites to permit estimation of the relationship between human and great apes. In this study, the complete sequences of the c-myc gene have been determined for gorilla and orangutan. These sequences were compared one to another and to a human sequence described by Gazin et al. (1984), to a chimpanzee sequence (Argant et al. 1991), and to gibbon and marmoset sequences (Eladari et al. 1992), the latter as outgroups of the hominoids. We also compared the amino acid sequences of the myc protein.
Materials and Methods Source. Gorilla (Gorilla gorilla) and orangutan(Pongo pygmaeus) Iibraries were obtainedfrom genomicbanks of the AmericanType Culture Collection.From these libraries,severallambdacloneswere sorted out by hybridizationwith a c-myc probe and the myc insert was sequenced with the dideoxy method (Sanger et al. 1977) after sonication, repair, and cloning in M13. When we aligned the orangutanDNA sequence with that of other primates, we found that this DNA, stored as orangutanin the ATCC, was in fact derived from a gibbon, and we decided to find another source. OrangutanDNA was then extracted from blood samples obtained from the "Zoo de Mficon" (France). The genomicregion corresponding to c-myc gene was amplifiedby PCR, using several sets of oligonucleotides specific of the human gene. The amplificationproducts were cloned in phage M13mp89 (Huang et al. 1989) and several clones were sequencedin order to detect possible misincorporationby PCR and discriminatepolymorphicpositions.
Sequence Alignment. Multiplealignmentswere then performed using the Clustal V program. The approach used in Clustal V is a modified version of the method of Feng and Doolittle (1987). The pairwise aligned sequenceswere compared each other in order to determinean initial guide tree (dendogram); then the sequences were aligned in larger and larger groups accordingto the branchingorder in this dendogram. This approach allows a very usefuI combinationof computational tractabilityand sensitivity. The positions of the gaps generated in early alignments remain throughout later stages. At each alignment stage, we aligned two groups of already-alignedsequences,using the algorithmof Myers and Miller (1988) for optimization. Phylogenetic Analysis. We used two methods to infer phylogenies: • The first is a distance-basedmethod. The number of sequence differences in pairwise comparisons of aligned sequences was computed. Differenceswere calculated in terms of the number of transitions and transversions. Percent divergence figures were then calculated by Kimura's two-parameter method (Kimura 1980) and these distances were used with the neighbor-joiningmethod (Saitou and Nei 1987) to constructunrootedphylogenetictrees. The neighbor-joiningmethod allows for unequalrates of evolutionin different lineages. • The second one is a maximumparsimonymethod using the DNAPARS program (Fitch 1971).
We made bootstrap resamplingof the nucleotidesequencesin order to assess the robustnessof these phylogeniesby the Seqboot program. All the programs used are from the PHYLIP package (Felsenstein1988).
Results From the analysis of the sequences, we observed that, potentially (since we did not analyze the mRNA encoded by these sequences), the gorilla and orangutan c-myc genes had the same characteristic structure (three exons and two introns) as the human gene, with the major polypeptide open reading frame (ORF) residing in the second and third exons (Fig. 1). There was 98.42 and 97.34% similarity between human and gorilla and human and orangutan genes, respectively, and as expected, exons were better conserved than introns (Table 1). The sequence of the first exon was 99.14% conserved between gorilla and human. As in the case of the human c-myc gene, this region contained an ORF (with an A T G at nucleotide 2304) which extended for 188 codons down to a stop codon (nucleotide 2868). However, we failed to observe an ORF in the orangutan sequence due to the insertion of an additional G in the orangutan, gibbon, and marmoset sequence at position 2537, introducing downstream a transition to a phase blocked by several stop codons. The nucleotide sequence similarity between human and orangutan was reduced to 97.94% (Table 1). When we considered the different regions of the gene, we observed that gorilla was closer to human than orangutan and that the percent divergence between gorilla and orangutan c-myc sequences was larger than the percent divergence between human and gorilla (Table 2). We made an alignment as indicated in Materials and Methods of the gorilla and orangutan c-myc sequences with already-published sequences for human, chimpanzee, gibbon and marmoset in order to construct phylogenetic trees, and used two methods to estimate the percent of divergence between the different sequences (Table 3). The levels of divergence calculated according to Kimura are always smaller than those calculated according to the first method. This difference can be explained by the fact that in the Kimura formula, insertion/deletion events were not taken into account. With both methods, we observed that orangutan was the most divergent of the human-great ape group, as it differs from other hominoid with a mean percentage of %D 2.64 (from 2.51 to 2.74), and that human and chimpanzee represent the closest species. The mean divergence between the human-great ape clade and outgroups is 3.10% for gibbon (from 2.96 to 3.21) and 9.97% for marmoset (from 9.18 to 10.41). Levels of divergence calculated by Kimura's method were used to estimate phylogenetic distances using the neighbor-joining method (Fig. 2A). We also used the
264 HU
1504 CACATCTCAG
GGCT
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A AACAGACGCC
TCCCGCACGG
GGCCCCACGG
AAGCCTGAGC
1568 AGGCGGGGCA
CH
C
G
...............
A
C
A
A C A
C
A
GG OO GIB MAR
C C C A
G G T G
............... A ............... A ............... A TAGCGC GTCCAGGAGG
A C C C
A A A G
A A A C
C C C T
A A A G
HU CH GG O0 GIB
1569 GGAGGGGCGG
TTTGGCAGCA
AATTGGGGGA
CTCAGTCTGG GTGGAAGGTA G G G G
C
MAR
HU
TATCTGCTGC A A A A
1649 ATACATAATG
1648 ATAGCTGTGC
CGCTCTCCAA
1728 GTATACGTGG
A
CATAATACAT
GACTCCCCCC
AACAAATGCA
ATGGGAGTTT
ATTCATAACG
G G G
A A A
T T T
T T C
T T T
T T T
GIB MAR
A G
A G
T A
T T
C T
T C
1729 CAATGCGTTG
CTGGGTTATT G G G
TTAATCATTC
TAGGCATCGT A G A G A G
TTTCCTCCTT T T T
ATGCCTCTAT A A A
GIB
G
A
G
T
A
MAR
A
C
A
G
C
HU CH
1809 AACATCCCAC T
GCTCTGAACG
CGCGCCCATT GC
GG
T
GC
O0 GIB MAR
T T C
GC TC GG
HU
1889 CCTTTCCCCA
GCCTTAGCGA
AATACCCTTC TTTCCTCCAC C A C T A C A A G
GGCGCCCTGC
T T T
C C T
A A C
GIB MAR
T G
T G
C C
1969 TGGAGGGCAG
CTGTTCCGCC
TGCGATGATT
CH
G
GG OO GIB MAR
G G G T
C C G
AGCCTGGTAC
CH GG OO
HU
A A A G
TCCAATCCAG
CH GG O0
HU CH GG OO
C T C C
T
TCTCCCTGGG T
C C C
GCGCGTGGCG G G
C
TT TT -T TC
TGGCGGTGGG
GGACAAGGAT
AT AT AT GA
T
ACTCTTGATC TT
C C C
G G T
TATACTCACA AC
1808 CATTCCTCCC TATCTACACT C C C C C C C C C C C T
G G G G
T T T C
1888 AAAGCGCGGC AA C AG AA AA CA
A C C C
1968 CGCGCAGTGC GTTCTCTGTG T GC G TG C GC G TG TG C GT G C C
GCGGTTTGTC T T
C
AT G GT A
AAACAGTACT A CA A A A G A
CA CA CA TG
A A A G
TG GG
2048 GCTACGGAGG CG CG TG CG AA 2128 CGGCTCTCTT CGG CGG CTG CGG AGA
HU CH GG OO GIB MAR
2049 AGCAGCAGAG C C C C G
AAAGGGAGAG GG GG GG GG ---
GGTTTGAGAG AG AG AG AG CC
GGAGCAAAAG G G G A C
AAA_ATGGTAG
HU
2129 ACTCTGTTTA
CATCCTAGAG
CTAGAGTGCT
CGGCTGCCCG GG T C GG T C CT T C CG T C GG A G
GCTGAGTCTC G G G G A
2208 CTCCCCACCT TCCCCACCCT CCCCACCCTC CCCACCT TCCCCACCCT CCCCACCCTC CCCACCT TCCCCACCCT CCCCACCCTC .................. CCCACCCTC CCCACC ..................... C C C .... C A C C C G C C - T C C C C A C C - - -
TTCCCAAAGC
AGAGGGCGTG
GGGGAAAAGA
CH GG OO GIB MAR
HU CH GG OO
T T G G G 2209 CCCA .........
G G G G A
TAAGCGC
C C C C T
CCCTCCCGGG
CCCA ......... T CCCACCCTCC CCGT CCTACTCTCC CCAG
Fig. 1. Sequence alignment. Alignment of the Gorilla gorilla (GG) and Pongo pygmaeus (00) c-myc sequences with already published sequences: (HU) Homo sapiens (Gazin et al. 1984); (CH) Pan troglodytes (Arganlt et al. 1991); (GIB) Hylobates lar," and (MAR) Callithrix jacchus (Eladari et al. 1992). The numbering of the human sequence is froln Gazin et al. (1984). First digits of numerals are aligned with the
GCGCGCGTAG CC T CC T CC C CC C AA C
TT~TTCATG
AAAAAGATCC C C C
2279 TCTCTCGCTA T T T T G G
corresponding nucleotides. Only the positions where one or more nucleotides differ from the human sequence are indicated. Void positions correspond to sequence identical to the human one. Missing nucleotides are indicated by a dash. The limits of the various regions of the sequence are indicated (exon 1-intron 1-exon 2-intron 2-exon 3).
265 GIB MAR
...... CTCC ---ACCCTCC
CCAG CCAG
C A
2280 HU CH
Upst ream/Exon
ATCTCCGCCC T
GG O0 GIB MAR
T T T C
HU CH GG OO GIB MAR
2356 C---ACCGCC --------GTC
HU CH GG O0 GIB MAR
HU CH GG O0 GIB
ACCG- --GC CCTTTATAAT G .... A G .... G .... G .... AACCC
2433 AAGAACGGAG
2513 GAGGCAGAGG
HU CH GG O0 GIB MAR
HU CH GG
T T T C
GCTGTGCTGC G
G G A G
G G G C
GGCAGGGCTT
CTCAGAGGCT A G A A A
GGAGGGATCG AT AT AT AT GA
CGCTGAGTAT
AAAGCCGGT C C C T C
TTTCGGGGCT G G A G G
TTATCTAACT
CGCTGTAGTA
GAGCGAGCGG
GCGG-CCGGC -CC -CC GCC GTC
TAGGGTGGAA T T T T
GAGCCGGGCG
AGCAGAGCTG A A A A
CGCTGCGGGC G G G G C C C G
G G A
C C T
C T A
G
GGGGCTTCGC G T G G G
C
CTCTGGCCCA T GC C GC
GCCCTCCCGC C C
C AC C GC C GT
T C C
-
G T G T A C
G
ACG;ukACTTT G C C C A T A G C A C A T A T A T A T T
GCGGGCGGGC G G C G G
ACTTTGCACT
GGAACTTACA
ACACCCGAGC G G G G T
2751 CTCTCCCGAC
GCGGGGAGGC
CATTTGGGGA GG GG
CACTTCCCCG T T
CCGCTGCCAG C G C G
GACCCGCTTC G T G T
T
2830 TCTCCTTGCA
GCTGCTTAGA
G G C
CGCTGGATTT
GG GA AG
TTTTCGGGTA
2512 ATTCCAGCGA C C C C T 2591 GTCCTGGGAA
2670 AGCCAGCGGT CCGCAACCCT G T G G T G
2671 TGCCGCATCC T C T C T T T C C C
T-ATTCTGCC G G
2432 TGGCGGGAAA
C G
TGATCCCC-C T T -
C T C
TCGCGGCCGC G
G G G T
CCTCGAGAAG C C C C G
O0 GIB MAR
HU
GA GA AA CG
GGACCCCCGA G
TCCCCTCCTG
2592 GGGAGATCCG-GAGCGAATAG G C T G C T
T T C
C G C G T G
G T G
G G A 2750 AAGGACGCGA
2829 TCTGAAAGGC C C
T T C
C C -
Exon I/Intron 1 GTGGAAAACC AGGTAAGCAC CGAAGTCCAC
2909 TTGCCTTTTA
CH GG
A A
T C T C
A A
TC TC
AT AT
C C
A A
AC AC
AA AG
C T
C C
O0 GIB MAR
A A C
T C T T A T
A A G
CC TA TC
GT GT GC
C C G
A A G
TC AC AG
AG AG GG
T T T
T T T
2910 HU CH GG O0 GIB MAR
ATTTATTTTT TT ....... A TCACTTTAAT ATTTATTTTT TT ....... A A A T T T A T T T T T TT ........ A A ATTTTTTTTT TTTTTTTTTA G A T T T A T T T T T ........ T T A G ............ TTTTTTTG G
GCTGAGATGA
GTCGAATGCC C C C T G
G G
G G
G G
G T
CTGGGGTGGG C G A C G A C G G C G G T G
GGG-TAATCC G - A G - A G GA GTA T - G
TAAATAGGGT G G
GTCTTTTCTC
2983 HU CH GG OO GIB MAR
Fig. 1.
GCTATTGACA C C C C T
Continued.
CTTTTCTCAG C C T C C
AGTAGTTATG A A A A G
GTAACTGGGG AC G AC G AC G AC G GG T
G G 2355
GGACGGCTGA GG T
GGGCCCCGGC CGTCCCTGGC C CC T G T C C T G C AC C G C C C T A C C A G G
GCA
OO GIB MAR
1
GCGAGGGTCT
A A A G
MAR
HU CH GG
T T
AGAACTGGAT
2982 CCATTCCTGC
3061 CGGGGTAAAG G A G G G
266
HU CH GG O0 GIB MAR
3062 3140 TGACTTGTCA AGATGGGAGA GGAGAAGGCA GAGGGAAAAC GGGAATG-GT TTTTAAGACT ACCCTTTCGA GATTTCTGCC C AG AG A TT GA C C AG AG A TT GC C C AG AA A TT GC C C AG AG C TC GC C G CT GG A T CT GC T
HU CH GG OO GIB MAR
3141 3220 TTATGAATAT ATTCACGCTG ACTCCCGGCC GGTCGGACAT TCCTGCTTTA TTGTGTTAAT TGCTCTCTGG GTTTTGGGGG T C AT C G C T G T C AT C G C T G T C GT C G C T A T C AT A G C G G G A AC C G T T G
HU CH GG O0 GIB MAR
3221 G-TGGGGGG- TTGCTTTGCG GTGGGCAGAA AGCCCCTTGC ATCCTGAGCT G T C G A G C G T C G A G C C T C G A G C G C C A A G C A G G T G G T A T
HU CH GG OO GIB MAR
3299 TGAGCCAGAT A AG A TG A AG GAG C AG
HU CH GG O0 GIB MAR
3379 A G C C G G A G A C G G A C A C T G C G G C G C G T .... C C C G C C C G C C G .... A .... G .... C .... A ACCG
HU CH GG O0 GIB MAR
3455 AAGTTGGCAT TTGG-CTTTT TAAAAAGCAA TAATACAATT TAAAACCTGG A A A A G C
HU CH GG O0 GIB MAR
3534 3613 GTAGGCGCAG GCAGGGGAAA AGGGAGGCGA GGATGTGTCC GATTCTCCTG GAATCGTTqA CTTGGAAAAA CCAGGGCGAA T G G A G G A G T G G A A G A G G A G A G G A G G G G A G G A G G G A G G A G C
CGCTCCGCAG CCGCTGACTT A A G T A A G T A A G T A A T T A G C
3298 CCTTGGAGTA GGGACCGCAT ATCGCCTGTG C O C G T G C G C A
3378 GTCCCCGTCT CCGGGAGGGC ATTTAAATTT CGGCTCACCG CATTTCTGAC G G G GG C G G G GG C G G G GG T G G T AG C T A G GA C 3454 TGTCCCCGCG GCGATTCCAA CCCGCCCTGA TCCTTTTAAG T C C T C C T T A T C T C C T 3533 GTCTCTAGAG GTGTTAGGAC GTGGTGTTGG G T T C G T C T A T C T G T C T G C C T
3614 TCTCCGCACC CAGCCCTGAC TCCCCTGCCG CGGCCGCCCT CGGGTGTCCT CGCGCCC--G AGATGCGGAG C G CT G C-CH C C G CT G T-GG C C A CT G CCT O0 C T G GG G CCT GIB C C G CT A CCT MAR A HU
3691 GAACTGCGAG GA T A GA T A GA T A GA T C TG G A
3771 GGTATCGCAG CGGGGTCTCT G G G G G G G G A C
HU CH GG O0 GIB MAR
3692 GAGCGGGGCT CTGGGCGGTT CCAGAACAGC TGCTACCCTT GGTGGGGTGG CTCCGGGGGA T G T G CG G G C T G T G CG G G C T G T G CA G G C T G T G CG G O C G T C AG C A G
HU CH GG O0 GIB MAR
3850 3772 GGCGCAGTTG CATCTCCGTA TTGAGTGCGA AGGGAGGTGC CCCTATTATT ATTTGACACC CCCC-TTGTA TTTATGGAGO A G CGA A TT C A A A G CAA A TT C G A G G CGA A CT C A T G G CGA G TT C C A A G G -GC G TC G A A
HU CH
3926 3851 GGTGTTAAAG CCCGCGGCTG AGCTCGCCAC TCCAGCCGGC GAGAGAAA-G AAGAAAAGCT GGCAAAAG~- -GAGTGTTGG CC G C AC A AA GG TGTTGG
Fig. 1.
Continued.
267 GG OO
CC TT
G C
C C
AA AA
GIB
CC
C
C
AA
MAR
CC
G
G
CA
HU
3927 A C G .... G G G G C G G T A C T G G
CH GG OO GIB MAR
HU
G C G .... G C G .... GCG--GGGTGGG ...... G
G G G G A
4003 AGGAGAGCGG
GT GT GT GT -C
GGGTGGGGAC
C C C C T
GGGGGCGGTG
G A G G G
CTAGGGCGCG
AGTGGGAACA
GAGAGGGAAG
G G G G A
C C
A AA GG T AA GG
TGTTGG TGTTGG
C
T AA
GG
TGTGGG
C
C AA GA
......
GTTGGGAGGG
GCTGCGGTGC
G G G G A
GCCGCAGCGG
C G G G G
AGGGGCCCCG
GCGCGG-AGC
4002 CGGCGGGGGT
T T C C C
G G G A G
GGGGTTCACG
4081 CAGCCGCTAG
CH GG OO
A A A
G G T
CT CG TG
A A A
G G G
C C C
G G G
T A T A T A
AG C AG C AG C
GIB MAR
A C
T A
CG AG
A G
G A
C G
G G
T A G G
AG T CT C
HU CH GG O0 GIB
4082 CGCCCAGGCG C C C C C G A G
C
MAR
HU CH
HU CH GG
TCTCCTTCAG
GTGGCGCAAA
CGGCTTCTTA T
A A A G
T T T C
4240 CGCTCCGGGC GG GG
AGGGC--GCC A C--G A A A G
C--G C--G C--G GTGA
TCCCGGGGGA GCGGGG-GCT G GGA GCG -G G GCG GGG CG
AGGGCCGATT A C A C
A T
A
G
C
G
-CC
GG
C
GCG
-G -G -A
T T
HU CH GG
GGAGAGGTTC A T A T
T TGC T TGC A GGG
A A T
G A G
T T C
4473 AGACTGCCTC CCGCTTTGTG T G T G
O0 GIB MAR
HU CH
G
4493 CTCTGCAAGG T TTC G T TGC G
OO GIB MAR
HU CH GG
G
T C C 4553 CTATGACCTC
G G T
GACTACGACT
G C
GGGACTGTGG A A A A G
CAAGCCGCTG G C T C GC
C T
T A
...... C C T T T A T T C C C C C A ...... A C ...... A C ...... A C ...... A C TCCCCC
T
CGCGCACTGC CG A CG A CG CG TC
C
CGGTGCAGCC
A A T
GTATTTCTAC
C T
CCAGCCGCTT C G C G C G C T
T T
4392 TAGGGGATAG G G G A
GCCCCTCAAC
G
ACCAAGACCC C T
C C T
A G G
TGCGACGAGG
4318 AGATAGCAGG T T T
T G
AGGTTTCCGC C C
C G C
C C G
GC AG
G
GCGCTGCGCC C C
A A C
G G
CCAAGACCAC A A A A
G
GG OO GIB MAR
HU CH GG OO
GTTCACTAAG TGCGTCTCCG T G G C C T A GC C T G GC C
GC GT
Intron i/Exon 2 TGCCCCGCTC CAGCAGCCTC CCGCGACGAT TC C A C G CC C A C G CT CC CC
4239 CCGGGCTTTG CG T CG T CA CG
G G-- GCC G C-- -CT
C
T~CGCTGCG GGGCCGACTC T A GCA T G GCA
C
GCA TGA
A G--
GAG
......
G G
CG GT
AAGGGTGCTC G G T G G C G G C A A C
A
T T
GG
AAGGGGGTGA G G G A
G
4161 TTCCTCACCG TT C TT C TT C TT T
C C
OO
4319 GGACTGTCCA GGA T AGA T GGA C GGA C
TCGATTCCTC
GCAAATTGTT A TGTA A TGTA - TGTA A TGTA
G A
GIB MAR
MAR
TTGGATTTTG A A G G
C
CGGCGGGCAC C C C C C C
HU CH GG OO GIB
ACTTTGTGCC T T T T
C
4162 CCACCTCCCG A
GG O0 GIB MAR
CCTCTCGCCT
C C C
GTTAGCTTCA T T T T C
AGGAGAACTT G
4472 CTTTAACTCA A A A A C 4552 CCAACAGGAA C C C C G
CTACCAGCAG
4632 CAGCAGCAGA
CCCCGCCCCT C C C
4712 GTCCCCTAGC T T T
G G G A 4633 GCGAGCTGCA
Fig. 1.
Continued.
GCCCCCGGCG C G C G C G
CCCAGCGAGG
ATATCTGGAA
GAAATTCGAG
CTGCTGCCCA
268 GIB MAR
C G
G A
C T
T G
HU CH GG OO GIB MAR
4713 4792 CGCCGCTCCG GGCTCTGCTC GCCCTCCTAC GTTGCGGTCA CACCCTTCTC CCTTCGGGGA GACAACGACG GCGGTGGCGG C C C A GG G T G C C T A GG G C G C C C A G G G C G C C C A G G G C G G AC G TA G C T
HU CH GG O0 GIB MAR
4793 4872 GAGCTTCTCC ACGGCCGACC AGCTGGAGAT GGTGACCGAG CTGCTGGGAG GAGACATGGT GAACCAGAGT TTCATCTGCG G C T G C T G C T G C T T T C
HU CH GG O0 GIB MAR
4873 4952 ACCCGGACGA CGAGACCTTC ATCAAAAACA TCATCATCCA GGACTGTATG TGGAGCGGCT TCTCGGCCGC CGCCAAGCTC C C A C C C C C C T
HU CH GG OO GIB MAR
4953 5032 GTCTCAGAGA AGCTGGCCTC CTACCAGGCT GCGCGCAAAG ACAGCGGCAG CCCGAACCCC GCCCGCGGCC ACAGCGTCTG A G C C A G T C A G C C C A C C A A C G
HU CH GG O0 GIB MAR
5033 5112 CTCCACCTCC AGCTTGTACC TGCAGGATCT GAGCGCCGCC GCCTCAGAGT GCATCGACCC CTCGGTGGTc TTCCCCTACC C G C A C C C A T A C C T G C A C C C G C A C C C G C G A G
HU CH GG O0 GIB MAR
5113 CTCTCAACGA CAGCAGCTCG CCCAAGTCCT GCGCCTCGCA T C C C T C A T C C C T G A T C T T T A C T C C C T G C G T C C C G C
}{U CH GG O0 GIB MAR
5193 TCCTCGACGG AGTCCTCCCC G G G A G
5272 GCAGGGCAGC CCCGAGCCCC TGGTGCTCCA TGAGGAGACA CCGCCCACCA CCAGCAGCGA G A G C G C G A G C G C G A C C G C T A C G G C G G C C A T
HU CH GG O0 GIB MAR
Exon 2/Intron 2 5273 CTCTGGTAAG CGAAGCCCGC AAGCCC C A ...... C AAGCCC C AAGCCC C GAGCCC T
CCAGGCCTGT CAAAAGTGGG C A T T A T C A T C A T C C C
HU CH GG OO GIB MAR
5352 5411 AACGGGCCAC T .................... CTTATTAGG AAGGAGAGAT AGCAGATCTG GAGAGATTTG GGAGCTCATC A .................... ATT A T A AG G T G .................... ATT A T A AG G T G ....................... A T C AG G T G .................... ATT A T A GG G T G TCAGTCGAC CCTGCCTTTC T AGT G G A AC A G
HI] CH GG OO GIB MAR
5412 ACCTCTGAAA CCTTGGGCTT T T T T T T T T C C
Fig. 1.
Continued.
AGACTCCAGC GCCTTCTCTC CGTCCTCGGA A TC A TC A TC G TC A CA
5192 TTCTCTGCTC T T T T C
5351 -CGGCTGGAT ACCTTTCCCA TTTTCATTGG CAGCTTATTT TC T T GC T TC T C GC T TG T T GC T TC T T GC T G GC C T TT C
5491 TAGCGTTTCC TCCCATCCCT TCCCCTTAGA CTGCCCATGT TTGCAGCCCC CCTCCCCGTT T C C CC C TC G G C G T C T CC C TT C G C G C C C CC C TT C G C G T G C AC C TT C A C G T C C CA C TT C G T C
269
HU
5492 TGTCTCCCAC
CCCTCAGGAA
TTTCATTTAG
GTTTTTAAAC
CTTCTGGCTT
ATCTTA-CAA
CH GG
G G
C C
A A
TCTG TCTG
C T C T
OO
G
C
A
TCTG
C T A
GIB MAR
G C
C T
A
TCTA ---G
C T . T C -
HU
5571 CTCCCGTTAA
CH GG OO GIB MAR
HU CH GG OO GIB MAR
CG CA CG TG CT
T T T T A
CATTTTAATT
A A A A G
5651 AG--TGAATG AG AG AG AG A-
GCCCTGGGGC GGGGTGGCAG T C GT GCA T C GT GCA T C GT GCC T C GT GCG C
T
TA ATG
GGAGTGTATG A TGTAT A TGTAT A TGTGT A TGTGT
5570 TTCTTCTTAC
-
T--TCTT TCTT .
AATGAGGATA A A A A
G .....
CTCAATCCAC
.
.
AGAGAGGATT GA T GA T -T GA C
G
GA
T
T C--5650 GATCTCTGAG G A A A A 5727
AATTGCTTCC CTCTTAACTT G C G C G C A C G T
5728 GTTTAGAGGC TAGGCAGGGC CTGCCTGAGT G G T AT G C G G GG T C T G C G G OO T C T G T G G GIB T C T G C C MAR G C C A C HU CH
CCGAGAAGTG CG CG TG CG CA
GT-GGGATTT -G -A -G -G TG
AATGAACTAT
CTACAAAAAT
GAGGGGCTGT A A A A
GCGGGAGCCA
GTGAACTGCC
TCAAGAGTGG
GTGGGCTGAG
GAGCTGGGAT
5807 GG GG GG GA AG
G G G G C
A A A A G
GA GC AC GC GC
G G G G -
GTG GTG GTG GTG --A 5887
58O8 HU CH
CTTCTCAGCC
GG OO GIB MAR
TATTTTGAAC ACTGAAAAGC AAATCCTTGC AAA C AAATCCT G C AAA C AAATCCT G C T C C
CAAAGTTGGA CAA G CAA G
CTTTTTTTTT CTTTTT CTTTTT
TCTTTTATTC CATTTT-TTC CTTTTT-TTC
CTTCCCCCGC C GC C GC
AAA C AAATCCT AAA C AAATCCT
G G
CAA CAA
G G
CTTCTT CTTTTT
TCTTTTTTTC T ...... TTC
C C
GC GC
GTT T GTTCAAA
C
TTG
C
AGGACC
T T T T T T ....
A
AT 5967
5888 HU CH GG OO GIB MAR
CCTCTTGGAC
TTTTGGCAAA T T T T A
ACTGCAATTT TTTTTTTTTT ATTTTTCATT G A T TC . . . . T T T T G G T T C T T T .... T G A C C ..... TTT T G A T T C T T T .... T A A G T C T T T .... C
TCCAGTAAAA
TAGGGAGTTG
TTGCAGCTAT
CATTTGCAAC
ACCTGAAGTG
AGTCCCTCAA
AAATAGGAGG
6047 TGCTTGGGAA
TGTCCTATCC CC C CC C
6127 TGGGAAGTTG G T G T
5968 HU
ACCAAGCAAT
CH GG OO GIB MAR
C C C G C
HU CH GG OO GIB MAR
6048 TGTGCTTTGC T T T T A
HU CH GG OO
6128 CACTTTTCTT C C C C T G
GIB MAR
T G T G
A A A A C
TTTGGGTGTG
GTCCATGCCA CA C CA A CA C CA TG
C C
TTCTTGGTAA
T T T
C C C
T G
C T
TAGGTAAGAA
TTGGCATCAA T T
TCCAAAGCCT C C C T C
CATTAAGTCT
TAACCCAGCT A A A
GTCTTTCCCT TTATGAGACT CTTACCTTCA TGGTGAGAGG C TCCCT TTATGAGACT CTTA A G C TCCCT TTATGAGACT CTTA A G C TCCCT TTATGAGACT CTTA A G C TCCCT TTAAGAGACT CTTA A T C .................. G G G
A C
T T C
CC CG GG
CTAAAGTCAT C C C C G
C G C
G T G T A G 6207 AGTAAGGGTG TA TA TA TA GT
HU CH GG OO GIB MAR
6208 GCTGGCTAGA C A C A G C A C A
6284 TTGGTTCTTT TTTTTT-TTT TTT-CCTTTT TT-AAGACGG AGTCTCACTC TGTCACTAGG CTGGAGTGCA TTGGTTCTTT TTTTTTGTTT TTT-CCTTTT TT-AAGACGG AGTCTCACTC TGTCACTAGG CTGGAGTGCA TTGGTTCTTT TTTTTTTTTT TTTTCCTTTT TT-AAGATGG AGTCTCACTC TGTCACTAGG CTGGAGTGCA TTG---CTTG TTTTTTTTTT TTTTCCTTTT TT-AAGACGG AGTCTCACTC TGTCACTAGG CTGGAGTGCA TTGGTTCTTT TTTTTTTTTT TTTTTTTTTT TTTAAGATGG AGTCTCACTC TGTCACTAGG CTGGAGTGCA .......................................................................
HU CH
6285 GTGGCGCAAT GTGGCGCAAT
CAACCTCCAA CAACCTCCAA
Fig. 1.
Continued.
CCCCCTGGTT CCCCCTGGTT
CAAGAGATTC CAAGAGATTC
TCCTGCCTCA CCCTGCCTCA
GCCTCCCAAG GCCTCCGAGG
TAGCTGGGAC TAGCTGGGAC
6364 TACAGGTGCA TACAGGTGCA
270 GG OO GIB MAR
GTGGTGCAAT CAACCTCCAA CCCTCTGGTT CAAGAGATTC TCCTGCCTCA GCCTCCCAAG TAGCTGGGAC TACAGGTGCA GTGGCACAAT CAACCTCCAA CTCCCTGGTT CAAGAGATTC TCCTGCCTCA GCCTCCCAAG TAGCTGGGAC TACAGGTGCA GTGGCGCGAT CAACCTCCGA CTCCCTGGTT CAAGAGATTC TCCTGCCTCA GCCTCCTAAG TAGCTGGGAC TACAGGTGCA ......................................... 6365
HU CH GG OO GIB MAR
CACCACCATG CCAGGCTAAT TTTTGT-AAT TTTAGTAGAG CACCACCATG CCAGGCTAAT TTTTGT-AAT TTTAGTAGAG CACCACCATG CCAGGCTAAT TTTTGT-AAT TTTAGTAGAG CACCACCATG CCAGGCTAAT TTTTGT-ATT TTTAGTAGAG CACCACCATG CCAGGCTAAT TTTTTTTATT TTTAGTAGAG ........................................ 6440 TGACCTCACG TGACCTCAAG TGACCTCACG TGACCTCATG
6439 C C A G G A T G G T C T C T .... CC C C A G G A T G G T C T C T .... CC C C A G G A T G G T C T C T .... CC C C A G G A T G G T C T C T .... CC ATGGGGTTTC ACCATGCTGG CCAGGATGGT CTCTATCTCC ........................................ ATGGGGTTTC ATGGGGTTTC A~GGGGTTTC ATGGGGTTTC
ATCGTGTTGG ACCATGTTGG ACCGTGTTGG ACCGTGTTGG
GIB MAR
6519 GGATTACAGG TGTGAGCCAG GGCACCAGGC TTAGATGTGG GGATTACAGG TGTGAGCCAG GGCACCAGGC TTAGATGTGG GGATTACAGG TGTGAGCCAG GGCACCAGGC TTAGATGTAG GGATTACAGG CATGAGCCAC GGCACCCAGC TTAGATGTGG TGACCTCGTG ATCCGCCCAC CTCGGCCTCC CAAAGTGCTG GGATTACAGG CATGAGCCAC AGCGCCCAGC CTAGATGTGG ................................................................................
HU CH GG OO GIB MAR
6520 6598 CTCTTTGGGG AGATAATTTT GTCCAGAGAC CTTTCTAACG TATTCATGCC TTGTATTTGT ACAGCATTA- ATCTGGTAAT CTCTTTGGGG C A G C A A G G T GTCTTTGGGG A A G C A A G G T CTCTTTGGGA A A G G A A A G T CTCTTTGGGG A A G C A A G G T ......... G A C A C A A A C C G
HU CH GG OO GIB MAR
6599 TGATTATTTT AATGTAACCT T T T CC T T T CC C T T CC T T A TC T C A CA
HU CH GG OO
HU CH GG O0 GIB MAR
ATCCGCCCAC ATCCGCCCAC ATCCACCCAC ACCCGCCCAC
CTCGGCCTCC CTCGGCCTCC CTCGGCCTCC TTCGGCCTCC
CAAAGTGCTG CAAAGTGCTG CAAAGTGCTG CAAAGTGCTG
Intron 2/Exon 3 coding 6678 TGCTAAAGGA GTGATTTCTA TTTCCTTTCT TAAAGAGGAG GAACAAGAAG ATGAGGAAGA TG CG G A GGA TG CG G A GGA TT CG G A GGA TG CG A A GGA CG CC G G ---
6679 6758 AATCGATGTT GTTTCTGTGG AAAAGAGGCA GGCTCCTGGC AAAAGGTCAG AGTCTGGATC ACCTTCTGCT GGAGGCCACA T G G T T GC C T G G T T GT T T G G T T GC C T G A T T GC C C
C
G
A
C
TC
C
HU CH GG O0 GIB MAR
6759 GCA-AACCTCC T C A C A G C C C A C T G G T C C T C A A G A G G T G C C A C G T C T C C A C A C A T C A G C A C A A C T A C G C A G C G C G C T C G C G T
HU CH GG O0 GIB MAR
6839 6918 ACTCGGAAGG ACTATCCTGC TGCCAAGAGG GTCAAGTTGG ACAGTGTCAG AGTCCTGAGA CAGATCAGCA ACAACCGAAA T T T A T T T A T T T A T T T A G C A G
HU CH GG O0 GIB MAR
6919 ATGCACCAGC CCCAGGTCCT CGGACACCGA A A G A A
HU CH GG OO GIB MAR
6999 7078 ACGAGCTAAA ACGGAGCTTT TTTGCCCTGC GTGACCAGAT CCCGGAGTTG GAAAACAATG AAAAGGCCCC CAAGGTAGTT C A T G G T C A T G G T C A T G G T T A T G G T C C C A A C
HU CH GG OO GIB MAR
7079 ATCCTTAAAA AAGCCACAGC ATACATCCTG TCCGTCCAAG C A G C A G C A g C A G T G A
Fig. 1.
Continued.
6838 GCCTCCCTCC G G G G C
6998 GGAGAATGTC AAGAGGCGAA CACACAACGT CTTGGAGCGC CAGAGGACGA G A A T G G A A T G G A A T G G A A T G A A G C A
7158 CAGAGGAGCA AAAGCTCATT TCTGAAGAGG ACTTGTTGCG C T G C T G C T A G C A C T A
271 7159 HU
Exon
GAAACGACGA
GAACAGTTGA
AACACAAACT
TGAACAGCTA
CGGAACTCTT
3 coding/Exon GTGCGTAAGG
3 not
coding
AAAAGTAAGG
CH
A
A
G
G
C
GG OO
A A
A A
G G
A G
C C
GIB MAR
A G
A G
A A
G G
A A
HU CH
7239 CTTCTAACAG T TA
GG O0 GIB MAR
HU
T T T G
TA TA TA AC
AAATGTCCTG AGCAATCA-C CT CTG GCA C CT CT CT CC
7318 TGAGTCTTGA
CCG CCG CCA AAC
GACTGAAAGA
TCA TCA TCA TTC
CTATGAACTT
GTTTCAAATG CA T
C C A T T -
CA CA CA GC
TTTAGCCATA
ATGTAAACTG
HU
CATGATCAAA
TGCAACCTCA
7317 CAACCTTGGC
GACTTTGGGC
ATAAAAGAAC
7397 TTTTTTATGC
TAAGATTTAC
7476 ACAATGTTTC
T T T C
CCTCAAATTG
CH GG O0 GIB MAR
A G A A A 7398 TTACCATCTT
TTTTTTTT-C
TTTAACAGAT
TTGTATTTAA
7238
AAAACGATTC
TG TG TG TG CA
GAATTGTTTT
TAAAAAATTT
CH GG OO
TC TC TC
TT--
C C T
TG TG TG
GIB MAR
-TC
TTT
T T
TG CT
HU CH GG OO GIB
7477 TCTGTAAATA
MAR
HU CH GG OO GIB MAR
HU
TTGCCATTAA A A A A
ATGTAAATAA
CTTTAATAAA
ACGTTTATAG
7557 AGTACCTAGT
ATTATAGGTA
CTATAAACCC
TAATTTTTTT
TATTTAAGTA
T
G
A
C
T
T
G
A A A A
A A A T
C C C T
T T T G
T T T G
A G G G
TAGAAAAAAT
Exon 3 not AAAATAACTG
TTTAAAGTTG
coding~Downstream GCAAATATAT
7715 TCTTCCCCCT CCCAACCACC TTC C T CAAC TTC C T CAAC C T C T T C
7793 GAGTTTTCCT
CAAC CAAC ACGT
CTGTTGAAAT
ACCATCCCTG AT AT
TTTGTTTTCA
CATTGAGCCA
AATCTTAAGT
TGTGAATGTT
TCAATTGCCC G G G G A
GGGTCTGGGG
GCCTTAAGGT
7873 ATG-ATAACA C C
Fig. 1.
GCCAGAGTTG
ACAGTTA--- GAAGGAATGG --G --G
A A A A G
TTGGAGGTTC TG TG TG CA CA
CAGAAGGCAG C C
G G T G G T A A T
A A A A G
CGT CGT CGT CGT TTC
A A G 7872 CCTGGAGACT
C C C C A
GTGAGAAGGT G G
CC CT CT TT TT
GAGAGGTAGG G G
-
C
---
G
A
G
G
G G
--AAA
G A
C T
A G
G T
Continued.
C C T
TAAGATGCTT
-
G
7714 TTGTTTCGTT
7792 CT--TCAGAG GGCGGTCTTA AGAAAGGCAA -- C G G T C A -- C G G T C A -- C -- C CA A
CTTTAAGTTC T T T T C
7634 ATTTTTTT---TTTT
T T T C T
AT AT TC
CH GG O0 GIB MAR
HU CH GG OO GIB MAR
CATTTTGCTT
G
G G G A G
TTC TTC CCT
7556 CCTAGTATAT A A A A
C
T T T C 7635 CTATTGTTTT
O0 GIB MAR
HU
GAATTTCAAT T T T T
G
CH GG OO GIB MAR
HU CH GG
CAGTTACACA
7941 CAA
272 Table 1.
Similarity (%) between human and other ape c-myc genesa Exon 1 nt: 2308-2881
Exon 2 nt: 4509-5277
Exon 3 nt: 6654-7670
Intron 1 nt: 2882-4508
Intron 2 nt: 5278-6653
99.00 99.50 (2)
99.50
99.70
99.60
97,50
98.00
98.42 99.00 (4)
99.14
99.09
99,02
97.67
97.67
97.34 97.95 (9)
97.94
98.83
98.92
96.39
95.55
96.78 97.95 (9)
97.43
98.70
98.43
95.95
95.62
91.54 96.35 (17)
92.47
95.32
94.00
89.81
88.56
Complete gene HU/CH nt (%) aa (%) HU/GG nt (%) aa (%) HU/OO nt (%) aa (%) HU/GIB nt (%) aa (%) HU/MAR nt (%) aa (%)
moset c-myc genes, nt (%), % of nucleotide similarity between the different regions of the c-myc genes determined from alignment of the two sequences (Fig. 1). aa (%), % of amino acid similarity (number of differences in the coding exons is shown in parentheses)
HU/CH means comparison between human and chimpanzee c-myc genes. HU/GG means comparison between human and gorilla c-myc genes. HU/OO means comparison between human and orangutan c-myc genes. HU/GIB means comparison between human and gibbon c-myc genes. HU/MAR means comparison between human and mar-
Table 2.
Comparison of human, gorilla, and orangutan myc-sequencesa
%
% Region
ts
Upstream nt 1504-2307 Exon 1 nt 2308-2881 Intron 1 nt 2882M508 Exon 2 nt 4509-5277 Intron 2 nt 5278-6653 Exon 3 nt 6654-7670 Exon 3 coding part nt 6654-7216 Exon 3 non cod. part nt 7217-7670 Downstream nt 7671-7941
tv
Gorilla/orangutan
Human/orangutan
Human/gorilla
ts
tv
%
gap
div
ts
tv
gap
div
9
+1
2.50
11
11
+1
2.92
7
4
-1
2.06
9
5
-1
2.57
-6 +2 0
3.61
34
16
3.36
1.17
9
3
-3 +2 0
-2 +5 -2 +1 0
4.45
31
20
4.17
1.08
9
1
0.89
5
1
-2 +4 -2 +1 0
-2 +1 0
1.31
4
0
1.53
0.37
0
1
-2 +1 0
gap
div 0.96
8
0.86
3
4
4
1
-1 0 0
21
13
-4
2.33
32
19
6
1
0
0.81
7
2
16
12
2.33
25
18
7
3
-2 +2 0
0.98
4
4
2
1
0
0.53
3
2
5
2
0
1.53
1
2
3
1
0
0.97
1
0
1.56
1.27 1.07
0.37
This table summarizes the localization, nature, and number of differences observed when comparing the human and gorilla, the human and orangutan, and also the gorilla and orangutan myc sequences, ts: total number of transitions observed when two c-myc sequences are compared, tv: total number of transversions observed when two c-myc sequences are compared. A gap in one sequence corresponds to the absence of one or several contiguous nt. In the human/gorilla comparison " - " corresponds to a gap in the human sequence and " + " to
a gap in the gorilla sequence. The same is true for the human/orangutan and gorilla/orangutan. % of sequence divergence values is calculated as follows:
maximum
mates were deduced from their nucleotide sequences. We
parsimony
method
(Fig. 2 B ) . I n b o t h c a s e s ,
(number of nt substitutions + number of gaps) actual shared positions + number of gaps
we obtained the same branching of species, with chim-
assumed,
p a n z e e c l o s e r to h u m a n t h a n g o r i l l a . C - m y c a m i n o a c i d s e q u e n c e s f o r t h e d i f f e r e n t pri-
same
from the extensive DNA
regulatory
elements
x 100
similarity, that the
(promoter,
protein
binding
sites, acceptor and donor splice in sites, polyadenylation
273 T a b l e 3.
HU/CH HU/GG HU/OO HU/GIB HU/MAR CH/GG CH/OO CH/GIB CH/MAR GG/OO GG/GIB GG/MAR OO/GIB OO/MAR GIB/MAR
Comparison of c-myc sequences: % divergence between species" Complete gene
Exon 1
Exon 2
%D
%K2P
%D
%K2P
%D
%K2P
%D
%K2P
%D
%K2P
%D
%K2P
1.05 1.58 2.66 3.13 10.13 1.41 2.51 2.96 10.12 2.74 3.21 10.41 3.13 9.18 10.05
0.89 1.35 2.13 2.60 8.60 1.17 2.15 2.58 8.65 2.33 2.71 8.80 2.75 8.76 8.67
0.51 0.86 2.06 2.57 7.52 1.03 2.58 3.09 7.74 2.57 3.09 8.09 3.60 8.25 8.25
0.52 0.86 1.91 2.27 7.16 1.04 2.45 2.80 7.35 2.44 2.80 7.73 3.52 8. I 1 7.91
0.26 0.91 1.17 1.30 4.74 0.91 1.04 1.30 4.68 1.56 1.69 5.07 1.43 4.81 4.68
0.26 0.92 1.18 1.31 4.84 0.92 1.05 1.31 4.84 1.58 1.71 5.26 1.45 4.99 4.84
0.39 0.98 1.08 1.57 6.21 0.78 1.08 1.57 6.21 1.27 1.77 6.50 1.38 6.21 6.02
0.39 0.99 0.79 1.39 6.25 0.79 0.79 1.39 6.25 0.99 1.59 6.58 1.19 6.13 6.04
1.47 2.33 3.63 4.54 10.02 1.72 3.31 3.99 9.92 3.37 4.05 9.91 4.17 10.21 10.34
1.36 2.11 3.18 4.23 9.32 1.61 2.99 3.69 9.17 3.11 3.82 9.23 3.87 9.55 9.76
1.97 2.25 4.45 4.38 19.67 2.41 3.74 4.10 19.46 4.17 4.63 20.13 4.65 20.72 18.88
1.61 2.06 3.81 3.86 15.01 2.29 3.19 3.80 15.08 3.81 4.26 15.48 3.95 16.52 14.22
a The % of sequence divergence values (%D) takes into account all nucleotide substitutions and the number of insertion/deletion events:
%D-
TS+TV+ID N+ID
oo~__~__~6 o, ~
0,0026c~
1
B
2Q)]
where P: transition frequency; Q: transversion frequency
35 H U 10 ~99,51
_ _
000404 CH
40 [1OOO1
000714 GG
29 [9981
[9961 O,01151 OO
27 CH 46 GG 80 O O
[1OOO1
[1000]
91 GIB
GIB (0,073~56)
Intron 2
%K2P = ~-In [(1 - 2P - Q) V ( 1 z
[9171 00451 [10001
Intron 1
The %K2P values estimate the divergence according to Kimura (1980), and considers only the substitutions:
x 100
where: TS: number of transitions; TV: number of transversions; ID: number of insertion/deletion; N: number of sites shared.
A
Exon 3
MAR
_(427)
MAR
Fig. 2. Phylogenetic trees derived from the sequence alignment presented in Fig. 1. A Tree constructed by the neighbor joining method of Saitou and Nei (1987). Numbers express relative distance. B Tree constructed by the maximum parsimony method. Numbers represent mutational events. Since these trees are unrooted, parentheses mean that
the numbers (0.07356) and (435) cannot be distributed between the branch of the marmoset (MAR) and the common branch of the other species. Numbers in brackets in A and B correspond to the bootstrap values obtained after 1,000 resamplings.
sites) gave rise to similar myc proteins. Figure 3 shows that there was a very good conservation of the sequence among primates. In all species, the protein was 439 amino acids in length, except for marmoset, which had a one-amino-acid-shorter sequence (438 amino acids) due to the deletion of a glutamine at position 261. (Numbers refer to the human sequence.) Amino acid replacements at two positions (211 and 417) could be interpreted as synapomorphies for the human/chimpanzee/gorilla clade and at position 82 for the human/chimpanzee clade. There was no position in favor of the human/gorilla or chimpanzee/gorilla clade. Human diverged from all the other primates by only one substitution: at position 354, an aspartate residue was replaced by a valine residue in human.
Discussion The great conservation of the c-myc gene organization and sequence among primates reflects the very high selection pressure that is exerted upon this gene and reinforces its biological importance. And the existence of an ORF overlapping the gorilla c-myc exon 1, as in human (Gazin et al. 1984) and chimpanzee (Argaut et al. 1991) genes, might be taken as an additional structural argument in favor of the coding capacity of this region. However, even using specific antibodies against the putative myc exon 1 protein, which recognizes a fusion protein encoded by a DNA construct Maltose-BindingProtein/Myc-exon 1/~-galactosidase, we failed to characterize such a protein (data not shown), contrary to what
274 HU CH GG OO GIB MAR MOU RAT CAT
1
MPLNVSFTNR ~fDLDYDSVQ PYFYCDEEEN FYQQQQQSEL QPPAPSEDIW KKFELLPTPP LSPSRRSGLC S T Y Q S T Y Q S T Y Q S T Y Q S S Y Q N T I H S A I H S A Y Q
HU CH GG OO GIB MAR MOU RAT CAT
71
SPSYVAV-TP FSLRGDNDGG GGSFSTADQL P Y AV-TP L G N S P Y AV-TP P G N S P Y AV-TP P G N S P Y AV-TP P G N S T C SV-TP P G N S P Y AVATS P E D N P Y AVATS P E D N P Y AF-AS P G D S
EMVTELLGGD MVNQSFICDP DDETFIKNII E V E V E V E V E V Q M E M E V
IQDCMWSGFS
139
HU CH GG OO GIB MAR MOU RAT CAT
140
AAAKLVSEKL ASYQAARKDS GSPNPARGHS G PN S G LN S G PN S S PN S S PN S T LS S T LS S G PS G
VCSTSSLYLQ DLSAAASECI DPSVVFPYPL NDSSSPKSCA V S S S S P V S S S S A V S S L S T V S S S S A V S S S P A V S T S S T V S T S S T G P T S P A
209
HU CH GG OO GIB MAR MOU RAT CAT
210
HU CH GG OO GIB MAR MOU RAT CAT
280
CKII SQDSSAFSPS SDSLLSSTES SPQGSPEPLV LHEETPPTTS SDSEEEQEDE EEIDVVSVEK RQAPGKRSES Q S P T QGS V ED E A G Q S P T QGS V ED E A G P S P T QGS V ED E A G P S P T QAS V ED E A S P S T T RAS V ED P G S T P RAS V ED E T A S T S RAT V DD E T A P S P A RAS A EE E P A
70
279
Loc Nuc GSPSAGGHSK PPHSPLVLKR CHVSTHQHNY AAPPSTRKDY PAAKRVKLDS PSAG S S V PSVG S S V PSAG I S V PSAG S S V PSSG S S V SPFR S S A SPSR S S A PSAG S P A
VRVLRQISNN RKCTSPRSSD V R T V R T V R T V R T V R T G K S G K S G K I
349
419
b-Hr.H CKII
Loc Nuc
HU CH GG OO GIB MAR MOU RAT CAT
350
TEENVKRRTH NVLERQRRNE LKRSFFALRD QIPELENNEK APKVVILKKA TAYILSVQAE EQKLISEEDL T D A V AE Q I E T D A V AE Q I E A D A I AE Q I K T D A V GE Q T K T D T V AE Q I K T D A I AD H T K T D A V AD H I K T D A V AG Q I K
HU CH GG OO GIB MAR MOU RAT CAT
420
LRKRREQLKH KLEQLRNSCA C C C C C G G C
bZ
Fig. 3. Alignment of myc protein sequences of different species. Comparison of c-myc amino acid sequences deduced from nncleotides sequences of Homo sapiens (HU), Pan troglodytes (CH), Gorilla gorilla (GG) (this work), Pongo pygmaeus (00) (this work), Hylobates
439
far (G1B), Callithrix jacchus (MAR), Felix sylvester (CAT) (Stewart et al. 1986), Rattus norvegicus (RAT) (Hayashi et al. 1987), Mus musculus (MU) (Bernard et al. 1983).
275
Gazin et al. (1986) observed with peptide-directed antibodies. We analyzed the entire c-myc gene sequence for different primates, using either the maximum parsimony method, an exhaustive search method which chooses the best tree among a large number of possible trees, or the neighbor-joining method, a stepwise clustering method which examines local topological relationships of the tree. Both methods allowed us to separate the human/ chimpanzee clade from gorilla, in good agreement with Perrin-Pecontal et al. (1992) analyzing the 13-globin gene region and Bailey et al. (1992) analyzing the ~-rl-globin region and the flanking noncoding sequences of the K-globin gene region. It is noteworthy that in our study, as in that of PerrinPecontal, the lengths of the sequences were on the same order of magnitude (around 6,000 kb) and the same treemaking methods were used. When analyzing sequences, the choice of the method of phylogenetic tree construction is of great importance. For instance, Brown et al. (1982), analyzing a segment of mitochondrial DNA 896 bp in length by means of the maximum parsimony method, found that chimpanzee and gorilla were closer to one another than to human. However, Hasegawa et al. (1990), analyzing the same segment by the maximum likehood method (Felsenstein 1981), concluded that human/chimpanzee clustering was preferred to the alternative branching orders among human, chimpanzee, and gorilla. The fact that we found a relatively long internal branch separating the human/chimpanzee clade from the last common ancestor with gorilla, ranging from 22 to 50% of the average human/chimpanzee terminal branch length, favors the human/chimpanzee grouping. These values are in good agreement with those obtained by Ruvolo et al. (1991) and Horai et al. (1992) studying mitochondrial DNA. In contrast to the present results, Oetting et al. (1993), studying the evolution of tyrosinase-related genes in primates, found that gorilla was more closely related to human than chimpanzee. This discrepancy could be accounted for by the fact that these authors analyzed pseudogenes (320 nucleotides in length) known to have evolved fast, thus rendering difficult a good alignment (Miyamoto et al. 1987, 1988; Ueda et al. 1989), rather than a functional coding gene, as in this study. Another possible explanation might be the use of a different computational analytical program; both methods, however, rely on parsimony. In good agreement with the branching order deduced from the analysis of the nucleotide sequences, comparisons of the deduced amino acid sequences allowed the separation of a human, chimpanzee, gorilla clade and among these three species we could identify a humanchimpanzee group. Our study of the c-myc gene and protein among primates suggests-that the question of the branching order
between human, gorilla, and chimpanzee could be solved by the existence of a human/chimpanzee clade. Moreover, this underlines the fact that numerous factors have to be taken into account when comparing DNA sequences in order to construct reliable phylogenetic trees. The sequence must be well conserved among the species studied to allow good alignment. It must be sufficient in length in order to provide enough informative sites, most of all when comparing tightly related species. Finally, the use of different computational programs and treemaking methods is necessary to avoid potential biases in the analysis. Acknowledgments. We wish to thank Dr. Vrronique Marie and Annick Thebault (SIS Pasteur) for helpful discussion and Dr. Vre Chaduc of Touroparc for the orangutan blood sample.
References Argaut C, Rigolet M, Eladari ME, Galibert F (1991) Nucleotide sequence of chimpanzee c-myc ongogene. Gene 97:231-237 Bailey WJ, Hayasaka K, Skinner CG, Kehoe S, Sieu LC, Slightom JL, Goodman M (1992) Reexamination of the African hominoid trichotomy with additional sequences from the primate [3-globingene cluster. Mol Phyl Evol 1:97-135 Bernard O, Cory S, Gerondakis S, Webb E, Adams JM (1983) Sequence of the murine and human cellular myc oncogenes and two modes of myc transcription resulting from chromosome in translocation B-lymphoid tumors. EMBO J 2:2375-2383 Brown WM, Prager EM, Wang A, Wilson AC (1982) Mitochondrial DNA sequences of primates: tempo and mode of evolution. J Mol Evol 18:225-239 Eladari ME, Mohammad-Ali K, Argaut C, Galibert F (1992) Gibbon and marmoset c-myc nucleotide sequences. Gene 116:231-243 Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368-376 Felsenstein J (1988) PHYLIP version 3-3 manual. University of Washington, Seattle Feng DF, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351-360 Fitch W (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Systems Zool 20:406-416 Gazin C, Dupont de Direchin S, Hampe A, Masson JM, Martin P, Stehelin D, Galibert F (1984) Nucleotide sequence of the human c-myc locus: provocative open reading from within the first exon. EMBO J 3:383-387 Gazin C, Rigolet M, Briand JP, Van Regenmortel MHV, Galibert F (1986) Immunoclinical detection of proteins related to the human c-myc exon 1. EMBO J 5:2241-2250 Gonzales IL, Sylvester JE, Smith TF, Stambolian D, Schmickel RD (1990) Ribosomal RNA gene sequences and hominoid phylogeny. Mol Biol Evol 7:203-219 Goodman M, Tagle DA, Fitch DHA, Bailey W, Czelusniak J, Koop BF, Benson P, Slightom JL (1990) Primate evolution at the DNA level and a classification of hominoids. J Mol Evol 30:260-266 Hasegawa M, Kishino H, Hayasaka K, Horai S (1990) Mitochondrial DNA evolution in primates: transition rate has been extremely low in the lemur. J Mol Evol 31:113-121 Hayasaka K, Gojobovi T, Horai S (1988) Molecular phylogeny and evolution of primate mitochondrial DNA. Mol Biol Evol 5:626644 Hayashi K, Makino R, Kawamura H, Arisawa A, Yonedak K (1987)
276 Characterization of rat c-myc and adjoint region. Nucleic Acids Res 15:6419-6436 Holmquist R, Miyamoto MM, Goodman M (1988) Analysis of higherprimate phylogeny from transversion differences in nuclear and mitochondrial DNA by Lake's methods of evolutionary parsimony and operators metrics. Mol Biol Evol 5:217-236 Horai S, Satta Y, Hayasaka K, Kondo R, Inoue T, Ishida T, Hayashi S, Takahata N (1992) Man's place in hominoidea revealed by mitochondrial DNA genealogy. J Mol Evol 35:32-43 Huang ME, Gobet M, Gatibert F (1989) A modified M13 vector specifically designed for ExoIII-nested deletion sequencing. Methods Mol Cell Biol 1:161-164 Kimura M (1980) A simple method for estimating evolutionary rate of base substitution through comparative studies of nucleofide sequences. J Mol Evol 16:111-120 Maeda N, Wu CI, Bliska J, Reneke J (1988) Molecular evolution of intergenic DNA in higher primates: pattern of DNA changes, molecular clock, and evolution of repetitive sequences. Mol Biol Evot 5:1-20 Miyamoto MM, Slightom JL, Goodman M (1987) Phylogenetic relations of humans and African apes from DNA sequences in the ~rl-globin region. Science 238:369-373 Miyamoto MM, Koop BF, Slightom JL, Goodman M, Tennant MR (1988) Molecular systematics of higher primates: genealogical relations and classification. Proc Natl Acad Sci USA 85:7627-7631
Myers EW, Miller W (1988) Optimal alignments in linear space. Comput Appl Biosci 4:11-17 Oetting WS, Stine OC, Townsend D, King RA (1993) Evolution of the tyrosinase related gene (TYRL) in primates. Pigmental Res 6:171177 Perrin-Pecontal P, Gary M, Nigon VM, Trabuchet G (1992) Evolution of the primate ~-globin gene region: nucleotide sequence of the &~-globin intergenic region of gorilla and phylogenetic relationship between African apes and man. J Mol Evol 34:17-30 Ruvolo MT, Disotell T, Allard MW, Brown WM, Honeycutt RL (1991) Resolution of the African hominoid trichotomy using a mitochondrial gene sequence. Proc Natl Acad Sci USA 88:1570-1574 Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing photogenetic trees. Mol Biol Evol 4:406-425 Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain terminating inhibitors. Proc Natl Acad Sci USA 74:5463-5467 Ueda S, Matsuda F, Honjo T (1988) Multiple recombinational events in primate immunoglobulin-epsilon and alpha genes suggest closer relationship of humans to chimpanzees than to gorillas. J Mol Biol 27:77-83 Ueda S, Watanabe Y, Saiton N, Omoto K, Hayashida H, Miyata T, Hisajima H, Honjo T (1989) Nucleotide sequences of immunoglobulin-epsilon pseudogenes in man and apes and their phylogenetic relationships. J Mol Biol 205:85-90
Lihat lebih banyak...
Comentarios