Species-specific Exon Loss in Human Transcriptomes

Share Embed


Descripción

MBE Advance Access published December 23, 2014

Species-Specific Exon Loss in Human Transcriptomes Jinkai Wang,y,1 Zhi-xiang Lu,y,1 Collin J. Tokheim,1 Sara E. Miller,2 and Yi Xing*,1 1

Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles Department of Internal Medicine, University of Iowa yThese authors contributed equally to this work. *Corresponding author: E-mail: [email protected]. Associate editor: Joshua Akey 2

Abstract

Key words: exon loss, evolution, RNA-seq, splicing, primate.

Introduction

eukaryotes have multiple exons, and alternative splicing of multiexon genes can produce distinct mRNA and protein isoforms from a single gene locus (Nilsen and Graveley 2010). These alternative isoforms can be viewed as “internal paralogs” of a gene to allow for an accelerated rate of gene evolution (Boue et al. 2003; Modrek and Lee 2003). The most well-studied type of splicing evolution is the creation of new exons (Sorek 2007). During evolution, new exons can be added to existing functional genes through exonization of nonexonic sequences or duplication of existing exons (Kondrashov and Koonin 2001; Letunic et al. 2002; Wang et al. 2005; Zhang and Chasin 2006). A series of studies have systematically analyzed exon creation in higher eukaryotes using transcriptome data generated by cDNA/expressed sequence tag (EST) sequencing (Modrek and Lee 2003; Wang et al. 2005; Zhang and Chasin 2006; Alekseyenko et al. 2007; Corvelo and Eyras 2008), exon microarray (Lin et al. 2008, 2009), and RNA sequencing (RNA-seq; Shen et al. 2011). Collectively, these studies demonstrate that exon creation is widespread during recent primate and human evolution, and provides an important means for recruiting nonfunctional sequences such as transposable elements into functional genes. For example, Zhang and Chasin (2006) used a multispecies comparison to identify over 2,000 exon creation events during primate and human evolution. Sela et al. (2007)

ß The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected]

Mol. Biol. Evol. doi:10.1093/molbev/msu317 Advance Access publication November 14, 2014

1

Article

A major interest in evolutionary biology is to elucidate the origin of evolutionary novelties. Evolution can create novel functions and regulatory programs through a variety of mechanisms, such as creation of new genes (Knowles and McLysaght 2009; Wu et al. 2011), exaptation of nonfunctional sequences including transposable elements (Muotri et al. 2007; Feschotte 2008), and small-scale sequence changes that alter gene regulation or protein function (Enard et al. 2002; Haygood et al. 2007; Wang et al. 2007; Maricic et al. 2013). Additionally, lineage-specific loss of functional genetic materials can play a role in adaptive evolution (Wang et al. 2006; McLean et al. 2011). For example, a number of genes were inactivated in humans after the divergence from chimpanzees, and these gene loss events were thought to play an active role in the evolution of human-specific traits (Wang et al. 2006). Moreover, human-specific loss of regulatory DNA occurred in hundreds of genomic loci and may contribute to unique human traits such as the expansion of brain size and the loss of penile spines (McLean et al. 2011). Recently, the evolution of exon–intron structures and gene splicing patterns has emerged as an important mechanism for the evolution of gene functions and species-specific regulatory networks (Xing and Lee 2006; Barbosa-Morais et al. 2012; Merkin et al. 2012). The vast majority of genes in multicellular

Downloaded from http://mbe.oxfordjournals.org/ at University of California, Los Angeles on December 26, 2014

Changes in exon–intron structures and splicing patterns represent an important mechanism for the evolution of gene functions and species-specific regulatory networks. Although exon creation is widespread during primate and human evolution and has been studied extensively, much less is known about the scope and potential impact of human-specific exon loss events. Historically, transcriptome data and exon annotations are significantly biased toward humans over nonhuman primates. This ascertainment bias makes it challenging to discover human-specific exon loss events. We carried out a transcriptome-wide search of human-specific exon loss events, by taking advantage of RNA sequencing (RNA-seq) as a powerful and unbiased tool for exon discovery and annotation. Using RNA-seq data of humans, chimpanzees, and other primates, we reconstructed and compared transcript structures across the primate phylogeny. We discovered 33 candidate human-specific exon loss events, among which six exons passed stringent experimental filters for the complete loss of splicing activities in diverse human tissues. These events may result from human-specific deletion of genomic DNA, or small-scale sequence changes that inactivated splicing signals. The impact of human-specific exon loss events is predominantly regulatory. Three of the six events occurred in the 50 untranslated region (50 -UTR) and affected cis-regulatory elements of mRNA translation. In SLC7A6, a gene encoding an amino acid transporter, luciferase reporter assays suggested that both a human-specific exon loss event and an independent human-specific single nucleotide substitution in the 50 -UTR increased mRNA translational efficiency. Our study provides novel insights into the molecular mechanisms and evolutionary consequences of exon loss during human evolution.

MBE

Wang et al. . doi:10.1093/molbev/msu317

2

nonhuman primate genomes (Chimpanzee Sequencing and Analysis Consortium 2005; Florea et al. 2005). This ascertainment bias makes it challenging to discover human-specific exon loss events. Two early studies used the human-chimpanzee genome alignment to identify chimpanzee genomic regions that match exons in chimpanzee gene models or nonprimate vertebrate mRNAs but are deleted in the human genome (Sen et al. 2006; Hahn et al. 2007). This strategy identified several putative human-specific exon loss events. However, there was no direct evidence to support the splicing of these exons in nonhuman primates, and almost all reported events were in hypothetical genes later removed from gene annotation databases. It must be noted that any detection strategy based on human-specific deletion of genomic DNA is inherently limited, because it will miss exon loss events due to small-scale sequence changes that inactivated splicing signals in humans. In this study, we carried out a genome-wide discovery of human-specific exon loss events, by taking advantage of RNA-seq as a powerful technology for exon discovery and annotation. RNA-seq enables accurate characterization and comparison of transcript structures across closely related species (Blekhman et al. 2010; Perry et al. 2012). Using RNA-seq data, one can identify exons and predict transcript structures de novo even in the absence of prior transcriptome annotations (Guttman et al. 2010; Trapnell et al. 2010). Here, we used deep RNA-seq data of human and nonhuman primate tissues to reconstruct and compare transcript structures across the primate phylogeny. By identifying exons absent in human RNAs but present in the RNAs of chimpanzees and outgroup species, we discovered 33 candidate human-specific exon loss events, among which six exons passed stringent experimental filters for the complete loss of splicing activities in diverse human tissues. These data allowed us to investigate the mechanisms and potential consequences of exon loss during human evolution.

Results Computational Discovery of Candidate HumanSpecific Exon Loss Events from RNA-seq Data We developed a computational pipeline to discover candidate human-specific exon loss events (fig. 1A). Briefly, we used a published deep RNA-seq data set of six tissues including brain (cerebral cortex or whole brain without cerebellum), cerebellum, heart, kidney, liver, and testis of human, chimpanzee, orangutan, and rhesus macaque (Brawand et al. 2011), as well as known transcript annotations of human and mouse genes (see details in Materials and Methods). For each tissue and species, we consolidated RNA-seq reads from all biological replicates and mapped reads to the respective genomes using TopHat (Trapnell et al. 2009). We then used Cufflinks (Trapnell et al. 2010) to perform de novo reconstruction of transcript structures. To compare transcript structures across species, we matched genomic regions between chimpanzee and other species using the UCSC pairwise genome alignments (Materials and Methods). From all internal (spliced) exons in chimpanzee transcripts, we identified

Downloaded from http://mbe.oxfordjournals.org/ at University of California, Los Angeles on December 26, 2014

identified over 1,500 human exons derived from primatespecific Alu retrotransposons, and these exons are by definition primate-specific. Through subsequent evolution, some of these new exons can acquire functional roles. For example, a primate-specific new exon of the gene encoding the RNA editing enzyme ADARB1 (also known as ADAR2), created through the exonization of an intronic Alu element, inserts a peptide segment into the catalytic domain of ADARB1 and alters the catalytic activity of the protein product (Gerber et al. 1997). Another study shows that primate-specific exons in the 50 untranslated region (50 -UTR) of human genes often contain cis elements of translational regulation and can alter the translational efficiency of the mRNA (Shen et al. 2011). The opposite scenario to exon creation is species-specific exon loss in functional genes. A classic example is the humanspecific exon loss in the CMAH gene, which encodes a hydroxylase responsible for the biosynthesis of N-glycolylneuraminic acid (Neu5Gc) from N-acetylneuraminic acid (Neu5Ac) (Chou et al. 1998, 2002; Irie et al. 1998; Hayakawa et al. 2001). Neu5Ac and Neu5Gc are sialic acids that play important roles in cell–cell and cell–pathogen interactions (Schauer 2009). A 92-bp exon of CMAH was deleted from the human genome via Alu-mediated replacement of the genomic DNA (Hayakawa et al. 2001). This causes the frameshift of the downstream coding sequence, turning a functional CMAH gene encoding an active enzyme into a pseudogene. Consequently, human cells lack the ability to synthesize Neu5Gc, and this results in profound biochemical and physiological differences between humans and nonhuman primates (Varki 2001, 2010). For example, the loss of Neu5Gc and the resulting increase in Neu5Ac levels in human cells have affected humans’ interactions with Neu5Gc- or Neu5Ac-binding pathogens. Consequently, humans are more resistant to Neu5Gc-binding pathogens but more susceptible to Neu5Ac-binding pathogens, and this has important implications for human evolution (Varki 2001, 2010). Despite this fascinating example, our knowledge of exon loss during recent human evolution is extremely limited. Unlike exon creation, which has been studied extensively in the past decade (Modrek and Lee 2003; Wang et al. 2005; Zhang and Chasin 2006; Alekseyenko et al. 2007; Corvelo and Eyras 2008; Lin et al. 2008, 2009; Shen et al. 2011), there are few genome-wide studies of exon loss (Alekseyenko et al. 2007) especially for recent exon loss events in the human lineage. This is understandable, considering the difficulty of identifying human-specific exon loss events using genome and transcriptome data. It is well known that the presence of splice site sequences in the genome is not a reliable indicator of exon splicing (Alekseyenko et al. 2007). To confidently discover exon loss events in the human lineage, we need to identify exons not seen in human transcripts but present in the transcriptomes of chimpanzees and other nonhuman primates. Historically, transcriptome data and exon annotations are significantly biased toward humans over nonhuman primates. In fact, comparative annotation using human mRNA sequences has long been an important strategy for annotating

MBE

Exon Loss in Human Transcriptomes . doi:10.1093/molbev/msu317

A

Chimpanzee Internal Exons (Cufflinks) 153,560 Human Exons (Ensembl+Vega+UCSC+ RefSeq+Cufflinks)

Is the exon found in human? No 576

Orangutan Exons (Cufflinks) Rhesus Exons (Cufflinks)

Yes 264

Additional filters: 1. Exon cannot be found in human RefSeq RNA sequences. 2. UCSC genome alignment must be confirmed by BLAT. 3. No exon inclusion junction read from human RNA-seq data. Yes 33 PCR validation

B

Human-specific exon loss of CMAH

Human kidney 15 0 5 Human liver 0 10 Human brain 0 Chr6 Chimp liver

10 0 Chr6

Orangutan liver 25 0 Chr6

Rhesus liver

50 0 Chr4

FIG. 1. Identification of human-specific exon loss events. (A) The computational pipeline to identify human-specific exon loss events using RNA-seq data and transcript annotations. (B) RNA-seq data of a known human-specific exon loss event in CMAH for human and nonhuman primate tissues. y axis represents RNA-seq read density. The Cufflinks-reconstructed transcripts are shown below the read density tracks. The arrows indicate the exon lost in humans. Flanking exons are also indicated in the plot.

those that were missing from human transcripts (known transcript annotations + Cufflinks transcript reconstructions) but present in the transcripts of at least one outgroup species (orangutan, rhesus macaque, or mouse). To distinguish exon loss events from gene loss events, we required that the flanking exons in the chimpanzee transcripts must have matching exons in the orthologous human genes. To further ensure that we comprehensively captured human exons and the chimpanzee exons were indeed absent from human transcriptomes, we analyzed independent human RNA-seq data covering 19 tissues, including 16 tissues in the Human Body Map 2.0 data set and 3 tissues from different anatomical compartments of the human placenta (Kim et al. 2012; see

Materials and Methods). We used Cufflinks to reconstruct transcript structures for this expanded set of human tissues and searched chimpanzee exons against the Cufflinks-reconstructed human exons. In total, 264 potential human-specific exon loss events were identified using this computational procedure (fig. 1A). We performed additional computational analyses to remove false positive findings of exon loss. Our computational pipeline could produce false positives due to a variety of reasons, such as missing segments in the current human genome assembly, errors in the UCSC genome alignments, or the failure of Cufflinks to detect spliced exons from human RNA-seq data. To address these three potential sources of 3

Downloaded from http://mbe.oxfordjournals.org/ at University of California, Los Angeles on December 26, 2014

Mouse Exons (Ensembl+Vega+UCSC+ RefSeq)

Is the exon found in at least one outgroup species?

Wang et al. . doi:10.1093/molbev/msu317

Experimental Validation of Candidate Human-Specific Exon Loss Events in Diverse Human and Nonhuman Primate Tissues Next, we performed experimental validation of the 33 candidate human-specific exon loss events. Of these 33 events, four events appeared to be due to human-specific deletion of the genomic DNA that contained the exon. For these four events, we performed polymerase chain reaction (PCR) analyses of genomic DNAs from human and nonhuman primates to confirm the human-specific genomic deletion. In the remaining events, the chimpanzee exon had an aligned orthologous region in the human genome, but there was no evidence for exon splicing based on human RNA-seq data and known transcript annotations. For these events, we performed reverse transcription polymerase chain reaction (RT-PCR) analyses of a large panel of human, chimpanzee, and rhesus tissues (supplementary table S1, Supplementary Material online) to verify the presence or absence of the exons in mRNA transcripts. Specifically, for each exon in each species, we designed a maximum of two primer pairs, with one pair amplifying the mRNA sequence between the exon and its upstream exon and the other pair amplifying the mRNA sequence between the exon and its downstream exon. We declared an exon to be absent from a given RNA sample if both primer pairs failed to amplify the expected RT-PCR products. We should point out that most of the human tissues represented in the original RNA-seq data set were from a limited number of individuals. For example, the 16 tissues in the Human Body Map 2.0 data set were all from single donors. Thus, to ensure that the identified exon loss events reflected true species differences in exon usage, we carried out our validation experiments on human RNA samples pooled from a large number of individuals (supplementary table S1, Supplementary Material online). Our experimental analysis confirmed the complete loss of exon splicing of six exons in all tested human tissues, whereas their orthologous exons were detected in RNAs from chimpanzees and other nonhuman primates (table 1 and 4

supplementary table S2, Supplementary Material online). Of these six exon loss events, three (in RIIAD1, SLC7A6, and CMAH) were due to human-specific genomic deletion. For example, in SLC7A6 the alignment of the human and nonhuman primate genomes indicated human-specific deletion of an approximately 2-kb genomic region, which removed the exon from the human gene (fig. 2A). Our PCR analysis of human genomic DNA indicated that this genomic deletion was a fixed change during human evolution, because we observed homozygous deletion of this region in all eight tested human individuals from major human populations (fig. 2B and supplementary table S1, Supplementary Material online). In contrast, RT-PCR analyses showed that the exon was present in all tested chimpanzee and rhesus tissues (fig. 2C and D). In fact, RNA-seq data suggested that this exon was always 100% included in the transcripts in chimpanzee and rhesus tissues (supplementary table S3, Supplementary Material online). Together, these data suggest that a constitutive SLC7A6 exon in the most recent common ancestor of humans and chimpanzees was lost during human evolution. We confirmed the same pattern of human-specific genomic deletion for the exons in RIIAD1 and CMAH (supplementary figs. S1 and S2, Supplementary Material online). Additionally, we checked the Database of Genomic Variants (DGV, http:// dgv.tcag.ca/, last accessed November 28, 2014) (MacDonald et al. 2014), which contains structural variants of the human genome reported by the 1000 Genomes Project. None of the three events is polymorphic in DGV, further confirming the notion that these three genomic deletion events are fixed in human populations. The other three events (in KANK1, ZMYND11, and LOC150568) were in conserved regions between the human and chimpanzee genomes. Of these, two exon loss events (in KANK1 and ZMYND11) were apparently due to single nucleotide changes that disrupted the splice sites in humans. In KANK1, we confirmed the absence of exon splicing in all tested human tissues (fig. 3A), whereas the exon was present in almost all chimpanzee and rhesus tissues (fig. 3B and C). The alignment of the human and nonhuman primate genomes revealed a G to A change in the first intronic nucleotide downstream of the exon, which disrupted the highly conserved GT dinucleotide of the 50 -splice site (fig. 3D). Likewise, we observed a human-specific G to C nucleotide change that disrupted the 30 -splice site of the ZMYND11 exon (supplementary fig. S3, Supplementary Material online). We also validated the exon loss event in the noncoding RNA gene LOC150568 (supplementary fig. S4, Supplementary Material online). The 50 - and 30 -splice sites of this exon remained intact in the human genome. Therefore, this exon loss event may be due to other human-specific sequence changes that abolished exon splicing in the human lineage. However, we should caution that because its splice sites are intact in the human genome, we cannot completely rule out the possibility that this exon may be spliced in other cell types or developmental stages not tested in this study. Twenty-seven of the 33 candidate events had RT-PCR signals in human tissues. We should note that our RT-PCR detection of exon splicing by placing one primer on the candidate exon is highly sensitive, because we will detect

Downloaded from http://mbe.oxfordjournals.org/ at University of California, Los Angeles on December 26, 2014

false positives, we designed three corresponding computational filters (see details in Materials and Methods). We applied these three filters in a sequential manner and they removed 137, 77, and 17 exons respectively. After these filters, we obtained a final list of 33 candidate human-specific exon loss events for further experimental analyses. For proof of concept, we checked if our computational pipeline could identify the well-known human-specific exon loss event in the CMAH gene (Chou et al. 1998, 2002; Hayakawa et al. 2001). Indeed, our analysis of human and nonhuman primate RNA-seq data successfully captured this exon loss event. As seen from the RNA-seq read density plot and the Cufflinks transcript reconstructions (fig. 1B), this CMAH exon and its flanking exons had robust RNA-seq signals in chimpanzee, orangutan, and rhesus tissues. In contrast, in human tissues there was no signal for the exon of interest, whereas the flanking exons still had robust RNA-seq signals, indicating human-specific exon loss.

MBE

chr9:581223–581348

chr10:244360–244475

chr1:130786076–130786187

chr16:68044023–68044073

chr6:25603207-25603298

chr2a:105599278–105599442

KANK1

ZMYND11

RIIAD1

SLC7A6

CMAH

LOC150568 165

92

51

112

116

Exon Length 126

Genomic deletion (Alu mediated)

50 -UTR

In noncoding RNA

Unknown

Genomic deletion (Alu mediated)

Genomic deletion

50 -UTR

CDS

Splice site disruption

Splice site disruption

50 -UTR

CDS

Mechanism

Location

Unknown

Exon loss removes an alternative isoform subject to NMD Exon loss removes 48 and 117 bp uORFs Exon loss shortens a 264 bp uORF to 213 bp Exon loss inactivates a coding gene to a pseudogene

Exon loss removes 36, 33, and 12 bp uORFs

Consequence

Alternative (25–100%)

Constitutive

Constitutive

Constitutive

Alternative (0–16%)

Exon Splicing Levels in Nonhuman Primatesb Constitutive

b

Expressed and included in brain, heart, and testis

Widely includedd

Expressed and included in brain, testis, kidney, and heart Widely includedd

Cytidine monophospho-N-acetylneuraminic acid hydroxylase Unknown

Uptake of amino acids

Protein phosphorylation (Predicted)

Involved in the control of cytoskeleton formation by regulating actin polymerization Corepressor of transcription (Probable)

Widely includedd

Widely includedd

Gene Function

Supported Tissuesc

Downloaded from http://mbe.oxfordjournals.org/ at University of California, Los Angeles on December 26, 2014

Coordinates are based on chimpanzee panTro2 assembly. The exon splicing levels are based on the RNA-seq data of chimpanzee, orangutan and rhesus macaque (supplementary table S3, Supplementary Material online). c Based on RNA-seq data and RT-PCR results from this study. d Expressed and spliced in all the chimpanzee and rhesus tissues used for RT-PCR analyses.

a

Chimpanzee Exon Regiona

Gene Symbol

Table 1. Summary of Human-Specific Exon Loss Events.

Exon Loss in Human Transcriptomes . doi:10.1093/molbev/msu317

MBE

5

MBE

Wang et al. . doi:10.1093/molbev/msu317

A

B Human gDNA

SLC7A6

La d G der M G 185 M 5 G 189 5 M 5 1 6 G 91 M 2 19 9 G 2 M 4 G 18 0 M 51 G 12 7 M 15 G 128 6 M 7 C 19 8 h i 23 m R p 8 he La su dd s er

Chimp/Rhesus gDNA

Human Chimp

3 kb 2 kb 1.5 kb

1 kb 500 bp 400 bp

Rhesus

Human Individuals Chimp cDNA

150 bp 100 bp

150 bp 100 bp

dd C er er Ki ebe dn ll Li ey um ve -7 -7 M r-7 59 59 u 5 C scle 9 er Ki ebe 759 dn ll Li ey um ve -7 -7 M r-7 75 75 us 75 C cleer 7 e 7 Ki be 5 dn llu Li ey mve -4 4 M r-4 53 53 u 5 H scl 3 ea er 4 L a t-L8 53 d d 29 H er ea H rt-J ea 2 8 H rt-J 0 ea 1 3 H rt-J 4 ea 5 6 H rt-J 9 ea 1 0 r 8 Lu t-H ng 70 Lu -H 8 n 7 Fi g-J 08 br 44 C obla 7 er s e t K i be d n llu Li e y - m-R ve R H r H M -RH 1 9 198 us 1 8 4 4 3 Te cle 984 3 st -RH 3 is -R 19 H 84 19 3 84 3

Rhesus cDNA

La

D

FIG. 2. Human-specific exon loss in SLC7A6 due to genomic deletion. (A) Schematic diagram showing the human-specific exon loss event in SLC7A6. (B) PCR validation of genomic deletion in human genomic DNAs. Primer positions are illustrated as arrows in the schematic diagram. (C and D) RT-PCR validation of exon splicing in chimpanzee (C) and rhesus macaque (D) tissues. Primer positions are illustrated as arrows in the schematic diagram. The expected RT-PCR bands are illustrated on the right.

the exon signal as long as the exon is spliced (even at low levels) in some of the individuals that comprised the pooled tissue sample. Thus, we performed additional analyses to investigate the splicing levels of these exons in human and nonhuman primate tissues. Specifically, we randomly selected 16 of these 27 exons and measured their exon inclusion levels in human, chimpanzee, and rhesus tissues by semiquantitative RT-PCR using primers designed on flanking exons. We estimated exon inclusion levels based on the band intensities of RT-PCR products corresponding to exon inclusion and skipping isoforms. We found that most of these exons had zero or very low exon inclusion levels in all or the vast majority of human tissues, whereas their orthologous exons had much higher inclusion levels in chimpanzee and rhesus tissues (supplementary table S4, Supplementary Material online; also see supplementary fig. S5A–C, Supplementary Material online, for a few representative examples). These lowly included exons were not found in existing human transcript annotations or in the RNA-seq data of diverse human tissues. Together, our results suggest that most of these 27 exons had a significant reduction in splicing levels during human evolution or loss of splicing in most human tissues, and may have interesting evolutionary implications. Nonetheless, for our further studies 6

we focused on the six exons with complete loss of exon splicing in all human tissues.

Characteristics of Human-Specific Exon Loss Events Of the six events with complete loss of exon splicing in all human tissues, five were from protein-coding genes with known functions whereas one was from a noncoding RNA gene (LOC150568). The five protein-coding genes are involved in a variety of cellular functions such as the control of cytoskeleton formation (KANK1), transcriptional regulation (ZMYND11, also known as BS69), protein phosphorylation (RIIAD1), amino acid transport (SLC7A6), and cell metabolism (CMAH). We examined the splicing levels of these exons in nonhuman primate tissues using RNA-seq data (supplementary table S3, Supplementary Material online). In four events (KANK1, RIIAD1, SLC7A6, and CMAH), the exon was constitutively spliced (i.e., 100% exon inclusion) in nonhuman primate tissues. The exon in ZMYND11 was spliced at low levels in nonhuman primates. The exon in LOC150568 was also an alternative exon with low exon inclusion levels in nonhuman primates whereas the expression of this noncoding RNA appeared to be tissue-restricted. The observation that four of the six exons were constitutive exons in nonhuman primates

Downloaded from http://mbe.oxfordjournals.org/ at University of California, Los Angeles on December 26, 2014

La d C de er r e Ki be dn llu Li ey mve -1 10 M r-1 00 0 us 00 C cle er -1 Ki ebe 00 dn llu Li ey- mve 3 32 2 M r-32 7 7 us 7 c L a le-3 d d 27 C er er e Ki b e dn l l u Li ey- m - 4 ve 48 8 H r-4 7 7 ea 87 H rt-M ea w H rt-A elu ea b b Lu rt-T ey ng on y Fi -Mi br st y ob la st

C

MBE

Exon Loss in Human Transcriptomes . doi:10.1093/molbev/msu317

KANK1 Human cDNA

M

am

m

ar

y

gl

an

dd C er er H eb ea el l Ki rt um dn Li ey ve M r us Pa cle n Pr c r e os as Sp tat l e Te een st Th i s y Ad roid r O ena va l Ad ry ip C os ol e Ly on m W ph hi n o L u te b d e n g lo od

ce

ll

150 bp 100 bp

d

La d Pr d e os r Sp tate le Te en st Th is y Ad roid re O na va l Ad ry i C pos ol e Ly on m p W hn hi o d L u te e n g blo od ce ll C er e H be ea l l u Ki rt m dn Li ey ve M r us Pa cle nc re as

A

La

M

am

m

ar

y

gl

an

d

B Chimp cDNA

dd C er er Ki ebe dn ll Li ey um ve -7 -7 M r-7 59 59 u 5 C scle 9 er Ki ebe 759 dn ll Li ey um ve -7 -7 M r-7 75 75 us 75 C cleer 7 e 7 Ki be 5 dn llu Li ey mve -4 4 M r-4 53 53 u 5 H scl 3 ea er 4 L a t-L8 53 d d 29 H er ea H rt-J ea 2 8 H rt-J 0 ea 1 3 H rt-J 4 ea 5 6 H rt-J 9 ea 1 r 08 Lu t-H n 7 Lu g-H 08 ng 7 Fi -J 08 br 44 C obla 7 er s e t Ki be dn llu Li ey- m-R ve R H r H M -RH 19 198 us 1 84 4 3 Te cle 984 3 st -RH 3 is -R 19 H 84 19 3 84 3

La

C Rhesus cDNA

150 bp 100 bp

150 bp 100 bp

D

10

Human Chimp Gorilla Orangutan Rhesus Callithrix Human Chimp Gorilla Orangutan Rhesus Callithrix

80

20

90

30

100

40

110

50

120

60

130

70

140

FIG. 3. Human-specific exon loss in KANK1 due to human-specific splice site sequence change. (A) RT-PCR validation of the human-specific exon loss in multiple human tissues. Primer positions are illustrated as arrows in the schematic diagram. (B and C) RT-PCR validation of exon splicing in chimpanzee (B) and rhesus macaque (C) tissues. Primer positions are illustrated as arrows in the schematic diagram. The expected RT-PCR bands are illustrated on the right. (D) Alignment of genomic sequences surrounding the KANK1 exon in human and nonhuman primate genomes. The exon regions are boxed and highlighted. The human-specific sequence change that disrupted the 50 -splice site is indicated by the arrow.

indicates that human-specific exon loss could remove exons with high splicing activities, resulting in the use of distinct major mRNA isoforms in human and chimpanzee tissues. We investigated the potential impacts of these exon loss events on gene expression and function. Of the five events in protein-coding genes, two were in coding regions (CMAH and ZMYND11) whereas the other three were in the 50 -UTR. The exon loss event in CMAH inactivated the gene and played an important role in the adaptive evolution of humans (Varki 2001, 2010). The exon in ZMYND11 was located in the coding region and the inclusion of this exon produced a minor mRNA isoform with a premature termination codon subject to the nonsense-mediated decay (NMD) pathway. Therefore, this exon likely represented splicing noise in nonhuman primates, which was removed from the human gene. The other three events (KANK1, RIIAD1, and SLC7A6) were constitutive internal exons in the

50 -UTR. All three events were predicted to remove or shorten upstream open reading frames (uORFs) within the 50 -UTR of human genes (table 1). uORFs have a widespread function in repressing mRNA translation and the strength of such translational repression is positively correlated with the uORF length (Calvo et al. 2009; Chatterjee and Pal 2009; Barbosa et al. 2013). Therefore, the three human-specific exon loss events in the 50 -UTR may increase the translational efficiency of these genes in the human lineage. Of note, our prior study of exon creation events shows that newly created primate-specific exons are preferentially located in the 50 -UTR and regulate translational efficiency through uORFs or other types of cis-regulatory elements of mRNA translation (Shen et al. 2011). Here, our new data indicate that both exon creation and loss events during human evolution have a strong preference toward the 50 -UTR and may impact gene regulation by modulating the rate of mRNA translation. 7

Downloaded from http://mbe.oxfordjournals.org/ at University of California, Los Angeles on December 26, 2014

La d C de er r e Ki be dn llu Li ey mve -1 10 M r-1 00 0 us 00 C cle er -1 Ki ebe 00 dn llu Li ey- mve 3 32 2 M r-32 7 7 us 7 c L a le-3 d d 27 C er er e Ki be dn llu Li ey- m-4 ve 48 8 7 r H -48 7 ea 7 H rt-M ea w r H t-A elu ea b r b Lu t-T ey ng on Fi -M y br is ob ty la st

150 bp 100 bp

MBE

Wang et al. . doi:10.1093/molbev/msu317

A uORF

71 codons

uORF

88 codons

uORF

88 codons

Human

Chimpanzee

B pIRES-Luc2

SV40

RLuc

Human inclusion SV40

* *

RLuc RLuc

FLuc FLuc FLuc 0

C RLuc

Chimp inclusion SV40

RLuc

Chimp skipping SV40 Human inclusion SV40 Human skipping SV40

* *

NS

FLuc

RLuc RLuc

FLuc FLuc FLuc FLuc

site compared to chimp sequence * human-specific uORF stop codon (introduced)

0.5

1.0

2.0

1.5

NS NS

NS NS

0 1.0 0.5 1.5 Relative translational efficiency (HeLa)

FIG. 4. Human-specific exon loss in SLC7A6 increases mRNA translational efficiency by shortening the uORF. (A) Schematic diagram of the gene structures of human, chimpanzee, and rhesus macaque SLC7A6. The coding regions are bolded. The uORFs are shown above the gene structure diagrams. The genomic region deleted in humans is illustrated by dashed box. (B) Dual-luciferase reporter assays to measure the translational efficiency of different SLC7A6 50 -UTR constructs in the HeLa cell line. Because the stop codon of the uORF in the endogenous SLC7A6 gene is located downstream of the canonical translation start site, to test the effect of the uORF on 50 -UTR translational efficiency using luciferase assays, we artificially introduced a stop codon in the 50 -UTR upstream of the Renilla luciferase translation start site, as indicated by the triangle in the schematic diagram. *P  0.05; **P  0.01; ***P  0.001; NS, no significance; paired t-test. Error bar: standard error of the mean (SEM). (C) Results of the dual-luciferase reporter assays when there is no stop codon for the uORF in the 50 -UTR.

Human-Specific Exon Loss in SLC7A6 Increases mRNA Translational Efficiency To test whether human-specific exon loss in the 50 -UTR could affect mRNA translational efficiency, we selected the exon loss event in SLC7A6 for detailed analyses. This event is interesting for several reasons. SLC7A6 is a broadly expressed gene in human and nonhuman primate tissues and encodes an amino acid transporter involved in the cellular uptake of dibasic amino acids and certain neutral amino acids (Broer et al. 2000). The exon is constitutively spliced in all nonhuman primate tissues but completely lost in humans due to genomic deletion. In the SLC7A6 mRNA from nonhuman primates, there was a uORF of 88 codons in length with the stop codon of the uORF located 50 bp downstream of the start codon of the canonical ORF (fig. 4A). The humanspecific exon loss event removed a 51 bp exon, causing an in-frame deletion within the uORF and shortened the uORF from 88 codons to 71 codons. We hypothesized that this human-specific exon loss event increased the translational efficiency of the SLC7A6 mRNA by 8

shortening the uORF. To test this hypothesis, we made different versions of the SLC7A6 50 -UTR and analyzed their translational efficiency using a dual luciferase reporter construct pIRES-Luc2 as described previously (fig. 4B) (Shen et al. 2011). For each 50 -UTR sequence, the resulting reporter construct expressed both the Firefly luciferase and the Renilla luciferase bicistronically. The Renilla luciferase was fused downstream of the cloned 50 -UTR sequence, whereas the Firefly luciferase translation was driven independently by an internal ribosome entry site (IRES) and was not regulated by the cloned 50 -UTR. Because the stop codon of the uORF in the endogenous gene is located downstream of the canonical translation start site, to test the effect of the uORF on translational efficiency using luciferase assays, we artificially introduced a stop codon in the 50 -UTR upstream of the Renilla luciferase translation start site (fig. 4B). The translational efficiency of each 50 -UTR construct was calculated as the ratio between the Renilla luciferase and the Firefly luciferase activities. As there was an independent human-specific nucleotide substitution in the uORF, we also made additional 50 -UTR

Downloaded from http://mbe.oxfordjournals.org/ at University of California, Los Angeles on December 26, 2014

RLuc

Chimp skipping SV40 Human skipping SV40

FLuc

IRES

RLuc

Chimp inclusion SV40

*** ** ***

Orangutan

MBE

Exon Loss in Human Transcriptomes . doi:10.1093/molbev/msu317

Human-Specific Exon Loss in SLC7A6 Resulted from Alu–Alu-Recombination Mediated Genomic Deletion Finally, we investigated the molecular mechanism for the human-specific exon loss of SLC7A6. This exon loss event was due to the deletion of an approximately 2-kb genomic region during human evolution (fig. 2). To elucidate the molecular mechanism for this genomic deletion, we aligned the chimpanzee genomic region around this exon to the orthologous human genomic sequence. We found that in the chimpanzee genome, the human-deleted region was flanked by two Alu elements (AluSq and AluSc) (fig. 5A), whereas in the human genome there was a single Alu element (AluSx) spanning the breakpoint. We then aligned the human Alu element to the two chimpanzee Alu elements located at the two ends of the deleted region. We found that the human Alu was a chimera of the two chimpanzee Alus, with the majority of the human Alu (at its 50 -end) matching the upstream chimpanzee Alu element and a small 30 -end segment of the human Alu matching the downstream chimpanzee Alu element (fig. 5B). The observed sequence pattern of the human Alu is consistent with the model of genomic deletion mediated by Alu–Alu recombination, a process known to be responsible for genomic deletion in evolution and disease (Sen et al. 2006). Our data suggest that the two Alus in the ancestral sequence recombined during human evolution,

which deleted the approximately 2-kb genomic region and removed the exon from the human gene (fig. 5C).

Discussion We carried out a transcriptome-wide search for exon loss events during human evolution, by comparing RNA-seq based de novo reconstructions of transcript structures across the primate phylogeny. We sought to identify exons spliced into the transcriptomes of chimpanzees and other nonhuman primates but absent in human RNAs. Using stringent computational and experimental filters, we identified six exons with complete loss of splicing activities in diverse human tissues, as well as additional exons with significant reduction in splicing levels during recent human evolution. Of these six experimentally confirmed exon loss events, three were due to human-specific deletion of genomic sequences, whereas others resulted from small-scale sequence changes that inactivated splicing signals. We identified a well-known human-specific exon loss event in the CMAH gene (Chou et al. 1998, 2002; Hayakawa et al. 2001), indicating that our computational strategy can effectively discover lineage-specific exon loss events from RNA-seq data. Compared with exon creation which is widespread during human evolution (Zhang and Chasin 2006), human-specific exon loss events are markedly rare. Only one (in the CMAH gene) was well-documented in the literature (Chou et al. 1998, 2002; Hayakawa et al. 2001), and five additional events were identified from this study using extensive RNAseq data. Of note, four out of the six human-specific exon loss events involve exons constitutively (100%) spliced in nonhuman primate tissues. This observation is consistent with a previous study of exon creation and loss events over much longer timescales of vertebrate evolution using ESTs, which reported that exon loss events were not biased toward lowly included exons (Alekseyenko et al. 2007). Thus, the difference in the frequencies of exon loss versus creation events may be attributed to the evolutionary barriers to these two types of splicing changes. It is well known that the vast majority of newly created exons in human genes are alternatively spliced at low splicing levels (Modrek and Lee 2003). Because the ancestral transcript isoform is still produced as the major isoform, a newly created exon may not be subject to strong purifying selection and could provide evolutionary intermediates for subsequent adaptive changes (Boue et al. 2003; Xing and Lee 2006). In contrast, the definition of human-specific exon loss events requires an exon to be detected (and conserved) in chimpanzee and other nonhuman primate species but absent only in human RNAs. Such events are more likely to involve exons with high splicing activities and functional importance and consequently undergo strong purifying selection. The deleterious effects of exon loss are evident in human genetics studies, in which mutations that disrupt exon splicing frequently cause human diseases (SterneWeiler et al. 2011). Taken together, these data suggest that during human evolution, almost all mutations that inactivated ancient functional exons were removed by purifying selection, while a small number survived and became fixed 9

Downloaded from http://mbe.oxfordjournals.org/ at University of California, Los Angeles on December 26, 2014

constructs to test the effect of this nucleotide substitution on mRNA translation. As shown in figure 4B, when transfected into the HeLa cells the human version of the 50 -UTR had a significantly higher translational efficiency as compared with the chimpanzee version of the 50 -UTR (“Human skipping” vs. “Chimp inclusion”). Interestingly, the exon loss event and the single nucleotide change in the human 50 -UTR appeared to have synergetic effects and both increased mRNA translational efficiency (fig. 4B). For example, when we inserted the exon back into the human version of the 50 -UTR (“Human inclusion”), the mRNA translational efficiency was greatly reduced as compared with the human 50 -UTR without the exon (“Human skipping”). Similarly, the human-specific single nucleotide substitution within the uORF increased mRNA translational efficiency (“Human skipping” vs. “Chimp skipping”). This nucleotide substitution did not change the amino acid encoded by the codon, but replaced a rare serine codon (AGU) in the chimpanzee sequence with a more common (optimal) serine codon (AGC) in the human sequence. Previous studies show that the replacement of rare codons by common codons in the uORF can increase translation of the downstream protein-coding ORF (Meijer and Thomas 2003; Qiao et al. 2011). We obtained similar results when we repeated the luciferase reporter assays in the JEG-3 cell line (supplementary fig. S6, Supplementary Material online). To confirm that the observed translational effect was mediated by the uORF, we mutated the stop codon of all tested 50 -UTR constructs. After the stop codon was removed, there was no significant difference in translational efficiency among different SLC7A6 50 -UTR constructs (fig. 4C).

MBE

6k 4k 2k 0

Human genome (chr16:68,300,092-68,310,003)

A

8k 10k

Wang et al. . doi:10.1093/molbev/msu317

0

SINE Chimp

3k

4k

5k

6k

7k

8k

9k 10k 11k

AluSq AluSc

160

170

180

190

200

210

230

240

250

260

270

280

290

. . . . . . . . . . . . . . . . . . . . . . . T . . . . . . . . . . . . C . . . . . . . . . . . . . . . G . . . . . T . . . . C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - - . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T G . . . . . . . G . . . . G . A . C . . C . . . . . . . . . . . . . . . . . G . . - - . . . . . . . . . . . . . . . . . . . . . . . . . .

Putative crossover site

C

Ancestral upstream AluSq

Ancestral downstream AluSc Exon Intrachromosomal

Interchromosomal Exon Exon

Exon

Human Alu

FIG. 5. Human-specific exon loss in SLC7A6 resulted from Alu–Alu-recombination mediated genomic deletion. (A) Alignment matrix of human and chimpanzee genomic sequences surrounding the exon loss event. The alignment of one human Alu and two chimpanzee Alus near the crossover site are highlighted in orange boxes. (B) The detailed sequence alignment of one human Alu and two chimpanzee Alus as highlighted in (A). The positions in the consensus Alu sequence are numbered, and only the second half of the Alu is shown. Only nucleotides different from the other two sequences are labeled. Blocks of sequences with high similarity in the alignment are represented by the same color. The ellipse indicates the putative crossover site where the Alu–Alu recombination event likely occurred during human evolution. (C) Schematic diagram illustrating the proposed molecular mechanism for human-specific exon loss in SLC7A6. The directions of arrows in the genome sequence indicate the orientations of Alu elements. Note that the position of the crossover site in this diagram is not drawn to the scale of SLC7A6.

in humans, likely by acquiring adaptive benefits as in the case of CMAH (Varki 2001, 2010). Although complete human-specific loss of exon splicing is rare, we should note that exon loss does not necessarily have to be viewed as a binary event, and a significant but incomplete human-specific reduction in exon splicing levels could still have important functional and regulatory consequences. In this study, of the 33 candidate exon loss events identified by the RNA-seq analysis, 27 had detectable RT-PCR signals in 10

at least some of the tested human tissues. A subset of these exons were analyzed by semiquantitative RT-PCR in human and nonhuman primate tissues. The majority of these exons analyzed had a significant reduction in splicing levels in human tissues as compared with nonhuman primate tissues (supplementary table S4, Supplementary Material online). Moreover, for these 27 events, we reanalyzed the corresponding gene expression levels of 17 events that had normalized RNA-seq RPKM (Reads Per Kilobase per Million mapped

Downloaded from http://mbe.oxfordjournals.org/ at University of California, Los Angeles on December 26, 2014

150

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - . . . A T G . . . . . . . . . . . . . . . . . . . . . A T G . G . . . . . . . . . . . . . . . . T . T . . . . . . T T G . . . . . . A . . . .

220 Chimp upstream Alu Human Alu Chimp downstream Alu

2k

Exon lost in human

B Chimp upstream Alu Human Alu Chimp downstream Alu

1k

Chimp genome (chr16:68,040,774-68,052,371)

MBE

Exon Loss in Human Transcriptomes . doi:10.1093/molbev/msu317

2011). Together, these data suggest that the mRNA 50 -UTR is a hotspot of evolutionary changes in gene splicing patterns, and such changes may have played a regulatory role during human evolution by fine-tuning mRNA translational efficiency and protein production in a lineage-specific manner. We performed detailed analyses of a human-specific exon loss event in SLC7A6. SLC7A6 encodes an amino acid transporter involved in the uptake of arginine, leucine, and glutamine and the release of arginine in exchange for glutamine (Broer et al. 2000). It plays a role in the trafficking of amino acids between brain cells (Broer et al. 2000). It also regulates nitric oxide (NO) synthesis through transport of arginine which is a substrate for the production of NO (ArancibiaGaravilla et al. 2003). The exon of interest is present in the 50 UTR of SLC7A6 in nonhuman primates but absent from the human mRNA due to Alu-mediated deletion of the corresponding genomic region. This exon loss event shortens the uORF in the 50 -UTR. Because of the lack of a suitable antibody, we were unable to directly compare protein abundance of SLC7A6 in human and nonhuman primate tissues. Nonetheless, luciferase reporter assays in multiple cell lines consistently suggest that this human-specific exon loss event increases the translational efficiency of the human mRNA via its effect on the uORF. Intriguingly, besides this human-specific exon loss event, there is also a human-specific single nucleotide substitution in the 50 -UTR of SLC7A6. This U to C single nucleotide change in the human mRNA also increases the translational efficiency according to luciferase assays, by replacing a rare serine codon (AGU) with an optimal serine codon (AGC) which may enhance the translation of the downstream protein-coding ORF (Meijer and Thomas 2003; Qiao et al. 2011). The observation that two independent human-specific sequence changes in the SLC7A6 50 -UTR have synergistic effects on mRNA translation implies that increased translational efficiency of SLC7A6 may confer adaptive benefits and have been selected for during recent human evolution. Future studies should investigate whether these evolutionary changes to the SLC7A6 50 -UTR functionally contributed to the differences in metabolic activities of human and chimpanzee cells.

Conclusions Our study represents a systematic, unbiased search for species-specific exon loss events in human transcriptomes. Using stringent computational and experimental filters, we identified six exons with complete loss of splicing activities in human transcriptomes after the divergence of humans and chimpanzees. Four of these six exons are constitutively spliced in nonhuman primate tissues, and three of them are located in the mRNA 50 -UTR and may affect translational efficiency. The overall low incidence of human-specific exon loss events may reflect the evolutionary barriers toward removing ancient exons with high splicing activities from functional genes. Nonetheless, a small number of exon loss events may have acquired adaptive benefits and become fixed in humans. By affecting mRNA translation or degradation, these events may have contributed to the regulatory evolution of the human transcriptome and proteome. 11

Downloaded from http://mbe.oxfordjournals.org/ at University of California, Los Angeles on December 26, 2014

reads) values provided in the original publication (Brawand et al. 2011). Most of these 17 genes were broadly and highly expressed. In human, chimpanzee, and rhesus macaque tissues, all 17 genes were expressed (RPKM 41) in at least one tissue, and more than half of the 17 genes (ten in human, nine in chimpanzee and rhesus macaque) were expressed in all six tissues from the original study. Furthermore, the majority of them (13 in human and chimpanzee, 12 in rhesus macaque) were highly expressed (RPKM 410) in at least one tissue. These results suggest that significant but incomplete reduction in exon splicing levels may be more widespread and target abundantly expressed and functionally important genes during human evolution. It is possible that some of these events may reflect an intermediate step to complete exon loss, analogous to newly created exons being spliced at low levels. Finally, we also investigated the potential molecular mechanisms of the generally reduced splicing levels of these exons in human tissues. For 7 of these 27 cases, the splice site scores of the human exons were significantly lower than those of the orthologous exons in nonhuman primates, as reflected by at least 4-fold reduction in the likelihood odd ratios of matching to the consensus MAXENT splice site model (Yeo and Burge 2004). This suggests that a minor subset of these 27 cases may be attributed to human-specific nucleotide changes that weakened the splice sites of the human exons. For other cases, the molecular mechanisms for their generally reduced splicing levels in human tissues are unclear, which is understandable given the complexity of splicing regulation in mammalian cells. Our study raises the question about the functional and regulatory consequences of human-specific exon loss events. Changes in RNA splicing patterns can affect gene products and functions at both protein and RNA levels, either by producing protein isoforms with divergent structural and functional characteristics (Black 2000) or by generating mRNA isoforms with distinct regulatory properties such as mRNA decay or translational efficiency (Lewis et al. 2003; SterneWeiler et al. 2013). For all six human-specific exon loss events validated in this study, the effects appear to be regulatory rather than protein-coding. For example, in CMAH, the exon loss event disrupted the ORF of the mRNA, turning a functional gene into a pseudogene. In ZMYND11, the humanspecific exon loss event removed a minor mRNA isoform subject to mRNA nonsense-mediated decay. Of note, in three of the six events (KANK1, RIIAD1, and SLC7A6), the exon lost during human evolution was located in the mRNA 50 -UTR and the exon loss event removed or shortened uORFs in the human mRNA (table 1). These events are expected to increase the translational efficiency of the human mRNA. In SLC7A6, using luciferase reporter assays, we experimentally confirmed the effect of the exon loss event on mRNA translational efficiency (fig. 4). Although the total number of exon loss events is small, the high frequency of events occurring in the 50 -UTR (three out of six) is reminiscent of the observations made about species-specific exon creation events, which are also enriched in the mRNA 50 UTR (Zhang and Chasin 2006; Lin et al. 2008; Shen et al. 2011) and may modulate translational efficiency (Shen et al.

MBE

Wang et al. . doi:10.1093/molbev/msu317

Identification of Candidate Human-Specific Exon Loss Events from RNA-seq Data

Human and Primate Tissue Samples

Data Source We analyzed published RNA-seq data of six tissues (brain, cerebellum, heart, kidney, liver, testis) of human, chimpanzee, orangutan, and rhesus macaque (Brawand et al. 2011), with a total of approximately 1.6 billion 76 bp single-end or 101 bp paired-end Illumina RNA-seq reads (supplementary table S5, Supplementary Material online). We also analyzed additional human RNA-seq data covering 19 tissues, including 16 tissues in the Human Body Map 2.0 data set (http://www.ensembl. info/blog/2011/05/24/human-bodymap-2-0-data-from-illumina/, last accessed November 28, 2014) and 3 tissues from different anatomical compartments of the human placenta (Kim et al. 2012; supplementary table S5, Supplementary Material online).

RNA-seq Read Mapping and De Novo Transcript Reconstruction

The basic idea of discovering human-specific exon loss events is to identify chimpanzee exons that are absent in the human transcriptomes but present in the transcriptomes of outgroup species. To test the presence or absence of a particular chimpanzee exon in other species, we matched the orthologous exon regions via the UCSC pairwise alignments among these genomes (hg19, panTro2, ponAbe2, and rheMac2) using the python package Pygr (http://code.google.com/p/ pygr/, last accessed November 28, 2014). To identify candidate human-specific exon loss events, for each internal (spliced) exon in chimpanzee transcripts as identified by Cufflinks, we first required that the exon cannot overlap with any human exon in either known human transcripts (UCSC [downloaded in September 2011], Ensembl [release 61], RefSeq [release 48], and Vega [release 44]) or human transcripts reconstructed de novo by Cufflinks using RNA-seq data (supplementary table S5, Supplementary Material online). To distinguish exon loss events from gene loss events, we required that the flanking exons in chimpanzee transcripts must have matching exons in the orthologous human genes. Here, we defined a pair of chimpanzee and human exons as matching if they had identical exon start or end positions according to the UCSC pairwise alignment and the strands were consistent. To distinguish exon loss events in humans versus exon creation events in chimpanzees, we analyzed transcript annotations of outgroup species. Specifically, we required that for 12

RNA samples of 18 human (Homo sapiens) tissues were purchased from Clontech (Mountain View, CA). Fifteen postmortem tissue samples from adult chimpanzees (Pan troglodytes) and 25 postmortem tissue samples from adult rhesus macaques (Macaca mulatta) were generously provided by the Southwest National Primate Research Center (San Antonio, TX). Chimpanzee and rhesus monkey fibroblast cell line samples were acquired from the Coriell Institute for Medical Research (Camden, NJ). The genomic DNA samples of human, chimpanzee, and rhesus macaque were from HapMap lymphoblastoid cell lines or primate fibroblast cell lines purchased from the Coriell Institute for Medical Research (Camden, NJ). Details of these RNA and DNA samples are provided in supplementary table S1, Supplementary Material online.

RT-PCR and Sequencing Total RNA was extracted using TRIzol (Invitrogen, Carlsbad, CA). High-capacity cDNA Reverse Transcription Kit (Applied Biosystems, Foster city, CA) was used for cDNA synthesis. For each exon or genomic region analyzed, regular PCR was carried out for 29–40 cycles. Final PCR products were resolved on 5% Tris-Borate-EDTA (TBE) polyacrylamide gels. For DNA fragments with ambiguous sizes, we conducted Sanger sequencing (University of Iowa DNA facility, Iowa City, IA). PCR primer sequences are shown in supplementary table S6, Supplementary Material online.

Downloaded from http://mbe.oxfordjournals.org/ at University of California, Los Angeles on December 26, 2014

We mapped RNA-seq reads of human, chimpanzee, orangutan, and rhesus macaque tissues to corresponding genomes (hg19, panTro2, ponAbe2, and rheMac2) using TopHat (version 1.4.1) (Trapnell et al. 2009) with default parameters. The uniquely mapped reads were then used for de novo transcript reconstruction using Cufflinks (version 1.2.0; Trapnell et al. 2010) with default parameters.

any candidate human-specific exon loss event, the chimpanzee exon must be present in at least one Cufflinks transcript of orangutan or rhesus macaque, or known mouse transcripts from UCSC (downloaded in December 2011), Ensembl (release 61), RefSeq (release 50), and Vega (release 45). We applied three additional filtering criteria to remove potential false positive findings of human-specific exon loss events. First, we examined the alignment of human RefSeq RNA sequences to the chimpanzee genome, available at the UCSC Genome Browser database via the “Non-Chimp RefSeq Genes” track (downloaded in December 2011). All chimpanzee exons aligned to human RefSeq RNA sequences were removed from candidate human-specific exon loss events. Second, to confirm the validity of the UCSC genome alignment, we used another alignment tool Blat (Kent 2002) to map chimpanzee transcripts to the genomes of other species, and removed candidate exon loss events from consideration after manual inspection of the alignment result if the UCSC genome alignment was inconsistent with the Blat alignment. Third, to address potential failures of Cufflinks in exon detection, we examined the TopHat mapping of human RNA-seq reads. We removed candidate events if TopHat suggested the existence of splice sites in the human genomic region orthologous to the chimpanzee exon.

Materials and Methods

Exon Loss in Human Transcriptomes . doi:10.1093/molbev/msu317

SLC7A6 50 -UTR Constructs and Luciferase Reporter Assay

Supplementary Material Supplementary figures S1–S6 and tables S1–S6 are available at Molecular Biology and Evolution online (http://www.mbe. oxfordjournals.org/).

Acknowledgments The authors thank Lan Lin for discussions of experimental design. This work was supported by the National Institutes of Health (R01GM088342 to Y.X.). This work used biological materials obtained from the Southwest National Primate Research Center, which was supported by the National Institutes of Health grant P51RR013986.

References Alekseyenko AV, Kim N, Lee CJ. 2007. Global analysis of exon creation versus loss and the role of alternative splicing in 17 vertebrate genomes. RNA 13: 661–670. Arancibia-Garavilla Y, Toledo F, Casanello P, Sobrevia L. 2003. Nitric oxide synthesis requires activity of the cationic and neutral amino acid transport system y+L in human umbilical vein endothelium. Exp Physiol. 88:699–710. Barbosa C, Peixeiro I, Romao L. 2013. Gene expression regulation by upstream open reading frames and human disease. PLoS Genet. 9: e1003529. Barbosa-Morais NL, Irimia M, Pan Q, Xiong HY, Gueroussov S, Lee LJ, Slobodeniuc V, Kutter C, Watt S, Colak R, et al. 2012. The evolutionary landscape of alternative splicing in vertebrate species. Science 338:1587–1593. Black DL. 2000. Protein diversity from alternative splicing: a challenge for bioinformatics and post-genome biology. Cell 103:367–370. Blekhman R, Marioni JC, Zumbo P, Stephens M, Gilad Y. 2010. Sexspecific and lineage-specific alternative splicing in primates. Genome Res. 20:180–189. Boue S, Letunic I, Bork P. 2003. Alternative splicing and evolution. Bioessays 25:1031–1034. Brawand D, Soumillon M, Necsulea A, Julien P, Csardi G, Harrigan P, Weier M, Liechti A, Aximu-Petri A, Kircher M, et al. 2011. The evolution of gene expression levels in mammalian organs. Nature 478: 343–348.

Broer A, Wagner CA, Lang F, Broer S. 2000. The heterodimeric amino acid transporter 4F2hc/y+LAT2 mediates arginine efflux in exchange with glutamine. Biochem J. 349(Pt 3):787–795. Calvo SE, Pagliarini DJ, Mootha VK. 2009. Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans. Proc Natl Acad Sci U S A. 106:7507–7512. Chatterjee S, Pal JK. 2009. Role of 5’- and 3’-untranslated regions of mRNAs in human diseases. Biol Cell. 101:251–262. Chimpanzee Sequencing and Analysis Consortium. 2005. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437:69–87. Chou HH, Hayakawa T, Diaz S, Krings M, Indriati E, Leakey M, Paabo S, Satta Y, Takahata N, Varki A. 2002. Inactivation of CMP-N-acetylneuraminic acid hydroxylase occurred prior to brain expansion during human evolution. Proc Natl Acad Sci U S A. 99:11736–11741. Chou HH, Takematsu H, Diaz S, Iber J, Nickerson E, Wright KL, Muchmore EA, Nelson DL, Warren ST, Varki A. 1998. A mutation in human CMP-sialic acid hydroxylase occurred after the Homo-Pan divergence. Proc Natl Acad Sci U S A. 95:11751–11756. Corvelo A, Eyras E. 2008. Exon creation and establishment in human genes. Genome Biol. 9:R141. Enard W, Przeworski M, Fisher SE, Lai CS, Wiebe V, Kitano T, Monaco AP, Paabo S. 2002. Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418:869–872. Feschotte C. 2008. Transposable elements and the evolution of regulatory networks. Nat Rev Genet. 9:397–405. Florea L, Di Francesco V, Miller J, Turner R, Yao A, Harris M, Walenz B, Mobarry C, Merkulov GV, Charlab R, et al. 2005. Gene and alternative splicing annotation with AIR. Genome Res. 15:54–66. Gerber A, O’Connell MA, Keller W. 1997. Two forms of human doublestranded RNA-specific editase 1 (hRED1) generated by the insertion of an Alu cassette. RNA 3:453–463. Guttman M, Garber M, Levin JZ, Donaghey J, Robinson J, Adiconis X, Fan L, Koziol MJ, Gnirke A, Nusbaum C, et al. 2010. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol. 28: 503–510. Hahn Y, Jeong S, Lee B. 2007. Inactivation of MOXD2 and S100A15A by exon deletion during human evolution. Mol Biol Evol. 24:2203–2212. Hayakawa T, Satta Y, Gagneux P, Varki A, Takahata N. 2001. Alu-mediated inactivation of the human CMP- N-acetylneuraminic acid hydroxylase gene. Proc Natl Acad Sci U S A. 98:11399–11404. Haygood R, Fedrigo O, Hanson B, Yokoyama KD, Wray GA. 2007. Promoter regions of many neural- and nutrition-related genes have experienced positive selection during human evolution. Nat Genet. 39:1140–1144. Irie A, Koyama S, Kozutsumi Y, Kawasaki T, Suzuki A. 1998. The molecular basis for the absence of N-glycolylneuraminic acid in humans. J Biol Chem. 273:15866–15871. Kent WJ. 2002. BLAT—the BLAST-like alignment tool. Genome Res. 12: 656–664. Kim J, Zhao K, Jiang P, Lu ZX, Wang J, Murray JC, Xing Y. 2012. Transcriptome landscape of the human placenta. BMC Genomics 13:115. Knowles DG, McLysaght A. 2009. Recent de novo origin of human protein-coding genes. Genome Res. 19:1752–1759. Kondrashov FA, Koonin EV. 2001. Origin of alternative splicing by tandem exon duplication. Hum Mol Genet. 10:2661–2669. Letunic I, Copley RR, Bork P. 2002. Common exon duplication in animals and its role in alternative splicing. Hum Mol Genet. 11:1561–1567. Lewis BP, Green RE, Brenner SE. 2003. Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans. Proc Natl Acad Sci U S A. 100:189–192. Lin L, Jiang P, Shen S, Sato S, Davidson BL, Xing Y. 2009. Large-scale analysis of exonized mammalian-wide interspersed repeats in primate genomes. Hum Mol Genet. 18:2204–2214. Lin L, Shen S, Tye A, Cai JJ, Jiang P, Davidson BL, Xing Y. 2008. Diverse splicing patterns of exonized Alu elements in human tissues. PLoS Genet. 4:e1000225.

13

Downloaded from http://mbe.oxfordjournals.org/ at University of California, Los Angeles on December 26, 2014

The intact chimpanzee 50 -UTR (i.e., the 50 -UTR including the exon of interest) of SLC7A6 was obtained from IDT Gene Synthesis Service (Integrated DNA Technologies, Inc, Coralville, IA). The pIRES-Luc2 vector described in our previous study (Shen et al. 2011) was used to measure the translational efficiency of SLC7A6 50 -UTR. Site-directed mutagenesis was carried out to make different versions of the SLC7A6 50 -UTR, and all mutant constructs were confirmed by sequencing (University of Iowa DNA facility, Iowa City, IA). HeLa and JEG-3 cells were grown in either Gibco Dulbecco’s Modified Eagle’s Medium (DMEM) (Invitrogen, Carlsbad, CA) or Minimum Essential Media (MEM) supplemented with 10% Fetal Bovine Serum (FBS). Lipofectamine 2000 (Invitrogen) was used in transient transfection. Dualcolor luciferase assay was carried out approximately 24 h after transfection following the manufacturer’s protocol (Promega, Madison, WI). Final results were confirmed by at least three independent experiments.

MBE

Wang et al. . doi:10.1093/molbev/msu317

14

Shen S, Lin L, Cai JJ, Jiang P, Kenkel EJ, Stroik MR, Sato S, Davidson BL, Xing Y. 2011. Widespread establishment and regulatory impact of Alu exons in human genes. Proc Natl Acad Sci U S A. 108: 2837–2842. Sorek R. 2007. The birth of new exons: mechanisms and evolutionary consequences. RNA 13:1603–1608. Sterne-Weiler T, Howard J, Mort M, Cooper DN, Sanford JR. 2011. Loss of exon identity is a common mechanism of human inherited disease. Genome Res. 21:1563–1571. Sterne-Weiler T, Martinez-Nunez RT, Howard JM, Cvitovik I, Katzman S, Tariq MA, Pourmand N, Sanford JR. 2013. Frac-seq reveals isoformspecific recruitment to polyribosomes. Genome Res. 23:1615–1623. Trapnell C, Pachter L, Salzberg SL. 2009. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105–1111. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. 2010. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 28:511–515. Varki A. 2001. Loss of N-glycolylneuraminic acid in humans: mechanisms, consequences, and implications for hominid evolution. Am J Phys Anthropol. 116(Suppl. 33), 54–69. Varki A. 2010. Colloquium paper: uniquely human evolution of sialic acid genetics and biology. Proc Natl Acad Sci U S A. 107(Suppl. 2), 8939–8946. Wang HY, Chien HC, Osada N, Hashimoto K, Sugano S, Gojobori T, Chou CK, Tsai SF, Wu CI, Shen CK. 2007. Rate of evolution in brainexpressed genes in humans and other primates. PLoS Biol. 5:e13. Wang W, Zheng H, Yang S, Yu H, Li J, Jiang H, Su J, Yang L, Zhang J, McDermott J, et al. 2005. Origin and evolution of new exons in rodents. Genome Res. 15:1258–1264. Wang X, Grus WE, Zhang J. 2006. Gene losses during human origins. PLoS Biol. 4:e52. Wu DD, Irwin DM, Zhang YP. 2011. De novo origin of human proteincoding genes. PLoS Genet. 7:e1002379. Xing Y, Lee C. 2006. Alternative splicing and RNA selection pressure— evolutionary consequences for eukaryotic genomes. Nat Rev Genet. 7:499–509. Yeo G, Burge CB. 2004. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol. 11: 377–394. Zhang XH, Chasin LA. 2006. Comparison of multiple vertebrate genomes reveals the birth and evolution of human exons. Proc Natl Acad Sci U S A. 103:13427–13432.

Downloaded from http://mbe.oxfordjournals.org/ at University of California, Los Angeles on December 26, 2014

MacDonald JR, Ziman R, Yuen RK, Feuk L, Scherer SW. 2014. The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 42: D986–D992. Maricic T, Gunther V, Georgiev O, Gehre S, Curlin M, Schreiweis C, Naumann R, Burbano HA, Meyer M, Lalueza-Fox C, et al. 2013. A recent evolutionary change affects a regulatory element in the human FOXP2 gene. Mol Biol Evol. 30:844–852. McLean CY, Reno PL, Pollen AA, Bassan AI, Capellini TD, Guenther C, Indjeian VB, Lim X, Menke DB, Schaar BT, et al. 2011. Humanspecific loss of regulatory DNA and the evolution of human-specific traits. Nature 471:216–219. Meijer HA, Thomas AA. 2003. Ribosomes stalling on uORF1 in the Xenopus Cx41 5’ UTR inhibit downstream translation initiation. Nucleic Acids Res. 31:3174–3184. Merkin J, Russell C, Chen P, Burge CB. 2012. Evolutionary dynamics of gene and isoform regulation in mammalian tissues. Science 338: 1593–1599. Modrek B, Lee CJ. 2003. Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss. Nat Genet. 34:177–180. Muotri AR, Marchetto MC, Coufal NG, Gage FH. 2007. The necessary junk: new functions for transposable elements. Hum Mol Genet. 16(Spec No. 2):R159–R167. Nilsen TW, Graveley BR. 2010. Expansion of the eukaryotic proteome by alternative splicing. Nature 463:457–463. Perry GH, Melsted P, Marioni JC, Wang Y, Bainer R, Pickrell JK, Michelini K, Zehr S, Yoder AD, Stephens M, et al. 2012. Comparative RNA sequencing reveals substantial genetic variation in endangered primates. Genome Res. 22:602–610. Qiao H, Lu N, Du E, Yao L, Xiao H, Lu S, Qi Y. 2011. Rare codons in uORFs of baculovirus p13 gene modulates downstream gene expression. Virus Res. 155:249–253. Schauer R. 2009. Sialic acids as regulators of molecular and cellular interactions. Curr Opin Struct Biol. 19:507–514. Sela N, Mersch B, Gal-Mark N, Lev-Maor G, Hotz-Wagenblatt A, Ast G. 2007. Comparative analysis of transposed element insertion within human and mouse genomes reveals Alu’s unique role in shaping the human transcriptome. Genome Biol. 8:R127. Sen SK, Han K, Wang J, Lee J, Wang H, Callinan PA, Dyer M, Cordaux R, Liang P, Batzer MA. 2006. Human genomic deletions mediated by recombination between Alu elements. Am J Hum Genet. 79: 41–53.

MBE

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.