Translating sanger-based routine DNA diagnostics into generic massive parallel ion semiconductor sequencing

Share Embed


Descripción

Papers in Press. Published October 1, 2014 as doi:10.1373/clinchem.2014.225250 The latest version is at http://hwmaint.clinchem.org/cgi/doi/10.1373/clinchem.2014.225250 Clinical Chemistry 61:1 000 – 000 (2015)

Molecular Diagnostics and Genetics

Translating Sanger-Based Routine DNA Diagnostics into Generic Massive Parallel Ion Semiconductor Sequencing Adinda Diekstra,1 Ermanno Bosgoed,1 Alwin Rikken,1 Bart van Lier,1 Erik-Jan Kamsteeg,1 Marloes Tychon,1 Ronny C. Derks,1 Ronald A. van Soest,1 Arjen R. Mensenkamp,1 Hans Scheffer,1,2 Kornelia Neveling,1† and Marcel R. Nelen1†*

BACKGROUND: Dideoxy-based chain termination sequencing developed by Sanger is the gold standard sequencing approach and allows clinical diagnostics of disorders with relatively low genetic heterogeneity. Recently, new next generation sequencing (NGS) technologies have found their way into diagnostic laboratories, enabling the sequencing of large targeted gene panels or exomes. The development of benchtop NGS instruments now allows the analysis of single genes or small gene panels, making these platforms increasingly competitive with Sanger sequencing. METHODS:

We developed a generic automated ion semiconductor sequencing work flow that can be used in a clinical setting and can serve as a substitute for Sanger sequencing. Standard amplicon-based enrichment remained identical to PCR for Sanger sequencing. A novel postenrichment pooling strategy was developed, limiting the number of library preparations and reducing sequencing costs up to 70% compared to Sanger sequencing.

RESULTS:

A total of 1224 known pathogenic variants were analyzed, yielding an analytical sensitivity of 99.92% and specificity of 99.99%. In a second experiment, a total of 100 patient-derived DNA samples were analyzed using a blind analysis. The results showed an analytical sensitivity of 99.60% and specificity of 99.98%, comparable to Sanger sequencing.

CONCLUSIONS: Ion semiconductor sequencing can be a first choice mutation scanning technique, independent of the genes analyzed.

© 2014 American Association for Clinical Chemistry

1

Department of Human Genetics and 2 Donders Center for Neurosciences, Radboud university medical center, Nijmegen, the Netherlands. † Kornelia Neveling and Marcel R. Nelen contributed equally to the work, and both should be considered as last authors. * Address correspondence to this author at: Department of Human Genetics, Radboud university medical center, P.O. Box 9101, Nijmegen 6500 HB, the Netherlands. Fax ⫹31-24-36-16658; e-mail: [email protected].

For more than 3 decades, Sanger sequencing has been the method of choice to determine the nucleotide composition of a given DNA molecule (1, 2 ). The limited capacity (up to 96 DNA sequences of typically 400 –500 nucleotides in a single run) and relatively high costs (consumables and labor) of this gold standard method stimulated scientists to develop massive parallel sequencing technologies (3– 6 ). For nearly 10 years, next generation sequencing (NGS)3 has allowed the sequencing of multiple genes, exomes, or genomes in a single experiment (7, 8 ). This capability has provided new possibilities in genetic research, including the fast and easy identification of new disease-associated genes (9 –11 ). NGS has also found its way into diagnostic laboratories (12 ). Exome sequencing and fixed gene panel assays are used for the analysis of highly heterogeneous diseases such as intellectual disability, blindness, or movement disorders (13–16 ). Exome or genome sequencing may become the generic approach for single nucleotide variant (SNV) identification even if analysis of just a few genes is sufficient to solve the clinical question. However, the required adequate capture and high and even coverage of all regions of interest (ROI) are insufficient to match the analytical sensitivity and specificity of a given Sanger sequencing reaction needed for confirmation or exclusion of a clinical diagnosis. Current practice is to first exclude the most likely genes by Sanger sequencing before proceeding to panel sequencing or exome sequencing (16, 17 ). These approaches offer a superior diagnostic yield but cannot be used to exclude a clinical diagnosis. The demand for faster and more affordable NGSbased platforms in diagnostic settings has driven the development of benchtop NGS instruments, such as the GS Junior (Roche 454), the MiSeq (Illumina), and the Personal Genome Machine (PGM) (Life Technol-

Received April 11, 2014; accepted August 11, 2014. Previously published online at DOI: 10.1373/clinchem.2014.225250 Nonstandard abbreviations: NGS, next generation sequencing; SNV, single nucleotide variant; ROI, regions of interest; PGM, Personal Genome Machine; ISS, ion semiconductor sequencing; ISP, Ion sphere particle; SQL-LIMS, structured query language–laboratory information management system.

3

1

Copyright (C) 2014 by The American Association for Clinical Chemistry

ogies) (18 –20 ). These devices can be applied for the analysis of single genes or small gene panels, making such platforms increasingly competitive with Sanger sequencing. In our department, we offer routine genetic testing for over 700 different genes, leading to 600000 Sanger sequencing reactions each year. To process these large numbers of tests, DNA extraction, amplification, and sequencing have been automated. To further improve efficiency and reduce costs, we have developed an alternative NGS-based generic sequencing strategy based on ion semiconductor sequencing (ISS). Novel pooling strategies allow a minimal usage of molecular barcodes, leading to a cost reduction of 70% per sequenced amplicon. Here we show the feasibility of using randomly Sanger sequencing–optimized PCR amplicons in a generic and automated manner in ISS.

AUTOMATED GENERATION OF PCR AMPLICONS

PCRs were performed using conventional Sanger sequencing primers in a fully automated robotic work flow and carried out in 96-well microtiter plates. The PCR reaction mixtures (15 ␮L final volume) consisted of 3.5 ␮L (3.0 pmol/␮L) M13-containing forward and reverse primer (Biolegio BV), 7.5 ␮L Amplitaq Gold 360 master mix (Life Technologies), and 4 ␮L genomic DNA (15 ng/uL). Pipetting was performed with a Microlab STAR Plus robot (Hamilton) using barcoded 96-well microtiter plates. PCR conditions were the following: 1 min at 95 °C and then 30 s at 95 °C, 30 s at 60 °C, and 1 min at 72 °C (35⫻), followed by 7 min at 72 °C (Veriti 96-well Thermal Cycler; Life Technologies). PURIFICATION OF PCR AMPLICONS

Methods INVESTIGATED DISEASES

The Radboud university medical center (Radboudumc) Department of Human Genetics currently offers Sanger sequencing of approximately 700 genes, corresponding to approximately 300 different diseases. For the current study, we randomly investigated different numbers of amplicons that were requested in our routine diagnostic work flow. In total, approximately 25% of all our genes were tested in this study. A complete list of analyzed genes is provided in Table 1 in the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/vol61/ issue1. AUTOMATED DNA ISOLATION

Genomic DNA isolation was performed automatically using a Hamilton Microlab STAR autoload system with an integrated Chemagen MSM I separation module (Hamilton Robotics GmbH, Martinsried, Germany). DNA isolation was performed using the Chemagic DNA blood kit special (PerkinElmer) according to the manufacturer’s instructions. An integrated barcode system was used to track all samples and isolated DNA fractions throughout the process. Following DNA isolation, concentrations of DNA fractions were determined (Hamilton Microlab Starlet robot with integrated Tecan Infinite 200 Pro reader) and normalized to a working concentration of 15 ng/␮L.

Purifications of PCR products for Sanger sequencing were performed on Hamilton STARlet “replicators” using Agencourt Ampure magnetic beads, according to manufacturer instructions. For the Ion Torrent work flow, no additional purification, besides the purifications within the library preparation, was necessary following PCR (see online Supplemental Table 2). Therefore, PCR plates could be transferred immediately onto the ML Starlet post-PCR robot (Hamilton), where the PCR products were automatically pooled. POOLING OF PCR AMPLICONS

During library preparation, molecular barcodes were attached to samples. Barcoding was performed either per gene, per patient, per PCR plate (pool of 96 amplicons), or per PCR robot (combining several PCR plates coming from 1 robot). To determine the best barcoding strategy, we compared sequencing costs for the different pooling possibilities. Calculations were performed on the basis of 1424 amplicons (80 different genes, 63 different samples) amplified on 16 different PCR (96-well) plates (see online Supplemental Table 3). The most efficient option was barcoding per PCR robot. In our department, recurrent amplicons were divided over 3 different pre-PCR robots. On each robot, we implemented a pooling strategy, in which 2 ␮L of each (unique) PCR product from 1e plate was pooled in well H12 (see online Supplemental Fig. 1). Subsequently, the pooled PCR products from well H12 were normalized to 500 ng. All H12 wells from 1 robot were then combined into 1 large pool.

NORMALIZATION OF DNA

Normalized DNA concentrations were accomplished robotically on a Hamilton Star system. The concentration of isolated DNA stock solution was measured with a Tecan Genios reader, the dilution necessary was calculated, and the DNA was diluted with TE (70 mmol/L Tris-HCL, 1 mmol/L EDTA) to 15 ng/␮L. 2

Clinical Chemistry 61:1 (2015)

ROBOTICS USED FOR POOLING

The pre-PCR robots work with template files which contain the information on how to compile PCR plates. These template files are generated using our inhouse patient information database and imported into the laboratory information system [structured query

Moving Away from Sanger Sequencing in Diagnostics

language–laboratory information management system (SQL-LIMS), LABVANTAGE]. A pre-PCR robot processes these working lists, thus generating the amplicons for each test requested. A customized script assures that there are unique amplicons on a single PCR plate and distributes recurrent amplicons to different plates. Each pool of unique amplicons receives a barcode in the downstream library preparation. SHEARING OF PCR AMPLICONS

The PCR products used in this approach were optimized for Sanger sequencing and varied in size between 200 and 900 bp. For sequencing on the PGM, PCR product lengths had to be reduced to a mean size of 200 –300 bp. Therefore, pooled PCR fragments were sheared using a Covaris E210 device (Covaris Inc.) with the following settings: duty cycle, 10%; intensity, 5; cycles/burst, 200; treatment time, 220 s. AUTOMATED LIBRARY PREPARATION

Following shearing, an automated library preparation was performed on a MicroLab Starlet Replicator Robot (Hamilton) by using the Ion Plus fragment library kit in combination with the Ion XpressTM barcode adapters 1–96 kit (both Life Technologies), according to the protocol “Ion XpressTM Plus gDNA Fragment Library Preparation” (21 ). Our procedure differed from that described in the protocol in that no size selection was performed. Each pool of unique amplicons was included in a separate library preparation, using 1 molecular barcode per pool.

software of 3 third-party sequence analysis tools [Ion Reporter (Thermofisher Scientific), NextGene (SoftGenetics), and SeqNext (JSI)] were compared during an initial pilot phase. In our experience, SeqNext performed best in terms of sensitivity and specificity, user friendliness, and cost efficiency, and was already used for ion semiconductor sequencing analysis of the breast cancer genes BRCA14 (breast cancer 1, early onset) and BRCA2 (breast cancer 2, early onset). The analyses (mapping, alignment, visualization, variant detection, and interpretation) were therefore performed using the SeqNext module of the SEQUENCE PILOT software from JSI Medical Systems. Sequencing data in fastq format were automatically sent to the analysis software SeqNext. Within SeqNext, the sequencing reads were mapped to defined ROIs, and variant calling was performed using defined user settings (see online Supplemental Table 4). Analysis parameters in combination with selective procedures were used to ensure high coverage and high sensitivity, thus taking specific sequencing technology– based limitations into account (e.g., bases with low base call quality and homopolymer topics). The software can handle only relatively small data sets. Large data sets like exome data are difficult to analyze. The software produces in silico electropherograms mimicking Sanger sequence traces. VALIDATION OF IDENTIFIED VARIANTS

All amplicons investigated in this study were also analyzed via conventional Sanger sequencing. All identified variants were therefore compared to the existing Sanger sequencing data.

EMULSION PCR, CHIP LOADING, AND SEQUENCING

Emulsion PCRs were performed on an Ion OneTouch system (Ion OT2 instrument, Life Technologies) using the Ion PGM Template OT2 200 kit. Enrichment of template-positive Ion sphere particles (ISPs) was performed on a OneTouch ES system (Life Technologies). The percentage of template-positive ISPs was measured with the use of the Ion Sphere Quality Control kit (Life Technologies) and a Qubit 2.0 Fluorometer (Invitrogen). Subsequently, ISPs coated with template were loaded on Ion 316TM sequencing chips (Life Technologies). The chips were sequenced on the PGM, with the use of the Ion PGM sequencing 200 kit version 2. All steps were done according to manufacturer’s instructions. DATA ANALYSIS

Following sequencing, all data generated by the PGM were automatically transferred to the Ion Torrent server (an integral part of the PGM system), which performed the first base calling and alignment to HG19 using the Torrent Suite software (version 3.4.2; Life Technologies). For subsequent sequence analysis,

Results ION SEMICONDUCTOR SEQUENCING USING SANGER-OPTIMIZED AMPLICONS

To test whether it was possible to use Sanger-optimized amplicons for ISS without the necessity to redesign the PCR primers, we validated the performance of ISS to identify SNVs previously detected by Sanger sequencing. We selected 232 unique amplicons (representing routine diagnostic requests of a random single day) and demonstrated that purification of PCR amplicons was not needed to obtain good shearing results (see Methods and online Supplemental Table 1). Next, the 232 amplicons were combined in a single pool (pool 1). In addition, we generated another pool (pool 2) of 1484 unique randomly selected amplicons representing all

4

BRCA1, breast cancer 1, early onset; BRCA2, breast cancer 2, early onset; Human genes; COL11A2, collagen, type XI, alpha 2; ATP6V1B1, ATPase, H⫹ transporting, lysosomal 56/58kDa, V1 subunit B1; SLC26A4, solute carrier family 26 (anion exchanger), member 4; CNGB3, cyclic nucleotide gated channel beta 3.

Clinical Chemistry 61:1 (2015) 3

Table 1. Statistical measures of ion semiconductor sequencing tests using Sanger optimized amplicons. Statistical measurea

Pool 1

Number of amplicons

Pool 2

Pool 3

232

1484

1224

Number of unique genes

19

77

232

Number of samples

11

65

981

1

1

13

2,923,050

2,831,989

3,371,365

⬃0.3

⬃0.3

Number of used barcodes Reads per run Target region, Mb

⬃0.1

Median coverage per 1490⫻ (1013) 716⫻ (726) 808⫻ (775) amplicon (SD) Maximum coverage per amplicon Amplicons covered ⬍40⫻ a

8883⫻ 2.7%

5371⫻

8236⫻

1.2%

3.1%

Number of amplicons, the total number of sequenced mutation-positive amplicons. Number of unique genes, the number of unique genes sequenced. Number of samples, the number of DNA samples sequenced. Number of used barcodes, the number of barcodes used to separate recurrent amplicons. Reads per run, the total number of generated reads. Target region, the size of the regions sequenced, given in Mb. Median coverage per amplicon, the median coverage of the sequenced targets (with SD). Maximum coverage per amplicon, the maximal median coverage of the best-covered amplicon. Amplicons covered ⬍40, the percentage of amplicons that has a median coverage below 40⫻.

requests for a random routine diagnostic week. Library preparations were performed for both pools using a single barcode per pool. A mean of 2.87 million reads was generated per run, showing a mean read length of 176 bp and a median coverage of 1490⫻ for pool 1 and 716⫻ for pool 2 (Table 1, pool 1 and 2). Insufficient coverage (below 40⫻) was observed for 2.7% (pool 1) and 1.2% (pool 2) of the PCR products (Table 1, pool 1 and 2) (17 ). Data analysis using default settings resulted in the detection of all known variants for both runs. AN INTEGRATED GENERIC AUTOMATED SANGER AND SEMICONDUCTOR SEQUENCING WORK FLOW

We further aimed at integration of the ISS into the already available fully automated diagnostic Sanger sequencing work flow (see online Supplemental Fig. 2). In addition, in the automated ISS process (Fig. 1), we aimed to use as few barcodes as possible to reduce the number of library preparations. Therefore a barcode per pool of unique amplicons instead of barcoding per sample was used. To avoid sample swaps, all amplicons in a particular PCR plate must be unique. In this novel pooling strategy, unique amplicons were combined in one plate and identical amplicons (from different pa4

Clinical Chemistry 61:1 (2015)

tients) were distributed over different plates (see online Supplemental Fig. 1). No major adaptations in the prerobot scripts were required. However, the files guiding the downstream sequencing work flow and containing the combination of DNA sample number and requested amplicons as unique identifiers, the so-called templates, had to be changed to distribute identical amplicons over different PCR plates. After PCR, the plates were transferred to the post-PCR robot (Fig. 1), which scanned the barcode of the PCR plate, thereby verifying whether the plate was planned to proceed toward the Sanger or ISS work flow (Fig. 2). For ISS, amplicon purification was omitted. Instead, all amplicons of a single PCR plate were combined to create a pool. In the case of multiple plates all containing just unique amplicons, pools could be combined in one large pool for subsequent shearing, automated library preparation, and ISS (see Methods and online Supplemental Fig. 1). Thus, an integrated generic automated sequencing work flow was developed for either Sanger or ISS (Fig. 2). VALIDATION OF THE AUTOMATED ISS WORK FLOW

To validate the automated ISS work flow, we selected amplicons from all positive index cases harboring a disease-causing mutation identified in 2012. In total, 1224 fragments amplified with Sanger-optimized PCR primers (Table 1, pool 3) were distributed over 13 PCR plates. After PCR, all amplicons from each plate were pooled. Subsequently, these 13 libraries were sequenced together on one 316 chip. In total, 3.4 million reads were generated, with a median coverage per amplicon of 808⫻ (Table 1, pool 3). Insufficient coverage (below 40⫻) was obtained for 3.1% of the amplicons (17 ). Adjustment of default settings for variant calling was needed to optimize analytical sensitivity and specificity. Optimization included a minimal coverage of 40⫻ for an ROI to be analyzed and a minimal percentage of 15% called variants in either direction (see online Supplemental Table 4). This resulted in 1 falsenegative and 43 false-positive calls (Table 2). The falsepositive variants occurred only in homopolymer stretches and presented exclusively as duplications or deletions (see online Supplemental Table 5). The falsenegative variant concerned a heterozygous base substitution in the collagen, type XI, alpha 2 (COL11A2) gene (c.3364C⬎T) seen in 50% of the forward reads but absent in the reverse reads. Due to the adapted settings, requiring a minimum of 15% variant reads per direction, this variant was not reported (Table 3). DETERMINATION OF ANALYTICAL SENSITIVITY AND SPECIFICITY

The automated ISS approach had an analytical sensitivity of 99.92% and specificity of 99.99% compared to Sanger sequencing (Table 2), as calculated in accor-

Moving Away from Sanger Sequencing in Diagnostics

Fig. 1. Schematic presentation of the automated ion semiconductor sequencing work flow. All requested sequencing tests are directed via our LIMS system. For the ISS work flow, unique amplicons are selected and distributed on 96-well plates, keeping the H12 well empty. Templates containing the information of which amplicon needs to be sequenced for a particular DNA sample are automatically sent to the pre-PCR robot (prerobot). Recurrent amplicons are distributed again in the next round. The PCR setup occurs in the prerobot, and the PCR itself in a 96-well Thermal Cycler. The finished PCR plate is stored in the Cytomat, from which the post-PCR robot (postrobot) can automatically retrieve it. The postrobot combines all amplicons from 1 plate in well H12, performs a quantification, and normalizes the pool to 500 ng/120 ␮L. The pool is then transferred to Covaris tubes for shearing, and subsequent library preparation is performed on a replicator robot according to the manufacturers’ instruction.

dance with the guidelines of the American College of Medical Genetics and Genomics (22 ). Two additional variants in the ISS data were identified (Table 3). BLIND SEQUENCING TO MIMIC A REALISTIC ROUTINE DIAGNOSTIC SCENARIO

To mimic a realistic routine diagnostic scenario, a blind experiment was performed. One hundred samples (representing requests of 3 different routine diagnostic days) were sequenced in parallel using automated Sanger and ISS, and results were compared. For ISS, amplicons were automatically divided into 2 pools per day, to set apart recurrent amplicons, and sequencing was performed as described above. A mean of 2.89 million reads was generated per run, with a median coverage of 2034⫻ per amplicon (Table 4; also see online Supplemental Fig. 3). A mean of 3.6% of the amplicons was insufficiently covered (below 40⫻). Data analysis revealed 58 false-positive variants and

1 false-negative variant compared to Sanger sequencing (Table 4). False-positive calling occurred again in homopolymer stretches. The false-negative variant c.1155dupC in ATPase, H⫹ transporting, lysosomal 56/58kDa, V1 subunit B1 (ATP6V1B1) was seen in 77% of reverse reads and in only 5.5% of forward reads (Table 3). Because of the setting of a required minimum of 15% variant reads per direction, this variant was not reported. The mean analytical sensitivity was 99.60% with a specificity of 99.98% (Table 4). Discussion We have developed an automated generic sequencing work flow that allows both Sanger and ISS in routine DNA diagnostics. ISS can be applied using regular Sanger-optimized PCR primers, independent of the respective amplicon size. A novel pooling strategy automatically distinguishes recurrent amplicons and rediClinical Chemistry 61:1 (2015) 5

Table 2. Statistical measures on the positive index cases of 2012 (pool 3). Statistical measurea

Number of amplicons Number of sequenced nucleotides

1231

FPs

43

TNs

rects those to different PCR plates. This allows the combining of all unique amplicons from one or more plates into a single pool that can be sequenced in a single ISS run. PERFORMANCE OF ISS

In the described setup, ISS in combination with the appropriate data analysis exhibited a mean analytical sensi6

Clinical Chemistry 61:1 (2015)

1 286,718

TP rate (sensitivity)

99.92%

TN rate (specificity)

99.99%

FP rate

Fig. 2. Graphical work flow describing automated semiconductor sequencing. All PCRs are performed as routinely done in DNA diagnostics. Generated amplicons are sent either to semiconductor sequencing or to Sanger sequencing. For semiconductor sequencing, unique amplicons are pooled, sheared, and processed in a single library preparation using a single barcode. Recurrent amplicons (the same amplicon for different patients) are divided over different library preparations and barcodes [barcode A (BCA), barcode B (BCB), and barcode C (BCC)]. By this, all amplicons of a given day are combined and sequenced on one 316 chip. For Sanger sequencing, the generated amplicons are subsequently purified and normalized. Single sequencing reactions are performed on ABI 3730XL sequencers.

287,950

TPs FNs

a

1224

0.015%

Accuracy

99.98%

Precision

96.62%

Number of amplicons, the number of sequenced mutation-positive amplicons. Number of sequenced nucleotides, the total number of sequenced nucleotides including the total number of true positive (TP), false positive (FP), false negative (FN), and true negative (TN) variants (compared to Sanger sequencing). Sensitivity and specificity has been calculated in accordance to the guidelines of the American College of Medical Genetics and Genomics [Rehmet al. (22 )]. TP rate ⫽ TP/(TP ⫹ FN) ⫽ 1231/(1231 ⫹ 1), TN rate ⫽ TN/(FP ⫹ TN) ⫽ 286,718/(43 ⫹ 286,718), FP rate ⫽ FP/(FP ⫹ TN) ⫽ 43/(43 ⫹ 286,718), accuracy ⫽ (TP ⫹ TN)/(TP ⫹ TN ⫹ FP ⫹ FN) ⫽ (1231 ⫹ 286,718)/(1231 ⫹ 286,718 ⫹ 43 ⫹ 1), precision ⫽ TP/(TP ⫹ FP) ⫽ 1231/(1231 ⫹ 43).

tivity of 99.61% and specificity of 99.98%. Two diseasecausing variants have been missed [c.3364C⬎T (COL11A2) and c.1155dupC (ATP6V1B1)]. Both mutations were clearly present in the raw data (see online Supplemental Fig. 4), demonstrating that ISS performance is equal to that of capillary Sanger sequencing. The challenge is rather to improve the software algorithms so that these will call all true variants. In both cases the missed variants were convincingly visible in one sequencing direction. A solution is given by a new software setting present in SeqNext version 4.2.0 that can be applied per individual ROI. This setting is able to force a combined analysis of all reads if a variant is present in ⬎45% of reads derived from only one direction. To avoid additional false-positive variants, this setting will be applied only for those amplicons in which a causative mutation is known to have been missed by ISS. A more general solution would be to avoid discrepancies in variant read counts, rather than adapting algorithms to be able to deal with them, e.g., by using a sequencing enzyme with better proofreading characteristics. This seems feasible, because both missed variants are located at either the first or last position of a C-stretch, depending on the sequencing orientation (see online Supplemental Fig. 4). The InDel variations tested in this diagnostic cohort indicated

Moving Away from Sanger Sequencing in Diagnostics

Table 3. False-negative variants.a Approach

Gene

Exon

Variant

Reason for being not called

Positive index cases of 2012 (pool 3) PGM

COL11A2

Exon 47

c.3364C⬎T

Unbalanced ratio (50% forward, 0% reverse)

Sanger

SLC26A4 b

Exon 10

c.1229C⬎T

Located under frame shift

Sanger

CNGB3

Exon 11

c.892A⬎C

Located under frame shift

ATP6V1B1

Exon 12

c.1155dupC

Unbalanced ratio (5.5% forward, 77% reverse)

Three blind tests PGM a

False-negative variants observed in PGM data and Sanger sequencing data of all positive index cases of 2012. Approach, sequencing approach in which a variant was missed. Gene, name of the gene in which a variant was missed. Exon, number of the exon in which a variant was missed. Variant, annotation of the variant that was missed. Reason for being not called, explanation why a variant was not detected. b SLC26A4, solute carrier family 26 (anion exchanger), member 4; CNGB3, cyclic nucleotide gated channel beta 3.

no sign of reduced sensitivity even for more difficult insertions, deletions, or duplications. Even larger duplications of 30 nucleotides were accurately detected by the analysis software (see online Supplemental Fig. 5).

Two nonpathogenic variants identified in the ISS data were missed in the Sanger sequencing data (Table 3). These variants remained undetected due to their location opposite to a deletion causing a frame shift in the

Table 4. Statistical measures of the 3 blind tests. Statistical measuresa

Number of amplicons

Blind 3

393

583

387

48

39

32

Number of samples

35

39

26

2

2

2

Reads per run

2,185,451

Target region, Mb

⬃0.1

Median coverage per amplicon (SD) Maximum coverage per amplicon % amplicons covered ⬍40 Number of known variants % Substitutions % Duplications/insertionsb b

% Deletions

Total number of sequenced nucleotides

3,186,680 ⬃0.1

3,297,930 ⬃0.1

1864⫻ (2026)

1800⫻ (1709)

2438⫻ (1874)

14,649⫻

17,060⫻

12,174⫻

3.7%

3.4%

3.6%

51

85

93

94.9%

94.1%

91.4%

1.7%

5.9%

5.4%

3.4% 83,923

0%

3.2%

105,561

78,383

TPs

51

84

93

FPs

22

23

13

FNs

0

1

0

TNs

83,850

105,454

78,277

TP rate (sensitivity)

100%

TN rate (specificity) FP rate

b

Blind 2

Number of unique genes Number of used barcodes

a

Blind 1

99.97% 0.026%

98.82%

100%

99.98%

99.98%

0.022%

0.017%

Accuracy

99.97%

99.98%

99.98%

Precision

69.86%

78.51%

87.74%

See the footnotes for Tables 1 and 2 for explanations of statistical measures, abbreviation definitions, and sensitivity and specificity calculations. Insertions and deletions in these sequencing runs were of limited complexity and did not exceed 5 bases.

Clinical Chemistry 61:1 (2015) 7

other allele. These findings underline the fact that falsenegative variants are not unique to NGS, but they can also occur in the analysis of Sanger sequencing data for various reasons. We suggest confirming any suspicious variant detected by ISS and iterating analysis for any amplicon showing a minimal coverage below 40⫻, both by Sanger sequencing. Because all false-negative variants were present in the raw data but could not be called except with settings that also generated many additional false-positive calls (23 ), we conclude that the false-negative detection rate is largely due to suboptimal data analysis, a problem that might be solved by testing or developing alternative data software tools. Notwithstanding, it might be worthwhile to test alternative NGS-based sequencing machines such as the Illumina MiSeq (19, 20, 24 ) to further improve performance, because the approach we used here is not restricted to ISS. In particular, the use of alternative platforms could reduce the number of false-positive variants that occur mainly in homopolymer stretches, which is an inherent weakness related to the principle of sequencing by synthesis (25 ). POTENTIAL IMPROVEMENTS

Currently, our Sanger PCR amplicons have a mean length of 500 – 600 bp but are sheared to achieve a suitable ISS read length. Because fragments need to be sheared, in principle amplicons that are initially even longer could be applied. This would circumvent the need for overlapping fragments of long coding regions and generally reduce the number of PCR reactions and thus costs. The possibility of generating larger amplicons thereby creates more flexibility in primer design and circumvents the necessity of primers to be located in a coding region. This reduces the risk of missing mutations due to overlapping primer sequences and the risk of allelic dropouts. Further improvement might be achieved through the use of an efficient alternative for enrichment. One option would be to replace the current singleplex amplification by a multiplex enrichment strategy (26 –28 ). This would further enhance efficiency, flexibility, and costeffectiveness of the automated sequencing work flow. COST EFFICIENCY

With the use of the newly developed sequencing work flow, sequencing costs were markedly reduced compared to regular Sanger sequencing. On average, we receive sequencing requests for around 25 rare genes per day, all of which can now easily be processed in a single sequencing run. On a daily basis, only 2 or 3 genes are requested more than once. These requests need to be allocated to different pools. The number of recurrent requests per day in the end will determine the number of library preparations and thus the exact price 8

Clinical Chemistry 61:1 (2015)

per amplicon. Sequencing costs are approximately 1.45 Euros (⬃$1.96 US) per amplicon compared to 4.44 Euros (⬃$6.00 US) per amplicon for Sanger sequencing for exactly the same amplicons, resulting in a cost reduction of 60%–70% for amplicons with an altered sequencing regime (see online Supplemental Table 6). The advantage of our described ISS work flow is highly dependent on the variety of requested tests. Different tests can be combined in one sequencing run, allowing efficient use of molecular barcodes, which determines the cost-effectiveness of this approach. Therefore, this approach is suitable only for different analyses, mostly involving SNV searching. Targeted mutation analyses for carrier testing or (pre)symptomatic testing of a familial mutation are currently not included, because these tests require the analysis of identical amplicons in the same experiment. A different pooling or enrichment strategy needs to be developed to sequence many identical amplicons in a similarly cost-effective manner. TURNAROUND TIME

Clinical turnaround times (from sample reception to final reporting) for the newly developed work flow are more difficult to predict. For Sanger sequencing, analysis of all investigations executed in 2012 showed a mean laboratory processing time (from DNA isolation to sequencing) of 18 working days, with 95% of the investigations being finished after 35 days (29 ). The most time-consuming factor is rework that must be performed when one of the subtests fails. Currently, 40% of all genes tested harbor at least 1 amplicon that has to reenter the cycle more than once, leading to approximately 10% rework for Sanger sequencing (29 ). The newly developed semiconductor sequencing work flow uses the same amplicons. However, the dropout rate here is for a large part dependent on the required minimal coverage of 40⫻. This criterion is not met by a mean 3.5% failure to reach minimal coverage of all amplicons tested (Table 4). The number of rework loops that must be executed determines the total turnaround time. For the breast cancer genes BRCA1 and BRCA2, with the use of ISS we have already achieved a reduction of the average turnaround time from approximately 27 days to approximately 18 days (from sample reception to final reporting). This reduction in turnaround time is mainly due to reduction of rework. Therefore, the introduction of ISS in DNA diagnostics may not only reduce sequencing costs significantly, but may also result in a reduction of turnaround times. Conclusion We have demonstrated the development and validation of a generic automated ISS work flow for routine

Moving Away from Sanger Sequencing in Diagnostics

genetic testing. The automated work flow we describe is based on the idea of using Sanger sequencing– optimized primers for ISS. This basic principle is completely independent of the applied automation, and as such might be applicable in laboratories interested in using a benchtop NGS machine without having to invest in different enrichment designs. Capacity on 316 chips allows 100 Mb of sequencing reads. Our mean daily sequence load of 0.1 Mb results in a far greater sequencing depth than necessary and permits great flexibility in the handling of the daily fluctuations characteristic of clinical diagnostic laboratories. The majority of Sanger sequencing reactions can be transferred to NGS-based sequencing without major adaptations to the sequencing work flow. The analytical sensitivity and specificity for ISS are comparable to those for the gold standard Sanger sequencing.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 re-

quirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures or Potential Conflicts of Interest: Upon manuscript submission, all authors completed the author disclosure form. Disclosures and/or potential conflicts of interest: Employment or Leadership: None declared. Consultant or Advisory Role: None declared. Stock Ownership: None declared. Honoraria: None declared. Research Funding: None declared. Expert Testimony: A.R. Mensenkamp, Life Technologies, 2 meetings. Patents: None declared. Role of Sponsor: No sponsor was declared. Acknowledgments: We are grateful to Rowdy Meijer, Maartje Pennings, Jeroen Schoots, Hicham Ouchene, Michiel Oorsprong, and Gaby van de Ven-Schobers for the blind analysis of the sequencing data. We further thank Michael Kwint and the staff of the sequencing facility for excellent technical assistance and Jelmer Bokhorst and Diederik Passchier for ICT support.

References 1. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A 1977;74:5463–7. 2. Sanger F, Coulson AR. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J Mol Biol 1975;94:441– 8. 3. Shendure J, Mitra RD, Varma C, Church GM. Advanced sequencing technologies: methods and goals. Nat Rev Genet 2004;5:335– 44. 4. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 2005;437:376 – 80. 5. Metzker ML. Emerging technologies in DNA sequencing. Genome Res 2005;15:1767–76. 6. Mardis ER. The impact of next-generation sequencing technology on genetics. Trends Genet 2008;24:133– 41. 7. Schuster SC. Next-generation sequencing transforms today’s biology. Nat Methods 2008;5: 16 – 8. 8. Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 2008;452:872– 6. 9. Ng SB, Nickerson DA, Bamshad MJ, Shendure J. Massively parallel sequencing and rare disease. Hum Mol Genet 2010;19:R119 –24. 10. Bamshad MJ, Ng SB, Bigham AW, Tabor HK, Emond MJ, Nickerson DA, Shendure J. Exome sequencing as a tool for mendelian disease gene discovery. Nat Rev Genet 2011;12:745–55. 11. Metzker ML. Sequencing technologies: the next generation. Nat Rev Genet 2010;11:31– 46. 12. Voelkerding KV, Dames SA, Durtschi JD. Nextgeneration sequencing: from basic research to diagnostics. Clin Chem 2009;55:641–58. 13. de Ligt J, Willemsen MH, van Bon BW, Kleefstra

14.

15.

16.

17.

18.

19.

20.

T, Yntema HG, Kroes T, et al. Diagnostic exome sequencing in persons with severe intellectual disability. N Engl J Med 2012;367:1921–9. Rauch A, Wieczorek D, Graf E, Wieland T, Endele S, Schwarzmayr T, et al. Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study. Lancet 2012;380:1674 – 82. O’Roak BJ, Deriziotis P, Lee C, Vives L, Schwartz JJ, Girirajan S, et al. Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nat Genet 2011;43:585–9. Neveling K, Feenstra I, Gilissen C, Hoefsloot LH, Kamsteeg EJ, Mensenkamp AR, et al. A post-hoc comparison of the utility of Sanger sequencing and exome sequencing for the diagnosis of heterogeneous diseases. Hum Mutat 2013. Weiss MM, Van der Zwaag B, Jongbloed JD, Vogel MJ, Bruggenwirth HT, Lekanne Deprez RH, et al. Best practice guidelines for the use of next-generation sequencing applications in genome diagnostics: a national collaborative study of Dutch genome diagnostic laboratories. Hum Mutat 2013;34:1313–21. Jiang Q, Turner T, Sosa MX, Rakha A, Arnold S, Chakravarti A. Rapid and efficient human mutation detection using a bench-top next-generation DNA sequencer. Hum Mut 2012;33:281–9. Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, Pallen MJ. Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol 2012;30: 434 –9. Li X, Buckton AJ, Wilkinson SL, John S, Walsh R, Novotny T, et al. Towards clinical molecular diagnosis of inherited cardiac conditions: a comparison of bench-top genome DNA sequencers. PloS One 2013;8:e67744.

21. Ion Xpress Plus gDNA Fragment Library preparation. Document number 4471989, revision N. Carlsbad (CA): Life Technologies; 2013. 22. Rehm HL, Bale SJ, Bayrak-Toydemir P, Berg JS, Brown KK, Deignan JL, et al. ACMG clinical laboratory standards for next-generation sequencing. Genet Med 2013;15:733– 47. 23. Yeo ZX, Chan M, Yap YS, Ang P, Rozen S, Lee AS. Improving indel detection specificity of the ion torrent PGM benchtop sequencer. PloS One 2012; 7:e45798. 24. Liu L, Li Y, Li S, Hu N, He Y, Pong R, et al. Comparison of next-generation sequencing systems. J Biomed Biotechnol 2012;2012:251364. 25. Bragg LM, Stone G, Butler MK, Hugenholtz P, Tyson GW. Shining a light on dark sequencing: characterising errors in Ion Torrent PGM data. PLoS Comput Biol 2013;9:e1003031. 26. O’Roak BJ, Vives L, Fu W, Egertson JD, Stanaway IB, Phelps IG, et al. Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders. Science 2012;338:1619 –22. 27. Hiatt JB, Pritchard CC, Salipante SJ, O’Roak BJ, Shendure J. Single molecule molecular inversion probes for targeted, high-accuracy detection of low-frequency variation. Genome Res 2013;23: 843–54. 28. Berglund EC, Lindqvist CM, Hayat S, Overnas E, Henriksson N, Nordlund J, et al. Accurate detection of subclonal single nucleotide variants in whole genome amplified and pooled cancer samples using Haloplex target enrichment. BMC Genomics 2013;14:856. 29. van Heur JJP. Analysis of the flow time of hereditary disease investigations at the human genetics department of UMC St Radboud [Master’s thesis]. Eindhoven, The Netherlands: Eindhoven University of Technology. 2013:88pp.

Clinical Chemistry 61:1 (2015) 9

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.