Transposon Rescues Short Read of Sequencing

The vast majority of DNA sequencers now generate short sequence reads. Although they are very efficient, many biological problems can not be solved by short fragments. A few days ago, Illumina research team found a simple solution. They used a transposon to temporarily combine short fragments, preserving sequence order or proximity.

On genomic DNA, the single base difference of different individuals is called single nucleotide polymorphism(SNP). Neighboring SNPs tend to be inherited to a progeny in the form of a whole, and this associated SNP in a chromosomal specific region is called haplotype.

Haplotype can help people find genetic variations that affect gene function, performing grafting and receptor pairing, understanding structural variations in the cancer genome editing cell, and so on. Proximity is very important for haplotype.

When preparing for sequencing, Tn5 transposase are often used for DNA cleavage and sequences adding. This robust Tn5 transposase can link the sequence fragments until they are chemically removed. The team developed a CPT-seq that preserves the proximity on the basis of this transposase.

In CPT-seq, the researchers first divided the dilution into 96 copies, adding the indexing sequence with the transposon, and then mixed them to redistribute to 96 compartments, which removed the transposase and introduced a new index sequence by amplification, generating more than 9,200 virtual compartments. This dual indexing program takes only three hours and is suitable for small amounts of genomic DNA.

Studies have shown that CPT-seq and its free analysis software are capable of typing more than 95% of new variants. The error rate of this method is very low. Only one or two error types occur every 10Mb. The researchers believe that the various advantages of CPT-seq will make the haplotype become a routine operation.

In another study, the University of Washington used CPT-seq to solve the problem of genome deft assembly. Researchers used CPT-seq and new algorithms in the human, mouse and Drosophila genomes. Studies had shown that CPT-seq can anchor unpatterned contigs on the reference genome and detect erroneous sequences.