Home Solutions Life-Sciences Sequence Alignment

Accelerated Sequence Alignment

A major time factor in the common workflow of a molecular biologist is to compare sequences to each other and find the best fitting alignment. High-quality alignment algorithms such as Smith-Waterman alignment have almost been replaced by "average-quality yet fast" heuristic approaches such as BLAST because of their long execution times.

Given SciEngines computers and clusters, it is now possible to revive non-heuristic alignment algorithms and to give biologists back the possibility of high-quality alignment results within short time-frames. But, also for heuristic approaches, the RIVYERA provides a benefit as new levels of analysis can be reached. The common constraint "it's going to take too long to calculate" simply doesn't apply anymore with a FPGA cluster that provides thousands of PC cores performance.

 

Smith-Waterman

SciEngines is excited to provide a highly efficient and scalable implementation of the Smith-Waterman algorithm for any computer of the RIVYERA line. With speeds up to 6 TCUPS per computer, this exact, high quality alignment method has become practical again, also in the context of next generation sequencing scenarios, and helps avoid time-consuming elimination of false positives or missing crucial information.

Main features of the accelerated Smith-Waterman:

  • User-definable scoring matrix. Use NUC22 or your own.

  • Affine gap penalties supported.

  • Calculation of reverse complement supported.

  • Possibility to define multiple query files and multiple database files for comparison.

  • Plain-Text Output as well as SAM Output supported.

 

In the currently available version of Smith-Waterman, a command-line interface is available. The same commands as in NCBI versions of the software are used so that the integration of this accelerated solution with your existing infrastructure and analysis pipelines is hassle-free. For even higher usability, SciEngines is also working with CLC bio on an integration of the different CLC solutions with the RIVYERA platform. Please feel free to This e-mail address is being protected from spambots. You need JavaScript enabled to view it or your local CLC representative if you have questions about this offering.

 

Performance: Providing the above mentioned features and run on a real-live dataset, the enormous speed of FPGA-computers in comparison to regular hardware as well its linear scaling properties becomes obvious. Using this high-quality alignment algorithm, only half a rack of RIVYERA S3-5000 would be sufficient to align in one day more than 3.5 billion base pairs and their complements against e.g. hg19 chromosome 1 or more than 18 billion base pairs and their complements against hg19 chromosome 21.

S-W performance benchmark graph

Additional resources for download:

Whitepaper: Smith-Waterman

Press Release: CLC & SciEngines collaboration