New Computational Tools Leverage Hadoop® and Spark™ for NGS Workflows

Life sciences research continues to evolve rapidly. While structural biology, drug discovery and materials science are still important, we’re seeing an increasing focus on analytics and the more effective use of data. The race to understand patients and diseases at the molecular level to achieve precision medicine is fueling this shift.

Recently my colleague, Ted Slater, wrote about precision medicine and the role genomics is having in improving cancer care. The development of next-generation sequencing (NGS) technologies is explosively growing the use of genomics for research related to human disease, agriculture and evolutionary sciences; with new developments aimed at greater accuracy, faster results and lower costs.

With the rapid evolution of technologies, life science organizations are struggling to keep their conventional compute infrastructure up to date. The industry challenge is summarized by Chris Dagdigian of The BioTeam: “. . . [T]oday’s Bio-IT professionals have to design, deploy, and support IT infrastructures with life cycles measured over several years, in the face of an innovation explosion where major laboratory and research enhancements arrive on the scene every few months.”

At Cray, we are meeting the increased use of analytics and big data within the life science research community by incorporating supercomputing technologies into analytics solutions. Additionally, we are seeking out a few strategic partners who leverage advances in information technology such as Apache Hadoop® and Apache Spark™. Historically, bioinformatics codes have not leveraged Hadoop or Spark; but a few companies — Lumenogix and BioDatomics — have developed wrappers to bioinformatics codes to leverage these analytics environments. Wrapping bioinformatics code eliminates the need to rewrite code you would want to move to Hadoop or Spark; providing a convenient way to incorporate updates and changes. Another key benefit to working with these partners is the ability to capture relevant metadata associated with NGS analysis, enabling researchers and analysts to repeat experiments and giving them the necessary information to compare results across different analytical runs.

Now the cool stuff. We’re working with these partners and others, including Intel, to test NGS workflows on Cray’s Urika-XA™ extreme analytics platform. The Urika-XA platform is optimized for Hadoop and Spark environments, with over 1,500 cores, fast SSD storage at the node level, a POSIX-compliant parallel file system and 6 TB of RAM, all in one cabinet. This unique architecture enables researchers to run their NGS workflows, perform reanalysis and then go on to perform any other analysis or annotation that can run on Hadoop or Spark while limiting data movement and still maintaining a small footprint in the data center. Our initial results are exciting: We’ve increased the number of samples that can be processed in parallel and reduced the time it takes to process both exome and whole-genome samples.

Join us next week at the annual Bio-IT conference where I’ll discuss how high performance computing technologies and analytics are applied to NGS workflows. I’ll be presenting on April 22 at 5 p.m. EDT.

The post New Computational Tools Leverage Hadoop® and Spark™ for NGS Workflows appeared first on Cray Blog.

New Computational Tools Leverage Hadoop® and Spark™ for NGS Workflows

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List