Researchers at Cold Spring Harbor Laboratory have developed a software program that they say corrects a crucial flaw in technology used to sequence genomes in plants and animals.
Biologists around the world are scrambling to decode the genetic material of flora and fauna to grow more bountiful crops, develop biofuels and discover new links between humans and animals. While the technology has vastly improved over the years, the results are often riddled with errors, leaving scientists with incomplete and blurry pictures of DNA, the molecular building block of heredity.
The method developed at Cold Spring Harbor, detailed Sunday in the journal Nature Biotechnology, uses mathematical algorithms to correct most of those errors. One of the researchers likened the program to a pair of eyeglasses that brings bleary DNA images into focus.
"It starts really fuzzy. Then we put on these glasses, and, aha! Everything is clear," said Michael Schatz, a quantitative biologist at Cold Spring Harbor Laboratory.
The technology could be used to partially decipher human genomes. But its primary use is likely by biologists working on simpler organisms, hoping to replicate the recent success of decoding the tomato, Schatz said.
It also has commercial implications, particularly for Pacific Biosciences of California Inc., a company that makes the sequencing machine Cold Spring Harbor researchers used to develop the software.
"Clearly it makes our technology much more valuable," Pacific Biosciences chief executive Michael Hunkapiller said.
Researchers at Cold Spring Harbor began working with the machine in 2010, when the company gave Schatz and his team a model to test. The scientists have made their software to improve the device available for free online.
Pioneered in the 1970s, the earliest genome sequencing techniques were exceedingly slow. Researchers first decoded the genome of a microbe in 1995. Six years later, hundreds of scientists from around the world teamed up to complete the decade-long effort to sequence a human genome.
Today, there are two primary methods, and neither is perfect.
Think of a genome as a long string of code, with four letters repeating billions of times in patterns.
One decoding method reads short DNA strings of about 100 letters and is very accurate. But that offers only a limited picture of an organism's genetic material. A second method reads about 1,000 DNA letters. But it is highly error-prone, mixing up every fourth or fifth character in the chain.
The software essentially combines the two methods, using algorithms, to pick up patterns from short strings and find errors in the long ones. The result is a long string of DNA that is more than 99 percent accurate.