by Eli Kintisch
giant leap for humility." "We're nothing special." "Scientists find
only half as many genes as expected." These are some of the headlines that
appeared after papers on the draft genome were published in February. Both
the public and private projects estimated we had just 30,000 to 40,000
genes, far fewer than most previous figures suggested -- and barely more than
But the low estimates have ignited a firestorm of controversy. William Haseltine, head of biotech company Human Genome Sciences (HGS) in Rockville, Maryland, has been the most outspoken critic, attacking both the quality of the draft sequences and the gene-finding efforts of those who compiled them. "They're reading smudged text through foggy glasses," he recently snarled. Haseltine claims to have found more than 90,000 genes, while companies such as Affymetrix sell gene chips based on more than 60,000 genes and DoubleTwist puts the number above 65,000.
But Craig Venter, head of Celera Genomics, the private rival to the public genome consortium, is standing by the lower estimate. He calls it a "truth serum" for his competitors. So are these companies wasting hundreds of millions of dollars on a wild goose chase? Or could the public consortium and Celera end up delaying the development of medical tests and treatments by denying the existence of large numbers of genes?
The accuracy of the draft genome is not the issue. The controversy is about how you find the fragmented parts of the genome that actually code for proteins. There are 26,000 genes that researchers more or less agree on. In the papers in Nature and Science, the public consortium and Celera estimated that there are about another 10,000, based on computer programs that search raw sequences for stretches that resemble known genes.
the programs tend to throw up lots of genes that don't really exist. To
avoid counting these, Celera and the consortium demanded evidence that gene
candidates really are transcribed to make the messenger RNAs that cells use
to make proteins. "But we only have transcription evidence for half the
genes in the body," admits geneticist Michael Zhang at Cold Spring Harbor
Laboratory in New York.
That's where HGS and similar genome companies come in. Instead of looking at the raw sequence, they find genes by combing thousands of different cells for bits of mRNA. These are then turned into bits of DNA called expressed sequence tags, or ESTs. Haseltine claims his ESTs provide evidence for more than 10,000 genes that aren't in the consortium's database. "We have made functional proteins, some of which we are developing as drugs, that are not annotated as even existing in that text," he says. But Celera and the consortium claim their estimates include these proteins.
Another problem with gene-finding programs is that they can only look for code that resembles known genes. So they not only turn up candidate genes that don't really exist, they also miss lots of real genes. "Historically, gene-prediction programs have tended to miss over 50 percent of genes," says geneticist Michael Snyder of Yale University. A group at Ohio State University in Columbus has analysed the same data that the consortium looked at and estimates there are actually about 80,000 genes. In an as yet unpublished paper, it claims that the consortium's software has missed nearly 850,000 gene segments for which there is protein or RNA evidence.
While the debate should be settled eventually, the uncertainty could have far-ranging implications. Some fear that undiscovered genes -- and thus potential drug targets -- could fall through the cracks. While many labs continue to mine the genome for new genes, some are finding it difficult to get funding. The head of one biomedical research lab, who preferred not to be named, says his funders recently asked him why he was continuing to look for genes when the "genome was finished." "People should not give up the gene count," warns Haseltine.
Meanwhile, the rival parties are heading for a showdown. "There's a simple way to settle the question," says Eric Lander of the Whitehead Institute, one of the leaders of the consortium. "Let's randomly select 3 percent of the genome, have everyone declare the genes that they believe to be in that region, and test the proposed genes." Let the games begin.
May 14, 2001 (http://www.monitor.net/monitor) All Rights Reserved. Contact email@example.com for permission to use in any format.
All Rights Reserved.
Contact firstname.lastname@example.org for permission to use in any format.