Sunday, November 15, 2015

the grammar of the genome?

In my research for "The Library of Life" (a chapter of my project that I'll post about soon) I studied the patterns of a common metaphor used for DNA and genomic research: that the genome is a book, made up of nucleotide base-pairs that are its letters, genes which are words or sentences, all susceptible to being read, copied, or edited.  One good study of the metaphor is "From sequencing to annotating: extending the metaphor of the book of life from genetics to genomics" by Iina Hellsten, in New Genetics and Society (2005). For the theme of Species and Print, the implications of the metaphor are that a specimen's genome is a copy of a species-book, ignoring the fact that organisms of a given species do not share the exact genome-text with others. The first heyday of this was around the time that the Human Genome Project was begun, in 1989-90, although as we'll see it began much earlier.

In the last week I've read two prominent news stories, in the New Yorker for November 16, 2015, and the New York Times Magazine for Nov. 15th, about the recently developed CRISPR technique in genomics, devised by Jennifer Doudna at UC-Berkeley, and Feng Zhang at Harvard (and many others no doubt). These pieces again use the same metaphors of the genome text. What should give us pause is that the insinuations that genetic engineering would be as easy as reading and writing did not come to pass the first time, because the metaphor was not so solid, and so they likely won't happen this time either.


Zhang is quoted in the New Yorker story saying: "Imagine being able to manipulate a specific region of DNA...almost as easily as correcting a typo" and the Times piece echoes this: "Some researchers have compared Crispr to a word processor, capable of effortlessly editing a gene down to the level of a single letter." Note that the sentence ends with "letter" and "typo" not "gene" or "nucleotide" or "GTA or C" as if the metaphor was so widely accepted that the actual DNA stuff need not be described or explained.


The most powerful study of the interrelations between genomics, cybernetics, and information is Who Wrote the Book of Life? (Stanford, UP 1999) by Lily Kay. It's a big tome of which I read only parts so far, but in another interesting paper I found I read of how “Fred Sanger and Walter Gilbert independently invented methods to deduce the sequence of nucleotides in the DNA molecule in 1975-1977”  They hired software specialists, who took the techniques used for early word processing software being invented at this same time. “they were conceiving the sequences as ‘words’ and using the algorithms--programming orders--applied in searches within texts by the word processors” (Chow-White and Garcia-Sancho in Science, Technology, and Human Values 37 [2012]:132). So it appears the origins of this metaphor arose from the coincidence that word-processing software was developed at the same time as genetic engineering. 


The lines from this weekend's articles, and many many others I've read, suggest, as Hellsten puts it, that "one can read off genetic information unambiguously and straightforwardly" and "correct genetic misprints." But this is misunderstanding of what genes are like, and how genomics is done. The vast majority of the genes, and therefore of the nucleotide letters, have an unknown function, or no function at all, in the development and operation of the organism. Genes are switched on or off by other genes or by events in the life and environment of the organism. Some genes block the operation of others. Some portions of the human genome, about 0.5%, are "plagiarized" from bacteria that have elbowed their way into the more noble text of the cells. [see Chris Ponting, "Plagiarized bacterial genes in the human book of life" Trends in Genetics 2001]


And the high-throughput sequencers that "read" these genes do not read like we read books. They blast apart the genome into short sections, and then computers search and match and reassemble the sequence: Since the chain termination method of DNA sequencing can only be used for fairly short strands (100 to 1000 basepairs), longer sequences must be subdivided into smaller fragments, and subsequently re-assembled to give the overall sequence.” the sequencers produce multiple “reads” of each section of these blasted-apart genomes, and then computers match the overlapping tiles and reassemble the entire sequence. In the “next-generation” shotgun methods, instead of just 10 or 12 copies of each fragment, there may be hundreds. More shotgun cartridges, if you will, requiring far more computing power, but resulting in greater reliability. Still, there are errors." [Chow-White and Garcia-Sancho, p146]. There are errors in sequence readings, and errors in genes themselves. But can we really call them "errors" corrected by an editor?


So although I titled this the grammar of the genome, in truth there is no grammar, and scarcely a syntax, for although there are significant passages, the loss of which can cause mutation or death, these gems of coherence are tiny, lost amid the inchoate gibberish, and pieces of other books. So there is no narrative thread to a genome, no building suspense, and no true beginning or conclusion. There are only fragments. Post-modernists would make much of this, but would not really mean what they say.


Most people use computers most often to write and edit, whether it be novels, scientific papers, school assignments, or, most likely, text messages and social media posts. This revolution in technology has made writing easier, but made the texts more distant from our bodies. Choose what font you may, your writing no longer bears your own hand's print. And the searching and editing features we use can be applied to huge, hidden databases of text, as when we type something into a search engine, or try to find names, titles, or phrases in our computer's drives or our phone's memory. I believe that these ponds and oceans of text have the immensity and mystery of a genome, which may be in our every cell, but is likewise foreign to our consciousness. This is not like rummaging through a shoebox filled with letters from your lover or parent, looking for a memory.