» About     » Archive     » Submit     » Authors     » Search     » Random     » Specials     » Statistics     » Forum     » Facebook     » RSS Feed     Updates Daily

No. 387: GGAGCTCGTTTTATCGAGCTCGAT

First | Previous | 2010-06-10 | Next | Latest

GGAGCTCGTTTTATCGAGCTCGAT

First | Previous | 2010-06-10 | Next | Latest

Strip by: Bill Gilliland

{Garfield walks through the room carrying Pookie.}
Jon: ATAGACATCGATAACACGTCTGAGGAAACACATGCCA
CGTAAATAGACATCGATAACACGTCTGAGGAAACACAT
GCCACGTAAATAGACATCGATAACACGTCTGAGGAAAC
ACATGCCACGTAAATAGACATCGATAACACGTCTGAGG

The author writes:

The genetic code is how organisms store the sequence of proteins. Basically, there are four nucleotides of DNA (A, C, G, T) that combine in three-letter "words" called codons to specify amino acids, which then get translated into a linear protein molecule through very complicated molecular machinery. There are 20 different amino acids, and a shorthand way to represent a protein is to use single letters. The conventional representation uses the whole alphabet except for the letters B, J, O, U, X, and Z. There's also the signal to stop the protein (called a "stop codon") which is represented by the codons TAA, TAG, and TGA.

The word "Garfield" does not contain any of those letters, and so might actually be the protein sequence glycine-alanine-arginine-phenylalanine-isoleucine-glutamic acid-leucine-aspartic acid. This protein sequence actually exists in nature (albeit as part of a larger protein) in at least one bacterium in the sequence database. I've highlighted the residues in red, below.

>gi|94500127|ref|ZP_01306661.1| protein containing QXW lectin repeats [Oceanobacter sp. RED65]
MFKRTFIRQISALATLGFFSLHALAMSATPDPQASVDGDAITQFQNSWAGKALAHQRNMDANKPLAENNI
IGSHNSYNSRKYRNATRYLDPQQIVSIYDQLRLGARFIELDAHWTAHTHGWPWQWGTDLLLCHSGIGVDV
GDLHVGCSLTDRRVEDGIAEVARWINENPKEVIILYFEDHTDGRHQELFNVINKQLGANIYASQGCKAIP
NTLTKNQVLASGKQVIVWKDGGCSGNQNMSNMAFTSLGDINRIWEDRTSIGAIGAFFTNGSVKKIESEDV
IQAFKNGGNIVNLDDMTHSDDRLSAAIWSWDVNEPNNWGGNQDCALQWENGRWDDTSCSNQHFFACQHNE
TQEWNISTYQDAWQAGQQACSLLGNYRFSTPSNSLENEKLKTAKGGISHVWLNASDRTEEGTWITH

(You can see the string "garfiel d" split before the final "d" in the second line of the ORIGIN field of the protein database entry, linked above.)

This bit of protein could be encoded in DNA as GGAGCTCGTTTTATCGAGCTCGAT (as well as many other ways, since the code is redundant). To make this strip, I just encoded all the dialogue as DNA and added a stop codon for a period. The only hard part was finding a strip that didn't contain any of those six letters. O and U are surprisingly common...

Original strip: 1978-10-24.