Saturday, August 4, 2007

Information, please- Part 2

As mentioned last post, the newest tack that the creationists use is to claim that evolution cannot create any new information.
For evolution in the molecules-to-man sense to occur, there must be a mechanism for information that did not previously exist to arise from non-information. No one has ever observed such a phenomenon occurring.
There are two ways to look at this claim. First of all, what is meant by “new information”? Secondly, does evolution produce new information? I’ll look at the first claim this time around, and examine the evidence for new information next time..

What is “information”? This question is not as simple as it seems.

For example, consider these strings-

aaaaaaaaaaaaaaa

5kmfd vj=mkrp3

The car is red.

All three of these strings contain 15 characters. So which one has the most information?

In the scientific sense, information is described in one of two ways- Shannon information and complexity. Shannon information refers to how much raw information a string contains; its “meaning” is irrelevant. That raw information is equal to the length of the description necessary to recreate the entire sequence accurately- in other words, the compressibility of the string. The string “aaaaaaaaaaaaaaa” is extremely compressible- “write “a”16 times” or a(16). The string “5kmfd vj=mkrp3” is not very compressible at all- I need to write a much longer description to recreate the sequence- “write a “5”, then a “k”…” and so on. In terms of Shannon information, a book composed of entirely random letters, symbols and numbers contains more information than a copy of a real book with an equal number of characters, because the random characters are harder to recreate and less compressible. An actual experiment that reflects Shannon information is using a file compression algorithm such as WinZip to compare the compressibility of equally-long text files. One such experiment, compressing 100,000 copies of the letter “a,” 100,000 random letters, and 100,000 letters from Shakespeare’s The Tempest, found compression rates of 99.3%, 2.5% and 58.4%.

Complexity information, on the other hand, looks at how much non-random, non-compressible information a sequence has. The greater the amount of non-random, non-compressible information, the more complex the sequence is. Using these standards, the sequence of 100,000 copies of the letter “a”is highly compressible, but non-random information. The sequence of 100,000 random letters is not very compressible, but highly random. The Tempest has medium compressibility, but very low randomness and thus contains more complex information.

Can we state that addition of information to an organism’s genome cannot occur? It seems obvious that information in the sense of Shannon information is constantly added to genomes through processes such as mutations and gene duplication. The common creationist response to such processes is that they do not add “new” information. So what type of information are creationists talking about when they state that organisms “contain no new information” or that mutations can never increase information, only decrease it? Very few creationists bother to make a distinction between the two types of information. When they do, they invariably mix up the definitions up into something meaningless or (not surprisingly) create their own definition of information:
For there to be information, there must be a multiplicity of distinct possibilities any one of which might happen. When one of these possibilities does happen and the others are ruled out, information becomes actualized. Indeed, information in its most general sense can be defined as the actualization of one possibility to the exclusion of others.
William Dembski- Intelligent Design as a Theory of Information
An elegant study using a very similar concept to this along with the Shannon definition of information shows that, using such a measure, information most certainly increases, using the value that is standard for measuring information- the bits of information contained at one site. The paper is too complex for a short explanation; this ten-minute video
is a great synopsis of the study (you may want to turn off your sound first!) This video is a good prerequisite.

Creationists will complain that this is not “new” information. They do not, however, define “new” in any meaningful sense. They imply that such information is not “new” because it is a variant of genetic material that already existed in the organism (or in another organism in the case of such things as horizontal gene transfer.)

A little biology is in order here. Most people are aware that the genetic code of organisms is contained in genes, which encode instructions in DNA. Many people are not aware that genes do not specify things like “build a leg” or “color this yellow.” Instead, genes code for the production of proteins by specifying how and when they are made, and the proteins themselves are responsible for patterns of growth and development.

Just as any statement written in the English language is composed of 26 letters, 10 numerals and a handful of symbols, new information in the genome is written with the existing “letters” of amino acids in DNA. Many of the documented changes in the genomes of organisms involve creation of novel DNA sequences that did not before occur in that organism. Changing a few “letters” in the DNA code can lead to huge changes in the function of the organism as new proteins are produced. It is difficult to conceive of any reasonable definition of “new” that excludes such events. Any creationist who states that events such as these do not represent “new information” must then agree that the production of any “new” information in the English language is similarly impossible. Let us look at two similar events:

ATG GTG GCT GTA GGA ATC TGT CGC ACA GAT GAC
ATG GTG GCT GTA GGA ATC TGT CAC ACA GAT GAC

You must not eat that apple.
You must now eat that apple.

The first two “sentences” are portions (nucleotide sequences 40-50) of the 374-codon sequence that codes for alcohol dehydrogenase, the enzyme that breaks down ethyl alcohol. The “word” CGC in the first sequence at position 47 codes for the amino acid arginine, while the “word” CAC in the second codes for the amino acid histidine. This difference results in two different functional enzymes. The difference in codon 47 is the only difference between the two. This simple difference results in different rates of alcohol metabolism. People with the first variant, known as ADH Beta 1, the most common variant in Caucasians, metabolize alcohol more slowly than people with the second version, ADH Beta 2, found more commonly in people of Oriental descent. People with ADH Beta 2 are more likely to display the alcohol flush reaction when they drink. This causes hangover-like symptoms such as facial flushing, nausea, and vertigo. Such people drink less alcohol due to these unpleasantries, and are thus less likely to become alcoholics. The one-“letter” change in this case has produced significant new information.

In the second example, changing one letter in a sentence leads to two sentences with completely different meanings. Again, the one-letter change has produced significant new information.

If rearranging the letters of DNA to create new messages with different meanings than the old messages does not produce new information, then neither does rearranging the letters used in English.

Creationists also claim that if a genome is simplified by removal of some genetic material, no new information is produced and evolution does not occur. Note that simplification can actually increase the complexity information content of an organism (decreasing randomness and compressibility) especially if the removed genetic material was duplicative “junk DNA.”

Well, that’s quite a bit of heavy info for one post. In the next posting, I will examine just one method of increasing information in organisms- gene duplication.

No comments: