A group of researchers working at the Human Genome Project will be announcing soon that they made an astonishing scientific discovery: They believe so-called non-coding sequences (97%) in human DNA is no less than genetic code of an unknown extraterrestrial life form.
The non-coding sequences are common to all living organisms on Earth, from molds to fish to humans. In human DNA, they constitute larger part of the total genome, says Prof. Sam Chang, the group leader. Non-coding sequences, also known as “junk DNA”, were discovered years ago, and their function remains mystery.
Unlike normal genes, which carry the information that intracellular machinery uses to synthesize proteins, enzymes and other chemicals produced by our bodies, non-coding sequences are never used for any purpose. They are never expressed, meaning that the information they carry is never read, no substance is synthesized and they have no function at all. We exist on only 3% of our DNA.
The junk genes merely enjoy the ride with hard working active genes, passed from generation to generation. What are they? How come these idle genes are in our genome? Those were the question many scientists posed and failed to answer – until the breakthrough discovery by Prof. Sam Chang and his group.
Trying to understand the origins and meaning of junk DNA Prof. Chang realized that he first needs a definition of “junk”. Is junk DNA really junk, (useless and meaningless) or it contains some information not claimed by the rest of DNA for whatever reason? He once mentioned the question to an acquaintance, Dr. Lipshutz, a young theoretical physicist turned Wall Street derivative securities specialist. “Easy,” replied Lipshutz. “We’ll run your sequence through the software I use to analyze market data, and it will show if your sequences are total garbage, “white noise”, or there is a message in there.” This new breed of analysts with strong background in math, physics and statistics are getting more and more popular with Wall Street firms. They sift through gigabytes of market statistics, trying to uncover useful correlation between the various market indexes, and individual stocks.
Working evenings and weekends, Lipshutz managed to show that non-coding sequences are not all junk, they carry information. Combining massive database of the Human Genome Project with thousands of data files developed by geneticists all over the world Lipshutz calculated Kolmogorov entropy of the non-coding sequences and compared it with the entropy of regular, active genes. Kolmogorov entropy, introduced by the famous Russian mathematician half a century ago, was successfully used to quantify the level of randomness in various sequences, from time sequences of noise in radio lamps to sequences of letters in 19th century Russian poetry. By and large, the technique allows researchers to quantitatively compare various sequences and conclude which one carries more information than the other does. “To my surprise, the entropy of coding and non-coding DNA sequences was not that different”, continues Lipshutz. “There was noise in both but it was no junk at all. If the market data were that orderly, I would have already retired.”
After a year of cooperation with Lipshutz, Chang was convinced, there is a hidden information in junk DNA. However, how could one understand its meaning if the information is never used? With active sequences you try to watch the cell and see what proteins are being made using the information. This wouldn’t work with dormant genes. There will be experiment to test a hypothesis; one should rely on the power of his thought. Since there are letters, it should be tested in some old languages, perhaps Sumerian, Egyptian, Hebrew, and so on. Prof. Sam Chang solicited help from three specialists in the field, but none of them managed to find a solution. There were no cultural clues, no references to other known languages, the field was too alien for the linguists.
“I asked myself: who else can decipher a hidden message?” Chang continues. “Of course, cryptographers! In addition, I began talking with researchers at the National Security Agency. It took me few months to make them return my calls. Were they running background checks on me? Alternatively, were they too busy lobbying senators on retaining and strengthening their authority to control exports of encryption technologies? Eventually, a junior fellow was assigned to answer my questions. He listened, requested my questions in writing and after another, few months turned me down. His message was polite but meant, “Go to hell with your crazy ideas. We are a serious agency, its National Security, dude. We are too busy.”
Well, Sam, forget the Government, talk to the private sector. Therefore, I began approaching computer security consultants. They were genuinely interested, and a couple of them even began working on my project, but their enthusiasm always faded after a month. I kept calling them until one nice fellow told me: “I’d love to work on your project if I had more time. I am overbooked. Emissaries of major banks and Fortune 500 companies are begging me to plumb the holes in their networks. They pay me $500 an hour. I can give you an educational discount, can you afford $350?” Scrambling $15/hr for a post doctoral studies is a big deal in academia, $350 sounded as something extraorbital.” Eventually Prof. Chang was referred to Dr. Adnan Mussaelian, a talented cryptographer in the former Soviet republic of Armenia. Poor fellow barely survived on a $15 a month salary and occasional fees for tutoring children of Armenian nuveau riches. A $10,000 research grant was a struck of luck, he began working like a beaver.
Adnan promptly confirmed the findings of his Wall Street predecessor: The entropy indicated tons of information almost in the clear, it was not too strong cryptographic system, it didn’t appear to be a tough problem. Adnan began applying differential cryptoanalysis and similar standard cryptographic techniques.
He was two months in the project when he noticed that all non-coding sequences are usually preceded by one short DNA sequence. A very similar sequence usually followed the junk. These segments, known to biologists as alu sequences, were all over the whole human genome. Being non-coding, junk sequences themselves, alu are one of the most common genes of all.
Trained as a cryptographer and computer programmer, and having no knowledge of microbiology, Adnan approached the genetic code as of computer code. Dealing with 0, 1, 2, 3 (four bases of genetic code) instead of 0s and 1s of the binary code was a sort of nuisance, but the computer code was what he was analyzing and deciphering all his life. He was on familiar territory. The most common symbol in the code that causes no action followed by a chunk of dormant code. What is that? Just playing with the analogy Adnan grabbed the source code of one his programs and fed it into the program that calculates the statistics of symbols and short sequences, a tool often used in decoding messages. What was the most common symbol? Of course, it was “/”, a symbol of comment! He took a Pascal code, and it were { and } ! Of course, the code between two slashes in C is never executed, and is never meant to be executed; it is not the code, it is the comment to the code!
Being unable to resist the temptation to further play with the analogy, Adnan began comparing statistical distributions of the comments in computer and genetic code. There must be a striking difference. This should show up in statistics. Nevertheless, statistically, junk DNA was not much different from active, coding sequences. To be sure, Adnan fed a program into the analyzer: surprisingly, the statistics of code and comments were almost the same. He looked into the source code and realized why: there were very few comments in between the slashes, it was mostly C code the author decided to exclude from execution, a common practice among programmers.
Adnan, religiously inclined person, was thinking about the divine hand – but after analyzing the spaghetti code inside the sequences he convinced himself that whoever wrote the small code was not God. Who wrote the active, small coding part of human genetic code was not very well organized, he was a rather sloppy programmer. It looked like rather somebody from Microsoft, but at the time human genetic code was written, there was no Microsoft on Earth.
On Earth? It was like a lightning… Was the genetic code for all life on Earth written by an extraterrestrial programmer and then somehow deposited here, for execution? The idea was mad and frightening, and Adnan resisted it for days. Then he decided to proceed. If the non-coding sequences are parts of the program that were rejected or abandoned by the author, there is a way to make them work. The only thing one needs to do is to remove the symbols of comments and if the portion between the /*……*/ symbols is a meaningful routine it may compile and execute! Following this line of thought, Adnan selected only those non-coding sequences that had exactly the same frequency distribution of symbols as the active genes. This procedure excluded the comments in Marcian or Q, whatever it was. He selected some 200 non-coding sequences that most closely resembled real genes, stripped them of /*, //, and similar stuff and after few days of hesitation sent e-mail to his American boss, asking him to find a way to put them in E-coli or whatever host and make them work.
Chang did not replied for two weeks. “I thought I was fired”, confessed Dr. Mussaelian. “With every day of his silence I more and more realized how crazy my idea was. Chang would conclude I was a schizophrenic and would terminate the contract. Chang finally responded and, to my surprise, he did not fire me. He had not bought my extraterrestrial theory but agreed to try to make my sequences work.”
Biologists have attempted for years to make junk sequences express, without much success. Sometimes nothing turned out; sometimes it was junk again. It was not surprising. Grab an arbitrary portion of the excluded computer code and try to compile it. Most likely, it will fail. At best, it will produce bizarre results. Analyze the code carefully, fish out a whole function from the comments, and you may make it work. Because of careful Mussaelian’s statistical analysis 4 of the 200 sequences he selected, began working, producing tiny amounts of a chemical compounds.
“I was anxiously awaiting the response from Chang,” says Dr. Mussaelian. “Would it be a more or less normal protein or something out of ordinary? The answer was shocking: it was a substance, known to be produced by several types of leukemia in men and animals. Surprisingly, three other sequences also produced cancer-related chemicals. It no longer looked like a coincidence. When one awakens a viable dormant gene, it produces cancer-related proteins. Researchers began searching Human Genome Project databases for the four genes they isolated from junk DNA. Eventually, three of the four were found there, listed as active, non-junk genes. This was not a big surprise: since cancer tissues produce the protein, there must be somewhere a gene, which codes it! The surprise came later: In the active, non-junk portion of the code the gene in question (the researchers called it “jhlg1”, for junk human leukemia gene) was not preceded by the alu sequence, i.e. the /* symbol was missing. However, the closing */ symbol at the end of “jhlg1” was there. This explained why “jhlg1” was not expressed in the depth of the junk DNA but worked fine in the normal, active part of the genome. The one who wrote the basic genetic code for humans excluded portion of the big code by embracing them in /*… */ but missed some of the opening /* symbol. His compiler seems to be garbage, too: a good compiler, even from terrestrial Microsoft, would most likely refuse to compile such program at all.
Prof. Sam Chang with his students began searching for genes associated with various cancers, and almost in all instances they discovered that those genes are followed by the alu sequence (i.e. protein as a comment closing symbol */), but never preceded by the comment opening /* gene! “This explains why diseases result in cell damage and their death, whereas cancers lead to cell reproduction and growth. Because only few fragments from the big code are expressed, they never lead to coherent growth. What we get with cancer, is expression of only few of genes alien to humans and symbiosis with some genes of bacterial parasites that lead to illogical, bizarre and apparently meaningless chunks of living cells. The chunks have its own veins, arteries, and its own immune system that vigorously resists all our anti-cancer drugs.
“Our hypothesis is that a higher extraterrestrial life form was engaged in creating new life and planting it on various planets. Earth is just one of them. Perhaps, after programming, our creators grow us the same way we grow bacteria in Petri dishes. We can’t know their motives – whether it was a scientific experiment, or a way of preparing new planets for colonization, or is it long time ongoing business of seedling life in the universe. If we think about it in our human terms, the extraterrestrial programmers were most probably working on one big code consisting of several projects, and the projects should have produced various life forms for various planets. They have been also trying various solutions. They wrote the big code, executed it, did not like some function, changed them or added new one, executed again, made more improvements, tried again and again. Of course, soon or later it was behind schedule. Few deadlines have already passed. Then the management began pressing for an immediate release. The programmers were ordered to cut all their idealistic plans for the future and concentrate now on one (Earth) project to meet the pressing deadline. Very likely in a rush, the programmers cut down drastically the big code and delivered basic program intended for Earth. However, at that time they were (perhaps) not quite certain which functions of the big code may be needed later and which not, so they kept them all there. Instead of cleaning the basic program by deleting all the lines of the big code, they converted them into comments, and in the rush they missed few /* symbols in the comments here or there; thus presenting mankind with illogical growth of mass of cells we know as cancer.”
There are three options to the problem. Either delete all the /* symbols and comments and clean this way the basic code, or add all the missing */ and avoid illogical mixing of the basic code with the big code. Alternatively, in the third option, remove all the / symbols and let work the basic code with the big code as a complete program. Unfortunately, none of these options are within our capacity. If we were able to efficiently insert genes into the chromosomes of living men, our breakthrough discovery would mean instant cure for all future cancer cases; at least from the programmer point of view. Theoretically, we can do it in a laboratory, but we have no practical means to implant the repaired DNA into living subjects. The mystery of “junk DNA” and cancer seems to be solved, but no quick cure shall be expected. The best thing we can do now is to try nourishing new, cancer-free line of humans with gradually debugged basic genetic code. That will take a long time. For us and our children, there is no hope on the horizon.
“However, from the programmer’s point of view, there is also positive outlook in it. What we see in our DNA is a program consisting of two versions, a big code and basic code. First fact is, the complete program was positively not written on Earth; that is now a verified fact. The second fact is, that genes by themselves are not enough to explain evolution; there must be something more in the game. What it is or where it is, we don’t kow. The third fact is, no creator of a new work, be it a composer, engineer or programmer, from Mars or Microsoft, will ever leave his work without the option for improvement or upgrade. Ingenious here is, that the upgrade is already enclosed – the “junk DNA” is nothing more than hidden and dormant upgrade of our basic code! We know for some time that certain cosmic rays have power to modify DNA. With this in mind, plausible solution is available. The extraterrestrial programmers may use just one flash of the right energy from somewhere in the Universe to instruct the basic code to remove all the /*…*/ symbols, fuse itself with the big code (“junk DNA”) and jumpstart working of our whole DNA. That would change us forever, some of us within months, some of us within generations. The change would be not too much physical, (except no more cancers, diseases and short life), but it will catapult us intellectually. Suddenly, we will be in time comparable to coexistence of Neanderthals with Cromagnons. The old will be replaced giving birth to a new cycle. The complete program is elegant, very clever self-organizing, auto-executing, auto-developing and auto-correcting software for a highly advanced biological computer with build-in connection to the ageless energy and wisdom of the Universe. Software wise, within us is either short and diseased life, or potential for a super-intelligent super-being with a long and healthy life. This triggers puzzling questions – was the reduction to the basic code done by sloppy programmers in a rush (as it appears to us), or was the disabling of the big code purposeful act which can be cancelled by a “remote control” whenever desired?”
Soon or later, we have to come to grips with the unbelievable notion that every life on Earth carries genetic code for his extraterrestrial cousin and that evolution is not what we think it is. This discovery may well shake the very roots of humanity – our beliefs in our concept of God and in our own power over our destiny. With the right paradigm, we may discover one day that all forms of life and the whole Universe is just one huge intellectual exercise in thoughts expressed mathematically, by Design, by Creator”