About 11 million 50-mer oligonucleotides were chosen from the entire sequenced genome of sea-urchin and synthesized on 27 glass-based slides. Each slide contained ~400,000 oligonucleotides. So, needless to say, they were of very high-density, unlike the garden-variety cDNA arrays.
The oligonucleotides were selected uniformly from the latest assembled sequence released by Baylor. There was a gap of 10-bases between consecutive oligos on the genome. In simple words, we started from one end of a contig, selected a 50-mer, skipped 10-bases, selected the next 50-mer and continued, until we reached the other end or got bored :). Actual probe selection algorithm was more complex, but let's stick to the above description to start with.
The slides were hybridized with RNA sample extracted from the sea-urchin embryo. Purified poly-A RNA mixed in equal quantities from egg, early blastula, gastrula and prism stage embryos were used for hybridization. Before hybridization, RNAs were reverse transcribed, converted into cDNAs, amplified as cRNAs and optically labeled. After hybridization, the slides were washed and laser-scanned to detect signals on all oligos.
What is the meaning of the signal on an oligo ? An oligo, by itself, is not labeled and should not emit any optical signal. It emits signal, only if it hybridizes with a labeled cRNA from the sample. Turning the argument around, if you see strong signal on an oligo, there is high chance that number of labeled RNAs matching the oligo existed in the original sample. Why is it "matching" and not "antisense" ? Remember, we reverse transcribed the original RNA in one of the processing steps.
Images in this webpage show the genomic coordinates on the X-axis and normalized signals on all oligonucleotides from the corresponding region on the Y-axis. If an mRNA is transcribed from a genomic region, we expect to see signals on many consecutive oligos within that regions. Therefore, if you see large signals on several consecutive probes from a segment of the chromosomes, the corresponding segment is most likely to be transcribed. The cutoff between signal/noise is 1.5 for this experiment.
Here is one thing you will generally observe in the images. There is always a gradual drop in signal from the 3' ends to the 5' ends of a gene. This has nothing to do with real biology, and is an artifact of the labeling procedure. Because the labeling starts from the 3'-end and continues to the 5'-end, its efficiency is highest near the 3'-end of a gene.
One peculiar observation in the sea-urchin tiling data is the presence of large amount of signal in the region antisense to almost all genes. Although observation of antisense signal has been reported in many other organisms, we believe that the particular observation in sea-urchin is more likely to be a labeling artifact than real biology. Why ? Here is one argument supporting my case (out of many). You will notice similar splicing of the antisense gene as the sense strand one. Not only that, the drop from 3' end to 5' end is almost identical to the sense strand gene. If there existed separate antisense RNA, wouldn't you expect it to label from the other end ?
To summarize, tiling array experiment is the 21st century technique for annotating a large genome. The traditional method of computational gene prediction either finds many non-existent genes (false positives), or misses many existing genes (false negatives). GLEAN set for example is very conservative and therefore misses many existing genes.
Please contact manoj.samanta@systemix.org, if you have any question.