- There is no way to retrieve information in this system.
DNA is natively a content addressable storage system, due to natural base-pairing. But to first address your question in your edit: think of DNA as a pair of singly-linked lists, each with an alphabet of four characters. Each singly-linked list is the "reverse complement" of the other: the reverse sequence with an A-T swap and a C-G swap. At each position there's 2 bits of information, and the other linked list allows for some redundancy.
To probe for information in a DNA database, you construct the reverse-complement of the desired bit of information, attach a marker to your probe (such as a fluorescent dye, biotin, magnetic bead), then physically mix it in to your DNA database. A couple cycles of melting and cooling, and your probe will eventually find it's target DNA.
Of course, the thermodynamics of a physical database like this aren't particularly great. I'm not sure of the asymptotic behavior; my intuitive guess is lg(N) just like in B-trees or what have you, but I've never run the numbers or heard of anyone else running it. Also, the constant in front may be just a few orders of magnitude larger than our current systems :)
Reading DNA is getting super cheap these days, and the pace of DNA sequencing technology makes Moore's law look positively wimpy. There are about 30 serious startups working on technologies that fall into a few broad categories, and some like PacBio had an IPO this year. Writing DNA is a much more difficult challenge, I don't know of many people looking into it yet. The market for writing DNA isn't nearly as obvious as it is for sequencing. Of course if it becomes feasible to write your own pets/plants/children instead of breeding them, the market may explode.
There are several companies that will "write" arbitrary sequences of DNA, to order. One is http://www.mrgene.com , but at $0.39/bp it is still much more economical to isolate and amplify desired sequences using PCR.
> To probe for information in a DNA database, you construct the reverse-complement of the desired bit of information
To build the reverse-complement of the information, don't you need to have the information in the first place?
I know it's possible to store information in DNA through various means, but I don't believe it can be done at the density the OP calculated. If we're going to take into account only information storage while ignoring retrieval considerations, then we shouldn't compare naked cells with no DNA duplication to reliable whole hard drives.
Call it human pride, but I think we've beaten mother nature in several aspects ;)
Think of it as a massive key-value store: you construct your query off the key, use that to pull out the key-value pair, and when you sequence your key you continue to sequence more in order to pull out the value. If you prefer sequential addresses, your key could be just that.
And actually, this could be done at a much higher density than what the original poster described, as he's counting the full cell in the density calculation, and DNA is only a small fraction of the cellular volume. You could duplicate all the DNA 10-100 times in the same amount of space once you take out all the ribosomes, proteins and extra water. And as long as it's not stored in direct sunlight or next to your pile of plutonium, DNA is going to be much much more stable than aligning magnetic fields. We're still getting good DNA sequence out of bones that are tens of thousands of years old.
When you think of nanotechnology and miniaturization, think of biology, because that's where all the real nanotechnology is going on. We've not done any better than nature when it comes to making small machinery. Nature has already invented the commodity interchangeable parts (amino acids and nucleic acids) that can self-assemble into rather fantastic machines.
However, we have beaten mother nature on latency: as I alluded to, a DNA database like this would have latency on the order of days for a lookup. On the other hand, as much parallel access as you can imagine is built in, without additional volume. And this isn't a system that has been engineered at all, I'm just talking about the fundamental properties of a little puddle of DNA and water. If half the engineering that went into modern computer hardware were put into a DNA database, it could be quite competitive with our electronic systems.
Sure, a key-value store would work. My point is that the OP's system is not such a store. He just stores 10 PBs of raw data with no indexing and no duplication, so there is no way to retrieve data and comparison with hard disks is meaningless. My post was an answer to his "please correct my math".
DNA is natively a content addressable storage system, due to natural base-pairing. But to first address your question in your edit: think of DNA as a pair of singly-linked lists, each with an alphabet of four characters. Each singly-linked list is the "reverse complement" of the other: the reverse sequence with an A-T swap and a C-G swap. At each position there's 2 bits of information, and the other linked list allows for some redundancy.
To probe for information in a DNA database, you construct the reverse-complement of the desired bit of information, attach a marker to your probe (such as a fluorescent dye, biotin, magnetic bead), then physically mix it in to your DNA database. A couple cycles of melting and cooling, and your probe will eventually find it's target DNA.
Of course, the thermodynamics of a physical database like this aren't particularly great. I'm not sure of the asymptotic behavior; my intuitive guess is lg(N) just like in B-trees or what have you, but I've never run the numbers or heard of anyone else running it. Also, the constant in front may be just a few orders of magnitude larger than our current systems :)
Reading DNA is getting super cheap these days, and the pace of DNA sequencing technology makes Moore's law look positively wimpy. There are about 30 serious startups working on technologies that fall into a few broad categories, and some like PacBio had an IPO this year. Writing DNA is a much more difficult challenge, I don't know of many people looking into it yet. The market for writing DNA isn't nearly as obvious as it is for sequencing. Of course if it becomes feasible to write your own pets/plants/children instead of breeding them, the market may explode.