A Step Closer To Data Immortality: From Hard Drives To Flash Drives To DNA Drives

Biology’s databank, DNA has long tantalised researchers with its potential as a storage medium: fantastically dense, stable, energy efficient and proven to work over a timespan of some 3.5 billion years. Digital information is accumulating at an astounding rate, straining our ability to store and archive it. DNA is among the most dense and stable information media known. The development of new technologies in both DNA synthesis and sequencing make DNA an increasingly feasible digital storage medium.

A bioengineer and geneticist at Harvard’s Wyss Institute have successfully stored 5.5 petabits of data — around 700 terabytes — in a single gram of DNA.

“We developed a strategy to encode arbitrary digital information in DNA, wrote a 5.27-megabit book using DNA microchips, and read the book by using next-generation DNA sequencing”.

The work, carried out by George Church and Sri Kosuri, treats DNA as just another digital storage device. Instead of binary data being encoded as magnetic regions on a hard drive platter, strands of DNA that store 96 bits are synthesised, with each of the bases (TGAC) representing a binary value (T and G = 1, A and C = 0). To read the data stored in DNA, you simply sequence it — just as if you were sequencing the human genome — and convert each of the TGAC bases back into binary. To aid with sequencing, each strand of DNA has a 19-bit address block at the start so a whole vat of DNA can be sequenced out of order, and then sorted into usable data using the addresses.

It is only with recent advances in microfluidics and labs-on-a-chip that synthesising and sequencing DNA has become an everyday task, though. While it took years for the original Human Genome Project to analyse a single human genome (some 3 billion DNA base pairs), modern lab equipment with microfluidic chips can do it in hours and where some experimental media—like quantum holography—require incredibly cold temperatures and tremendous energy,DNA is stable at room temperature. You can drop it wherever you want, in the desert or your backyard, and it will be there 400,000 years later.

Just think about it for a moment: One gram of DNA can store 700 terabytes of data. That’s 14,000 50-gigabyte Blu-ray discs. Just a gram. To store the same kind of data on hard drives — the densest storage medium in use today — you’d need 233 3TB of HDD storage, weighing close to 100Kg. In Church and Kosuri’s case, they have successfully stored around 700 kilobytes of data in DNA — Church’s latest book, in fact — and proceeded to make 70 billion copies (which they claim, jokingly, makes it the best-selling book of all time!) totalling 44 petabytes of data stored.

Although other projects have encoded data in the DNA of living bacteria, the Church team used commercial DNA microchips to create standalone DNA. Church said:

“We purposefully avoided living cells. In an organism, your message is a tiny fraction of the whole cell, so there’s a lot of wasted space. But more importantly, almost as soon as a DNA goes into a cell, if that DNA doesn’t earn its keep, if it isn’t evolutionarily advantageous, the cell will start mutating it, and eventually the cell will completely delete it”.

In another departure, the team rejected so-called “shotgun sequencing,” which reassembles long DNA sequences by identifying overlaps in short strands. Instead, they took their cue from information technology, and encoded the book in 96-bit data blocks, each with a 19-bit address to guide reassembly. Including jpeg images and HTML formatting, the code for the book required 54,898 of these data blocks, each a unique DNA sequence. “We wanted to illustrate how the modern world is really full of zeroes and ones, not As through Zs alone,” Kosuri said.

Still, regarding the procedure a quaternary bit scheme (A=0, T=1, G=2, C=3) is more efficient in theory than the redundant scheme in this paper; the four-value bit has the advantage of greater data density, as other folks have demonstrated on smaller scales. But as I gathered form George Church, that approach has drawbacks as well. Homopolymers — long strings of the same letter, like TTTTTTTTT — are notoriously difficult to sequence accurately. But with two letters for 0, and two letters for 1, Church’s team could design an algorithm that avoids creating homopolymers. The homopolymer problem may be temporary, and as sequencing technology continues to improve, a quarternary bit scheme may become more appealing.

I foresee a world where biological storage would allow us to record anything and everything without reservation. Today, we would not dream of blanketing every square meter of Earth with cameras and recording every moment for all eternity/human posterity — we simply don’t have the storage capacity. There is a reason that backed up data is usually only kept for a few weeks or months — it just isn’t feasible to have warehouses full of hard drives, which could fail at any time where about four grams of DNA theoretically could store the digital data humankind creates in one year.

The entirety of human knowledge — every book, uttered word, every video or song will soon be stored in a few kilograms of DNA. The future is no longer future. Just imagine the possibilities, the impact this could have on space travel, police state, live data exchange between organisms to an unified, centralised, biological database and suddenly James Cameron’s “Tree of Souls” idea from his Avatar movie suddenly makes sense as it is possible to store data in the DNA of living cells — though only for a short time at the moment. In the future we will be storing data in our skin and this alone makes the implantable technology already outdated.

Credit: illustration by John Goode

Full article available at Wearable Technologies