When DNA Meets Technology

Sometime soon you will be accessing and storing big data, not on a hard drive but in DNA — a massive improvement over how we store data today.


Prepare to have your mind blown. This past summer it was reported that researchers at Harvard Medical School encoded motion pictures into the DNA of a living cell. According to Nature magazine, this brings technology one step closer to using “living cells to record what happens inside them or in the tissues and fluids that surround them.” Hypothetically, for people who get sick, these cells could one day be played back to figure out what happened and, ideally, how to make them better. It is analogous to airplane black boxes, which contain data that’s used in the event of a crash.

Although these bioblack boxes are not possible yet, what is possible, however, is storing and retrieving information in DNA. Yes, sometime soon you will be accessing and storing big data, not on a hard drive but in DNA — a massive improvement over how we store data today.

To understand why, let’s look at how DNA and our current digital storage systems work.

DNA is a chemically encoded storage structure that contains the blueprint for nearly all living cells. Encoded in DNA are the directions for traits as diverse as a person’s hair colour, the scent of a rose or the way bacteria break down food. In other words, DNA is Mother Nature’s hard drive that stores all information required for a living organism to grow and replicate.

Today’s digital storage systems are binary-based and use only the numerals 0 or 1 to encode data. The data string of 0s and 1s is then segregated into defined blocks. These individual blocks of data are randomly saved to a physical medium. Each block contains information on its location in the string so the string of data can be correctly reassembled.

Instead of 0s and 1s, DNA relies entirely on the pairing of four different bases: adenine, thymine, guanine and cytosine, better known as A, T, G and C. The entire sequence combines strands of these bases into one genome. Different segments of the genome contain unique information. Just as our digital data file can be reassembled, edited and then saved again, DNA genomes can be split, cut and additional strands added.

So why is storing data in DNA better? One of the main weaknesses in digital data storage is that the physical medium can reliably store information for only a decade or two as the medium deteriorates over time. DNA replicates itself perfectly as cells reproduce. This new DNA storage technology can keep data safe for thousands of years.

The cost of sequencing or reading DNA is dropping fast. The cost of the first Human Genome Project, started in 1990 and completed in 2003, was about US$2.7 billion. Illumina, the largest maker of DNA sequencers, notes that the cost for a complete human genome is now approximately US$1,000.

Last year, Columbia University and the New York Genome Center developed a technique for compressing 214 petabytes (one million gigabytes) of data on one gram of synthetic DNA. A more relatable analogy comes from John Markoff, retired New York Times technology journalist. He reported in 2015 that DNA can store “all of the world’s digital information in roughly nine litres of solution or about the amount of liquid in a case of wine.”

This vast storage capacity is achieved by splitting the binary code into short strings. These are then translated into base pair sequences referred to as droplets. Each droplet contains an identifier in the sequence that tells the researchers where it fits when reassembling the file.

The ability to slice, label and restore DNA information is possible because of gene-editing technology called CRISPR/Cas9. By 2015, it had been dubbed “the biggest biotech discovery of the century.” For good reason: it promises to be a game-changer in how we address healthcare, agriculture and environmental challenges, to name just a few.

This post was originally published in CPA Magazine