How Big Is It? A Simple Guide to Understanding Massive Amounts of Data
The terms used in scientific reports about Big Data can sometimes seem like a foreign language. For example, the Large Hadron Collider in Geneva, Switzerland, generates approximately 42 terabytes of data every day. Meanwhile, the National Climatic Data Center in Asheville stores more than 6 petabytes of climate information collected from ships, buoys, weather balloons, radar systems, satellites, and computer models. According to the center's estimates, this figure was expected to reach 20 petabytes by 2020.
According to experts, storing all the information that currently exists in the world would require at least 1,200 exabytes of memory.
Understanding what these numbers actually mean is not easy. Even experts admit that these measurements are astonishing. In their 2013 book Big Data, Oxford University information law specialist Viktor Mayer-Schönberger and journalist Kenneth Cukier wrote:
“There is really no good way to imagine what this amount of data means.”
To understand these measurements, let's start with the smallest units.
A byte is the basic unit of computer memory. Storing one letter of the alphabet or a single character usually requires one or two bytes of memory.
The next unit of measurement is the kilobyte. One kilobyte equals 1,024 bytes. This represents the tenth power of two, or 2¹⁰.
A megabyte equals 2²⁰ bytes, which is slightly more than one million bytes. This amount of memory is enough to store a short novel. An average MP3 music file occupies about 4 megabytes of storage space. A large photograph may require approximately 5 megabytes of memory. This is roughly comparable to the amount of storage needed for the complete works of William Shakespeare.
The next unit is the gigabyte. One gigabyte equals 2³⁰, or 1,073,741,824 bytes. This amount of memory is sufficient to store a 90-minute movie, approximately 250 songs, or the text of all the books on an 18-meter-long bookshelf. Today, many smartphones on the market are equipped with 16 gigabytes of memory or more.
Next comes the terabyte. One terabyte equals 2⁴⁰ bytes. In 2000, scientists estimated that storing the entire printed collection of the United States Library of Congress would require approximately 10 terabytes of storage.
A petabyte equals 2⁵⁰ bytes. Approximately this amount of storage would be needed to hold one copy of all printed information in the world.
There are even larger units of measurement:
Exabyte – 2⁶⁰ bytes
Zettabyte – 2⁷⁰ bytes
Yottabyte – 2⁸⁰ bytes
If all the information stored in the world—approximately 1,200 exabytes of data—were printed in book form, those books would completely cover the Earth with a layer 52 books thick.
Now, let us think about how information was stored thousands of years ago.
About 2,000 years ago, the ancient Greeks created a magnificent library in the Egyptian city of Alexandria. This library resembled a modern university campus, with spaces for walking, thinking, reading, and discussion.
The goal of the Library of Alexandria was extraordinarily ambitious: to gather all written works that existed in the world into one place. According to various estimates, it once contained hundreds of thousands of books and manuscripts collected from different parts of the world known to the Greeks.
However, by today's standards, even this magnificent library appears very small.
If all the information that exists in the world today were distributed equally among everyone on Earth, each person would receive approximately 300 times more information than was contained in the entire Library of Alexandria.