The Case of the Missing Gigabytes

You buy a nice, shiny 500 GB hard drive. But to your horror, you realize that its actual capacity is only 465 GB. Where did 35 GB worth of space go? Your friend claims that it's used by "formatting". But formatting can't really take that much space... or can it?

The Fine Print

"1 GB = 1 billion bytes. Actual formatted capacity will be less."

Nearly all hard drives have this marking somewhere on the box, or even on the drive itself. This is actually the key to the case of the missing gigabytes. Notice that it a) defines what a "gigabyte" (GB) is, and b) adds a disclaimer that the actual formatted capacity will be less.

Formatting

A hard drive is made up of sectors. The numbers of these sectors are defined by a low-level format, which in modern drives is done at the factory. The "formatting" you do when you buy a new hard drive is actually the process of a) defining partitions in the MBR or GPT and b) creating an empty file system.

The filesystem is the abstraction layer between the "files" you see and the physical sectors of the drive. (Each sector is typically 512 bytes in size.) It stores the names and positions of every file. Obviously, some space is taken up by this "metadata", and some space is taken up by the partition table and MBR.

However, modern hard drives are sold with their low-level-formatted capacity advertised in terms of 1 GB = 1 billion bytes. "Formatting" takes up negligible space.

What's a Gigabyte?

If formatting doesn't actually take up all of that space, what does? The answer lies in the manufacturer's definition of a "gigabyte".

The SI gigabyte is defined as 1 billion bytes exactly, and this is the figure that manufacturers use to advertise their drives. However, computers use the definition of 1 KB = 1024 B, 1 MB = 1024 KB, and 1 GB = 1024 MB. Thus, 1 GB = 1024 * 1024 * 1024 bytes = 1,073,741,824 bytes as far as your computer is concerned.

Do the math:
500 * 1000 * 1000 * 1000 bytes = X * 1024 * 1024 * 1024
X = 500 * ((1000 * 1000 * 1000) / (1024 * 1024 * 1024))
X = 500 * 0.931322574615478515625 X = 465.66128...
Well, whaddayaknow... 465 GB. (To be precise, Disk Management reports my "500 GB" drive as containing 465.76 GB of data, which is actually slightly more than advertised if you do the math!)

Due to the ambiguous definition of terms like "gigabyte", "kilobyte", and "megabyte", since they can each be interpreted as multiples of 1000 (SI) or 1024 (binary), a new set of terms was developed. Here they are:

Common term SI amount Binary amount Binary term
Kilobyte (KB) 1000 bytes 1024 bytes Kibibyte (KiB)
Megabyte (MB) 1000 SI KB 1024 KiB Mebibyte (MiB)
Gigabyte (GB) 1000 SI MB 1024 MiB Gibibyte (GiB)
Terabyte (TB) 1000 SI GB 1024 GiB Tebibyte (TiB)

Thus, a more accurate label for a "500 GB" drive is "465 GiB". The "missing gigabytes" never existed in the first place, and this is just one way for manufacturers to inflate the apparent capacity of their drives.

Conclusion

So where did the missing gigabytes go? Were they used by formatting? Nope. Did the hard drive manufacturer lie? Nope, but they did exploit the ambiguous definition of "gigabyte". What can you do about it? Nothing, really; just be aware of the difference between advertised capacity and actual capacity.

Note that different operating systems label things differently. Mac OS X 10.6 actually uses the SI definition of a gigabyte, so your 500 GB (advertised) drive will be listed as "499 GB" or even "500 GB". Windows uses the term "gigabyte" to refer to a gibibyte, et cetera. Linux and Unix usage varies, but Gnome uses the marking "GiB" to avoid ambiguity.

Posted on Saturday, September 11, 2010 at 10:08 PM | Permalink

Comments (0)

Leave a comment

 
four minus four is (Huh?)
Comment moderation is enabled.
Your comment will appear on the page after it has been reviewed.