2014-04-24

On Kilobytes, Megabytes, and other computer-centric factors

I started programming way back.  In those olden days I was working close to the machine - on machine language stuff (assembly languages).  Bits were important: shift left, shift right, AND/OR/XOR.  And memory pages were important too: fitting an important routine within a 256 byte page could really help performance.

These days life is different.  You allocate objects.  If you're storing a boolean, you create a boolean object.  Who knows how that's represented under the hood, but it certainly isn't represented in one bit of RAM.   Most people don't even use the bitwise operators offered through the programming languages given to them.  Sure, some do.  But most do not.

And so now we get into our prefixes: kilo, mega, giga, terra, and peta (and beyond, I suppose!)

Many people still want these prefixes to be based on powers of 2.  One kilobyte is 1024 bytes (2^10). One megabyte is 1024*1024 bytes (2^20).  Etc.  It's an OK system, but it really makes little sense. Why is a kilobyte 2^10?  Because the number, when converted to decimal, is the power-of-two  number that's closest to 1,000.  2^11 and 2^9 simply aren't as close to 1000 as 2^10.

Anyhow, all this makes some (but not much) sense in terms of addressing RAM.  Then these same people wanted everything else related to computing to work the same way.

Disk Storage

So the programmers decided that since physical RAM layout was important, and that it was good to have a funny math for it, that others should follow their methods.  The programmers wanted disk drives to follow the same memory conventions.  At first there was some practicality to this: programmers wrote code to stick pages of RAM onto disks in what they called "disk sectors".  This was primarily done because it was very easy to treat a page of RAM (perhaps 2^8 bytes) as a body of work.  This was key because performance ruled the day with 1 MHz computers.  (By the way, the M in MHz means exactly "1,000,000").

But over time the disk drive guys were not interested.  Sectors were a false abstraction, and under the hood of the drive sectors changed size to pack in more bytes and low-level ECC and other techniques made it pointless. Furthermore, programmers were no longer dumping pages of RAM to disk, they just wanted to store files in a file system.

And so the drive guys started to sell disks using normal base-10 units.  100 MB drive means 100 Million Bytes.  This was convenient to a lot of programmers because most left base-2 mathematics behind when higher level languages became practical.  Before long, if a programmer said that there were 1K of rows in a database, they meant a normal thousand and not 1,024.

And that was the start of the first war.  Programmers screamed at the drive guys for abandoning their "base 2" convention.  The programmers still wanted 1 MB of disk storage to mean 1024 *1024.  But why?  Programmers were no longer worrying about pages of RAM and sector sizes.  Those same programmers also complained when they got less storage than on the box due to the overhead of things like the realities of how a file system works.   And those same programmers are creating substantial "object" to store a 3 character string.  Talk about babies, they couldn't even appreciate a file system.  They just liked their silly "my way or its wrong" math despite the fact that their way no longer had a purpose.

Let me give you a practical example: Let's say you are dumping 1 billion records onto a disk.  Each record is 40 bytes long.  Quick, do you have enough room if you have 38.1 GB free?  WHO KNOWS!  Because bonehead holier-than-thou programmers that never shifted any register on any CPU wanted to confuse everyone.

Networking

Throughout all this the network guys were not interested in this "new math".  They did things in bits per second.  Bytes?  No way!   "Byte" was an ambigulously-sized number of bits, so they smartly renamed a collection of 8 bits an "octet".  kilo?  To a network guy, that meant 1000.  Nothing else.  Mega?  1,000,000.    100 Megabits per second meant 100,000,000 bits in one second. And it still does to a network guy.

But then the uncultured programmers got in there with their stupid math and wrote some software using weird rules and confused everyone.  They started to apply their way to other realms for NO REASON.

What does 100 MB/second mean?
  • Normal Person: 100,000,000 bytes in one second (100 * 1,000,000)
  • Networking Person: 100,000,000 bytes in one second (100 * 1,000,000)
  • Programmer:
    • Normal: 100,000,000 bytes in one second (100 * 1,000,000)
    • Very Stupid: 104,857,600 bytes in one second (100 * (1024*1024))
    • Very Very Stupid: 102,400,000 bytes in one second (100 * (1024*1000))
Unfortunately, most programmers are at least "Very Stupid".

These same annoying programmers no longer use the bin/oct/hex functions of their HP16C.  In fact, I'd say they most wouldn't be able to use an HP16C to add two hex numbers together.

Conclusion

It's time to give up the obsolete base-2 notion of kilo, mega, and giga.  If you really love powers of two, use them explicitly like a REAL tech expert would.  My laptop has 2^33 addresses of active RAM. And now how many 2^8 byte pages of RAM fit into that address space?  Comment with your simple assembly language program that calculates this number (any architecture).

No comments:


Share