Hashing is not encryption

01 Sep, 2025

I'm crazy.
I think I'm crazy.
I think I think I'm crazy.
...
What rabbit hole am I descending into?

Introduction

Let’s clear up a common confusion: hashing and encryption are not the same. They’re somewhat related, sure (both shuffle data around with math; that’s about it), but mistaking one for the other leads to bad assumptions and even worse code.

My goal here is simple: to give you a feel for what hashing actually is, without drowning you in algorithms or cryptography theory. There will be a sprinkle of math, but it’s optional; if math isn’t your thing, skip ahead; the plain-English explanation will carry you through.

(BTW: I am meticulous in my word choice: I don’t use words such as literally or infinite in a hyperbolic way.)

What is encryption?

Most people have a decent idea of what encryption is: it transforms a message so that only the intended receiver can reverse it and read the content.

In the old days, encryption could be simple: every A in a message might be replaced by a B, every B by a C, etc., and Z would wrap around to A. This is known as the Caesar cipher, named after Julius Caesar, who allegedly used it to send secret military messages.

Zpv upp, Csvuvt?

This is an extremely simple (even ridiculously simple to the point that it is useless, really) example of a substitution cipher. Its protection hinges entirely on the algorithm used. Once that algorithm is known, decryption is trivial. And usually the algorithm can be discovered just by examining a long enough encrypted message and spotting patterns.

A more secure approach is to use keys, which are essentially machine-usable passwords. The message is encrypted with a key and decrypted with the same key.

Actually, nowadays, keys can come in pairs: one for encryption, another related one for decryption. But this post isn’t about cryptography, so…

This means that someone who manages to intercept the message won't be able to decrypt the message even if they know the algorithm that was used. They also need the key.

The point of this short introduction is to highlight that encryption is an activity that can always be undone. Assuming, of course, that the algorithm and key(s) are known.

Encryption is therefore reversible.

message = decrypt( encrypt(message, enc_key), dec_key )

Encrypt a message with an encryption key, then decrypt the result with the corresponding decryption key, and you get the original message back.

This is what encryption and decryption are for. Encryption without decryption is worse than useless.

What is a hash function?

A hash function takes as input any (finite) sequence of bits (e.g., a file) and produces a fixed-length number (the digest), regardless of input size.

How long that digest is depends on the algorithm. Here’s a short list of well-known hash functions and their digest sizes:

Hash function	Digest size (bits)
MD5	128
SHA-1	160
SHA-224	224
SHA-256	256
SHA-384	384
SHA-512	512
SHA3-224	224
SHA3-256	256
SHA3-384	384
SHA3-512	512

Digests are almost always shown as hexadecimal numbers. For example, if the message is Hello, World!:

MD5( Hello, World! ) = 65a8e27d8879283831b664bd8b7f0ad4
SHA-1( Hello, World! ) = 0a0a9f2a6772942557ab5355d76af442f8f65e01

Each hex character represents 4 bits. The MD5 digest above has 32 hex characters; 32 × 4 = 128 bits, matching the number in the table. I’ll leave the SHA-1 calculation as an exercise for the reader.

The maximum number of distinct digests a hash function can produce is 2^{digest size}. That’s a huge (like, way bigger than your npm dependencies folder) number.

MD5 has a digest size of 128 bits, allowing for 2¹²⁸ distinct digests. To get a feel for how huge that number is, imagine trying to count from 1, 2, 3... all the way up to 2¹²⁸. Even if you could rattle off a billion numbers per second, swallowed some immortality pills, and kept going non-stop until the Sun begins to die 5 billion years from now, you’d still only have reached about 1.6×10²⁶.
That means you're still only 0.000000000046% on your way to counting to 2¹²⁸.

In practice, it looks as though each message has a unique digest. This illusion is by design, and it probably fuels the confusion that digests are "encrypted versions" of the message. Even the word digest (Merriam-Webster: a summation or condensation of a body of information) doesn’t help: it suggests the meaning of the message might still be "in there."

When I learned about hashing back in the 1990s I read an article claiming MD5 could uniquely identify every sentence ever written or spoken in English. That might still be true, but MD5 is now considered broken. Don’t use it for anything important. (Note: the article did not claim "every possible sentence in English" of which there are infinitely many!)

Let’s clarify with a thought experiment.

I'm crazy.
I think I'm crazy.
I think I think I'm crazy.

(I considered handsome, but that would have been proof I’m crazy.)

By prepending I think to the previous sentence, you can generate infinitely many unique sentences. Each of these can be hashed.

In math, the set of possible inputs to a function is its domain. The set of possible outputs is the codomain (for hashes, usually called the digest space). For a 128-bit hash, the digest space is 2¹²⁸: huge, but finite.

Now imagine hashing one more than 2¹²⁸ of those I think I’m crazy sentences. By necessity, at least two of them must share the same digest. This is the Pigeonhole Principle: if you have more pigeons than holes, at least one hole holds more than one pigeon.

When two different inputs yield the same digest, that’s called a collision.

Because:

the input space is literally infinite
the digest space is finite
anything finite is negligible compared to infinity

collisions are inevitable.

In fact, because of that last point, if a hash function can produce a particular digest, then there are literally infinitely many inputs that map to it. Which means that one of those ...I think I'm crazy variations could hash to the same digest as the original script of Shakespeare's Julius Caesar.

And that’s the crucial difference from encryption: encryption is always reversible if you know the key, but hashing isn’t. Since infinitely many inputs collapse to the same digest, there can never be an inverse function that maps a digest back to a specific input. That’s why hashing is not a form of encryption.

Properties of cryptographic hash functions

All cryptographic hash functions share these properties:

Fast to compute. Given a message, producing the digest is quick (though still dependent on input size).
Deterministic. The same message always produces the same digest with the same function.
Avalanche effect. Changing even one bit of input completely changes the digest, hiding patterns.
One-way. Given a digest, it’s computationally infeasible to reconstruct the original message. No inverse function exists.

What can hashes be used for?

There are many uses for hashes. Here are two (categories):

Fingerprints

Say you collect pictures: cats, astronomy, whatever. To avoid duplicates, you can’t just compare filenames (cute_cat.jpg appears often). Comparing each new file with each file already in your collection will take a lot of time (an optimization is to only compare between files of equal size).

A better solution:

Compute the file’s hash.
Check if that hash already exists in a list of hashes of pictures you already own.
- If yes: reject/delete the new file.
- If no: add the hash to your list.

Linux bash script example:

#!/bin/bash
DB=hashes.txt
h=$(sha256sum "$1" | awk '{print $1}')
grep -qx "$h" "$DB" 2>/dev/null || echo "$h" >> "$DB"

In practice, when hashes match, you should confirm the files are truly identical -- just to be safe.

This only works for bit-for-bit identical files. Resizing an image or even flipping only one pixel will yield a completely different hash (thanks, avalanche effect!).

The bottom line: a hash identifies a sequence of bits, much like your fingerprint identifies you.

One-way functions (no inverse)

When you build a website where users log in with a username and password, it’s a bad idea to store the password exactly as the user typed it. If the site is ever compromised, attackers could read all the passwords in plain text. A better approach is to store only the password’s digest. That way, the actual password remains hidden.

Of course, this means that during login you can’t just compare the password directly. Instead, you hash the password entered by the user and compare that to the stored hash. If they match, you can be reasonably confident the user entered the correct password.

This use case relies on the absence of an inverse for hash functions: given a hash of a password, you cannot recover the original password. And the hash itself is useless for login, because hash(password) ≠ hash(hash(password)).

In 2011, a popular Dutch dating site was "hacked" and its user database was published online. The developers had hashed the passwords, but they did it wrong. I downloaded the table and sorted it by how often each digest (= hashed password) occurred in the table. Some digests appeared many times, which suggested that some users had chosen very simple passwords. Sure enough, when I tried those accounts with the password 123456, I got in. As any decent person would, I logged out immediately. No snooping!

The proper way to store passwords is not only to hash them, but also to add a random salt. If you don’t know what that is, look it up: it’s essential! And in practice, always use a well-tested library in your programming language of choice to handle password hashing for you.

"Objection!"

As I write this, I can almost hear readers protesting: "But I’ve been on a site where I could enter a digest and it gave me the original message!"

Yes, sites like that exist -- but they are not implementing inverses of hash functions. What they actually do is build a massive list of words and phrases, run them through the hash functions they support, and store the results in a big lookup table:

message	MD5	SHA-1
abbrev	`a8535dd69eea7328daee4b1edb3c5fc9`	`770b7e96e3c50fed56ac2afeda4a6ece3dbd98a8`
...	...	...
Xyzzy	`56f2d4d0b97e43f94505299dc45942a1`	`735257c757150678b0ed75503820cd0a2451ae82`

This is called a Rainbow table.

When you enter a digest on one of those sites, it simply looks it up in the table and shows the corresponding message. That’s not "inverting" the hash; it’s just a precomputed dictionary attack.

If a true inverse for a hash function existed, it would work for any digest, not just ones that happen to be in a lookup table. To demonstrate that no such inverse exists, here’s a quick proof you can try yourself.

We’ll use a Universally Unique Identifier (UUID), or Globally Unique Identifier (GUID), as Microsoft calls them.

Fun fact: this is one of the rare occasions where Microsoft’s naming is more modest. My theory: their marketing team just couldn’t come up with a word bigger than "universal."

A UUID is a 128-bit identifier designed to be unique in space and time. Sounds Sci-Fi, but all it means is that when you generate a UUID, it’s (for all practical purposes) guaranteed never to have been generated before, and never to be generated again. We’ll feed such a UUID into a hash function.

Here’s how to generate a UUID and hash it with SHA-256:

Linux

uuidgen | sha256sum

macOS / iOS

uuidgen | shasum -a 256

Windows 11 (PowerShell)

[BitConverter]::ToString([System.Security.Cryptography.SHA256]::Create().ComputeHash([System.Text.Encoding]::UTF8.GetBytes([guid]::NewGuid().ToString()))) -replace '-', ''

Now copy that digest and paste it into one of those "reverse hash" sites.

It will fail.

And that’s the point: there is no inverse to a hash function. That’s why hashing is not encryption.

#difference #encryption #hashing