Understanding Hash Functions: MD5, SHA-256, and Beyond

Hash functions are one of the most fundamental building blocks in computer science and information security. Every time you log into a website, verify a downloaded file, or push a commit to Git, hash functions are working behind the scenes. Despite their ubiquity, many developers use them without fully understanding how they work or when to choose one algorithm over another.

This guide takes a deep dive into cryptographic hash functions — what they are, the properties that make them useful, how the most popular algorithms compare, and the real-world scenarios where each one shines.

What Is a Hash Function?

A hash function is a mathematical algorithm that takes an input of arbitrary size and produces a fixed-size output, called a hash, digest, or checksum. The same input will always produce the same output, but even a tiny change to the input — flipping a single bit — should produce a completely different hash.

For example, here is what happens when you hash two nearly identical strings with SHA-256:

Input:  "Hello, World"
SHA-256: 03675ac53ff9cd1535ccc7dfcdfa2c458c5218371f418dc136f2d19ac1fbe8a5

Input:  "Hello, World!"
SHA-256: dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f

Adding a single exclamation mark completely changed the output. This property is called the avalanche effect, and it is essential for security applications.

Key Properties of Cryptographic Hash Functions

Not all hash functions are created equal. A cryptographic hash function must satisfy several specific properties to be considered secure:

Deterministic: The same input always produces the same output. There is no randomness involved. This is what makes hashes useful for verification — you can independently compute the hash and compare.
Fast to compute: Generating the hash for any input should be efficient. However, for password hashing specifically, you actually want the function to be deliberately slow (more on this later).
Pre-image resistance: Given a hash output, it should be computationally infeasible to find an input that produces that hash. In other words, you cannot reverse the function.
Second pre-image resistance: Given an input and its hash, it should be infeasible to find a different input that produces the same hash.
Collision resistance: It should be infeasible to find any two different inputs that produce the same hash. Because hash outputs are fixed-size but inputs are unlimited, collisions must mathematically exist — the goal is to make finding them practically impossible.

Comparing Popular Hash Algorithms

Let's examine the most widely used hash algorithms, their strengths, and their weaknesses:

MD5 (Message Digest Algorithm 5)

MD5 was designed by Ronald Rivest in 1991 and produces a 128-bit (16-byte) hash, typically displayed as a 32-character hexadecimal string. For over a decade, MD5 was the go-to hash function for everything from password storage to file verification.

However, MD5 is now considered cryptographically broken. In 2004, researchers demonstrated practical collision attacks, and by 2008, researchers were able to create a rogue SSL certificate authority using MD5 collisions. Today, generating an MD5 collision can be done in seconds on commodity hardware.

Should you use it? Not for anything security-related. MD5 is still acceptable as a non-cryptographic checksum for detecting accidental data corruption (such as verifying file transfers), but even there, better options exist.

SHA-1 (Secure Hash Algorithm 1)

SHA-1 produces a 160-bit hash and was published by NIST in 1995 as the successor to SHA-0. It was the most widely used hash algorithm through the 2000s, relied upon by SSL/TLS certificates, Git, and countless other systems.

SHA-1's collision resistance was theoretically broken in 2005, and a practical collision was publicly demonstrated by Google and CWI Amsterdam in the SHAttered project in 2017. The attack required enormous computational resources at the time but is becoming increasingly practical as hardware improves.

Should you use it? No, for new applications. Major browsers stopped accepting SHA-1 certificates in 2017. Git still uses SHA-1 internally (with collision detection hardening), but is transitioning to SHA-256. If you encounter SHA-1 in a legacy system, plan to migrate.

SHA-256 and SHA-512 (SHA-2 Family)

SHA-256 and SHA-512 are part of the SHA-2 family, published by NIST in 2001. SHA-256 produces a 256-bit hash, while SHA-512 produces a 512-bit hash. These are the workhorses of modern cryptography — used in TLS, code signing, blockchain (Bitcoin uses double SHA-256), and virtually every security protocol designed in the last two decades.

No practical attacks against the SHA-2 family have been published. While they are structurally similar to SHA-1 (using the Merkle-Damgård construction), the larger state size and more complex internal operations provide a substantial security margin.

Should you use them? Yes — SHA-256 is the default recommendation for most applications in 2026. SHA-512 is slightly faster than SHA-256 on 64-bit processors and produces a longer hash, making it a good choice when you want extra security margin or are working on 64-bit-optimized systems.

SHA-3 (Keccak)

SHA-3 was selected through a public competition run by NIST and standardized in 2015. It uses a fundamentally different construction (the sponge construction) compared to SHA-1 and SHA-2, which means a breakthrough attack against SHA-2 would not automatically affect SHA-3.

Should you use it? SHA-3 is a strong choice, especially if you want defense-in-depth against potential future attacks on Merkle-Damgård constructions. In practice, SHA-256 remains more widely deployed, has better hardware acceleration support, and is sufficient for the vast majority of applications.

Algorithm Comparison at a Glance

Algorithm	Output Size	Status	Speed	Recommended?
MD5	128-bit	Broken	Very fast	No (non-security only)
SHA-1	160-bit	Broken	Fast	No
SHA-256	256-bit	Secure	Fast	Yes (default choice)
SHA-512	512-bit	Secure	Fast (64-bit)	Yes
SHA-3-256	256-bit	Secure	Moderate	Yes (defense-in-depth)

Generate MD5, SHA-1, SHA-256, and SHA-512 hashes instantly in your browser — no data sent to any server.

Open Hash Generator →

Real-World Use Cases

Password Storage

This is one of the most important — and most frequently misunderstood — applications of hashing. When a user creates an account, the application should hash their password and store only the hash. When the user logs in, the application hashes the provided password and compares it to the stored hash. The actual password is never stored.

However, general-purpose hash functions like SHA-256 are too fast for password hashing. An attacker with a GPU can compute billions of SHA-256 hashes per second, making brute-force attacks feasible. Instead, use dedicated password hashing functions that are deliberately slow and memory-intensive:

bcrypt — The most widely deployed password hashing function. Uses a configurable cost factor to control computation time. Proven track record since 1999.
Argon2 — Winner of the Password Hashing Competition (2015). Configurable time cost, memory cost, and parallelism. The current best practice for new applications.
scrypt — Designed to be memory-hard, making hardware attacks more expensive. Used by some cryptocurrency systems.

Rule of thumb: Never use MD5, SHA-1, or even SHA-256 directly for password storage. Always use bcrypt, Argon2, or scrypt with proper salting.

File Integrity Verification

When you download software, the publisher often provides a hash (usually SHA-256) of the file. After downloading, you compute the hash of your local file and compare it to the published value. If they match, you can be confident the file was not corrupted or tampered with during transfer.

This is especially important for security-sensitive downloads like operating system images, cryptographic libraries, and firmware updates. Package managers like npm, pip, and apt all use hash-based integrity checks internally to ensure you are installing exactly what you expect.

# Verify a downloaded file on macOS/Linux
sha256sum ubuntu-24.04-desktop-amd64.iso

# Compare the output to the published hash
echo "expected_hash  ubuntu-24.04-desktop-amd64.iso" | sha256sum --check

Digital Signatures

Digital signature schemes like RSA and ECDSA do not sign the actual document — they sign a hash of the document. This is both a performance optimization (signing a 32-byte hash is much faster than signing a 10MB document) and a security measure. The hash function must be collision-resistant; otherwise, an attacker could create two documents with the same hash and trick a signer into signing one while presenting the other as the signed document.

This is precisely why the deprecation of SHA-1 was so critical. A collision attack on the hash function undermines the entire digital signature scheme built on top of it.

Data Structures and Deduplication

Hash functions power many fundamental data structures and algorithms beyond security:

Hash tables — The backbone of dictionaries and maps in nearly every programming language. The hash function distributes keys across buckets for O(1) average-case lookup.
Content-addressable storage — Systems like Git store objects by the SHA hash of their content. This provides automatic deduplication and integrity checking.
Bloom filters — Probabilistic data structures that use multiple hash functions to efficiently test set membership.
Merkle trees — Binary trees of hashes used in blockchain, certificate transparency, and distributed file systems to efficiently verify large datasets.

HMAC: Hash-Based Message Authentication

HMAC (Hash-based Message Authentication Code) combines a hash function with a secret key to produce an authentication tag. Unlike a plain hash, an HMAC proves both integrity (the data was not modified) and authenticity (the sender possesses the secret key). HMACs are used extensively in API authentication, session tokens, and secure communication protocols.

# Python example: HMAC-SHA256
import hmac, hashlib

key = b"my-secret-key"
message = b"important data"
tag = hmac.new(key, message, hashlib.sha256).hexdigest()
# => "a1b2c3..." (unique to this key + message combination)

Security Considerations

Length Extension Attacks

Hash functions based on the Merkle-Damgård construction (MD5, SHA-1, SHA-256) are vulnerable to length extension attacks. If you know H(message) and the length of the message, you can compute H(message || padding || extension) without knowing the original message. This is why you should use HMAC rather than simple concatenation (H(key + message)) for message authentication. SHA-3 is not susceptible to this attack due to its sponge construction.

Rainbow Tables and Salting

A rainbow table is a precomputed lookup table mapping hashes back to their original inputs. Attackers can use rainbow tables to reverse commonly used passwords nearly instantly. The defense is salting — prepending a unique random value (the salt) to each input before hashing. This forces the attacker to build a separate rainbow table for every possible salt, which is computationally prohibitive. All modern password hashing functions (bcrypt, Argon2, scrypt) incorporate salting automatically.

Choosing the Right Algorithm

Here is a practical decision framework:

Storing passwords? Use Argon2id (preferred) or bcrypt. Never use a general-purpose hash.
Verifying file integrity? Use SHA-256. It is fast, universally supported, and secure.
Digital signatures or certificates? Use SHA-256 or SHA-512. Ensure your toolchain does not fall back to SHA-1.
Building a hash table or non-security data structure? Use a fast non-cryptographic hash like xxHash or MurmurHash for performance.
Need defense-in-depth? Use SHA-3 or BLAKE3 if you want a different internal construction from SHA-2.

Trying It Out

Understanding hash functions conceptually is important, but there is no substitute for hands-on experimentation. Try hashing the same string with different algorithms and observe the output lengths. Change a single character and see how completely the hash changes. Hash an empty string — even that produces a valid, unique digest for each algorithm.

TensorLocal's Hash Generator lets you compute MD5, SHA-1, SHA-256, and SHA-512 hashes instantly, right in your browser. No data is transmitted to any server — everything runs locally, making it safe to hash even sensitive test data.

Conclusion

Hash functions are deceptively simple on the surface — give them data, get a fingerprint — but the subtleties of algorithm selection, security properties, and appropriate use cases matter enormously. MD5 and SHA-1 served the world well for years, but they are now relics that should be retired from any security-sensitive context. SHA-256 is the safe default for the vast majority of modern applications, while specialized tools like Argon2 are essential for password storage.

The most dangerous mistake is not choosing the wrong hash algorithm — it is not understanding why the choice matters. Armed with the knowledge in this guide, you can make informed decisions that keep your data, your users, and your systems secure.