Hashing Algorithms Compared: MD5, SHA-1, SHA-256, SHA-512, and bcrypt
Choose the wrong hashing algorithm and the consequences range from mildly wasteful to catastrophically insecure. MD5 is still plastered across download pages as a "security check." SHA-1 lives on in legacy systems despite being publicly broken since 2017. SHA-256 gets misapplied to password databases. And bcrypt, the right choice for passwords, gets skipped because it feels slow. This guide cuts through the noise: what each algorithm actually does, where it is and isn't safe, and how to make the right call for your use case.
1. What Is a Hash Function?
A hash function takes an input of any length and produces a fixed-length output called a hash, digest, or checksum. Feed it a single character or the entire text of War and Peace — the output is always the same length. This fixed-length fingerprint is the hash's most fundamental property.
Four characteristics define how hash functions behave:
- Deterministic. The same input always produces the same output. Hash "hello" a million times and you always get the same result. This makes hashes useful for verification — compute the hash before and after a transfer, compare them, and you know whether anything changed.
- Fixed output length. SHA-256 always outputs 256 bits (64 hex characters), regardless of input size. Hashing a 4 GB video file and a single space both produce the same length digest.
- One-way. You cannot reverse a hash to recover the original input. Given the SHA-256 digest
b94f6f125c79e3a5ffaa826f584c10d52ada669e6762051b826b55776d05a8c, there is no algorithm that computes the original input — only brute-force guessing. - Avalanche effect. A tiny change to the input — flipping a single bit — produces a completely different output. "hello" and "Hello" (capital H) produce entirely different SHA-256 digests with no visible relationship.
These properties make hash functions useful for data integrity verification, digital signatures, password storage, caching, and deduplication. But not every hash function delivers all these properties equally well — and some have had specific properties broken by cryptanalysts.
2. Key Properties of Cryptographic Hash Functions
When a hash function is described as "cryptographic," it means it is designed to resist specific classes of attack. Three resistance properties matter:
Pre-image Resistance
Given a hash h, it should be computationally infeasible to find any input m such that hash(m) = h. This is the "one-way" property. A hash function without pre-image resistance could be reversed — an attacker who stole your password hash could recover your password directly.
Collision Resistance
It should be computationally infeasible to find two different inputs m1 and m2 such that hash(m1) = hash(m2). Collisions must theoretically exist (infinite inputs, finite outputs) but finding one intentionally should be practically impossible. When collision resistance breaks, an attacker can craft a malicious document that has the same hash as a legitimate one.
Avalanche Effect and Speed
Cryptographic hashes should be fast to compute — this is a feature for checksums and digital signatures, where you need to hash large amounts of data quickly. But speed becomes a vulnerability for password storage, where fast computation means an attacker can attempt billions of guesses per second. This tension between "fast enough for data integrity" and "slow enough for passwords" is why password hashing requires entirely different algorithms.
3. Algorithm Comparison Table
| Algorithm | Output Length | Speed | Security Status | Primary Use Cases |
|---|---|---|---|---|
| MD5 | 128-bit (32 hex) | Very fast | BROKEN | Non-security checksums, file deduplication, caching keys |
| SHA-1 | 160-bit (40 hex) | Fast | DEPRECATED | Legacy systems only; git commit IDs (historical) |
| SHA-256 | 256-bit (64 hex) | Fast | SECURE | SSL/TLS, digital signatures, code signing, Bitcoin |
| SHA-512 | 512-bit (128 hex) | Fast on 64-bit | SECURE | High-security contexts, large files on 64-bit systems |
| SHA-3 | 224–512-bit | Fast | SECURE | Alternative to SHA-2, future-proofing, government systems |
| bcrypt | 60-char string | Intentionally slow | SECURE | Password hashing (widely supported) |
| Argon2 | Variable | Intentionally slow | PHC WINNER | Password hashing (modern recommended standard) |
Generate Hashes Instantly in Your Browser
The SnapUtils Hash Generator computes MD5, SHA-1, SHA-256, SHA-512, and SHA-3 hashes client-side — nothing is sent to any server. Paste text or upload a file and get all digests at once.
Open Hash Generator4. MD5: Still Useful, Never for Security
MD5 (Message Digest 5) was designed by Ron Rivest in 1991 and was the dominant hash function for most of the 1990s. It produces a 128-bit digest and is extremely fast — a modern CPU can compute billions of MD5 hashes per second. That speed, combined with its age, is exactly why it is broken for security purposes.
The critical flaw is collision vulnerability. In 2004, Wang and Yu demonstrated a practical collision attack against MD5. By 2008, researchers had used MD5 collisions to forge a rogue certificate authority certificate — meaning they created a fraudulent SSL certificate that browsers would trust. In 2012, the Flame malware used a more sophisticated MD5 collision attack to forge a fake Microsoft code-signing certificate. MD5 is not just theoretically weak; it has been weaponized in real attacks.
What MD5 Collisions Look Like in Practice
A collision means two different inputs produce the same MD5 hash. An attacker can craft a malicious document that has an identical MD5 hash to a legitimate document. If you use MD5 to verify "is this the file I trust?", the attacker can swap the legitimate file for the malicious one and the hash check passes. This is why MD5 must never be used to verify software downloads, certificates, or any file where tampering is a concern.
What MD5 Is Still Good For
Despite being cryptographically broken, MD5 remains genuinely useful in contexts where security is not the goal:
- File deduplication. Comparing MD5 hashes to find duplicate files in a storage system. An attacker can't exploit this because no adversarial tampering is involved.
- Caching keys. Using MD5 as a hash of a cache key to create a short, fixed-length identifier. The only adversary is hash collision in a large dataset, not a targeted attack.
- Non-security checksums. Detecting accidental corruption during file transfer (not adversarial tampering). If the MD5 you computed locally matches the MD5 the server says the file should have, the file wasn't corrupted in transit.
- Data partitioning. Distributing data across shards using MD5 as a consistent hash function.
Rule of thumb: If you would care that an attacker could produce two inputs with the same hash, do not use MD5. If you only care about detecting accidental corruption or generating short identifiers, MD5 is fine.
5. SHA-1: Deprecated in 2017
SHA-1 (Secure Hash Algorithm 1) was developed by the NSA and published in 1995 as a NIST standard. It produces a 160-bit digest — 32 more bits than MD5 — and was the dominant hash function in web PKI (the SSL/TLS certificate infrastructure) for over a decade.
Theoretical weaknesses in SHA-1 were identified as far back as 2005 by Wang, Yin, and Yu. NIST began recommending against SHA-1 in new systems in 2011. But the killing blow came in February 2017 when Google's Project Zero published the SHAttered attack — the first practical SHA-1 collision. The team produced two different PDF files with identical SHA-1 hashes, costing roughly $110,000 in cloud compute. The cost has since dropped substantially as compute prices have fallen.
Browser vendors responded quickly. Chrome, Firefox, and Safari all removed support for SHA-1 TLS certificates in 2017. The CA/Browser Forum prohibited issuing new SHA-1 certificates. Any SHA-1 certificate encountered on a modern browser triggers a security warning.
SHA-1 in Git
Git has historically used SHA-1 for commit hashes, tree hashes, and blob hashes. Every git commit ID you've ever seen — those 40-character hex strings — is a SHA-1 digest. This has caused some concern since the SHAttered attack. In practice, the git threat model is different from PKI: an attacker would need to craft a malicious commit with the same SHA-1 as a legitimate commit and somehow inject it into a repository, which is significantly harder than a simple collision. That said, the Git project has been migrating to SHA-256 (the --object-format=sha256 option in newer versions) and will complete the transition in time.
SHA-1 should be considered deprecated for all new systems. If you are maintaining legacy code that uses SHA-1, plan a migration to SHA-256.
6. SHA-256 and SHA-512: The Current Standard
SHA-256 and SHA-512 are both members of the SHA-2 family, designed by the NSA and published by NIST in 2001. Both are considered secure — no practical attacks against either algorithm have been demonstrated. They differ primarily in output size and hardware optimization characteristics.
SHA-256
SHA-256 produces a 256-bit (32-byte) digest, represented as 64 hexadecimal characters. It is the most widely deployed secure hash algorithm in the world, used in:
- SSL/TLS certificates. All modern certificates use SHA-256 as the signature hash algorithm.
- Code signing. Windows Authenticode, macOS code signing, and APK signing all default to SHA-256.
- Bitcoin and most cryptocurrencies. SHA-256 is used in Bitcoin's proof-of-work algorithm and transaction hashing.
- HTTPS pinning and certificate transparency. SHA-256 digests identify certificates in CT logs.
- HMAC authentication. HMAC-SHA256 is the standard for API request signing.
SHA-256 uses 32-bit operations internally, which means it performs similarly on both 32-bit and 64-bit hardware.
SHA-512
SHA-512 produces a 512-bit (64-byte) digest, represented as 128 hexadecimal characters. Because it uses 64-bit operations internally, it is actually faster than SHA-256 on modern 64-bit hardware when hashing large amounts of data. On 32-bit hardware, it is slower.
SHA-512 is preferred when:
- You need a longer digest for collision resistance in a high-security context (512 bits vs 256 bits doubles the search space).
- You are hashing large files on 64-bit hardware and need maximum throughput.
- Your security policy or compliance framework requires 256-bit or greater security level (SHA-512 provides 256-bit security against collision attacks, SHA-256 provides 128-bit).
For most applications — TLS, code signing, API authentication — SHA-256 is the right default. SHA-512 is not "more future-proof enough" to justify the doubled output size unless you have a specific reason to need it.
7. SHA-3: The Backup Standard
SHA-3 is NIST's third standardized cryptographic hash algorithm, published in 2015. Unlike SHA-1 and SHA-2, which share the same underlying Merkle–Damgård construction, SHA-3 is based on the Keccak algorithm — a completely different design (a sponge construction) developed by Bertoni, Daemen, Peeters, and Van Assche.
The reason NIST ran a competition to find a SHA-3 candidate (the SHA-3 Competition, 2007–2012) was not because SHA-2 was broken. It was because SHA-1 and SHA-2 share structural similarities — if a fundamental flaw were ever discovered in the Merkle–Damgård construction, both algorithms would be affected simultaneously. SHA-3 provides a structurally independent backup: if SHA-2 were ever broken, the cryptographic community would have a well-analyzed, already-standardized alternative ready to deploy immediately.
SHA-3 supports output lengths of 224, 256, 384, and 512 bits (as SHA3-224, SHA3-256, SHA3-384, SHA3-512). There are also two "extendable-output functions" — SHAKE128 and SHAKE256 — that produce variable-length outputs.
When to use SHA-3:
- When your threat model specifically requires structural independence from SHA-2.
- When working with government or compliance environments that mandate SHA-3.
- When building new systems and you prefer the most modern standard.
- When you need the XOF (extendable output function) variants for key derivation.
SHA-3 has slightly lower software performance than SHA-2 on most platforms (it was designed with hardware efficiency in mind) but the difference is negligible for most applications. In practice, SHA-256 and SHA-3-256 are both excellent choices and the decision often comes down to ecosystem support and compliance requirements.
8. bcrypt, scrypt, and Argon2: Password Hashing Is Different
The fundamental mistake developers make is treating password storage like data integrity: hash the password with SHA-256, store the hash, compare hashes on login. This approach is wrong because SHA-256 is designed to be fast, and fast is the enemy of password security.
With a modern GPU, an attacker can compute roughly 10 billion SHA-256 hashes per second. Against a leaked database of SHA-256 password hashes, an attacker could exhaust all 8-character passwords containing letters and numbers in under an hour. Common passwords fall in milliseconds.
Why Password Hashing Needs Slowness
Password hashing algorithms are deliberately designed to be slow and memory-intensive. The goal is not to make login slow for users (which would be noticeable at, say, 10 seconds) but to make brute-force attacks economically impractical. If bcrypt takes 100 milliseconds to compute on your server, a user doesn't notice. But an attacker can only try 10 passwords per second instead of 10 billion — a 10^9 reduction in attack throughput. Even a massively parallel GPU cluster is limited to tens of thousands of guesses per second against bcrypt.
bcrypt
bcrypt was designed by Niels Provos and David Mazieres in 1999 and remains one of the most widely deployed password hashing algorithms. Its key features:
- Automatic salting. bcrypt generates a cryptographically random salt for each password and includes it in the output hash string. This prevents rainbow table attacks — precomputed tables of common password hashes.
- Cost factor (work factor). bcrypt takes a cost parameter (typically 10–14) that controls how many iterations to run. A cost of 10 means 2^10 = 1,024 iterations. A cost of 12 means 2^12 = 4,096 iterations. As hardware gets faster, increase the cost factor to maintain security without changing the algorithm.
- Self-contained hash string. The bcrypt output (like
$2b$12$RVcG5Y4t...XzE1y) encodes the algorithm version, cost factor, salt, and hash in one portable string. You store one field per user.
bcrypt's limitation is its 72-byte input limit. Passwords longer than 72 characters are truncated before hashing, which means very long passphrases don't benefit from their full entropy. This is rarely a practical concern, but it is worth noting.
scrypt
scrypt (Colin Percival, 2009) adds memory-hardness to bcrypt's time-hardness. It requires large amounts of RAM to compute, which prevents attackers from using memory-efficient ASICs or GPUs that can run many parallel threads with small memory footprints. scrypt is used in Litecoin and is a reasonable bcrypt alternative, but it is more complex to configure correctly (three parameters: N, r, p) and has less widespread support in web frameworks.
Argon2
Argon2 won the Password Hashing Competition in 2015 and is the current state-of-the-art recommendation. It comes in three variants:
- Argon2d: Maximizes resistance to GPU cracking attacks. Not suitable where side-channel attacks are a concern.
- Argon2i: Optimized against side-channel attacks. Use for password hashing in untrusted environments.
- Argon2id: A hybrid that combines both. This is the recommended variant for most use cases.
Argon2id has three tuning parameters: memory cost (how much RAM), time cost (number of iterations), and parallelism (thread count). For most web applications, the OWASP recommendations suggest a memory cost of 19 MiB, time cost of 2, and parallelism of 1 as a starting point, adjusting upward as hardware allows.
Minimum viable rule: Never store passwords as plain SHA-256, SHA-512, or MD5 hashes. Always use bcrypt (minimum cost 10), scrypt, or Argon2id. The 100ms extra login time is invisible to users and makes offline cracking attacks take centuries instead of hours.
9. Checksums vs Cryptographic Hashes
The words "checksum" and "hash" are often used interchangeably, but they represent different threat models. Understanding the difference prevents common misuse.
A checksum is any value computed from data to verify its integrity against accidental corruption. CRC32 is a classic checksum — it detects the kinds of errors that occur in storage and transmission (bit flips, dropped bytes). It is not designed to resist an adversary who deliberately wants to produce a collision. But it does not need to be — if corruption is accidental, CRC32 or even a simple XOR is sufficient.
A cryptographic hash is a checksum that additionally resists intentional manipulation. It provides both error detection and tampering detection. When a software vendor publishes a SHA-256 hash of a download alongside the download itself, they are claiming: "if this hash matches what you computed, the file came from us and was not modified in transit."
The catch: publishing a SHA-256 hash on the same server as the download provides no security if an attacker compromises the server — they can replace both the file and the hash. Hash verification for software downloads only provides meaningful security when the hash is published through a different, independently verified channel (a signed announcement, a certificate transparency log, or an immutable record).
MD5 and SHA-1 checksums on download pages are perfectly fine for detecting accidental corruption during transfer. They are not useful for detecting targeted tampering. Using SHA-256 for that purpose is more accurate but still requires the hash to be obtained through a trusted channel.
10. Code Examples
JavaScript: Browser (Web Crypto API)
// SHA-256 using the Web Crypto API (returns a hex string)
async function sha256(message) {
const msgBuffer = new TextEncoder().encode(message);
const hashBuffer = await crypto.subtle.digest('SHA-256', msgBuffer);
const hashArray = Array.from(new Uint8Array(hashBuffer));
return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}
// Usage
const hash = await sha256('hello world');
console.log(hash);
// => b94f6f125c79e3a5ffaa826f584c10d52ada669e6762051b826b55776d05a8c
// Also supports: 'SHA-1', 'SHA-384', 'SHA-512'
const hash512 = await crypto.subtle.digest('SHA-512', msgBuffer);
JavaScript: Node.js (crypto module)
const crypto = require('crypto');
// SHA-256
const sha256Hash = crypto
.createHash('sha256')
.update('hello world')
.digest('hex');
console.log(sha256Hash);
// => b94f6f125c79e3a5ffaa826f584c10d52ada669e6762051b826b55776d05a8c
// MD5 (for non-security use only)
const md5Hash = crypto
.createHash('md5')
.update('hello world')
.digest('hex');
console.log(md5Hash);
// => 5eb63bbbe01eeed093cb22bb8f5acdc3
// SHA-512
const sha512Hash = crypto
.createHash('sha512')
.update('hello world')
.digest('hex');
// HMAC-SHA256 for API authentication
const hmac = crypto
.createHmac('sha256', 'your-secret-key')
.update('message to sign')
.digest('hex');
Python: hashlib
import hashlib
# SHA-256
sha256_hash = hashlib.sha256(b'hello world').hexdigest()
print(sha256_hash)
# => b94f6f125c79e3a5ffaa826f584c10d52ada669e6762051b826b55776d05a8c
# SHA-512
sha512_hash = hashlib.sha512(b'hello world').hexdigest()
# MD5 (non-security use only)
md5_hash = hashlib.md5(b'hello world').hexdigest()
print(md5_hash)
# => 5eb63bbbe01eeed093cb22bb8f5acdc3
# SHA-3-256 (Python 3.6+)
sha3_hash = hashlib.sha3_256(b'hello world').hexdigest()
# For large files — streaming approach
sha256 = hashlib.sha256()
with open('largefile.bin', 'rb') as f:
for chunk in iter(lambda: f.read(65536), b''):
sha256.update(chunk)
print(sha256.hexdigest())
Python: bcrypt for password hashing
import bcrypt
# Hash a password (automatically generates a salt)
password = b"my_secure_password"
hashed = bcrypt.hashpw(password, bcrypt.gensalt(rounds=12))
print(hashed)
# => b'$2b$12$RVcG5Y4t...XzE1y' (includes salt + hash)
# Verify a password — constant-time comparison (safe against timing attacks)
is_valid = bcrypt.checkpw(password, hashed)
print(is_valid) # => True
Command Line
# SHA-256 of a file (macOS/Linux)
shasum -a 256 filename.zip
# SHA-512 of a file
shasum -a 512 filename.zip
# MD5 (macOS)
md5 filename.zip
# MD5 (Linux — GNU coreutils)
md5sum filename.zip
# Using OpenSSL (cross-platform)
openssl dgst -sha256 filename.zip
openssl dgst -sha512 filename.zip
openssl dgst -md5 filename.zip
# Hash a string directly (not a file)
echo -n "hello world" | shasum -a 256
echo -n "hello world" | openssl dgst -sha256
Try the SnapUtils Hash Generator
Generate MD5, SHA-1, SHA-256, SHA-512, and SHA-3 hashes instantly from text or uploaded files. Runs entirely in your browser — no data is sent to any server.
Generate Hashes Free11. Hash Collisions: What They Actually Mean
A hash collision is not a magic key that unlocks everything. Its practical impact depends entirely on how the hash is being used.
For a hash function with an N-bit output, the birthday paradox tells us collisions are expected after roughly 2^(N/2) random inputs. For MD5 (128-bit), that means 2^64 random inputs before a random collision. The actual attack against MD5 is much better than that — researchers can engineer chosen-prefix collisions in seconds on modern hardware, crafting two documents with identical MD5 hashes that contain meaningful (not random) content.
What a collision attack can do:
- Create a malicious file that has the same hash as a known-good file, defeating hash-based integrity checks.
- Forge digital signatures if the signature scheme relies on the broken hash (this is how the MD5 CA certificate attack worked in 2008).
- Create two versions of a document — one legitimate, one malicious — that produce the same hash, enabling post-signing substitution.
What a collision attack cannot do:
- Reverse a hash to recover the original input (pre-image attack — a different and harder problem, still infeasible for MD5).
- Crack a password hash by finding the collision (collisions don't help if you need to find the specific original password, not just any input with the same hash).
- Compromise systems that use hashes only for deduplication or caching, where an adversary has no incentive or ability to engineer specific collision inputs.
This is why MD5 is still safe for file deduplication. If you are checking whether two files are the same — and neither you nor anyone else has an incentive to manufacture a collision — MD5 works perfectly. The attack surface requires a motivated adversary engineering specific inputs. Without that adversary, there is no practical risk.
12. Choosing the Right Algorithm
Use this decision guide to match your use case to the right algorithm:
Storing Passwords
Use Argon2id (preferred) or bcrypt (cost factor 12 or higher). Never use MD5, SHA-1, SHA-256, or SHA-512 directly for password storage. If your language or framework doesn't have a well-maintained Argon2 library, bcrypt is the safe fallback. Use scrypt as a second alternative if bcrypt is unavailable.
File Integrity and Download Verification
Use SHA-256 or SHA-512. These provide genuine collision resistance. MD5 and SHA-1 can still detect accidental corruption, but any new system publishing checksums should use SHA-256 as the minimum. If your published hash is obtained from the same server as the download, the security value is limited regardless of algorithm — the hash proves the file wasn't corrupted in transit, not that it wasn't tampered with at the source.
Digital Signatures and Certificates
Use SHA-256 (or SHA-384/SHA-512 for higher security margins). All modern TLS certificates use SHA-256. Do not sign with SHA-1 under any circumstances — the signature is forgeable in practice.
HMAC and API Authentication
Use HMAC-SHA256. This is the standard for most API authentication schemes (AWS Signature Version 4, JWT HS256, Stripe webhook signatures). HMAC-SHA512 is also acceptable if your library supports it. Do not use HMAC-MD5 for new systems.
Non-Security Checksums and Caching Keys
MD5 or SHA-1 are fine. Speed matters more than collision resistance here. If you are generating cache keys or deduplication IDs and performance is critical, MD5 is a practical choice. For slightly more collision resistance with similar speed, SHA-1 or CRC32 (for pure error detection) are also options.
General Data Integrity (No Adversary)
Use SHA-256 for anything where you might later need to prove integrity, and MD5 or CRC32 for high-throughput scenarios where only accidental corruption matters.
13. Frequently Asked Questions
Is MD5 safe to use?
MD5 is not safe for any security purpose. Its collision resistance is broken — two different inputs with the same MD5 hash can be engineered in seconds. However, MD5 remains genuinely useful for non-adversarial tasks: file deduplication, caching keys, generating short fixed-length identifiers from longer strings, and detecting accidental (not intentional) file corruption. The rule is simple: if an attacker could benefit from engineering a collision, do not use MD5.
Should I use SHA-256 or SHA-512?
Both are secure; the choice depends on context. SHA-256 is the standard for web PKI, code signing, and API authentication — it is the safe default for almost everything. SHA-512 produces a longer digest and is faster than SHA-256 on 64-bit hardware for large inputs, making it preferable when hashing large files on 64-bit servers or when you need the extra security margin of 512 bits. For most developers, SHA-256 is the right answer and SHA-512 is the answer when you have a specific reason to need more.
Why should I not use SHA-256 for passwords?
SHA-256 is fast — intentionally so. A modern GPU can compute tens of billions of SHA-256 hashes per second. Against a leaked password database, an attacker can try billions of common passwords per second, cracking most passwords in a leaked SHA-256 database within hours or days. bcrypt, scrypt, and Argon2 are designed to be slow (100ms per hash is typical) and memory-intensive, limiting attackers to thousands of guesses per second at most. That difference makes brute-force attacks economically infeasible rather than trivially fast.
What is the difference between a checksum and a cryptographic hash?
A checksum is any value computed from data to detect accidental corruption — CRC32 and Adler-32 are classic checksums that detect random bit errors but can be trivially fooled by an attacker. A cryptographic hash is a checksum that additionally resists intentional manipulation: it must be collision-resistant (no two inputs produce the same hash without enormous computation) and pre-image resistant (you cannot reverse the hash). MD5 and SHA-1 sit in an awkward middle ground — they are used as checksums for accidental corruption (fine) and sometimes as security mechanisms (not fine, since their collision resistance is broken).
What is bcrypt and how is it different from SHA-256?
bcrypt is a password hashing function, not a general-purpose hash. It incorporates automatic salting (preventing rainbow table attacks), a configurable cost factor (making computation deliberately slow), and produces a self-contained string encoding the algorithm, version, salt, and hash. SHA-256 was designed to be fast for data integrity — bcrypt was designed to be slow for password storage. The two serve fundamentally different purposes. SHA-256 hashes a password in microseconds; bcrypt takes 50–200 milliseconds by design. That slowness is what makes bcrypt appropriate for passwords and SHA-256 unsuitable.