ML-KEM and ML-DSA in Practice
NIST's FIPS 203 and FIPS 204 are the new cryptographic standards replacing RSA and ECDSA. This tutorial explains the math behind lattice-based key encapsulation and signatures, shows how to use them with real code (Python cryptography library + OpenSSL 3.5), and walks through hybrid TLS 1.3 — the production-grade migration deployment.
Prerequisites: Tutorial 21: Post-Quantum Cryptography Threat Model
Tutorial 21 said “you should be migrating to ML-KEM and ML-DSA.” This tutorial shows how. We’ll build intuition for why lattice-based cryptography works, run the standardized algorithms with real code, integrate them into a TLS 1.3 handshake via OpenSSL 3.5, and measure the realistic performance overhead — so you can quote accurate numbers to the client who asks “will this slow down my site?”
Learning With Errors: the hardness assumption
ML-KEM and ML-DSA are both built on the Module Learning With Errors (MLWE) problem. Start with plain LWE:
LWE problem. Given a matrix (uniformly random), a secret vector (small coefficients), and an error vector (small coefficients), compute
The LWE problem is: given , recover . Without the error , this is just linear algebra — Gaussian elimination solves it in polynomial time. With the error, it becomes believed-hard in the worst case (Regev 2005).
Why this resists Shor: the problem has no exploitable periodic or multiplicative structure. Shor’s algorithm works on groups with hidden subgroup structure (like under multiplication mod ). with short errors is a different beast.
Module LWE uses vectors of polynomials instead of integers, allowing shorter keys and faster operations with the same security. ML-KEM and ML-DSA use the polynomial ring with .
ML-KEM (FIPS 203): key encapsulation
A Key Encapsulation Mechanism (KEM) is a simpler primitive than full public-key encryption. Three algorithms:
KeyGen()→(public_key, secret_key)Encapsulate(public_key)→(ciphertext, shared_secret)Decapsulate(secret_key, ciphertext)→shared_secret'
Goal: Alice sends Bob a ciphertext; both derive the same shared_secret (a symmetric key, typically used to encrypt bulk data with AES-GCM or ChaCha20-Poly1305).
ML-KEM under the hood
High-level, for the parameter set ML-KEM-768 (NIST Category 3 = ~192-bit post-quantum security):
KeyGen:
Sample random (A, s, e) with A ∈ R_q^{k×k}, s,e small from R_q^k, k=3
pk ← (A, b := A·s + e)
sk ← s
Encapsulate(pk):
Sample random m ∈ {0,1}^{256}
(r, e1, e2) derived from m via SHAKE256
u ← Aᵀ·r + e1 # 3 polynomials
v ← bᵀ·r + e2 + Compress(m) # scalar
c ← (u, v)
K ← KDF(m || H(c))
return (c, K)
Decapsulate(sk, c):
m' ← Decompress(v - sᵀ·u) # recovers m (small enough errors)
# Re-run encapsulation with m' as the random seed to check
(c', K') ← Encapsulate_deterministic(pk, m')
if c == c': return K' # valid
else: return H(sk || c) # "implicit rejection" — some fixed bogus key
The “re-encapsulate and check” at decap time is the Fujisaki-Okamoto transform — it upgrades a weak IND-CPA encryption scheme to a strong IND-CCA KEM.
ML-KEM in Python
Python’s cryptography library (version 44.0+, released late 2024) has native ML-KEM support:
from cryptography.hazmat.primitives.asymmetric import ml_kem
from cryptography.hazmat.primitives import hashes
# --- Alice: generate key pair ---
private_key = ml_kem.MLKEM768PrivateKey.generate()
public_key = private_key.public_key()
pk_bytes = public_key.public_bytes_raw()
print(f"Public key size: {len(pk_bytes)} bytes")
# Public key size: 1184 bytes
# --- Bob: encapsulate ---
from cryptography.hazmat.primitives.asymmetric.ml_kem import MLKEM768PublicKey
bob_pk = MLKEM768PublicKey.from_public_bytes(pk_bytes)
ciphertext, shared_secret_bob = bob_pk.encapsulate()
print(f"Ciphertext size: {len(ciphertext)} bytes")
print(f"Shared secret: {shared_secret_bob.hex()[:16]}...")
# Ciphertext size: 1088 bytes
# --- Alice: decapsulate ---
shared_secret_alice = private_key.decapsulate(ciphertext)
assert shared_secret_alice == shared_secret_bob
print("Both sides agree on shared secret.")
Size comparison with ECDH
| Scheme | Public key | Ciphertext | Shared secret |
|---|---|---|---|
| X25519 (ECDH) | 32 bytes | 32 bytes | 32 bytes |
| RSA-2048 OAEP | 256 bytes | 256 bytes | ~32 bytes |
| ML-KEM-512 | 800 bytes | 768 bytes | 32 bytes |
| ML-KEM-768 | 1184 bytes | 1088 bytes | 32 bytes |
| ML-KEM-1024 | 1568 bytes | 1568 bytes | 32 bytes |
ML-KEM-768 public keys and ciphertexts are roughly 30× larger than X25519. This is the main production concern. For high-traffic TLS endpoints, this adds noticeable bandwidth. Practical impact: expect TLS handshakes to grow by ~2KB; negligible for most web apps, measurable for QUIC connections trying to fit the handshake into 1-2 packets.
ML-DSA (FIPS 204): digital signatures
Three algorithms: KeyGen, Sign(sk, message), Verify(pk, message, signature).
ML-DSA sketch
High-level for ML-DSA-65 (Category 3):
KeyGen:
Sample (A, s1, s2, t := A·s1 + s2)
pk ← (A, t)
sk ← (s1, s2)
Sign(sk, msg):
loop (rejection sampling):
y ← sample small vector
w ← A·y
c ← Hash(pk || msg || w) # challenge
z ← y + c·s1
if z's coefficients are too large: restart
return (c, z, hint_data)
Verify(pk, msg, (c, z, hint)):
w' ← A·z - c·t
c' ← Hash(pk || msg || w')
return c == c'
This is Fiat-Shamir with abort (lattice version) — transform an interactive identification protocol into a non-interactive signature via a hash-based challenge, with rejection sampling to avoid secret leakage.
ML-DSA in Python
from cryptography.hazmat.primitives.asymmetric import ml_dsa
# --- Generate key pair ---
private_key = ml_dsa.MLDSA65PrivateKey.generate()
public_key = private_key.public_key()
pk_bytes = public_key.public_bytes_raw()
print(f"Public key size: {len(pk_bytes)} bytes")
# Public key size: 1952 bytes
# --- Sign ---
message = b"The critical data you want integrity-protected."
signature = private_key.sign(message)
print(f"Signature size: {len(signature)} bytes")
# Signature size: 3309 bytes
# --- Verify ---
try:
public_key.verify(signature, message)
print("Signature valid")
except Exception as e:
print(f"Invalid: {e}")
Size comparison with ECDSA
| Scheme | Public key | Signature |
|---|---|---|
| ECDSA P-256 | 64 bytes | 64 bytes |
| Ed25519 | 32 bytes | 64 bytes |
| RSA-2048 (PSS) | 256 bytes | 256 bytes |
| ML-DSA-44 | 1312 bytes | 2420 bytes |
| ML-DSA-65 | 1952 bytes | 3309 bytes |
| ML-DSA-87 | 2592 bytes | 4627 bytes |
50× larger signatures than Ed25519. Signatures are the biggest pain point in a pure-PQC deployment. For code-signing, document signing, blockchain: signatures appear on-chain or in every release, and size matters. For transient TLS signatures, less of an issue.
Alternative: Falcon (not yet standardized as of early 2026, but in NIST Round 4). Smaller signatures (~700 bytes) but requires floating-point arithmetic — nightmare for constant-time implementations on embedded hardware.
Alternative: SLH-DSA (SPHINCS+) — hash-based signatures, even larger (8-50 KB) but based on hash functions only, so fundamentally different security assumption from lattices.
Hybrid TLS 1.3 with OpenSSL 3.5
OpenSSL 3.5 (released 2025) has native ML-KEM support. Hybrid TLS 1.3 uses two key-exchange algorithms combined — X25519MLKEM768 is the canonical hybrid, already deployed by Cloudflare and Google Chrome.
Enable hybrid on the server side
# Generate server certificate with ML-DSA-65 signature algorithm
openssl req -x509 -newkey mldsa65 -keyout server.key -out server.crt \
-days 365 -nodes -subj "/CN=test.example.com"
# Inspect the certificate
openssl x509 -in server.crt -text -noout | grep "Signature Algorithm"
# Signature Algorithm: ML-DSA-65
# Start OpenSSL TLS server with hybrid key exchange
openssl s_server -accept 4443 -cert server.crt -key server.key \
-tls1_3 -groups X25519MLKEM768
Client side
# Connect with the X25519MLKEM768 hybrid group
openssl s_client -connect localhost:4443 -tls1_3 \
-groups X25519MLKEM768 -showcerts
# You'll see:
# Peer temporary key: X25519MLKEM768, 1215 bytes
# Server Temp Key: X25519MLKEM768
The handshake uses both X25519 and ML-KEM-768. The TLS session key is derived from both shared secrets combined via HKDF. If ML-KEM has a latent weakness discovered later, X25519 keeps you safe. If Shor breaks X25519 first, ML-KEM keeps you safe.
Production measurements
Cloudflare’s public data on hybrid TLS rollout (2024):
| Metric | X25519 only | X25519MLKEM768 hybrid | Delta |
|---|---|---|---|
| Handshake CPU | 0.4 ms | 0.7 ms | +75% |
| Bandwidth | ~500 bytes | ~2300 bytes | +1800 bytes |
| Connection failure rate (middleboxes) | 0.0001% | 0.003% | 30× worse |
The middlebox-failure rate is the interesting number. Some corporate firewalls and outdated inspection proxies choke on the larger TLS ClientHello message. Cloudflare reports this at ~0.003% of connections — non-zero but manageable for most use cases. Worth checking for internal enterprise networks.
Performance numbers (Python)
import time
from cryptography.hazmat.primitives.asymmetric import ml_kem, ml_dsa, ec
def bench(name: str, fn, n: int = 1000):
start = time.perf_counter()
for _ in range(n):
fn()
elapsed = (time.perf_counter() - start) / n * 1e6
print(f"{name:40s} {elapsed:>8.1f} µs/op")
# Key generation
bench("EC-P256 keygen", lambda: ec.generate_private_key(ec.SECP256R1()))
bench("ML-KEM-768 keygen", lambda: ml_kem.MLKEM768PrivateKey.generate())
bench("ML-DSA-65 keygen", lambda: ml_dsa.MLDSA65PrivateKey.generate())
# Encapsulation / sign (on a pre-generated key)
kem_priv = ml_kem.MLKEM768PrivateKey.generate()
kem_pub = kem_priv.public_key()
bench("ML-KEM-768 encapsulate", lambda: kem_pub.encapsulate())
ct, _ = kem_pub.encapsulate()
bench("ML-KEM-768 decapsulate", lambda: kem_priv.decapsulate(ct))
dsa_priv = ml_dsa.MLDSA65PrivateKey.generate()
dsa_pub = dsa_priv.public_key()
msg = b"benchmark message"
bench("ML-DSA-65 sign", lambda: dsa_priv.sign(msg))
sig = dsa_priv.sign(msg)
bench("ML-DSA-65 verify", lambda: dsa_pub.verify(sig, msg))
Typical numbers on a modern laptop CPU (single-threaded, Python binding):
- ML-KEM-768 keygen: ~80 µs
- ML-KEM-768 encapsulate: ~60 µs
- ML-KEM-768 decapsulate: ~70 µs
- ML-DSA-65 keygen: ~200 µs
- ML-DSA-65 sign: ~500 µs (rejection sampling → variable)
- ML-DSA-65 verify: ~150 µs
Compare X25519 keygen: ~30 µs. ECDSA P-256 sign: ~40 µs.
ML-KEM operations are ~2-3× slower than X25519 but not catastrophically so. ML-DSA operations are ~5-10× slower than Ed25519. For high-throughput servers, this matters; for individual user sessions, it’s imperceptible.
Common integration gotchas
-
Key size blows up serialized bytes. Anywhere you store a public key (database columns, JSON blobs, config files, X.509 certificates) needs to handle 1+ KB instead of 32 bytes. Check database column sizes.
-
TLS fragmentation. QUIC and TLS 1.3 may now need multiple packets for the handshake — usually fine, but low-bandwidth IoT paths may see real slowdown.
-
Middleboxes. Corporate firewalls that truncate or inspect TLS handshakes sometimes reject the large PQC ClientHello. Your app should gracefully fall back to pure classical TLS if the hybrid negotiation fails (for now).
-
Hardware accelerator support. AES-NI and SHA-NI make classical crypto hardware-fast on modern CPUs. ML-KEM and ML-DSA don’t benefit yet; Intel and AMD are adding instructions in roadmap chips. If you target specific hardware, check.
-
Constant-time implementations. The Python bindings use constant-time C code. If you reimplement ML-KEM in Rust or Go or wherever, make sure you use a vetted library (
libcrypto,bouncycastle-pqc,rustcrypto/pqcrypto) rather than rolling your own — side-channel attacks on lattice crypto are a real and active research area.
When NOT to use ML-KEM/ML-DSA
Three cases where pure classical crypto is still fine:
- Ephemeral session keys with short-lived data. A 10-minute chat session encrypted with ECDH → AES is post-quantum safe in practice, because the data is worthless by the time Shor exists.
- Post-quantum not yet in the regulatory framework. Some compliance standards haven’t yet adopted PQC requirements; using PQC ahead of regulation is future-proofing but not yet required.
- Resource-constrained embedded systems. 1.9 KB ML-DSA-65 public key is too large for some embedded firmware flash budgets. Wait for the NIST standardization of smaller algorithms (Falcon, potentially) or use hash-based signatures for one-off firmware signing.
Exercises
1. Write a ChaCha20-Poly1305 channel with ML-KEM key exchange
Build a simple one-way encrypted-message channel: Alice sends Bob encrypted data using a ChaCha20-Poly1305 key derived from an ML-KEM-768 key encapsulation. Verify decryption on Bob’s side.
Show scaffold
from cryptography.hazmat.primitives.asymmetric import ml_kem
from cryptography.hazmat.primitives.ciphers.aead import ChaCha20Poly1305
from cryptography.hazmat.primitives.kdf.hkdf import HKDF
from cryptography.hazmat.primitives import hashes
import os
alice_key = ml_kem.MLKEM768PrivateKey.generate()
pk_bytes = alice_key.public_key().public_bytes_raw()
# Bob encapsulates
from cryptography.hazmat.primitives.asymmetric.ml_kem import MLKEM768PublicKey
alice_pub = MLKEM768PublicKey.from_public_bytes(pk_bytes)
ct, ss = alice_pub.encapsulate()
# Derive AEAD key from shared secret
key = HKDF(hashes.SHA256(), 32, salt=None, info=b"msg-v1").derive(ss)
nonce = os.urandom(12)
message = b"Quantum-safe hello."
ciphertext = ChaCha20Poly1305(key).encrypt(nonce, message, None)
# Alice decapsulates + decrypts
ss_alice = alice_key.decapsulate(ct)
key_alice = HKDF(hashes.SHA256(), 32, salt=None, info=b"msg-v1").derive(ss_alice)
recovered = ChaCha20Poly1305(key_alice).decrypt(nonce, ciphertext, None)
assert recovered == message2. Benchmark on your server
Run the benchmark script above and compare against X25519 + Ed25519. How much overhead does your production TLS deployment actually get?
Typical answer
On a modern server CPU (single core): X25519 keygen ~30 µs; ML-KEM-768 keygen ~80 µs. Overhead per handshake: ~100 µs CPU, ~2 KB bandwidth. For a 10k-RPS server: ~1 extra CPU core burned on the PQC key exchange. Usually fine.
3. Check if your clients support hybrid TLS
Write a Bash script that connects to https://cloudflareresearch.com and reports whether the negotiated TLS group was X25519MLKEM768.
Show script
openssl s_client -connect cloudflareresearch.com:443 -servername cloudflareresearch.com -tls1_3 -groups X25519MLKEM768 < /dev/null 2>&1 | grep "Temp Key"
# Server Temp Key: X25519MLKEM768 → working
# Server Temp Key: X25519 → hybrid not negotiated4. Design a DB migration
Your app stores user public keys in a Postgres column public_key BYTEA with max length 128 bytes. Design the schema migration to support ML-DSA-65 public keys.
Show approach
Don’t widen the existing column (breaks indexes, existing rows). Add a new column ml_dsa_public_key BYTEA with length 2048 or unlimited. Add a migration batch job to generate ML-DSA keypairs for existing users during next login. Keep the old column for a deprecation period (6-12 months); clients use whichever key the server sends. Full migration: retire the classical column once all active users have ML-DSA keys.
What you should take away
- ML-KEM-768 and ML-DSA-65 are the practical default post-quantum choices for Category 3 (128-192 bit PQ security).
- Size blowup is the main deployment issue: public keys 30-50× larger, signatures 50× larger than pre-PQC.
- CPU overhead is 2-5×, noticeable but manageable for most workloads.
- Hybrid mode (X25519MLKEM768) is the production-grade migration path — already deployed by Cloudflare, Google, AWS.
- OpenSSL 3.5 + Python cryptography 44+ give you all the primitives in mainstream toolchains.
Next — and final for this iteration: Auditing a Codebase for Y2Q Readiness. The concrete scanner tool that turns quantum knowledge into an indie product or a consulting deliverable.