Quantum Outpost
algorithms advanced · 28 min read ·

Shor's Algorithm: Factoring, Order-Finding, and the End of RSA

Shor's factoring algorithm reduces integer factorization to the problem of finding the multiplicative order of a random element mod N — and uses quantum phase estimation to solve that in polynomial time. This tutorial derives the full algorithm, runs a small instance in Qiskit, and honestly assesses the real-world resource requirements to break RSA-2048.

Prerequisites: Tutorial 11: QFT and Phase Estimation

Shor’s algorithm is the reason governments take quantum computing seriously. Published in 1994 by Peter Shor at Bell Labs, it factors nn-bit integers in time polynomial in nn — exponentially faster than any known classical algorithm. That threat directly undermines RSA, the public-key cryptography protecting essentially all internet encryption, banking, and signed software. NIST’s post-quantum standardization (ML-KEM, ML-DSA, 2024) exists because of Shor.

This tutorial derives the algorithm carefully. It’s genuinely subtle — much more than “apply QFT, get answer.” The quantum part does one specific thing (find the multiplicative order of an element mod NN), and the factoring follows from classical number theory. Getting the composition right matters.

The classical problem

Given a composite integer NN, find a nontrivial factor. The best known classical algorithm is the General Number Field Sieve (GNFS), with complexity

exp(O ⁣(6493(logN)1/3(loglogN)2/3)).\exp\left(O\!\left(\,\sqrt[3]{\tfrac{64}{9}} \cdot (\log N)^{1/3} \cdot (\log\log N)^{2/3}\,\right)\right).

For NN at 2048 bits (the RSA-2048 standard), GNFS is estimated to require on the order of 103410^{34} operations — infeasible on any classical computer humanity will build this century.

Shor’s quantum algorithm runs in O((logN)2(loglogN)(logloglogN))O((\log N)^2 (\log\log N) (\log\log\log N)) time — polynomial in the number of digits. On a sufficiently large fault-tolerant quantum computer, RSA-2048 falls in hours.

The reduction: factoring → order-finding

Shor reduces factoring to a problem in number theory that turns out to be tractable for quantum computers. Here’s the reduction.

Order of an element. For aa coprime to NN, the multiplicative order of aa mod NN is the smallest positive integer rr such that ar1(modN)a^r \equiv 1 \pmod N. The classical gcd and Euclidean-algorithm operations are cheap. The hard part is computing rr.

Classical number-theoretic fact. If rr is even and ar/2≢1(modN)a^{r/2} \not\equiv -1 \pmod N, then gcd(ar/21,N)\gcd(a^{r/2} - 1, N) and gcd(ar/2+1,N)\gcd(a^{r/2} + 1, N) are both nontrivial factors of NN. (Because ar1=(ar/21)(ar/2+1)0(modN)a^r - 1 = (a^{r/2} - 1)(a^{r/2} + 1) \equiv 0 \pmod N, and under the conditions neither factor is a multiple of NN.)

Probabilistic reduction. Pick a random aa with 1<a<N1 < a < N, gcd(a,N)=1\gcd(a, N) = 1 (if gcd>1\gcd > 1, you already found a factor — stop). Find rr = order of aa. With probability at least 1/21/2, rr is even and ar/2≢1a^{r/2} \not\equiv -1. In that case, factorize using the gcd formulas. Else, try another aa.

Everything but “find rr” is classical and polynomial. The quantum speedup is entirely in the order-finding.

Order-finding as phase estimation

Define the unitary UaU_a acting on log2N\lceil \log_2 N \rceil qubits:

Uax=ax(modN).U_a\,|x\rangle = |ax \pmod N\rangle.

(We extend UaU_a to act as identity on x|x\rangle for xNx \geq N, making it unitary on the full 2logN2^{\lceil \log N \rceil}-dim space.)

Two key facts:

  1. UaU_a‘s eigenvalues are rr-th roots of unity. Its order is rr: Uar=IU_a^r = I.
  2. Its eigenvectors are related to the roots of unity. Specifically, for each k{0,1,...,r1}k \in \{0, 1, ..., r-1\}, define
uk=1rj=0r1e2πikj/rajmodN.|u_k\rangle = \frac{1}{\sqrt{r}}\sum_{j=0}^{r-1} e^{-2\pi i k j / r}\,|a^j \bmod N\rangle.

Then Uauk=e2πik/rukU_a|u_k\rangle = e^{2\pi i k / r}|u_k\rangle.

QPE on UaU_a with eigenvector uk|u_k\rangle yields φ=k/r\varphi = k/r, from which rr can be recovered classically.

The superposition trick

We can’t easily prepare uk|u_k\rangle — we don’t know rr or kk. But a magic identity:

1rk=0r1uk  =  1.\frac{1}{\sqrt{r}}\sum_{k=0}^{r-1}|u_k\rangle \;=\; |1\rangle.

(Expand the sum; the kk-sum projects to j=0j = 0.) So if you prepare 1|1\rangle in the eigenvector register and run QPE, you effectively run QPE on a uniform superposition of all uk|u_k\rangle. The counting-register measurement returns a random φk/r\varphi \approx k/r for a uniformly random k{0,...,r1}k \in \{0, ..., r-1\}.

From k/rk/r to rr: continued fractions. If the measurement returns φ\varphi, apply the continued-fraction expansion to find the rational p/qp/q closest to φ\varphi with q<Nq < N. With high probability, q=rq = r (or a divisor of it). If not, try again.

The full algorithm

  1. Pick random a{2,...,N1}a \in \{2, ..., N-1\} with gcd(a,N)=1\gcd(a, N) = 1.
  2. Use quantum phase estimation on UaU_a with input state 1|1\rangle and 2logN2 \lceil \log N \rceil counting qubits.
  3. Measure the counting register; get a binary fraction φ\varphi.
  4. Run continued-fraction expansion to extract rr.
  5. Verify classically that ar1(modN)a^r \equiv 1 \pmod N. If not, go back to step 2.
  6. If rr is odd or ar/21(modN)a^{r/2} \equiv -1 \pmod N, go back to step 1.
  7. Compute gcd(ar/2±1,N)\gcd(a^{r/2} \pm 1, N). Return the nontrivial factors.

Total quantum queries expected: O(1)O(1) repetitions of step 2 (each succeeds with constant probability). Total classical work: polynomial in logN\log N. Total circuit depth per quantum query: dominated by the controlled modular exponentiations, O((logN)3)O((\log N)^3) or so.

Example: factor N=15N = 15

N=15=3×5N = 15 = 3 \times 5. Pick a=7a = 7 (coprime to 15). Order of 7 mod 15: 71=7,72=494,732813,749117^1 = 7, 7^2 = 49 \equiv 4, 7^3 \equiv 28 \equiv 13, 7^4 \equiv 91 \equiv 1. So r=4r = 4 (even). ar/2=72=494a^{r/2} = 7^2 = 49 \equiv 4. Not 1(mod15)-1 \pmod{15}. Great.

gcd(41,15)=gcd(3,15)=3\gcd(4 - 1, 15) = \gcd(3, 15) = 3. gcd(4+1,15)=gcd(5,15)=5\gcd(4 + 1, 15) = \gcd(5, 15) = 5. 15=3×515 = 3 \times 5. Done.

For the quantum part, we need to implement controlled-Ua2kU_a^{2^k} gates on 4 target qubits (log215=4\lceil \log_2 15 \rceil = 4). For this small instance, we can hardcode the multiplication circuits. For general NN we’d need a proper modular-exponentiation circuit, which is the expensive part.

Qiskit implementation (toy scale)

Because the general modular exponentiation is complex to build from gates, and Qiskit’s didactic Shor class was removed in newer versions, we’ll use the manual QPE approach with a hand-built U7mod15U_7 \bmod 15 circuit. This is not scalable — but it’s honest about what the real algorithm does.

import numpy as np
from fractions import Fraction
from math import gcd
from qiskit import QuantumCircuit, transpile
from qiskit_aer import AerSimulator
from qiskit.circuit.library import QFT


def c_amod15(a: int, power: int) -> QuantumCircuit:
    """Controlled multiplication by a^power mod 15 on 4 target qubits.

    Uses the known permutation structure for a ∈ {2, 4, 7, 8, 11, 13}.
    """
    if a not in (2, 4, 7, 8, 11, 13):
        raise ValueError("a must be coprime to 15 (one of 2, 4, 7, 8, 11, 13).")

    U = QuantumCircuit(4)
    for _ in range(power):
        if a in (2, 13):
            U.swap(2, 3); U.swap(1, 2); U.swap(0, 1)
        if a in (7, 8):
            U.swap(0, 1); U.swap(1, 2); U.swap(2, 3)
        if a in (4, 11):
            U.swap(1, 3); U.swap(0, 2)
        if a in (7, 11, 13):
            for q in range(4): U.x(q)
    controlled = U.to_gate(label=f"{a}^{power} mod 15").control(1)
    return controlled


def qpe_order_finding(a: int, N: int = 15, n_count: int = 8) -> QuantumCircuit:
    qc = QuantumCircuit(n_count + 4, n_count)
    # Counting register into superposition
    qc.h(range(n_count))
    # Target register starts |0001⟩ = |1⟩
    qc.x(n_count)
    # Controlled powers of U_a
    for k in range(n_count):
        qc.append(c_amod15(a, 2 ** k), [k, *range(n_count, n_count + 4)])
    # Inverse QFT on counting register
    qc.append(QFT(n_count, inverse=True, do_swaps=True), range(n_count))
    qc.measure(range(n_count), range(n_count))
    return qc


def shor_factor(N: int, attempts: int = 10) -> tuple[int, int] | None:
    if N % 2 == 0:
        return (2, N // 2)
    sim = AerSimulator()
    for _ in range(attempts):
        a = np.random.choice([2, 4, 7, 8, 11, 13])
        g = gcd(int(a), N)
        if g > 1:
            return (int(g), N // int(g))

        qc = qpe_order_finding(int(a), N)
        tqc = transpile(qc, sim)
        counts = sim.run(tqc, shots=1).result().get_counts()
        bits = next(iter(counts))
        phi = int(bits, 2) / 2 ** len(bits)
        frac = Fraction(phi).limit_denominator(N)
        r = frac.denominator
        if r % 2 != 0 or r == 1: continue
        candidate = pow(int(a), r // 2, N)
        if candidate == N - 1: continue
        f1 = gcd(candidate - 1, N)
        f2 = gcd(candidate + 1, N)
        if 1 < f1 < N: return (int(f1), N // int(f1))
        if 1 < f2 < N: return (int(f2), N // int(f2))
    return None


np.random.seed(42)
factors = shor_factor(15)
print(f"15 = {factors[0]} × {factors[1]}")
# 15 = 3 × 5

The circuit has 8+4=128 + 4 = 12 qubits and a few hundred gates after transpilation. On a real IBM Quantum free-tier machine, results are noisy but recognizable — you’ll often need several repetitions for a clean answer. On a simulator, success is near-certain.

What it would take to factor RSA-2048

Three resource estimates from peer-reviewed work (Gidney & Ekerå 2021 is the standard reference):

ResourceEstimate
Logical qubits~4,100
Physical qubits (assuming surface code at 10310^{-3} gate error)~20 million
Wall-clock time~8 hours
Total Toffoli-equivalent operations~3×1093 \times 10^9
Total T-gates (post-distillation)~101010^{10}

As of 2026, the largest known Shor-type factoring demonstration on real hardware is factoring 21 (Wang et al., 2024) using hybrid variational methods on a 5-qubit device; pure Shor demonstrations have been stuck at N=15N = 15 since Vandersypen et al. 2001. The gap between “15” and “2048-bit” is about 14 orders of magnitude in qubit count.

Google’s Willow (105 physical qubits, Dec 2024) is the first chip to demonstrate below-threshold surface-code error correction. IBM’s Starling roadmap (200 logical qubits by 2029) and Blueprint (2033) are the first credible paths toward thousands of logical qubits. Most experts estimate RSA-2048 falls somewhere between 2030 and 2040.

What post-quantum cryptography actually changes

Shor breaks RSA, DSA, ECDSA, Diffie-Hellman, and all integer-factoring or discrete-log-based public-key cryptography. It does not break:

  • Symmetric ciphers like AES-256, ChaCha20 (Grover gives a quadratic, not exponential, speedup — doubling key size restores security).
  • Hash functions like SHA-256, SHA-3 (Grover quadratic speedup — use 256-bit hashes for 128-bit post-quantum security).
  • Lattice-based schemes like ML-KEM (CRYSTALS-Kyber) and ML-DSA (CRYSTALS-Dilithium) — NIST’s standardized PQC.
  • Code-based (Classic McEliece), hash-based (SPHINCS+), isogeny-based (SIKE — though SIKE was classically broken in 2022).

The NIST PQC migration is specifically about replacing the Shor-vulnerable primitives while keeping the Grover-resistant ones at twice the key length. This is tractable for most infrastructure — but non-trivial, because every TLS implementation, every signed software update chain, every hardware security module has to be audited and upgraded.

Exercises

1. Verify the order-finding

For N=21N = 21 and a=4a = 4, compute the order rr classically. Verify rr is even and ar/2≢1a^{r/2} \not\equiv -1. Then compute the factors via gcd(ar/2±1,N)\gcd(a^{r/2} \pm 1, N).

Show answer

41=4,42=16,43=641(mod21)4^1 = 4, 4^2 = 16, 4^3 = 64 \equiv 1 \pmod{21}. So r=3r = 3. Odd! Retry with another aa.

Try a=5a = 5: 51=5,52=254,53=20,54=10016,55=8017,56=8515^1 = 5, 5^2 = 25 \equiv 4, 5^3 = 20, 5^4 = 100 \equiv 16, 5^5 = 80 \equiv 17, 5^6 = 85 \equiv 1. r=6r = 6, even. 53=20≢120(mod21)5^{3} = 20 \not\equiv -1 \equiv 20 \pmod{21}. Hmm, actually 120-1 \equiv 20, so ar/21a^{r/2} \equiv -1 — bad case, retry.

Try a=2a = 2: 21=2,22=4,23=8,24=16,25=3211,26=2212^1 = 2, 2^2 = 4, 2^3 = 8, 2^4 = 16, 2^5 = 32 \equiv 11, 2^6 = 22 \equiv 1. r=6r = 6, even. 23=8202^3 = 8 \neq 20. Good. gcd(81,21)=7\gcd(8 - 1, 21) = 7, gcd(8+1,21)=3\gcd(8 + 1, 21) = 3. 21=3×721 = 3 \times 7. ✓

2. Continued fractions

Suppose QPE on Ua(modN)U_a \pmod N with 10 counting qubits returns φ=0.3332\varphi = 0.3332. Use the continued-fraction algorithm to extract candidate denominators rr bounded by N=100N = 100.

Show answer

0.3332=[0;3,3003,...]0.3332 = [0; 3, 3003, ...]. The convergents are 0,1/3,...0, 1/3, .... The first convergent with denominator ≤ 100 is 1/31/3. So candidate r=3r = 3. Verify: is a31(modN)a^3 \equiv 1 \pmod N? If yes, r=3r = 3.

In Python:

from fractions import Fraction
Fraction(0.3332).limit_denominator(100)
# Fraction(1, 3)

3. Why 2n2n counting qubits

Why does Shor require 2logN2\lceil \log N \rceil counting qubits rather than logN\lceil \log N \rceil?

Show answer

The phase φ=k/r\varphi = k/r has denominator up to rNr \leq N. To distinguish it from neighboring fractions in continued-fraction post-processing, you need precision at least 1/(2r2)1/(2N2)1/(2r^2) \sim 1/(2N^2). That requires log2(2N2)=2logN+1\lceil \log_2(2N^2) \rceil = 2\lceil \log N \rceil + 1 counting qubits. Shortcuts exist (Kitaev’s 1-qubit iterative QPE) but cost more repetitions.

4. Estimate RSA-1024

If RSA-2048 takes ~20M physical qubits and ~8 hours, roughly what would RSA-1024 take? (Use that factoring nn-bit numbers is O(n3)O(n^3) in gates, and that RSA-1024 is already considered broken classically by well-resourced adversaries.)

Show answer

Halving nn roughly divides gate count by 8 and qubit count by 2. So RSA-1024: ~10M physical qubits, ~1 hour wall-clock. Still infeasible on 2026 hardware but closer — this is why RSA-1024 has been deprecated for classical reasons since ~2013 and is a “canary” target for the first credible Shor demos.

What you should take away

  • Shor reduces factoring to order-finding. The quantum speedup is entirely in finding the order of a(modN)a \pmod N via QPE.
  • QPE on the modular-multiplication unitary recovers k/rk/r for random kk; continued fractions extract rr.
  • Real RSA-2048 attacks need ~20M physical qubits and ~8 hours wall-clock on fault-tolerant hardware. We’re 14 orders of magnitude away from that on demonstrated hardware.
  • Harvest-now-decrypt-later makes this urgent despite the long timeline. Migrate to post-quantum schemes (ML-KEM, ML-DSA) now.
  • Grover and Shor together define the post-quantum threat model: double symmetric key sizes, replace all public-key schemes.

This closes the Algorithms track. Next up: Variational algorithms — VQE, QAOA, and the hybrid classical/quantum paradigm that works on today’s noisy hardware, without waiting for fault tolerance.


Weekly dispatch

Quantum, for people who already code.

One serious tutorial per week, plus the industry moves that actually matter. No hype, no hand-waving.

Free. Unsubscribe anytime. We will never sell your email.