Quantum Outpost
algorithms intermediate · 23 min read ·

Bernstein-Vazirani and Simon: Learning Hidden Structure in One (or O(n)) Queries

Bernstein-Vazirani learns a hidden bit string in a single query. Simon's algorithm learns a hidden shift with O(n) queries where classical algorithms need exponentially many — and was the direct inspiration for Shor's factoring algorithm. This tutorial derives both from scratch with complete Qiskit implementations.

Prerequisites: Tutorial 8: Deutsch-Jozsa

Deutsch-Jozsa told us quantum computers can beat deterministic classical ones in the black-box model. This tutorial covers two algorithms that push the idea further and make it recognizable as learning: Bernstein-Vazirani retrieves a hidden bit string in one query, and Simon’s algorithm cracks a problem where even randomized classical algorithms need exponentially many queries.

Simon’s algorithm is the unsung hero of quantum computing history. When Peter Shor saw it in 1994, he reportedly realized “the same idea would break RSA if I could just replace the Boolean group Z2n\mathbb{Z}_2^n with the integers mod NN.” Shor’s factoring algorithm — the reason governments care about quantum — is a direct descendant.

Bernstein-Vazirani: find the hidden string in one query

Problem. You’re given a Boolean function f:{0,1}n{0,1}f: \{0,1\}^n \to \{0,1\} as a black box, with the promise that there exists a hidden string s{0,1}ns \in \{0,1\}^n such that

f(x)  =  sx(mod2),f(x) \;=\; s \cdot x \pmod 2,

where the dot product is taken bit-by-bit and summed mod 2. Your job: find ss.

Classical bound. nn queries — you have to learn each bit of ss. Query f(ei)f(e_i) where eie_i is the ii-th standard basis vector and read off sis_i. You can’t do better because each query returns only one bit.

Quantum bound. One query. Same circuit as Deutsch-Jozsa, same structure, different interpretation of the output.

Circuit

Literally identical to Deutsch-Jozsa. Prepare 0n1|0^n\rangle|1\rangle, apply Hadamards on everything, apply the oracle, apply Hadamards on the inputs, measure. The output is ss.

Why? Trace the state as in Deutsch-Jozsa. After the oracle with phase kickback:

12nx(1)sxx.\tfrac{1}{\sqrt{2^n}}\sum_x (-1)^{s \cdot x}|x\rangle \otimes |-\rangle.

Now apply HnH^{\otimes n} to the inputs. The Hadamard of a basis state with a sign pattern is:

Hn12nx(1)sxx  =  12nx,y(1)sx+xyy  =  s.H^{\otimes n}\,\frac{1}{\sqrt{2^n}}\sum_x (-1)^{s \cdot x}|x\rangle \;=\; \frac{1}{2^n}\sum_{x,y}(-1)^{s \cdot x + x \cdot y}|y\rangle \;=\; |s\rangle.

The inner sum x(1)x(sy)\sum_x (-1)^{x \cdot (s \oplus y)} is 2n2^n when y=sy = s and 00 otherwise. So you measure ss deterministically. One oracle query, nn classical bits recovered.

OpenQASM for BV with s=1011s = 1011

OPENQASM 2.0;
include "qelib1.inc";

// Bernstein-Vazirani for hidden string s = 1011 (LSB on q[0])
qreg q[5];   // 4 inputs + 1 ancilla
creg c[4];

// Prep
x q[4];
h q[0]; h q[1]; h q[2]; h q[3]; h q[4];

// Oracle f(x) = s · x mod 2, for s = 1011
// Bits of s that are 1 contribute CNOTs to the ancilla:
cx q[0], q[4];   // s_0 = 1
cx q[1], q[4];   // s_1 = 1
// s_2 = 0, skip
cx q[3], q[4];   // s_3 = 1

// Final H on inputs
h q[0]; h q[1]; h q[2]; h q[3];

measure q[0] -> c[0];
measure q[1] -> c[1];
measure q[2] -> c[2];
measure q[3] -> c[3];

Every shot returns 1011 (in Qiskit’s reverse bit order that’s 1101 read right-to-left). → Try it in the playground — paste the QASM above.

Qiskit implementation

from qiskit import QuantumCircuit
from qiskit_aer import AerSimulator

def bv_circuit(s: str) -> QuantumCircuit:
    n = len(s)
    qc = QuantumCircuit(n + 1, n)

    qc.x(n)                 # ancilla = |1⟩
    qc.h(range(n + 1))      # Hadamards everywhere

    # Oracle: one CNOT per '1' bit in s (indexed so s[0] is the MSB)
    for i, bit in enumerate(reversed(s)):
        if bit == "1":
            qc.cx(i, n)

    qc.h(range(n))          # Hadamards back on inputs
    qc.measure(range(n), range(n))
    return qc

s = "1011"
qc = bv_circuit(s)
counts = AerSimulator().run(qc, shots=1).result().get_counts()
print(f"Hidden s = {s}, recovered = {next(iter(counts))}")
# Hidden s = 1011, recovered = 1011

Why the n=1n = 1 classical vs quantum separation looks so stark

If you hand a classical algorithm an oracle for f(x)=sxf(x) = s \cdot x on n=1000n = 1000 bits, it will make exactly 1000 queries to learn ss. The quantum algorithm makes one. That’s a 1000-to-1 query speedup — exponential in the speedup factor, though not in the scaling (both are polynomial in nn in this problem).

The deeper win: the speedup holds against randomized classical algorithms too. Unlike Deutsch-Jozsa, there’s no way for a classical algorithm to get around the information-theoretic lower bound of nn queries — each classical query returns only one bit and you need nn bits of answer. Quantum parallelism + phase kickback + Hadamard readout returns nn bits in one go.

Simon’s algorithm: the real exponential speedup

Problem. You’re given a function f:{0,1}n{0,1}nf: \{0,1\}^n \to \{0,1\}^n as a black box, with the promise that there exists a hidden nonzero string s{0,1}ns \in \{0,1\}^n such that

f(x)=f(y)    x=y or x=ys.f(x) = f(y) \quad \iff \quad x = y \text{ or } x = y \oplus s.

In words: ff is 2-to-1, and every pair of colliding inputs differs by XOR of the hidden ss. Your job: find ss.

Classical bound. Θ(2n/2)\Theta(2^{n/2}) queries by the birthday paradox — you have to keep querying ff until you happen to see the same output twice, and then x1x2=sx_1 \oplus x_2 = s. On n=40n = 40 that’s already 2 million queries.

Quantum bound. O(n)O(n) queries on average. That’s exponential separation against even randomized classical algorithms. No asterisks.

Circuit

Simon uses two nn-qubit registers. Call them the input register and the output register:

  1. Prepare 0n0n|0^n\rangle|0^n\rangle.
  2. Apply HnH^{\otimes n} to the input register:
12nxx0n.\frac{1}{\sqrt{2^n}}\sum_x |x\rangle|0^n\rangle.
  1. Apply the oracle UfU_f which maps x0nxf(x)|x\rangle|0^n\rangle \to |x\rangle|f(x)\rangle:
12nxxf(x).\frac{1}{\sqrt{2^n}}\sum_x |x\rangle|f(x)\rangle.
  1. Measure the output register (and discard the result). This step is subtle and important.

Because ff is 2-to-1, every output f(x)f(x) comes from exactly two inputs {x0,x0s}\{x_0, x_0 \oplus s\}. Measuring the output collapses the input register to an equal superposition over that pair:

12(x0+x0s).\frac{1}{\sqrt{2}}\big(|x_0\rangle + |x_0 \oplus s\rangle\big).

You don’t know x0x_0 — that’s hidden randomness — but the structure "{x0,x0s}\{x_0, x_0 \oplus s\}" is exactly what exposes ss.

  1. Apply HnH^{\otimes n} to the input register. After a page of algebra:
12ny:ys=0(1)yx0y.\frac{1}{\sqrt{2^n}}\sum_{y : y \cdot s = 0} (-1)^{y \cdot x_0}|y\rangle.

Every yy in the superposition has the property ys=0(mod2)y \cdot s = 0 \pmod 2. The random phase (1)yx0(-1)^{y \cdot x_0} is irrelevant — when you measure, you get some yy with ys=0(mod2)y \cdot s = 0 \pmod 2.

  1. Measure the input register. You get a random yy orthogonal to ss over F2\mathbb{F}_2.

Classical post-processing

One run of the quantum circuit gives you one random linear constraint ys=0(mod2)y \cdot s = 0 \pmod 2. Run the circuit n1n-1 times (or slightly more, to be robust) and you’ll have n1n-1 random linear constraints. Solve them as a linear system over F2\mathbb{F}_2 and you get ss (up to the trivial solution s=0s = 0, which the problem promise rules out).

Linear systems over F2\mathbb{F}_2 can be solved by Gaussian elimination in O(n3)O(n^3) classical operations. Total quantum queries: O(n)O(n). Total classical post-processing: O(n3)O(n^3).

OpenQASM for Simon with s=11s = 11

n=2n = 2, smallest meaningful instance. Pick a specific ff: f(00)=f(11)=00f(00) = f(11) = 00, f(01)=f(10)=01f(01) = f(10) = 01 — collisions at differ-by-ss pairs.

OPENQASM 2.0;
include "qelib1.inc";

// Simon, s = 11. 2 input qubits, 2 output qubits.
qreg q[4];
creg c[2];

// Prep & Hadamards on input register
h q[0]; h q[1];

// Oracle: f(x0, x1) = (x0 XOR x1, 0). Collisions at x and x XOR 11.
cx q[0], q[2];
cx q[1], q[2];

// Measure output register (qubits 2,3) first, informally
// (skipping the classical measurement step here; the collapse still happens)

// Hadamards back on input
h q[0]; h q[1];

// Measure input register
measure q[0] -> c[0];
measure q[1] -> c[1];

Every shot returns either 00 or 11 (because the only yy with y11=0(mod2)y \cdot 11 = 0 \pmod 2 are 0000 and 1111). → Try it in the playground.

Qiskit implementation with classical post-processing

import numpy as np
from qiskit import QuantumCircuit
from qiskit_aer import AerSimulator

def simon_circuit(n: int, oracle: QuantumCircuit) -> QuantumCircuit:
    qc = QuantumCircuit(2 * n, n)
    qc.h(range(n))                  # uniform superposition on inputs
    qc.compose(oracle, inplace=True) # apply oracle
    qc.h(range(n))                  # Hadamards back
    qc.measure(range(n), range(n))
    return qc

def simon_oracle(n: int, s: str) -> QuantumCircuit:
    """Oracle for a Simon function with hidden shift s."""
    oc = QuantumCircuit(2 * n)
    # Copy x → output register
    for i in range(n):
        oc.cx(i, n + i)
    # Find first '1' bit of s (MSB in our convention)
    first_one = None
    for i, bit in enumerate(reversed(s)):
        if bit == "1":
            first_one = i
            break
    # Controlled on that bit, XOR s into output → creates the 2-to-1 collision pattern
    for i, bit in enumerate(reversed(s)):
        if bit == "1":
            oc.cx(first_one, n + i)
    return oc

def solve_f2_system(ys: list[str]) -> str | None:
    """Find a nonzero s such that y · s = 0 mod 2 for every y. Gaussian elimination over F_2."""
    n = len(ys[0])
    A = np.array([[int(b) for b in y] for y in ys], dtype=np.int8)
    # Row-reduce
    rows = A.tolist()
    row, col = 0, 0
    while row < len(rows) and col < n:
        pivot = next((i for i in range(row, len(rows)) if rows[i][col]), None)
        if pivot is None:
            col += 1
            continue
        rows[row], rows[pivot] = rows[pivot], rows[row]
        for i in range(len(rows)):
            if i != row and rows[i][col]:
                rows[i] = [(a ^ b) for a, b in zip(rows[i], rows[row])]
        row += 1; col += 1
    # Find a null-space vector
    pivots = set()
    for r in rows:
        for j in range(n):
            if r[j]:
                pivots.add(j); break
    free = [j for j in range(n) if j not in pivots]
    if not free:
        return None
    s = [0] * n
    s[free[0]] = 1
    for r in rows:
        pivot_col = next((j for j in range(n) if r[j]), None)
        if pivot_col is None:
            continue
        s[pivot_col] = sum(r[j] * s[j] for j in range(n)) % 2
    return "".join(str(b) for b in s)

# --- Run ---
n = 3
s = "110"
oracle = simon_oracle(n, s)
qc = simon_circuit(n, oracle)

sim = AerSimulator()
ys = []
shots = 0
while len(ys) < n - 1 and shots < 200:
    counts = sim.run(qc, shots=1).result().get_counts()
    y = next(iter(counts))
    shots += 1
    if y != "0" * n and y not in ys:
        # Sanity check: y · s should be 0 mod 2
        if sum(int(a) * int(b) for a, b in zip(y, s)) % 2 == 0:
            ys.append(y)

recovered = solve_f2_system(ys)
print(f"Hidden s = {s}, shots used = {shots}, recovered = {recovered}")
# Hidden s = 110, shots used = ~3, recovered = 110

Why Simon matters more than Bernstein-Vazirani

BV’s speedup is dramatic-looking (1 vs nn) but polynomial: classical nn queries is already efficient. Simon’s speedup is exponential (nn vs 2n/22^{n/2}) in a setting where no classical algorithm — randomized, deterministic, anything — can do better than O(2n/2)O(2^{n/2}). This is the first genuine exponential quantum-vs-classical separation in query complexity that holds against randomized algorithms.

Shor’s algorithm is, structurally, Simon’s algorithm with Z2n\mathbb{Z}_2^n replaced by ZN\mathbb{Z}_N and the hidden XOR-shift replaced by a hidden multiplicative period. The analogy is precise: Simon’s hidden subgroup is {0,s}\{0, s\}; Shor’s hidden subgroup is {0,r,2r,3r,...}(modN)\{0, r, 2r, 3r, ...\} \pmod N. Same idea, bigger group, bigger trouble for RSA.

Exercises

1. Verify BV algebraically

For s=101s = 101 on n=3n = 3, hand-trace the state through the BV circuit and confirm the final measurement yields ss with probability 1.

Show answer

After phase kickback, input register is 18x(1)sxx\tfrac{1}{\sqrt{8}}\sum_x (-1)^{s \cdot x}|x\rangle with s=101s = 101. The sign of x2x1x0|x_2 x_1 x_0\rangle is (1)x2x0(-1)^{x_2 \oplus x_0}. Hadamard transform gives s=101|s\rangle = |101\rangle. Measurement gives 101 with probability 1.

2. Simon with a specific oracle

Write out the Simon oracle for n=3n = 3, s=110s = 110, as a circuit. Run it in Qiskit and collect 10 random yy‘s. Verify every yy satisfies ys=0(mod2)y \cdot s = 0 \pmod 2.

Show hint

Use simon_oracle(3, "110") from the code above. Run 10 shots, check each outcome: y0y1y2y_0 y_1 y_2 should satisfy y0+y1=0(mod2)y_0 + y_1 = 0 \pmod 2. Expected yy‘s: 000,001,110,111000, 001, 110, 111.

3. Classical birthday attack

Implement a naive classical algorithm for Simon that just queries ff at random inputs until it sees a collision. Measure the number of queries needed for n=10n = 10 over 100 runs. Compare to 2n/2=322^{n/2} = 32.

Hint
import random

def classical_simon(f, n, trials=100):
    lengths = []
    for _ in range(trials):
        seen = {}
        count = 0
        while True:
            x = random.randint(0, 2**n - 1)
            count += 1
            if x in seen: continue
            y = f(x)
            if y in seen.values():
                lengths.append(count); break
            seen[x] = y
    return sum(lengths) / len(lengths)

You’ll see averages around 1.252n/21.25 \cdot 2^{n/2}, consistent with the birthday bound.

4. Think: what Simon leaves out

Simon’s algorithm gives you ss given a specific promise about ff. What happens if the promise is violated — e.g., ff is actually 4-to-1? Would the algorithm fail silently or detectably?

Show answer

If ff is 4-to-1, measuring the output register collapses the input register to an equal superposition over 4 inputs (not 2). The subsequent Hadamard and measurement returns a yy orthogonal to a hidden subgroup of size 4, not a single shift ss. The algorithm “succeeds” in returning such yy‘s, but the linear system you’d solve to extract ss is inconsistent — Gaussian elimination fails or returns multiple solutions. So you’d notice, but only in post-processing. This is why real uses of Simon check the promise holds before trusting results.

What you should take away

  • Bernstein-Vazirani retrieves an nn-bit secret in one oracle query (vs nn classically). Same circuit as Deutsch-Jozsa, different interpretation of the output bit string.
  • Simon’s algorithm finds a hidden XOR-shift in O(n)O(n) queries (vs Θ(2n/2)\Theta(2^{n/2}) classically). The first exponential quantum speedup holding against randomized classical algorithms.
  • Simon → Shor. The same algorithm with Z2n\mathbb{Z}_2^n replaced by ZN\mathbb{Z}_N and XOR-period replaced by multiplicative-period is Shor’s factoring algorithm.
  • Classical post-processing matters. Simon returns random linear constraints; you recover the answer by Gaussian elimination over F2\mathbb{F}_2.

Next: Grover’s search. The other great query-complexity algorithm — O(N)O(\sqrt{N}) vs O(N)O(N) — and the amplitude amplification technique that generalizes it far beyond search.


Weekly dispatch

Quantum, for people who already code.

One serious tutorial per week, plus the industry moves that actually matter. No hype, no hand-waving.

Free. Unsubscribe anytime. We will never sell your email.