algorithms intermediate · 23 min read · By LIPAI WANG · April 22, 2026

Bernstein-Vazirani and Simon: Learning Hidden Structure in One (or O(n)) Queries

Bernstein-Vazirani learns a hidden bit string in a single query. Simon's algorithm learns a hidden shift with O(n) queries where classical algorithms need exponentially many — and was the direct inspiration for Shor's factoring algorithm. This tutorial derives both from scratch with complete Qiskit implementations.

Prerequisites: Tutorial 8: Deutsch-Jozsa

Deutsch-Jozsa told us quantum computers can beat deterministic classical ones in the black-box model. This tutorial covers two algorithms that push the idea further and make it recognizable as learning: Bernstein-Vazirani retrieves a hidden bit string in one query, and Simon’s algorithm cracks a problem where even randomized classical algorithms need exponentially many queries.

Simon’s algorithm is the unsung hero of quantum computing history. When Peter Shor saw it in 1994, he reportedly realized “the same idea would break RSA if I could just replace the Boolean group $\mathbb{Z}_2^n$ with the integers mod $N$ .” Shor’s factoring algorithm — the reason governments care about quantum — is a direct descendant.

Bernstein-Vazirani: find the hidden string in one query

Problem. You’re given a Boolean function $f: \{0,1\}^n \to \{0,1\}$ as a black box, with the promise that there exists a hidden string $s \in \{0,1\}^n$ such that

f(x) \;=\; s \cdot x \pmod 2,

where the dot product is taken bit-by-bit and summed mod 2. Your job: find $s$ .

Classical bound. $n$ queries — you have to learn each bit of $s$ . Query $f(e_i)$ where $e_i$ is the $i$ -th standard basis vector and read off $s_i$ . You can’t do better because each query returns only one bit.

Quantum bound. One query. Same circuit as Deutsch-Jozsa, same structure, different interpretation of the output.

Circuit

Literally identical to Deutsch-Jozsa. Prepare $|0^n\rangle|1\rangle$ , apply Hadamards on everything, apply the oracle, apply Hadamards on the inputs, measure. The output is $s$ .

Why? Trace the state as in Deutsch-Jozsa. After the oracle with phase kickback:

\tfrac{1}{\sqrt{2^n}}\sum_x (-1)^{s \cdot x}|x\rangle \otimes |-\rangle.

Now apply $H^{\otimes n}$ to the inputs. The Hadamard of a basis state with a sign pattern is:

H^{\otimes n}\,\frac{1}{\sqrt{2^n}}\sum_x (-1)^{s \cdot x}|x\rangle \;=\; \frac{1}{2^n}\sum_{x,y}(-1)^{s \cdot x + x \cdot y}|y\rangle \;=\; |s\rangle.

The inner sum $\sum_x (-1)^{x \cdot (s \oplus y)}$ is $2^n$ when $y = s$ and $0$ otherwise. So you measure $s$ deterministically. One oracle query, $n$ classical bits recovered.

OpenQASM for BV with $s = 1011$

OPENQASM 2.0;
include "qelib1.inc";

// Bernstein-Vazirani for hidden string s = 1011 (LSB on q[0])
qreg q[5];   // 4 inputs + 1 ancilla
creg c[4];

// Prep
x q[4];
h q[0]; h q[1]; h q[2]; h q[3]; h q[4];

// Oracle f(x) = s · x mod 2, for s = 1011
// Bits of s that are 1 contribute CNOTs to the ancilla:
cx q[0], q[4];   // s_0 = 1
cx q[1], q[4];   // s_1 = 1
// s_2 = 0, skip
cx q[3], q[4];   // s_3 = 1

// Final H on inputs
h q[0]; h q[1]; h q[2]; h q[3];

measure q[0] -> c[0];
measure q[1] -> c[1];
measure q[2] -> c[2];
measure q[3] -> c[3];

Every shot returns 1011 (in Qiskit’s reverse bit order that’s 1101 read right-to-left). → Try it in the playground — paste the QASM above.

Qiskit implementation

from qiskit import QuantumCircuit
from qiskit_aer import AerSimulator

def bv_circuit(s: str) -> QuantumCircuit:
    n = len(s)
    qc = QuantumCircuit(n + 1, n)

    qc.x(n)                 # ancilla = |1⟩
    qc.h(range(n + 1))      # Hadamards everywhere

    # Oracle: one CNOT per '1' bit in s (indexed so s[0] is the MSB)
    for i, bit in enumerate(reversed(s)):
        if bit == "1":
            qc.cx(i, n)

    qc.h(range(n))          # Hadamards back on inputs
    qc.measure(range(n), range(n))
    return qc

s = "1011"
qc = bv_circuit(s)
counts = AerSimulator().run(qc, shots=1).result().get_counts()
print(f"Hidden s = {s}, recovered = {next(iter(counts))}")
# Hidden s = 1011, recovered = 1011

Why the $n = 1$ classical vs quantum separation looks so stark

If you hand a classical algorithm an oracle for $f(x) = s \cdot x$ on $n = 1000$ bits, it will make exactly 1000 queries to learn $s$ . The quantum algorithm makes one. That’s a 1000-to-1 query speedup — exponential in the speedup factor, though not in the scaling (both are polynomial in $n$ in this problem).

The deeper win: the speedup holds against randomized classical algorithms too. Unlike Deutsch-Jozsa, there’s no way for a classical algorithm to get around the information-theoretic lower bound of $n$ queries — each classical query returns only one bit and you need $n$ bits of answer. Quantum parallelism + phase kickback + Hadamard readout returns $n$ bits in one go.

Simon’s algorithm: the real exponential speedup

Problem. You’re given a function $f: \{0,1\}^n \to \{0,1\}^n$ as a black box, with the promise that there exists a hidden nonzero string $s \in \{0,1\}^n$ such that

f(x) = f(y) \quad \iff \quad x = y \text{ or } x = y \oplus s.

In words: $f$ is 2-to-1, and every pair of colliding inputs differs by XOR of the hidden $s$ . Your job: find $s$ .

Classical bound. $\Theta(2^{n/2})$ queries by the birthday paradox — you have to keep querying $f$ until you happen to see the same output twice, and then $x_1 \oplus x_2 = s$ . On $n = 40$ that’s already 2 million queries.

Quantum bound. $O(n)$ queries on average. That’s exponential separation against even randomized classical algorithms. No asterisks.

Circuit

Simon uses two $n$ -qubit registers. Call them the input register and the output register:

Prepare $|0^n\rangle|0^n\rangle$ .
Apply $H^{\otimes n}$ to the input register:

\frac{1}{\sqrt{2^n}}\sum_x |x\rangle|0^n\rangle.

Apply the oracle $U_f$ which maps $|x\rangle|0^n\rangle \to |x\rangle|f(x)\rangle$ :

\frac{1}{\sqrt{2^n}}\sum_x |x\rangle|f(x)\rangle.

Measure the output register (and discard the result). This step is subtle and important.

Because $f$ is 2-to-1, every output $f(x)$ comes from exactly two inputs $\{x_0, x_0 \oplus s\}$ . Measuring the output collapses the input register to an equal superposition over that pair:

\frac{1}{\sqrt{2}}\big(|x_0\rangle + |x_0 \oplus s\rangle\big).

You don’t know $x_0$ — that’s hidden randomness — but the structure " $\{x_0, x_0 \oplus s\}$ " is exactly what exposes $s$ .

Apply $H^{\otimes n}$ to the input register. After a page of algebra:

\frac{1}{\sqrt{2^n}}\sum_{y : y \cdot s = 0} (-1)^{y \cdot x_0}|y\rangle.

Every $y$ in the superposition has the property $y \cdot s = 0 \pmod 2$ . The random phase $(-1)^{y \cdot x_0}$ is irrelevant — when you measure, you get some $y$ with $y \cdot s = 0 \pmod 2$ .

Measure the input register. You get a random $y$ orthogonal to $s$ over $\mathbb{F}_2$ .

Classical post-processing

One run of the quantum circuit gives you one random linear constraint $y \cdot s = 0 \pmod 2$ . Run the circuit $n-1$ times (or slightly more, to be robust) and you’ll have $n-1$ random linear constraints. Solve them as a linear system over $\mathbb{F}_2$ and you get $s$ (up to the trivial solution $s = 0$ , which the problem promise rules out).

Linear systems over $\mathbb{F}_2$ can be solved by Gaussian elimination in $O(n^3)$ classical operations. Total quantum queries: $O(n)$ . Total classical post-processing: $O(n^3)$ .

OpenQASM for Simon with $s = 11$

$n = 2$ , smallest meaningful instance. Pick a specific $f$ : $f(00) = f(11) = 00$ , $f(01) = f(10) = 01$ — collisions at differ-by- $s$ pairs.

OPENQASM 2.0;
include "qelib1.inc";

// Simon, s = 11. 2 input qubits, 2 output qubits.
qreg q[4];
creg c[2];

// Prep & Hadamards on input register
h q[0]; h q[1];

// Oracle: f(x0, x1) = (x0 XOR x1, 0). Collisions at x and x XOR 11.
cx q[0], q[2];
cx q[1], q[2];

// Measure output register (qubits 2,3) first, informally
// (skipping the classical measurement step here; the collapse still happens)

// Hadamards back on input
h q[0]; h q[1];

// Measure input register
measure q[0] -> c[0];
measure q[1] -> c[1];

Every shot returns either 00 or 11 (because the only $y$ with $y \cdot 11 = 0 \pmod 2$ are $00$ and $11$ ). → Try it in the playground.

Qiskit implementation with classical post-processing

import numpy as np
from qiskit import QuantumCircuit
from qiskit_aer import AerSimulator

def simon_circuit(n: int, oracle: QuantumCircuit) -> QuantumCircuit:
    qc = QuantumCircuit(2 * n, n)
    qc.h(range(n))                  # uniform superposition on inputs
    qc.compose(oracle, inplace=True) # apply oracle
    qc.h(range(n))                  # Hadamards back
    qc.measure(range(n), range(n))
    return qc

def simon_oracle(n: int, s: str) -> QuantumCircuit:
    """Oracle for a Simon function with hidden shift s."""
    oc = QuantumCircuit(2 * n)
    # Copy x → output register
    for i in range(n):
        oc.cx(i, n + i)
    # Find first '1' bit of s (MSB in our convention)
    first_one = None
    for i, bit in enumerate(reversed(s)):
        if bit == "1":
            first_one = i
            break
    # Controlled on that bit, XOR s into output → creates the 2-to-1 collision pattern
    for i, bit in enumerate(reversed(s)):
        if bit == "1":
            oc.cx(first_one, n + i)
    return oc

def solve_f2_system(ys: list[str]) -> str | None:
    """Find a nonzero s such that y · s = 0 mod 2 for every y. Gaussian elimination over F_2."""
    n = len(ys[0])
    A = np.array([[int(b) for b in y] for y in ys], dtype=np.int8)
    # Row-reduce
    rows = A.tolist()
    row, col = 0, 0
    while row < len(rows) and col < n:
        pivot = next((i for i in range(row, len(rows)) if rows[i][col]), None)
        if pivot is None:
            col += 1
            continue
        rows[row], rows[pivot] = rows[pivot], rows[row]
        for i in range(len(rows)):
            if i != row and rows[i][col]:
                rows[i] = [(a ^ b) for a, b in zip(rows[i], rows[row])]
        row += 1; col += 1
    # Find a null-space vector
    pivots = set()
    for r in rows:
        for j in range(n):
            if r[j]:
                pivots.add(j); break
    free = [j for j in range(n) if j not in pivots]
    if not free:
        return None
    s = [0] * n
    s[free[0]] = 1
    for r in rows:
        pivot_col = next((j for j in range(n) if r[j]), None)
        if pivot_col is None:
            continue
        s[pivot_col] = sum(r[j] * s[j] for j in range(n)) % 2
    return "".join(str(b) for b in s)

# --- Run ---
n = 3
s = "110"
oracle = simon_oracle(n, s)
qc = simon_circuit(n, oracle)

sim = AerSimulator()
ys = []
shots = 0
while len(ys) < n - 1 and shots < 200:
    counts = sim.run(qc, shots=1).result().get_counts()
    y = next(iter(counts))
    shots += 1
    if y != "0" * n and y not in ys:
        # Sanity check: y · s should be 0 mod 2
        if sum(int(a) * int(b) for a, b in zip(y, s)) % 2 == 0:
            ys.append(y)

recovered = solve_f2_system(ys)
print(f"Hidden s = {s}, shots used = {shots}, recovered = {recovered}")
# Hidden s = 110, shots used = ~3, recovered = 110

Why Simon matters more than Bernstein-Vazirani

BV’s speedup is dramatic-looking (1 vs $n$ ) but polynomial: classical $n$ queries is already efficient. Simon’s speedup is exponential ( $n$ vs $2^{n/2}$ ) in a setting where no classical algorithm — randomized, deterministic, anything — can do better than $O(2^{n/2})$ . This is the first genuine exponential quantum-vs-classical separation in query complexity that holds against randomized algorithms.

Shor’s algorithm is, structurally, Simon’s algorithm with $\mathbb{Z}_2^n$ replaced by $\mathbb{Z}_N$ and the hidden XOR-shift replaced by a hidden multiplicative period. The analogy is precise: Simon’s hidden subgroup is $\{0, s\}$ ; Shor’s hidden subgroup is $\{0, r, 2r, 3r, ...\} \pmod N$ . Same idea, bigger group, bigger trouble for RSA.

Exercises

1. Verify BV algebraically

For $s = 101$ on $n = 3$ , hand-trace the state through the BV circuit and confirm the final measurement yields $s$ with probability 1.

Show answer

After phase kickback, input register is $\tfrac{1}{\sqrt{8}}\sum_x (-1)^{s \cdot x}|x\rangle$ with $s = 101$ . The sign of $|x_2 x_1 x_0\rangle$ is $(-1)^{x_2 \oplus x_0}$ . Hadamard transform gives $|s\rangle = |101\rangle$ . Measurement gives 101 with probability 1.

2. Simon with a specific oracle

Write out the Simon oracle for $n = 3$ , $s = 110$ , as a circuit. Run it in Qiskit and collect 10 random $y$ ‘s. Verify every $y$ satisfies $y \cdot s = 0 \pmod 2$ .

Show hint

Use simon_oracle(3, "110") from the code above. Run 10 shots, check each outcome: $y_0 y_1 y_2$ should satisfy $y_0 + y_1 = 0 \pmod 2$ . Expected $y$ ‘s: $000, 001, 110, 111$ .

3. Classical birthday attack

Implement a naive classical algorithm for Simon that just queries $f$ at random inputs until it sees a collision. Measure the number of queries needed for $n = 10$ over 100 runs. Compare to $2^{n/2} = 32$ .

Hint

import random

def classical_simon(f, n, trials=100):
    lengths = []
    for _ in range(trials):
        seen = {}
        count = 0
        while True:
            x = random.randint(0, 2**n - 1)
            count += 1
            if x in seen: continue
            y = f(x)
            if y in seen.values():
                lengths.append(count); break
            seen[x] = y
    return sum(lengths) / len(lengths)

You’ll see averages around $1.25 \cdot 2^{n/2}$ , consistent with the birthday bound.

4. Think: what Simon leaves out

Simon’s algorithm gives you $s$ given a specific promise about $f$ . What happens if the promise is violated — e.g., $f$ is actually 4-to-1? Would the algorithm fail silently or detectably?

Show answer

If $f$ is 4-to-1, measuring the output register collapses the input register to an equal superposition over 4 inputs (not 2). The subsequent Hadamard and measurement returns a $y$ orthogonal to a hidden subgroup of size 4, not a single shift $s$ . The algorithm “succeeds” in returning such $y$ ‘s, but the linear system you’d solve to extract $s$ is inconsistent — Gaussian elimination fails or returns multiple solutions. So you’d notice, but only in post-processing. This is why real uses of Simon check the promise holds before trusting results.

What you should take away

Bernstein-Vazirani retrieves an $n$ -bit secret in one oracle query (vs $n$ classically). Same circuit as Deutsch-Jozsa, different interpretation of the output bit string.
Simon’s algorithm finds a hidden XOR-shift in $O(n)$ queries (vs $\Theta(2^{n/2})$ classically). The first exponential quantum speedup holding against randomized classical algorithms.
Simon → Shor. The same algorithm with $\mathbb{Z}_2^n$ replaced by $\mathbb{Z}_N$ and XOR-period replaced by multiplicative-period is Shor’s factoring algorithm.
Classical post-processing matters. Simon returns random linear constraints; you recover the answer by Gaussian elimination over $\mathbb{F}_2$ .

Next: Grover’s search. The other great query-complexity algorithm — $O(\sqrt{N})$ vs $O(N)$ — and the amplitude amplification technique that generalizes it far beyond search.

Bernstein-Vazirani: find the hidden string in one query

Circuit

OpenQASM for BV with s=1011s = 1011s=1011

Qiskit implementation

Why the n=1n = 1n=1 classical vs quantum separation looks so stark

Simon’s algorithm: the real exponential speedup

Circuit

Classical post-processing

OpenQASM for Simon with s=11s = 11s=11

Qiskit implementation with classical post-processing

Why Simon matters more than Bernstein-Vazirani

Exercises

1. Verify BV algebraically

2. Simon with a specific oracle

3. Classical birthday attack

4. Think: what Simon leaves out

What you should take away

Quantum, for people who already code.

OpenQASM for BV with $s = 1011$

Why the $n = 1$ classical vs quantum separation looks so stark

OpenQASM for Simon with $s = 11$