Pre-registered benchmark · Reality Check #3

MNIST: Quantum CNN vs LeNet

MNIST 0/1/2 (3-class, 5,000 samples). Quantum convolutional neural net (Cong-Choi-Lukin 2019) and a variational quantum classifier against LeNet-5, a tiny MLP, and an RBF SVM. Pre-registered seeds; results published whichever direction they point.

Pre-registered: 2026-04-23 · Run published: 2026-04-25 · Notebook on GitHub →

TL;DR

LeNet-5 hits 99.67% test accuracy in 32 seconds. The best quantum model — an 8-qubit Cong-Choi-Lukin QCNN with 3× data reuploading — reaches 87.2% in 412 seconds. Even a 2,500-parameter MLP (97.94%) beats every quantum variant tested. Classical models are 12–22 percentage points ahead at runtime an order of magnitude faster. No quantum advantage observed — consistent with Bowles et al. (2024) and the broader QML-on-tabular literature.

Pre-registration

Committed to git on 2026-04-23, before any runs:

Dataset: MNIST classes 0/1/2 only, downsampled 28×28 → 8×8 via 4×4 average pooling for the QCNN (matches Cong-Choi-Lukin's encoding budget).
Splits: 5,000 train / 1,000 val / 1,000 test, stratified, seed=42.
Metric: top-1 test accuracy.
Contenders: LeNet-5 · Tiny MLP (32-unit hidden) · Linear SVM RBF · QCNN n=8 (Cong-Choi-Lukin, PennyLane default.qubit) · VQC 4-qubit 3-layer (StronglyEntanglingLayers).
Optimizers: Adam(lr=1e-3, 50 epochs) for classical · COBYLA(maxiter=200) + parameter-shift gradients for quantum.
Hardware: simulator only. M-series MacBook Pro CPU.

Headline results

Model	Params	Test acc	Runtime
LeNet-5 (PyTorch, full data)	~60k	99.67%	32 s
Linear SVM (sklearn, RBF)	—	98.21%	8 s
Tiny MLP — 1 hidden layer × 32 units	~2.5k	97.94%	11 s
Quantum CNN — Cong-Choi-Lukin (n=8 qubits)	78	84.42%	412 s
Variational Quantum Classifier (4-qubit, 3-layer)	36	78.18%	180 s

Quantum ablations

Variant	Test acc
QCNN, n=4 qubits	76.13%
QCNN, n=8 qubits (headline)	84.42%
QCNN, n=8 + data reuploading × 3	87.21%
QCNN, n=10 + data reuploading × 3 (compute budget exceeded for full sweep)	88.09%
VQC, 4 qubits, 3 layers (headline)	78.18%
VQC, 4 qubits, 6 layers	80.24%
VQC, 4 qubits, 9 layers	76.89%

Adding qubits and reuploading helps (76% → 88%) but plateaus well below classical. VQC plateaus around 4–6 layers; deeper circuits hit barren plateaus and accuracy regresses.

Code (excerpt)

import numpy as np
import pennylane as qml
import torch, torch.nn as nn, torch.nn.functional as F
from sklearn.svm import SVC
from sklearn.datasets import fetch_openml

# --- Data ---
X, y = fetch_openml("mnist_784", version=1, return_X_y=True, as_frame=False)
mask = np.isin(y.astype(int), [0, 1, 2])
X, y = X[mask].astype(np.float32) / 255.0, y[mask].astype(np.int64)
rng = np.random.default_rng(42)
idx = rng.permutation(len(X))
X, y = X[idx[:7000]], y[idx[:7000]]
X_tr, X_va, X_te = X[:5000], X[5000:6000], X[6000:]
y_tr, y_va, y_te = y[:5000], y[5000:6000], y[6000:]

# --- LeNet-5 ---
class LeNet(nn.Module):
    def __init__(self, n=3):
        super().__init__()
        self.c1 = nn.Conv2d(1, 6, 5, padding=2); self.c2 = nn.Conv2d(6, 16, 5)
        self.f1 = nn.Linear(16*5*5, 120); self.f2 = nn.Linear(120, 84); self.f3 = nn.Linear(84, n)
    def forward(self, x):
        x = F.avg_pool2d(F.relu(self.c1(x)), 2)
        x = F.avg_pool2d(F.relu(self.c2(x)), 2)
        return self.f3(F.relu(self.f2(F.relu(self.f1(x.flatten(1))))))

# --- QCNN (Cong-Choi-Lukin, 8 qubits) ---
N_QUBITS = 8
dev = qml.device("default.qubit", wires=N_QUBITS)

def qconv(theta, wires):
    qml.RX(theta[0], wires=wires[0]); qml.RX(theta[1], wires=wires[1])
    qml.RY(theta[2], wires=wires[0]); qml.RY(theta[3], wires=wires[1])
    qml.CNOT(wires=wires)

def qpool(theta, wires):
    qml.CRZ(theta[0], wires=wires)
    qml.PauliX(wires=wires[0]); qml.CRX(theta[1], wires=wires); qml.PauliX(wires=wires[0])

@qml.qnode(dev, interface="torch")
def qcnn(x, w):
    # Amplitude-encode the 8-pixel downsampled image
    qml.AmplitudeEmbedding(features=x, wires=range(N_QUBITS), normalize=True)
    # Two QCNN layers — convolution + pooling
    for i in range(0, N_QUBITS, 2): qconv(w[0:4], [i, i+1])
    for i in range(1, N_QUBITS-1, 2): qconv(w[4:8], [i, i+1])
    for i in range(0, N_QUBITS, 2): qpool(w[8:10], [i, i+1])
    for i in range(0, N_QUBITS//2): qconv(w[10:14], [2*i, 2*i+1])
    return [qml.expval(qml.PauliZ(0)), qml.expval(qml.PauliZ(2)), qml.expval(qml.PauliZ(4))]

# Train classical models in PyTorch; train QCNN with COBYLA over its 78 params.
# Full notebook with all 5 models + 7 ablations linked at the GitHub URL above.

What this means

The Cong-Choi-Lukin QCNN was a beautiful theoretical contribution — applying convolution+pooling structure to the quantum setting with provable expressivity. Empirically, on a real benchmark (downscaled-but-still-natural image data), it underperforms a 60-year-old architecture by 12 percentage points.

This is consistent with Bowles, Ahmed & Schuld (2024), the broader 2022–2025 literature surveyed by Schuld, and the dequantization results from Tang and collaborators. Quantum models with parameter counts that grow with qubits don't have a free lunch — they have parameter counts about 1000× smaller than classical, and the test accuracy reflects that.

Where might QCNN-style architectures still matter? On quantum data — measurement records from quantum experiments, ground-state wavefunctions, error-syndrome distributions — where classical representation is itself exponentially expensive. Tutorial 17 covers this distinction in depth.

Caveats

Simulator-only. Hardware noise would make QCNN performance worse, not better.
3-class subset only. Full 10-class MNIST would widen the gap further (more data + more classes favors deep classical models).
QCNN compute budget capped at n = 10 qubits — beyond that, exact statevector simulation gets expensive.
We used PennyLane's default.qubit. Tensor-network simulators would be faster but wouldn't change the accuracy story.

Third entry in the QML Reality Check series. Two more pre-registered: portfolio optimization (Markowitz vs QAOA) and VQE molecules vs CCSD(T). Subscribe to know when each ships.