Pre-registered benchmark · Reality Check #5

VQE: molecules vs CCSD(T)

Ground-state energies of four small molecules in the minimal STO-3G basis. VQE-UCCSD and VQE-HEA on a quantum simulator, against CCSD(T) and DMRG on a classical solver. The first benchmark in this series where quantum has a structural reason to be competitive — and the result is more nuanced.

Pre-registered: 2026-04-23 · Run published: 2026-04-25 · Notebook on GitHub →

TL;DR — nuanced

VQE-UCCSD reaches chemical accuracy (< 1.594 mHa from FCI) on every molecule we tested. So does CCSD(T). DMRG reaches FCI exactly. All three methods tie at the level of accuracy that matters for chemistry. Hardware-Efficient Ansatz (HEA) misses chemical accuracy on every molecule. For STO-3G molecules of this size, classical CCSD(T) is faster, simpler, and equally accurate. VQE matches but does not beat — and runtime favors classical by 2–4 orders of magnitude. The interesting story is at larger basis sets and stronger correlation, where this benchmark stops short.

Pre-registration

Molecules: H₂ (R=0.735 Å), LiH (R=1.595 Å), BeH₂ (D∞h, R=1.326 Å), H₂O (R=0.958 Å, ∠HOH=104.51°).
Basis: STO-3G (minimal).
Reference: Full Configuration Interaction (FCI) computed via PySCF — the exact answer in this basis.
Contenders: CCSD(T) (PySCF) · DMRG (block2 / DMRGSCF) · VQE-UCCSD (PennyLane) · VQE-HEA (PennyLane, 4-layer StronglyEntanglingLayers).
Threshold: chemical accuracy = 1.594 mHa (1 kcal/mol). Method "passes" if error vs FCI < threshold.
Optimizers: COBYLA(maxiter=200) for VQE-UCCSD; SPSA(iter=300) for VQE-HEA.
Hardware: simulator only.

Energies (Hartree)

Molecule	FCI (exact)	CCSD(T)	DMRG	VQE-UCCSD	VQE-HEA
H₂	-1.13727	-1.13727	-1.13727	-1.13727	-1.13502
LiH	-7.88239	-7.88234	-7.88239	-7.88231	-7.86891
BeH₂	-15.59537	-15.59533	-15.59537	-15.59512	-15.57820
H₂O	-75.01153	-75.01147	-75.01153	-75.01098	—

Errors vs FCI (mHa)

Molecule	CCSD(T)	VQE-UCCSD	VQE-HEA
H₂	0.000	0.000	2.250
LiH	0.050	0.080	13.480
BeH₂	0.040	0.250	17.170
H₂O	0.060	0.550	—

Chemical-accuracy threshold = 1.594 mHa. Cells in green meet it; amber and red miss. VQE-UCCSD passes on every molecule. VQE-HEA fails uniformly. CCSD(T) and DMRG pass uniformly.

What this means

VQE-UCCSD matches classical CCSD(T) on small molecules in STO-3G — at chemical accuracy, the two are indistinguishable. This is actually the right outcome to celebrate cautiously: VQE is doing what it was designed to do.

But CCSD(T) takes ~10 ms per molecule on a laptop. VQE-UCCSD on a simulator takes 30 seconds to 4 minutes. On real quantum hardware with circuit noise, VQE on these molecules has been demonstrated successfully — but at much higher cost (1000+ shots × 100s of circuits × seconds per shot) and only because the underlying problem is small enough to verify against the exact answer.

The interesting question this benchmark deliberately doesn't answer: what happens at strongly-correlated systems (transition metal complexes, biradicals, breaking-bond regimes) where CCSD(T) is known to fail and DMRG scales poorly with dimensionality? That is the regime where quantum has a plausible structural advantage. STO-3G H₂O isn't it. We will publish a follow-up at cc-pVDZ benzene and FeMoco active space when fault-tolerant logical-qubit counts catch up — likely 2029-2031 per IBM Starling roadmap.

For now: use CCSD(T) or DMRG. VQE on small molecules is research infrastructure, not a tool.

Caveats

STO-3G is the smallest reasonable basis set. Larger bases (cc-pVDZ, cc-pVTZ) widen the gap between methods and are where quantum advantage debate is alive.
Simulator-only — perfect VQE optimization. Hardware noise would significantly degrade VQE-UCCSD energies.
VQE-HEA was only run with 4 entangling layers; deeper circuits hit barren plateaus and didn't improve.
VQE-UCCSD H₂O failed to converge in COBYLA-200 — used SLSQP-500 instead. Default optimizers matter for VQE.

Fifth and final entry in the initial QML Reality Check pre-registration. All five benchmarks now published; series continues with community-suggested datasets — see the sponsor page if your organization wants a specific benchmark run honestly.

Pre-registration

Energies (Hartree)

Errors vs FCI (mHa)

What this means

Caveats

Quantum, for people who already code.