Pre-registered benchmark · Reality Check #5
VQE: molecules vs CCSD(T)
Ground-state energies of four small molecules in the minimal STO-3G basis. VQE-UCCSD and VQE-HEA on a quantum simulator, against CCSD(T) and DMRG on a classical solver. The first benchmark in this series where quantum has a structural reason to be competitive — and the result is more nuanced.
TL;DR — nuanced
VQE-UCCSD reaches chemical accuracy (< 1.594 mHa from FCI) on every molecule we tested. So does CCSD(T). DMRG reaches FCI exactly. All three methods tie at the level of accuracy that matters for chemistry. Hardware-Efficient Ansatz (HEA) misses chemical accuracy on every molecule. For STO-3G molecules of this size, classical CCSD(T) is faster, simpler, and equally accurate. VQE matches but does not beat — and runtime favors classical by 2–4 orders of magnitude. The interesting story is at larger basis sets and stronger correlation, where this benchmark stops short.
Pre-registration
- Molecules: H₂ (R=0.735 Å), LiH (R=1.595 Å), BeH₂ (D∞h, R=1.326 Å), H₂O (R=0.958 Å, ∠HOH=104.51°).
- Basis: STO-3G (minimal).
- Reference: Full Configuration Interaction (FCI) computed via PySCF — the exact answer in this basis.
- Contenders: CCSD(T) (PySCF) · DMRG (block2 / DMRGSCF) · VQE-UCCSD (PennyLane) · VQE-HEA (PennyLane, 4-layer StronglyEntanglingLayers).
- Threshold: chemical accuracy = 1.594 mHa (1 kcal/mol). Method "passes" if error vs FCI < threshold.
- Optimizers: COBYLA(maxiter=200) for VQE-UCCSD; SPSA(iter=300) for VQE-HEA.
- Hardware: simulator only.
Energies (Hartree)
| Molecule | FCI (exact) | CCSD(T) | DMRG | VQE-UCCSD | VQE-HEA |
|---|---|---|---|---|---|
| H₂ | -1.13727 | -1.13727 | -1.13727 | -1.13727 | -1.13502 |
| LiH | -7.88239 | -7.88234 | -7.88239 | -7.88231 | -7.86891 |
| BeH₂ | -15.59537 | -15.59533 | -15.59537 | -15.59512 | -15.57820 |
| H₂O | -75.01153 | -75.01147 | -75.01153 | -75.01098 | — |
Errors vs FCI (mHa)
| Molecule | CCSD(T) | DMRG | VQE-UCCSD | VQE-HEA |
|---|---|---|---|---|
| H₂ | 0.000 | 0.000 | 0.000 | 2.250 |
| LiH | 0.050 | 0.000 | 0.080 | 13.480 |
| BeH₂ | 0.040 | 0.000 | 0.250 | 17.170 |
| H₂O | 0.060 | 0.000 | 0.550 | — |
Chemical-accuracy threshold = 1.594 mHa. Cells in green meet it; amber and red miss. VQE-UCCSD passes on every molecule. VQE-HEA fails uniformly. CCSD(T) and DMRG pass uniformly.
What this means
VQE-UCCSD matches classical CCSD(T) on small molecules in STO-3G — at chemical accuracy, the two are indistinguishable. This is actually the right outcome to celebrate cautiously: VQE is doing what it was designed to do.
But CCSD(T) takes ~10 ms per molecule on a laptop. VQE-UCCSD on a simulator takes 30 seconds to 4 minutes. On real quantum hardware with circuit noise, VQE on these molecules has been demonstrated successfully — but at much higher cost (1000+ shots × 100s of circuits × seconds per shot) and only because the underlying problem is small enough to verify against the exact answer.
The interesting question this benchmark deliberately doesn't answer: what happens at strongly-correlated systems (transition metal complexes, biradicals, breaking-bond regimes) where CCSD(T) is known to fail and DMRG scales poorly with dimensionality? That is the regime where quantum has a plausible structural advantage. STO-3G H₂O isn't it. We will publish a follow-up at cc-pVDZ benzene and FeMoco active space when fault-tolerant logical-qubit counts catch up — likely 2029-2031 per IBM Starling roadmap.
For now: use CCSD(T) or DMRG. VQE on small molecules is research infrastructure, not a tool.
Caveats
- STO-3G is the smallest reasonable basis set. Larger bases (cc-pVDZ, cc-pVTZ) widen the gap between methods and are where quantum advantage debate is alive.
- Simulator-only — perfect VQE optimization. Hardware noise would significantly degrade VQE-UCCSD energies.
- VQE-HEA was only run with 4 entangling layers; deeper circuits hit barren plateaus and didn't improve.
- VQE-UCCSD H₂O failed to converge in COBYLA-200 — used SLSQP-500 instead. Default optimizers matter for VQE.
Fifth and final entry in the initial QML Reality Check pre-registration. All five benchmarks now published; series continues with community-suggested datasets — see the sponsor page if your organization wants a specific benchmark run honestly.