variational intermediate · 23 min read · April 22, 2026

QAOA for Combinatorial Optimization

QAOA encodes a combinatorial problem as a cost Hamiltonian, prepares a variational state by alternating cost and mixer evolutions, and uses a classical optimizer to find approximate solutions. This tutorial derives the MaxCut case from scratch, runs it in Qiskit, and honestly compares QAOA to classical baselines like Goemans-Williamson.

Prerequisites: Tutorial 13: Variational Quantum Eigensolver

QAOA — the Quantum Approximate Optimization Algorithm — is VQE’s cousin, reshaped for combinatorial problems. It was proposed by Farhi, Goldstone, and Gutmann in 2014 as a near-term quantum heuristic for problems like MaxCut, graph coloring, Max-SAT, and portfolio optimization. The promise: use shallow circuits to produce approximately optimal solutions better than classical heuristics.

Ten years later, the reality is nuanced. QAOA with small depth ( $p = 1, 2$ ) often matches but doesn’t beat strong classical baselines. At deeper $p$ , the circuit gets hard to optimize. This tutorial walks through the math, ships a working implementation, and gives you the honest comparison: what QAOA can and can’t do in 2026.

The problem framing

Combinatorial optimization: find a bit string $z \in \{0,1\}^n$ that maximizes (or minimizes) a cost function $C(z) \to \mathbb{R}$ . MaxCut is the canonical instance: given a graph $G = (V, E)$ , partition $V$ into two sets to maximize the number of edges crossing the cut.

MaxCut cost function:

C(z) = \sum_{(i,j) \in E} \frac{1 - z_i z_j}{2} \qquad \text{with } z_i \in \{-1, +1\}.

Edge $(i,j)$ contributes 1 if $z_i \neq z_j$ (cut) and 0 if they match.

Encode the cost as a Hamiltonian

Map $z_i \to Z_i$ (Pauli Z eigenvalues are $\pm 1$ ). The cost Hamiltonian is:

H_C = \sum_{(i,j) \in E} \frac{1 - Z_i Z_j}{2}.

Eigenvalues of $H_C$ on computational-basis states equal the MaxCut values of the corresponding bit strings. The ground state of $H_C$ (highest energy actually, since we’re maximizing) is the optimal bit string encoded as a basis state.

The QAOA ansatz

Alternate two unitaries $p$ times:

Cost unitary. $U_C(\gamma) = e^{-i\gamma H_C}$ . For MaxCut: each edge contributes a $e^{-i\gamma (I - Z_iZ_j)/2}$ factor, implementable as two CNOTs around an $R_z(\gamma)$ gate.

Mixer unitary. $U_M(\beta) = e^{-i\beta H_M}$ , where $H_M = \sum_i X_i$ . Just $R_x(2\beta)$ on every qubit.

Full QAOA- $p$ circuit:

|\psi(\vec{\gamma}, \vec{\beta})\rangle = U_M(\beta_p)\,U_C(\gamma_p)\,\ldots\,U_M(\beta_1)\,U_C(\gamma_1)\,|+\rangle^{\otimes n}.

Initial state is the uniform superposition (easy to prepare with Hadamards). Classical optimizer finds $\vec{\gamma}, \vec{\beta}$ that maximize $\langle\psi|H_C|\psi\rangle$ . Then sample the final state to read off good bit strings.

Why this works (intuition)

At $p = \infty$ with appropriately slow-scheduled $\gamma$ and $\beta$ , QAOA becomes the quantum adiabatic algorithm — evolving slowly from the ground state of $H_M$ (uniform superposition) to the ground state of $H_C$ (optimal bit string). QAOA- $p$ is a truncation: $p$ steps of a discretized adiabatic evolution.

At small $p$ , the circuit is shallow enough to run on NISQ hardware, and the cost landscape in $(\vec\gamma, \vec\beta)$ is often surprisingly well-structured — you can find good solutions by parameter sweep or gradient descent.

MaxCut on a 4-node graph

Graph: cycle $C_4$ , edges $(0,1), (1,2), (2,3), (3,0)$ . Optimal cut: 4 (alternating color assignment). MaxCut value for a random cut = 2; for a uniformly random bit string, expected value 2 as well.

import numpy as np
from qiskit import QuantumCircuit
from qiskit.circuit import Parameter
from qiskit.quantum_info import SparsePauliOp
from qiskit.primitives import StatevectorEstimator, StatevectorSampler
from scipy.optimize import minimize


edges = [(0, 1), (1, 2), (2, 3), (3, 0)]
n = 4


def maxcut_hamiltonian(edges, n) -> SparsePauliOp:
    """H_C = Σ_{(i,j)} (I - Z_i Z_j)/2"""
    terms = []
    for i, j in edges:
        paulis = ["I"] * n
        paulis[i] = "Z"; paulis[j] = "Z"
        terms.append(("".join(reversed(paulis)), -0.5))
        terms.append(("I" * n, 0.5))
    return SparsePauliOp.from_list(terms).simplify()


H_C = maxcut_hamiltonian(edges, n)


def qaoa_circuit(p: int) -> tuple[QuantumCircuit, list[Parameter]]:
    gammas = [Parameter(f"γ{i}") for i in range(p)]
    betas = [Parameter(f"β{i}") for i in range(p)]
    qc = QuantumCircuit(n)
    qc.h(range(n))                               # |+⟩^n
    for layer in range(p):
        # Cost unitary: e^{-i γ H_C}
        for i, j in edges:
            qc.cx(i, j); qc.rz(2 * gammas[layer], j); qc.cx(i, j)
        # Mixer: e^{-i β H_M}
        for q in range(n):
            qc.rx(2 * betas[layer], q)
    return qc, gammas + betas


p = 2
ansatz, params = qaoa_circuit(p)
estimator = StatevectorEstimator()


def cost(theta: np.ndarray) -> float:
    bound = ansatz.assign_parameters(dict(zip(params, theta)))
    ev = estimator.run([(bound, H_C, [])]).result()[0].data.evs
    return -float(ev)                            # minimize -⟨H_C⟩ to maximize


theta0 = np.array([0.5, 0.5, 0.3, 0.3])           # [γ1, γ2, β1, β2]
result = minimize(cost, theta0, method="COBYLA", options={"maxiter": 200, "rhobeg": 0.2})

print(f"QAOA-{p} expected cut: {-result.fun:.4f}")
print(f"Optimal cut for C₄:   4")
print(f"Random cut (baseline): 2")
# QAOA-2 expected cut: 3.9998 ← basically optimal

At $p = 2$ on a 4-cycle, QAOA finds the optimal cut essentially exactly. The optimizer converges in ~60 iterations.

Sample the final state for a concrete bit string:

from qiskit.primitives import StatevectorSampler

bound = ansatz.assign_parameters(dict(zip(params, result.x)))
bound_meas = bound.copy(); bound_meas.measure_all()
sampler = StatevectorSampler()
counts = sampler.run([(bound_meas, [])], shots=2048).result()[0].data.meas.get_counts()

for bits, cnt in sorted(counts.items(), key=lambda kv: -kv[1])[:4]:
    z = [1 if b == "0" else -1 for b in reversed(bits)]
    cut_value = sum((1 - z[i] * z[j]) / 2 for i, j in edges)
    print(f"|{bits}⟩: {cnt:>4d} shots, cut = {cut_value}")
# |0101⟩: 1023 shots, cut = 4
# |1010⟩: 1021 shots, cut = 4
# |0110⟩:    2 shots, cut = 2
# ...

The two optimal cuts ( $0101$ and $1010$ , the two alternating-color assignments) dominate the distribution.

→ Try a small MaxCut in the playground — paste the cost-unitary QASM and see the diagram.

Benchmarking against Goemans-Williamson

The canonical classical baseline for MaxCut is the Goemans-Williamson (GW) SDP-based approximation algorithm, which achieves a worst-case ratio of $\alpha_{GW} \approx 0.878$ — the cut it returns has value at least $0.878 \times \text{OPT}$ in expectation. This is provably optimal under the Unique Games Conjecture; no polynomial-time classical algorithm can do better in the worst case.

How does QAOA compare?

QAOA-1 (p=1): worst-case approximation ratio $\approx 0.692$ (Farhi 2014). Worse than GW.
QAOA-2 (p=2): average-case ratios around $0.79$ on random 3-regular graphs. Still below GW.
QAOA- $p$ at large $p$ : ratios approach 1 on many benchmark families, but you need depth that exceeds NISQ capabilities.
Warm-started QAOA (Egger et al. 2020): initialize parameters from the GW solution; empirically beats GW on specific instances. Still requires running GW first.

Scaling up — Max- $k$ -SAT, portfolio optimization

The QAOA framework is fully general. Any problem expressible as

H_C = \sum_k c_k\,\sigma_k

(where each $\sigma_k$ is a Pauli string) fits. Examples:

Max- $k$ -SAT. Each clause $C$ contributes a tensor-product-of- $Z$ -operator term; you build the cost Hamiltonian from the clauses.
Portfolio optimization. Markowitz mean-variance $H = -\mu^T z + \lambda z^T \Sigma z$ . The covariance term $z^T \Sigma z$ is quadratic in binary variables, hence expressible as a weighted sum of $Z_i Z_j$ and $Z_i$ terms.
Graph coloring. Encode each vertex’s color in a one-hot binary vector; penalize constraint violations.
Traveling Salesman. Requires $O(n^2)$ qubits for $n$ cities; large overhead. Not competitive below fault tolerance.

When to use QAOA

Honest guidance for indie developers and small teams:

Prototyping hybrid quantum-classical workflows. QAOA teaches the full hybrid stack; the lessons transfer.
Education. Any optimization tutorial is cleaner when the problem has both a classical solver and a quantum one you can compare.
Research. Algorithmic improvements to QAOA (warm starts, adaptive schedules, symmetry exploitation) are active areas.
Production: not yet. Classical heuristics (simulated annealing, Gurobi, specialized graph algorithms) beat QAOA at problem sizes big enough to matter. Revisit in 2028-2030 with fault-tolerant hardware.

Parameter landscapes

A practical curiosity: the QAOA landscape is more structured than you’d expect. For MaxCut on random graphs, Farhi et al. showed the optimal $(\vec\gamma, \vec\beta)$ concentrate around typical values that generalize across instances. Transferring parameters from one small instance to a larger one often works — called parameter transferability or instance concentration.

This means you can:

Find good parameters on a small classical simulator (up to $\sim 20$ qubits).
Transfer to hardware for larger instances (up to $\sim 100$ qubits).
Skip the expensive optimization on the noisy hardware.

This is the single biggest practical optimization when running QAOA on real hardware.

Exercises

1. Cost Hamiltonian by hand

Write the MaxCut cost Hamiltonian for the triangle graph $K_3$ (three nodes, three edges). Expand and identify the ground state by inspection.

Show answer

$H_C = \tfrac{3}{2}I - \tfrac{1}{2}(Z_0 Z_1 + Z_1 Z_2 + Z_0 Z_2)$ . Max cut value = 2 (any bipartite split of 3 nodes cuts 2 edges). Every bit string except $|000\rangle$ and $|111\rangle$ achieves 2. Degenerate ground state of $-H_C$ is the 6-dimensional subspace of non-constant bit strings.

2. QAOA-1 approximation

Implement QAOA at $p = 1$ for a 5-node random 3-regular graph. Measure the average approximation ratio vs GW over 10 random instances.

Show approach

import networkx as nx
import cvxpy as cp  # for GW SDP

def goemans_williamson(G):
    n = G.number_of_nodes()
    X = cp.Variable((n, n), symmetric=True)
    constraints = [X >> 0] + [X[i, i] == 1 for i in range(n)]
    obj = 0.5 * sum(1 - X[i, j] for i, j in G.edges())
    cp.Problem(cp.Maximize(obj), constraints).solve()
    # Randomized rounding + average
    V = np.linalg.cholesky(X.value + 1e-8 * np.eye(n))
    # ... random hyperplane

Compare QAOA-1 average cut to GW average cut over 10 instances.

3. Mixer alternatives

The default mixer $H_M = \sum_i X_i$ explores the full $2^n$ bit-string space. For problems with symmetry (e.g., fixed-weight bit strings), a more structured mixer is better. Think: what’s the mixer for problems where you need exactly 3 ones among 8 bits? (Hint: look up “XY mixer” or “ring mixer.”)

Show sketch

Use $H_M = \sum_i (X_i X_{i+1} + Y_i Y_{i+1})$ — the XY (swap-preserving) mixer. This preserves total Hamming weight, so the state stays in the “exactly 3 ones” subspace throughout the evolution. Useful for graph coloring and portfolio selection problems.

4. Compare depth

Run QAOA-1, QAOA-2, QAOA-4 on the 4-cycle. Tabulate: (a) number of parameters, (b) circuit depth, (c) best expected cut, (d) wall-clock optimization time. Is more depth always better?

Expected

QAOA-1: 2 params, depth ~8, expected cut ~3.0. QAOA-2: 4 params, depth ~16, expected cut 4.0. QAOA-4: 8 params, depth ~32, expected cut 4.0 but optimizer stalls more often due to local minima. More depth doesn’t always help; the cost landscape gets more complex.

What you should take away

QAOA framework. Encode cost as Hamiltonian $H_C$ ; alternate $e^{-i\gamma H_C}$ and $e^{-i\beta \sum X}$ ; optimize $(\vec\gamma, \vec\beta)$ classically.
Works for many problems. MaxCut, SAT, portfolio optimization, coloring — anything expressible as a sum of Pauli strings.
Classical baseline is Goemans-Williamson (0.878 approximation for MaxCut). QAOA-1 is worse; QAOA- $p$ beats it only at impractical depth.
No quantum advantage demonstrated for real-world optimization as of 2026. Parameter transferability and warm starts are the closest wins.
Still worth learning for the hybrid-algorithm framework, which generalizes to VQE, QML, and near-term quantum software patterns.

Next: Quantum ML foundations — using parameterized circuits for classification, the encoding problem, and the parameter-shift rule for gradient-based training.