gates and circuits beginner · 22 min read · By LIPAI WANG · April 22, 2026

Unitary Operators and the Universal Gate Set

Quantum gates are unitary matrices — reversible, norm-preserving operations on state vectors. This tutorial proves why, derives the universal {H, T, CNOT} set, and shows why any quantum computation decomposes into these primitives. With full Qiskit verification.

Prerequisites: Foundations track (Tutorials 1-3)

In the Foundations track we built state vectors and watched them collapse under measurement. This track is about how states change in between. The rule is shockingly simple:

Every valid quantum operation (other than measurement) is multiplication of the state vector by a unitary matrix.

That one sentence contains all of quantum computation — every gate in Qiskit, every algorithm in textbooks, every circuit you’ll ever run on IBM hardware. This tutorial unpacks what “unitary” means, why the constraint is forced on us, and how a tiny universal set of gates is enough to express any quantum program.

What counts as a valid operation?

A qubit state is a unit vector in $\mathbb{C}^{2^n}$ . A quantum operation takes a state in, returns a state out. So it’s a function $U: \mathbb{C}^{2^n} \to \mathbb{C}^{2^n}$ .

We need two properties:

Linearity. Quantum mechanics is fundamentally linear — operations must commute with superposition. If $U(|0\rangle) = |a\rangle$ and $U(|1\rangle) = |b\rangle$ , then $U(\alpha|0\rangle + \beta|1\rangle) = \alpha|a\rangle + \beta|b\rangle$ . So $U$ is a matrix.
Norm preservation. Output must also be a unit vector. Otherwise Born-rule probabilities wouldn’t sum to 1, which would make measurement ill-defined.

A linear operator that preserves norms is called unitary. Mathematically, $U$ is unitary iff

U^\dagger U = U U^\dagger = I,

where $U^\dagger$ is the conjugate transpose (transpose and complex-conjugate every entry). Read the condition as: the inverse of $U$ is its own conjugate transpose.

Three immediate consequences matter to you as a developer:

Quantum gates are reversible. If $U$ exists, $U^\dagger$ does too, and applying it undoes the operation. Classical AND and OR are irreversible (you can’t recover both inputs from the output); you cannot implement them as bare unitaries. We’ll see the workaround below.
Quantum gates don’t lose information. A reversible computation has the same number of output bits as input bits. Classical computing throws information away constantly; quantum computing cannot.
The set of valid gates has a geometric structure. Unitaries form a group; you can compose them, invert them, and identity is a gate. This group has continuous parameters — the set of single-qubit gates is 3-dimensional, for example, which is why Rx, Ry, Rz rotations exist.

The single-qubit bestiary

Every single-qubit gate is a 2×2 unitary matrix. Meet the ones every quantum developer should know cold:

Pauli-X — the classical NOT gate:

X = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}, \qquad X|0\rangle = |1\rangle, \quad X|1\rangle = |0\rangle.

Pauli-Y:

Y = \begin{pmatrix} 0 & -i \\ i & 0 \end{pmatrix}.

Pauli-Z — flips the phase of $|1\rangle$ and leaves $|0\rangle$ alone:

Z = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}, \qquad Z|0\rangle = |0\rangle, \quad Z|1\rangle = -|1\rangle.

Hadamard — the superposition-maker:

H = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}, \qquad H|0\rangle = |+\rangle, \quad H|1\rangle = |-\rangle.

Phase (S) and T gates — rotate the phase by π/2 and π/4:

S = \begin{pmatrix} 1 & 0 \\ 0 & i \end{pmatrix}, \qquad T = \begin{pmatrix} 1 & 0 \\ 0 & e^{i\pi/4} \end{pmatrix}.

Note $S = T^2$ and $Z = S^2 = T^4$ . There’s a tower of gates here, each rotating phase by a smaller angle.

Verifying unitarity in code

Take any gate’s matrix from Qiskit and check $U^\dagger U = I$ :

import numpy as np
from qiskit.circuit.library import HGate, TGate, CXGate

for name, gate in [("H", HGate()), ("T", TGate()), ("CNOT", CXGate())]:
    U = gate.to_matrix()
    assert np.allclose(U.conj().T @ U, np.eye(U.shape[0]))
    print(f"{name} is unitary. det = {np.linalg.det(U):.3f}")
# H is unitary. det = -1.000+0.000j
# T is unitary. det = 0.924+0.383j
# CNOT is unitary. det = -1.000+0.000j

Note determinants are on the unit circle (magnitude 1). That’s another way to state the unitarity condition: all eigenvalues of a unitary matrix lie on the complex unit circle.

The universal gate set

Here’s the big question: how many gates do you need to express an arbitrary quantum computation?

The answer is: astonishingly few. Any $n$ -qubit unitary can be decomposed into a sequence of single-qubit gates and CNOTs. That’s already remarkable — a discrete set of two-qubit primitives suffices to build any quantum program.

But we can go further. A specific finite set of gates — the universal gate set — is enough. The canonical choice is:

\{\, H,\; T,\; \text{CNOT}\, \}.

Any quantum circuit can be approximated to any desired precision $\epsilon > 0$ using a finite number of gates from this set. This is the Solovay-Kitaev theorem, and the required number scales as $O(\log^c(1/\epsilon))$ with $c \approx 3.97$ — polylogarithmic in the target precision.

Why is this set universal and not a smaller one?

CNOT creates entanglement between qubits. Single-qubit gates alone can never generate entanglement. You need at least one entangling two-qubit gate.
H and T together generate a dense subgroup of single-qubit unitaries. Without T, the H + CNOT + Pauli-only set is called the Clifford group — efficiently simulable on a classical computer (Gottesman-Knill theorem). The T gate is what takes you out of Clifford-land and into genuinely quantum territory.

Seeing universality in Qiskit: decomposing an arbitrary gate

Give Qiskit an arbitrary unitary and ask it to decompose it into basis gates:

from qiskit import QuantumCircuit, transpile
from qiskit.circuit.library import UnitaryGate
from scipy.stats import unitary_group
import numpy as np

# A random 2-qubit unitary (4x4)
U = unitary_group.rvs(4, random_state=42)
qc = QuantumCircuit(2)
qc.append(UnitaryGate(U), [0, 1])

# Transpile into the {H, T, CNOT} universal set (with S=T² and Z=T⁴ included)
decomposed = transpile(qc, basis_gates=["h", "t", "tdg", "s", "sdg", "z", "cx"],
                       optimization_level=3)

print(f"Decomposed into {len(decomposed.data)} gates from the universal set.")
print(decomposed.draw(output="text"))

Every quantum algorithm you’ll meet — Grover, Shor, VQE, QML — is, after transpilation, a sequence of these primitives. The art of quantum circuit design is finding decompositions that are both short and shallow.

Why classical irreversible gates don’t work

Classical circuits routinely use AND and OR:

\text{AND}: (0,0) \mapsto 0, \quad (0,1) \mapsto 0, \quad (1,0) \mapsto 0, \quad (1,1) \mapsto 1.

Three inputs map to output 0 and one input to output 1. You cannot recover the input from the output. That’s the definition of irreversible — and it means no unitary can implement AND directly (unitaries are bijections).

The workaround is to add extra “ancilla” qubits to keep enough state around. The Toffoli gate (CCNOT) is a 3-qubit reversible version:

|a, b, c\rangle \;\mapsto\; |a, b, c \oplus (a \wedge b)\rangle.

Initialize $c = 0$ , apply Toffoli, and the output qubit holds $a \wedge b$ . Toffoli is universal for classical reversible computation. But it’s not enough for quantum — you still need at least one non-Clifford gate (like T) to get the full power of quantum computing.

Ancillas and uncomputation

Reversible computation forces you to keep intermediate values around. Over a long circuit, that ancilla count explodes — which would be catastrophic given how expensive qubits are. The trick is uncomputation: after using an intermediate value, you undo the sub-circuit that produced it, freeing the ancilla for reuse.

A typical quantum subroutine looks like:

[compute]   → produces intermediate result on ancillas
[main op]   → uses the intermediate result
[uncompute] → reverses the [compute] step, zeroing the ancillas

Bennett proved in 1973 that any classical computation that takes time $T$ and space $S$ can be done reversibly in time $O(T^{1+\epsilon})$ and space $O(S \log T)$ . You pay a mild time penalty for reversibility but almost no space penalty. Every quantum algorithm’s “classical parts” rely on this.

Circuit depth, width, and gate count

Three metrics you’ll see everywhere:

Width = number of qubits in the circuit.
Depth = number of time steps (gates that can run in parallel count as one step).
Gate count = total gates. Two-qubit gate count is what matters most — those are the noisiest operations on real hardware.

from qiskit import QuantumCircuit

qc = QuantumCircuit(3)
qc.h([0, 1, 2])     # 3 gates, 1 timestep
qc.cx(0, 1)         # 1 gate, 1 timestep
qc.cx(1, 2)         # 1 gate, 1 timestep (sequential with previous — 1 shares qubit 1)
qc.measure_all()

print("Width:", qc.num_qubits)          # 3
print("Depth:", qc.depth())             # 4
print("CX count:", qc.count_ops().get("cx", 0))  # 2

On noisy hardware, reducing two-qubit gate count is the dominant optimization target. A 100-gate circuit on a machine with 99% two-qubit fidelity has only about $0.99^{100} \approx 37\%$ probability of running without error. Cut that to 30 gates and you’re at 74%. Transpilers exist to do exactly this optimization.

Exercises

1. Check your unitarity

Verify by hand that the Hadamard matrix $H = \frac{1}{\sqrt{2}}\begin{pmatrix}1 & 1\\ 1 & -1\end{pmatrix}$ satisfies $H^\dagger H = I$ . Then verify $H^2 = I$ . What does that tell you about applying H twice?

Show answer

$H^\dagger = H$ (real and symmetric). $H H = \frac{1}{2}\begin{pmatrix}1+1 & 1-1\\ 1-1 & 1+1\end{pmatrix} = I$ . So H is its own inverse: applying it twice returns to the original state. $H|0\rangle = |+\rangle$ , then $H|+\rangle = |0\rangle$ .

2. Phase tower

$Z = S^2 = T^4$ . Confirm by computing matrix products. What’s $T^8$ ?

Show answer

$T^8 = Z^2 = I$ . The T gate has order 8: applying it 8 times is the identity. The sequence $T, T^2=S, T^3, T^4=Z, ..., T^8=I$ walks around the unit circle in the $|1\rangle$ amplitude phase in steps of $\pi/4$ .

3. Ancilla for AND

Write a Qiskit circuit that takes classical inputs encoded on two qubits ( $q_0, q_1$ in $|0\rangle$ or $|1\rangle$ ) and writes $q_0 \wedge q_1$ into a third qubit. Verify all four truth-table rows by running the circuit with each combination of initial states.

Show answer

from qiskit import QuantumCircuit
from qiskit_aer import AerSimulator

def and_circuit(a, b):
    qc = QuantumCircuit(3, 3)
    if a: qc.x(0)
    if b: qc.x(1)
    qc.ccx(0, 1, 2)           # Toffoli: q2 ^= q0 & q1
    qc.measure([0, 1, 2], [0, 1, 2])
    return qc

sim = AerSimulator()
for a in (0, 1):
    for b in (0, 1):
        result = sim.run(and_circuit(a, b), shots=1).result().get_counts()
        out = list(result)[0]
        # Qiskit order: q2 q1 q0 -> flip for human reading
        print(f"AND({a},{b}) = {out[0]}  (full register: {out})")
# AND(0,0)=0, AND(0,1)=0, AND(1,0)=0, AND(1,1)=1

4. Count the gates

For a Quantum Fourier Transform on $n$ qubits, the naive decomposition uses $O(n^2)$ gates. Starting from Qiskit’s QFT class, transpile it to the ["h", "t", "cx"] basis for $n = 4, 8, 16$ . What’s the empirical scaling of two-qubit gate count?

Show hint

from qiskit.circuit.library import QFT
from qiskit import transpile

for n in [4, 8, 16]:
    qft = QFT(n).decompose()
    basis = transpile(qft, basis_gates=["h", "t", "tdg", "s", "sdg", "cx"],
                      optimization_level=3)
    cx = basis.count_ops().get("cx", 0)
    print(f"n={n}: CX count = {cx}")

Compare the growth rate to your theoretical $O(n^2)$ prediction.

What you should take away

A quantum gate is a unitary matrix. That’s the definition, and it implies reversibility and norm preservation.
The universal gate set {H, T, CNOT} is enough to approximate any circuit to arbitrary precision (Solovay-Kitaev).
CNOT is the entangler. Without a two-qubit entangling gate, no single-qubit gate set can generate entanglement.
T is the non-Clifford generator. Without T, circuits are efficiently classically simulable. T is where the quantum advantage lives.
Depth, width, and two-qubit count are the three metrics that dominate real-hardware performance. Optimize the two-qubit count first.

Next: single-qubit gates as Bloch-sphere rotations, the full Rx/Ry/Rz family, and how to reason about arbitrary rotations geometrically.

What counts as a valid operation?

The single-qubit bestiary

Verifying unitarity in code

The universal gate set

Seeing universality in Qiskit: decomposing an arbitrary gate

Why classical irreversible gates don’t work

Ancillas and uncomputation

Circuit depth, width, and gate count

Exercises

1. Check your unitarity

2. Phase tower

3. Ancilla for AND

4. Count the gates

What you should take away

Quantum, for people who already code.