Spin is Not a Relativistic Phenomenon

Quantum mechanical spin is regularly described as a phenomenon that arises naturally only in relativistic physics. Here for example is a quote from What Remains to Be Discovered, a book written by a former editor of Nature:

Pauli was the first to guess at an explanation: spin, he argued, exists because of the way the time dimension must be added to the three dimensions of ordinary space to provide a relativistic description of objects. Spin is nature’s way of signaling the correctness of Einstein’s theory of relativity. [pp.72-3]

This sort of perspective is unfortunately not entirely correct, as explained over fifty years ago by Jean-Marc Lévy-Leblond. Spin can be shown to arise in a natural way in the completely nonrelativistic context of the Schrödinger equation. The presentation of this surprising insight is based on this paper, with a slightly different derivation.

The method used by Lévy-Leblond is to apply the traditional derivation of the Dirac equation to the Schrödinger equation. Dirac had the idea to derive first-order wave equations by factoring the second-order Klein-Gordon equation:

$[\frac{1}{c^{2}} \frac{\partial^{2}}{\partial t^{2}} - \sum_{k} \frac{\partial^{2}}{\partial x_{k}^{2}} + (\frac{m c}{ℏ})^{2}] ψ = 0$

The sum here runs over all spatial variables in the system to which the equation is applied. In practice that is three dimensions, but it is worth keeping in mind that the following procedure can be carried out for other choices using Clifford algebras.

Physicists generally set $ℏ = c = 1$ in this equation to simplify algebra. Explicit constants are retained here to facilitate comparison with the nonrelativistic derivation to come.

To get the Dirac equation, form the linear combinations with constant coefficients

$\begin{array}{l} γ^{0} \frac{1}{c} \frac{\partial}{\partial t} + γ^{k} \frac{\partial}{\partial x_{k}} + i \frac{m c}{ℏ} \\ γ^{0} \frac{1}{c} \frac{\partial}{\partial t} + γ^{k} \frac{\partial}{\partial x_{k}} - i \frac{m c}{ℏ} \end{array}$

where repeated spatial indices represent sums over those indices. Multiply these combinations:

$\begin{array}{l} (γ^{0} \frac{1}{c} \frac{\partial}{\partial t} + γ^{k} \frac{\partial}{\partial x_{k}} + i \frac{m c}{ℏ}) (γ^{0} \frac{1}{c} \frac{\partial}{\partial t} + γ^{l} \frac{\partial}{\partial x_{l}} - i \frac{m c}{ℏ}) \\ = (γ^{0})^{2} \frac{1}{c^{2}} \frac{\partial^{2}}{\partial t^{2}} + (γ^{k})^{2} \frac{\partial^{2}}{\partial x_{k}^{2}} + (\frac{m c}{ℏ})^{2} \\ + (γ^{0} γ^{k} + γ^{k} γ^{0}) \frac{1}{c} \frac{\partial^{2}}{\partial t \partial x_{k}} + (γ^{k} γ^{l} + γ^{l} γ^{k}) \frac{\partial^{2}}{\partial x_{k} \partial x_{l}} \end{array}$

The multiplicative order of the constant coefficients with different indices is deliberate: Dirac realized that he could factor the Klein-Gordon equation if the coefficients are noncommuting matrices with the following properties:

$\begin{array}{l} (γ^{0})^{2} = 1 \\ (γ^{k})^{2} = - 1 \end{array} \begin{array}{l} γ^{0} γ^{k} + γ^{k} γ^{0} = 0 \\ γ^{k} γ^{l} + γ^{l} γ^{k} = 0, k \neq l \end{array}$

With these choices, the Dirac equation is either of

$\begin{array}{l} [γ^{0} \frac{1}{c} \frac{\partial}{\partial t} + γ^{k} \frac{\partial}{\partial x_{k}} + i \frac{m c}{ℏ}] ψ = 0 \\ [γ^{0} \frac{1}{c} \frac{\partial}{\partial t} + γ^{k} \frac{\partial}{\partial x_{k}} - i \frac{m c}{ℏ}] ψ = 0 \end{array}$

While the first choice is conventional, either sign on the mass leads to equivalent physics.

There are several explicit representations of the gamma matrices, depending on the emphasis of the particular solution to the Dirac equation. For the standard solution in one temporal and three spatial dimensions, 4×4 matrices are necessary to accommodate all four variables. In the Dirac basis these matrices are

$γ^{0} = (\begin{array}{c} I & 0 \\ 0 & - I \end{array}) γ^{k} = (\begin{array}{c} 0 & σ_{k} \\ - σ_{k} & 0 \end{array})$

where the submatrices are the 2×2 Pauli matrices

$σ_{1} = (\begin{array}{c} 0 & 1 \\ 1 & 0 \end{array}) σ_{2} = (\begin{array}{c} 0 & - i \\ i & 0 \end{array}) σ_{3} = (\begin{array}{c} 1 & 0 \\ 0 & - 1 \end{array})$

supplemented by a 2×2 identity matrix. The Dirac basis is convenient for describing low-energy systems, and thus useful for comparison with the Schrödinger equation. Another basis is that of Weyl,

$γ^{0} = (\begin{array}{c} 0 & I \\ I & 0 \end{array}) γ^{k} = (\begin{array}{c} 0 & - σ_{k} \\ σ_{k} & 0 \end{array})$

which is more suited to describing high-energy systems. The choice of sign for $γ^{k}$ is related to the form of a fifth gamma matrix not needed here.

The point of factoring the Klein-Gordon equation is to allow new details to emerge. This can be illustrated simply by comparing the second-order differential equation

$(\frac{d^{2}}{d x^{2}} + 1) f = 0$

to the corresponding factored equation:

$(\frac{d}{d x} + i) (\frac{d}{d x} - i) f = 0$

The solutions to the first equation are $sin x$ and $cos x$ . The factored form of the equation picks out the constituent functions making up the circular functions: the first factor has only the solution $e^{- i x}$ , while the second has only the solution $e^{i x}$ . These two functions are related by a reversal of sign of the independent variable that picks out odd and even parts. Further, the factors act as projection operators on the constituent function space, selecting only the single function that satisfies each factor separately.

Similarly, the two forms of the Dirac equation act as projection operators on the space of solutions to the Klein-Gordon equation, bringing out detail that washes out in forming the second-order equation. And just as for the simple factoring example, there will be a relation connecting solutions to the two forms of the Dirac equation.

Nontrivial details are expected to arise for nonrelativistic systems from a parallel factoring process.

For physical context, first consider applying equations above to plane wave states. For a mass without spin, one can use a scalar with a Lorentz-invariant exponent,

$ψ (x, p) = exp [- \frac{i}{ℏ} p x] = exp [\frac{i}{ℏ} (p \cdot x - E t)]$

where the negative sign on energy ensures the correct relationship to nonrelativistic kinetic energy in the Schrödinger equation. Acting upon this function with the Klein-Gordon equation leads to

$E^{2} = p^{2} c^{2} + m^{2} c^{4}$

which is simply the definition of the relativistic energy of a free particle. For small momenta, one has the nonrelativistic approximation

$E = m c^{2} \sqrt{1 + \frac{p^{2}}{m^{2} c^{2}}} \approx m c^{2} + \frac{p^{2}}{2 m}$

For the four-dimensional Dirac equation, plane waves must include two-dimensional bispinors

$ψ (x, p) = [\begin{matrix} u (p) \\ v (p) \end{matrix}] exp [\frac{i}{ℏ} (p \cdot x - E t)]$

that are functions of energy and momentum. Applying the first form of the Dirac equation to this gives

$(γ^{0} E - c γ^{k} p_{k} - m c^{2}) [\begin{matrix} u \\ v \end{matrix}] = 0$

which in the Dirac basis, suitable for low-energy phenomena, becomes

$[\begin{array}{c} E - m c^{2} & - c σ \cdot p \\ c σ \cdot p & - E - m c^{2} \end{array}] [\begin{matrix} u \\ v \end{matrix}] = 0$

which is a matrix equation for the four components of the two bispinors. Expanding into component equations one has

$(E - m c^{2}) u = c σ \cdot p v c σ \cdot p u = (E + m c^{2}) v$

Now introduce an electromagnetic field into this relativistic system with a minimal coupling in Gaussian units

$E \to E - e φ p \to p - \frac{e}{c} A$

so that the component equations become

$(E - m c^{2} - e φ) u = c σ \cdot (p - \frac{e}{c} A) v c σ \cdot (p - \frac{e}{c} A) u = (E + m c^{2} - e φ) v$

For small momentum and electrostatic potential, the rightmost coefficient can be approximated as

$E + m c^{2} - e φ \approx 2 m c^{2} + \frac{p^{2}}{2 m} - e φ \approx 2 m c^{2}$

and then eliminating the bispinor $v$ leads to the single nonrelativistic equation

$[\frac{1}{2 m} σ \cdot (p - \frac{e}{c} A) σ \cdot (p - \frac{e}{c} A) + e φ] u = (E - m c^{2}) u$

where the factor on the right-hand side is energy apart from rest mass. Using the identity of the Pauli matrices for the product of two dot products,

$(σ \cdot a) (σ \cdot b) = a \cdot b + i σ \cdot a \times b$

the corresponding product in square brackets becomes

$\begin{array}{l} σ \cdot (p - \frac{e}{c} A) σ \cdot (p - \frac{e}{c} A) \\ = (p - \frac{e}{c} A)^{2} + i σ \cdot (p - \frac{e}{c} A) \times (p - \frac{e}{c} A) \\ = (p - \frac{e}{c} A)^{2} - \frac{i e}{c} σ \cdot [p \times A + A \times p] \end{array}$

If the momentum in the second term is treated as a quantum mechanical operator, and remembering that cross products are antisymmetric, then one can write

$\begin{array}{l} [p \times A + A \times p] u = [(\frac{ℏ}{i} \nabla \times A) - (A \times \frac{ℏ}{i} \nabla) + (A \times \frac{ℏ}{i} \nabla)] u \\ = [(\frac{ℏ}{i} \nabla \times A)] u = \frac{ℏ}{i} B u \end{array}$

where the last step uses the definition of the magnetic field in terms of the vector potential. The single nonrelativistic equation is then

$[\frac{1}{2 m} (p - \frac{e}{c} A)^{2} + e φ - \frac{e ℏ}{2 m c} σ \cdot B] u = (E - m c^{2}) u$

The first two terms in square brackets are the Hamiltonian of a charged particle in a external field. The third term is the spin-orbit coupling of a charged particle with spin of one half, i.e., an electron.

This nonrelativistic equation is known as the Pauli equation. It appeared a year before Dirac’s derivation of the full relativistic equation, but was considered ad hoc in its construction. The Dirac equation is notable for producing the correct value of the spin term automatically.

The Schrödinger equation for a freely moving mass is

$[i ℏ \frac{\partial}{\partial t} + \frac{ℏ^{2}}{2 m} \sum_{k} \frac{\partial^{2}}{\partial x_{k}^{2}}] ψ = 0$

Multiplying by 2m and applying to plane waves gives

$(2 m E - p^{2}) ψ = 0$

The factorization will be done in terms of these physical variables rather than derivatives, simply because the notation is more compact.

The relevant section of Lévy-Leblond’s paper, “III b) Linearization of the Schrödinger Equation”, is unfortunately not quite as clear as it could be, and contains several obvious typographical errors. There is also an arbitrariness in the selection of coefficient matrices as part of the process.

Rather than attacking the factorization of the Schrödinger equation directly, consider reverse engineering the nonrelativistic approximation to the Dirac equation. This provides a nonarbitrary choice for coefficient matrices, and although different from the choice in the paper, will produce the expected physics. This is related to the freedom of basis available in the gamma matrices.

Here again is the Dirac equation applied to plane waves in the Dirac basis, before any approximations:

$[\begin{array}{c} E_{rel} - m c^{2} & - c σ \cdot p \\ c σ \cdot p & - E_{rel} - m c^{2} \end{array}] [\begin{matrix} u \\ v \end{matrix}] = 0$

To convert this to the nonrelativistic case, replace the diagonal terms with nonrelativistic quantities and omit the speed of light as nonphysical:

$[\begin{array}{c} E & - σ \cdot p \\ σ \cdot p & - 2 m \end{array}] [\begin{matrix} u \\ v \end{matrix}] = 0$

Expand this into component equations

$E u = σ \cdot p v σ \cdot p u = 2 m v$

and introduce an electromagnetic field using the same minimal coupling:

$(E - e φ) u = σ \cdot (p - \frac{e}{c} A) v σ \cdot (p - \frac{e}{c} A) u = 2 m v$

These can be combined without approximation into a single equation

$[\frac{1}{2 m} σ \cdot (p - \frac{e}{c} A) σ \cdot (p - \frac{e}{c} A) + e φ] u = E u$

The appearance of the speed of light here is a result of how Gaussian units are defined, not some relativistic effect. The product of dot products is treated as above,

$[\frac{1}{2 m} (p - \frac{e}{c} A)^{2} + e φ - \frac{e ℏ}{2 m c} σ \cdot B] u = E u$

resulting in the same nonrelativistic Pauli equation with automatic spin-orbit coupling. This is the crux of the appearance of spin in a natural way mentioned at the outset.

To complete the factoring process, rewrite the nonrelativistic equation in terms of gamma matrices in the Dirac basis:

$[γ^{0} \frac{E + 2 m}{2} - γ^{k} p_{k} + \frac{E - 2 m}{2}] [\begin{matrix} u \\ v \end{matrix}] = 0$

This is the first factor, representing the Schrödinger equation in a four-dimensional form. Following Lévy-Leblond, there must be some other factor that when applied on the left produces the usual Schrödinger equation. Write this second factor with arbitrary coefficients and multiply the two using properties of the gamma matrices:

$\begin{array}{l} (A γ^{0} + B γ^{k} p_{k} + C) (γ^{0} \frac{E + 2 m}{2} - γ^{k} p_{k} + \frac{E - 2 m}{2}) \\ = A \frac{E + 2 m}{2} + B p^{2} + C \frac{E - 2 m}{2} \\ - γ^{0} γ^{k} p_{k} (A + B \frac{E + 2 m}{2}) + γ^{0} (A \frac{E - 2 m}{2} + C \frac{E + 2 m}{2}) \\ + γ^{k} p_{k} (B \frac{E - 2 m}{2} - C) \end{array}$

Comparing with the Schrödinger equation applied to plane waves immediately gives

$A = \frac{E + 2 m}{2} B = - 1 C = - \frac{E - 2 m}{2}$

since these choices make the last three terms identically zero, and the first and third terms sum to $2 m E$ . Given the number of equations involved, this simple result implies quite a bit of symmetry.

Restoring derivatives, the Schrödinger equation in four-dimensional form is one of either

$\begin{array}{l} [γ^{0} (\frac{i ℏ}{2} \frac{\partial}{\partial t} + m) + i ℏ γ^{k} \frac{\partial}{\partial x_{k}} + \frac{i ℏ}{2} \frac{\partial}{\partial t} - m] ψ = 0 \\ [γ^{0} (\frac{i ℏ}{2} \frac{\partial}{\partial t} + m) + i ℏ γ^{k} \frac{\partial}{\partial x_{k}} - \frac{i ℏ}{2} \frac{\partial}{\partial t} + m] ψ = 0 \end{array}$

or simplifying a bit

$\begin{array}{l} [\frac{γ^{0} + 1}{2} \frac{\partial}{\partial t} + γ^{k} \frac{\partial}{\partial x_{k}} - i \frac{m}{ℏ} (γ^{0} - 1)] ψ = 0 \\ [\frac{γ^{0} - 1}{2} \frac{\partial}{\partial t} + γ^{k} \frac{\partial}{\partial x_{k}} - i \frac{m}{ℏ} (γ^{0} + 1)] ψ = 0 \end{array}$

The two forms are related in the same way as the two forms of the Dirac equation, with a change in sign of mass but essentially the same physics.

As a final note, Lévy-Leblond points out that one can include an arbitrary invertible matrix between the two factors of the squared Schrödinger equation. This is equivalent to multiplying the first form of the equation by this same matrix. The first and third coefficients in that equation are

$\frac{γ^{0} + 1}{2} = (\begin{array}{c} 1 & 0 \\ 0 & 0 \end{array}) γ^{0} - 1 = (\begin{array}{c} 0 & 0 \\ 0 & - 2 \end{array})$

Reading past typographical errors, the invertible matrix used by Lévy-Leblond is $(\begin{array}{c} 0 & - 1 \\ 1 & 0 \end{array})$ , which is a two-dimensional representation of the imaginary unit. Applying this matrix to the coefficient matrices gives

$(\begin{array}{c} 0 & - 1 \\ 1 & 0 \end{array}) (\begin{array}{c} 1 & 0 \\ 0 & 0 \end{array}) = (\begin{array}{c} 0 & 0 \\ 1 & 0 \end{array}) (\begin{array}{c} 0 & - 1 \\ 1 & 0 \end{array}) (\begin{array}{c} 0 & 0 \\ 0 & - 2 \end{array}) = (\begin{array}{c} 0 & 2 \\ 0 & 0 \end{array})$

which are the coefficients A and C, apart from mass, chosen by Lévy-Leblond. The second coefficient follows similarly,

$(\begin{array}{c} 0 & - 1 \\ 1 & 0 \end{array}) γ^{k} = (\begin{array}{c} 0 & - 1 \\ 1 & 0 \end{array}) (\begin{array}{c} 0 & σ_{k} \\ - σ_{k} & 0 \end{array}) = (\begin{array}{c} σ_{k} & 0 \\ 0 & σ_{k} \end{array})$

providing a connection between the somewhat arbitrarily chosen coefficients in the paper and those employed in this presentation.