Prefer watching over reading?
8 minutes · From Pythagoras to the well-tempered ear
Chapter 1
One Tone, Two Tones, a Puzzle
Strike two notes at the same time. Sometimes it sounds beautiful – smooth, round, blending together. Sometimes it sounds rough, wavering, unpleasant. Every human being hears this difference, whether musically trained or not. But why?
The answer begins with an observation two and a half millennia old. Legend has it that Pythagoras walked past a blacksmith’s forge and noticed that certain pairs of hammers sounded harmonious while others did not. He investigated the weights of the hammers and found integer ratios. The story is probably fiction – hammer weights don’t directly determine pitch – but the insight behind it is real: consonance has something to do with simple number ratios.
Frequency: What a Tone Really Is
A tone is a periodic fluctuation in air pressure. The number of oscillations per second is called frequency, measured in Hertz (Hz). Concert pitch A4 oscillates 440 times per second: \(f = 440 \,\text{Hz}\). A lower A3 oscillates exactly half as fast: \(f = 220 \,\text{Hz}\). The ratio is \(2{:}1\) – an octave.
The fundamental question of music theory can be stated this way: Which frequency ratios sound “good,” and why?
The Simplest Ratios
Pythagoras and his students identified the most consonant intervals using simple fractions:
| Interval | Ratio | Example (Hz) | Perception |
|---|---|---|---|
| Octave | 2:1 | 440 & 880 | Fused, identical |
| Fifth | 3:2 | 440 & 660 | Open, powerful |
| Fourth | 4:3 | 440 & 587 | Stable, calm |
| Major third | 5:4 | 440 & 550 | Warm, bright |
| Minor third | 6:5 | 440 & 528 | Soft, melancholic |
| Tritone | 45:32 | 440 & 618 | Tense, dissonant |
The simpler the ratio, the more consonant the sound. This rule of thumb works remarkably well – but why it works could only be explained by Helmholtz and later by Plomp & Levelt.
Lissajous Figures: Making Consonance Visible
When you plot two sinusoidal oscillations against each other – one on the x-axis, the other on the y-axis – you get Lissajous figures. For simple frequency ratios, the patterns are closed and simple. For complicated ratios, they become chaotic.
Octave (2:1)
Fifth (3:2)
Major third (5:4)
The octave (2:1) produces a simple figure eight. The fifth (3:2) yields a somewhat more complex but clearly closed pattern. The major third (5:4) becomes more intricate – but remains closed. A tritone (45:32) would fill nearly the entire surface before the curve closes.
Helmholtz: Beating and Roughness
Hermann von Helmholtz provided the first physical explanation in 1863, in his “On the Sensations of Tone.” When two tones lie close together in frequency, you hear beating – periodic volume fluctuations at the frequency \(|f_1 - f_2|\).
This is the product formula for the sum of two sine functions. The cosine term modulates the amplitude at the beat frequency \(|f_1 - f_2|\). Slow beating (2–6 Hz) is perceived as a pleasant “vibrato.” But in the range of 20–50 Hz, the sensation becomes unpleasant: a rough, grating feeling that Helmholtz called roughness.
Helmholtz’s thesis: Dissonance is roughness. Two tones sound dissonant when their frequencies (or those of their overtone pairs) lie so close together that the beating falls in the rough range.
Plomp & Levelt: The Dissonance Curve
In 1965, Reinier Plomp and Willem Levelt refined this idea with a landmark experiment. They played pairs of pure sine tones to listeners and asked them to rate the perceived roughness. The result: a universal dissonance curve.
Roughness is greatest when the frequency separation is about 25% of the critical bandwidth – a concept from psychoacoustics that we’ll return to in Chapter 6. At larger separations, roughness decreases and reaches local minima at simple ratios.
The Plomp-Levelt curve for two pure tones has a simple shape: a maximum at about a quarter tone, then a monotonic decline. But for complex tones with overtone series (i.e., every real instrument), the roughness is summed over all overtone pairs – and then the Pythagorean ratios appear as sharp minima. The simple fractions aren’t arbitrary: they are the points where the overtone series of both tones overlap maximally instead of beating against each other.
Try it yourself
Move the frequency slider and hear how consonance changes. The dissonance curve shows in real time where the rough spots are. Toggle between a pure sine tone and a timbre with overtone series – and watch the minima appear at the Pythagorean ratios.
Euler and the Gradus Function
In 1739, Leonhard Euler proposed a simple measure for the “simplicity” of a frequency ratio: the Gradus suavitatis (degree of sweetness). For a ratio \(p{:}q\) (in lowest terms, so \(\gcd(p,q) = 1\)):
where \(p_i\) are the prime factors of \(p \cdot q\) (with multiplicity). The smaller \(\Gamma\), the more consonant. The octave (2:1) has \(\Gamma = 2\), the fifth (3:2) has \(\Gamma = 4\), the tritone (45:32) has \(\Gamma = 14\). Euler’s formula was ahead of its time – it quantifies the same intuition that Plomp and Levelt confirmed experimentally two centuries later.
Status: Consonance is not arbitrary. Simple frequency ratios sound good because they produce minimal roughness – this is experimentally confirmed (Plomp & Levelt 1965) and theoretically understood (Helmholtz 1863). But the picture is not yet complete: roughness does not explain why major sounds happy and minor sounds sad. For that, we first need the harmonic series.
Chapter 2
The Harmonic Series
When you pluck a guitar string, you don’t hear one tone. You hear dozens – simultaneously. What you perceive as “a single tone” is actually a chord of overtone frequencies that your brain fuses into a single timbre.
Standing Waves on a String
A string of length \(L\), fixed at both ends, can only vibrate in certain ways. The boundary conditions require the displacement to be zero at both ends. The only functions that satisfy this are sine waves whose half-wavelength fits an integer number of times into \(L\):
The fundamental \(f_1\) has \(n=1\): a single half sine wave. The second partial \(f_2 = 2f_1\) has two half-waves, vibrating an octave higher. The third \(f_3 = 3f_1\) a fifth above that. And so on.
These discrete frequencies \(f_n = n \cdot f_1\) form the harmonic series. It is not a human invention – it follows directly from the physics of vibrating strings. And it contains, all by itself, the intervals of music:
| Partial | Ratio to \(f_1\) | Interval | Example (C = 131 Hz) |
|---|---|---|---|
| 1 | 1 | Fundamental | C (131 Hz) |
| 2 | 2 | Octave | C (262 Hz) |
| 3 | 3 | Octave + fifth | G (393 Hz) |
| 4 | 4 | Two octaves | C (524 Hz) |
| 5 | 5 | Two oct. + major third | E (655 Hz) |
| 6 | 6 | Two oct. + fifth | G (786 Hz) |
| 7 | 7 | Natural seventh (not a piano note!) | B♭ (917 Hz) |
Notice: partials 4, 5, 6 form the ratio \(4{:}5{:}6\). That is a major triad. Nature “plays” major all by itself. Minor has to be sought out.
Why Every Instrument Sounds Different
The harmonic series is the same for every instrument – the frequencies are always \(f, 2f, 3f, \ldots\). What differs are the relative loudnesses of the partials: the amplitudes \(A_1, A_2, A_3, \ldots\). This is precisely what makes up timbre.
A flute has almost nothing but the fundamental – nearly a pure sine wave. An oboe has strong odd-numbered overtones – hence its nasal quality. A trumpet has many strong overtone pairs – hence its brilliant sound. A piano starts with an enormous number of overtone partials (from the hammer strike), which then decay at different rates.
The Fourier Decomposition: Every Sound Is a Sum
The mathematical formulation is elegant. Any periodic function can be represented as a sum of sines and cosines – the Fourier series:
The coefficients \(a_n, b_n\) (or equivalently the amplitudes \(A_n\) and phases \(\phi_n\)) are the fingerprint of the sound. The spectrum is the timbre.
Try it yourself
Mix the amplitudes of the first eight overtone partials and hear how the timbre changes. Start with a pure sine (only \(A_1\)), then add overtone partials one by one. Can you recreate a flute, a clarinet, or a harpsichord?
Why Overtone Series Explain Consonance
Now the circle closes back to Chapter 1. When two tones stand in the ratio \(3{:}2\) (a fifth), the 3rd partial of the lower tone is identical to the 2nd partial of the upper tone. The overtone series overlap – they fuse. For a frequency ratio of \(45{:}32\) (a tritone), the overtone series only overlap at very high partials – and in the meantime, closely spaced overtone pairs create roughness.
The Plomp-Levelt dissonance curve for complex tones is nothing other than the sum of roughness across all overtone pairs. The consonance of an interval is not a metaphysical property – it is a calculable quantity that follows from the physics of standing waves.
Status: Every sound is a sum of harmonic partials (\(f, 2f, 3f, \ldots\)). The relative amplitudes determine the timbre. Consonance arises when overtone series overlap. And the harmonic series already contains the major triad: partials 4, 5, 6. But what about minor?
Chapter 3
Major, Minor, and the Question of Emotion
Major sounds happy, minor sounds sad – it’s one of the most persistent clichés in music theory. And like all good clichés, it contains a kernel of truth, but also an oversimplification.
The Major Triad: 4:5:6
A major triad consists of the root, major third, and fifth. In just intonation, the frequency ratios are:
This means: if the root is at 400 Hz, the third is at 500 Hz and the fifth at 600 Hz. Three consecutive integer multiples. Remarkably simple.
What’s more: the frequencies 400, 500, and 600 Hz are all multiples of 100 Hz (the 4th, 5th, and 6th partials). A major triad sounds as if it were a fragment of a single overtone series – because it is. This explains the perceived fusion: the brain interprets the three tones as parts of a single sound with the (virtual) fundamental frequency of 100 Hz.
The Minor Triad: 10:12:15
A minor triad consists of the root, minor third, and fifth. The frequency ratios:
Instead of \(4{:}5{:}6\), we have \(10{:}12{:}15\) – the numbers are significantly larger. The frequencies 10, 12, and 15 are not consecutive multiples of a single frequency. The virtual fundamental is ambiguous: is it the 10th partial of something? The 12th? The fusion is incomplete.
Norman Cook put it this way in 2007: major suggests one speaker (a coherent overtone source), minor suggests multiple speakers (an ambiguous source attribution). The sadness of minor would then be not a direct emotion, but a cognitive uncertainty: the brain cannot neatly assign the tones to a single source.
What Does Psychology Say?
Empirical research confirms the major-happy/minor-sad effect for Western listeners with enormous consistency. In studies with thousands of participants, over 90% of Western-socialized adults classify major as “happy” and minor as “sad” (Kastner & Crowder, 1990; Dalla Bella et al., 2001).
But is it biology or culture?
The Tsimane Study: An Experiment in the Rainforest
In 2016, Josh McDermott and colleagues published a remarkable study in the journal Nature. They traveled to the Tsimane, an indigenous group in the Bolivian Amazon who had little exposure to Western music.
The result was surprising: the Tsimane rated consonant and dissonant intervals as equally pleasant. They had no preference for octaves, fifths, or major triads over dissonant combinations. For Western listeners, this preference feels so natural that it seems innate – but the Tsimane data show: the preference for consonance is (at least partly) learned.
However: the Tsimane could perceive roughness perfectly well – they found rough sounds unpleasant. What they lacked was the aesthetic preference for simple frequency ratios. The sensory foundation (roughness detection) seems universal; the emotional valuation (major = happy) is culturally shaped.
Nature or Nurture? Both.
The current scientific consensus can be summarized as follows:
Universal (Nature):
• Roughness detection is innate (even infants react to dissonance).
• The harmonic series is a physical fact, not a cultural convention.
• The ability to distinguish pitch is innate.
Culturally shaped (Nurture):
• The preference for consonance vs. dissonance.
• The emotional association major/happy, minor/sad.
• Which scales and tunings feel “right.”
• Which chord progressions create “tension” and “resolution.”
Biology provides the raw material (harmonic series, roughness detection, frequency analysis in the inner ear). Culture shapes it into musical grammars. Major sounds happy – to us. But not to every human being on the planet.
Try it yourself
Play a major triad and a minor triad. See the frequency ratios, hear the sound, observe how the overtone series overlap. Shift the root note – the pattern stays the same, and so does the perception.
Beyond Major and Minor
Western music limits itself to 12 tones per octave. But other cultures use entirely different systems:
• Arabic maqam music uses quarter-tone steps – 24 tones per octave, with intervals that don’t exist in the Western system.
• Javanese gamelan music uses slendro (5 tones) and pelog (7 tones) – scales that deliberately sound “out of tune” to produce beating that is considered beautiful.
• Indian raga music defines not just scales but also ascending and descending rules, emotional associations with times of day and seasons of the year.
• Balinese gamelan music even deliberately tunes instrument pairs slightly out of tune with each other, to create a shimmering “ombak” (wave) effect.
What all these systems share: they use the harmonic series as raw material, but shape it into entirely different aesthetic landscapes. The physics of vibration is universal; the music is not.
Status: Major (4:5:6) suggests a coherent overtone source – hence the perceived “clarity.” Minor (10:12:15) is more ambiguous. But the emotional valuation is culturally shaped: the Tsimane show no consonance preference. Biology provides the raw material (harmonic series, roughness detection), culture shapes the music. And now we face a practical problem: how do we tune an instrument?
Chapter 4
The Comma of Pythagoras
Imagine you’re tuning a piano. You start from a C and tune upward in pure fifths: C → G → D → A → E → B → F♯ → C♯ → G♯ → D♯ → A♯ → E♯ → B♯. After 12 fifths, you should theoretically arrive back at the starting C – seven octaves higher.
Theoretically. In practice, something inconvenient happens.
The Calculation
Twelve pure fifths upward means: multiply the frequency 12 times by \(3/2\).
Seven octaves upward means: multiply by \(2^7 = 128\).
The ratio of the two:
That is not 1. It’s off by \(1.36\%\) – equivalent to 23.46 cents. (A cent is one hundredth of an equal-tempered semitone. Most people can hear differences from about 5–10 cents.) The difference of 23.46 cents is clearly audible.
This tiny but persistent ratio \(3^{12}/2^{19}\) is called the Pythagorean comma. It is the mathematical proof that the circle of fifths does not close exactly.
Why This Is a Fundamental Problem
The reason lies in number theory. Pure fifths are powers of 3 (more precisely: \(3^n / 2^m\)), and octaves are powers of 2. A pure fifth would fit exactly into octaves if there existed integers \(n, m\) with \((3/2)^n = 2^m\), i.e., \(3^n = 2^{n+m}\). But the equation
has no solution for positive integers \(n, k\). (Proof: the left side is odd, the right side is even. Contradiction.) The Pythagorean comma is not an inaccuracy – it is a theorem. Fifths and octaves are incommensurable, like the diagonal of a square and its side.
The Wolf Fifth
In practice, the Pythagorean comma means: if you tune 11 out of 12 fifths pure, the last fifth must absorb the entire comma – it becomes 23.46 cents too narrow. This mutilated fifth sounds so bad that it earned the name wolf fifth – it “howls like a wolf.”
In Pythagorean tuning, the wolf fifth typically fell between G♯ and E♭. This meant: keys with many sharps or flats were unusable. A composer could write beautifully in C major, but F♯ major sounded ghastly. This constrained music for centuries.
Other Commas: The Syntonic Comma
The Pythagorean comma is not the only problem. There is also the syntonic comma (81:80, roughly 21.5 cents): the difference between a Pythagorean major third (\(81/64\), four fifths up) and a pure major third (\(5/4 = 80/64\)).
The Pythagorean system uses only the primes 2 and 3. As soon as you add the prime 5 (for pure thirds), a new comma appears. You can have pure fifths or pure thirds – but not both at once. This is the fundamental tension of tuning theory.
Historical Tuning Systems Compared
| Tuning | Fifths | Thirds | All keys? | Era |
|---|---|---|---|---|
| Pythagorean | 11 pure, 1 wolf | All too wide | No | Antiquity – 1400 |
| Quarter-comma meantone | 8 narrow, 1 wolf | 8 pure | No | 1500 – 1700 |
| Werckmeister III | Irregular | Irregular | Yes (with character) | 1691 |
| Equal temperament (12-TET) | All ~2 cents narrow | All ~14 cents wide | Yes (identical) | from ~1800 |
Try it yourself
Choose a tuning system and hear how the same chord sounds in different keys. In Pythagorean tuning, C major sounds perfect, but F♯ major sounds ghastly. In equal temperament, everything sounds the same – but nothing is perfectly pure.
Status: The Pythagorean comma (\(3^{12}/2^{19} \approx 1.0136\)) is a number-theoretic theorem: fifths and octaves are incommensurable. No tuning system can simultaneously offer pure fifths, pure thirds, and transposability. Every tuning is a compromise. The most radical compromise – equal temperament – took centuries to prevail.
Chapter 5
The Well-Tempered Clavier
In 1722, Johann Sebastian Bach wrote a collection of 24 preludes and fugues – one in each major and minor key. He called it “The Well-Tempered Clavier.” The title was a statement: all keys are playable.
Exactly which tuning Bach used remains debated to this day. It was not equal temperament – that came later. It was most likely a well temperament (such as Werckmeister III or similar), in which the fifths are irregularly narrowed so that no wolf fifth exists, but each key retains its own character.
The Road to Equal Temperament
The idea of making all 12 semitones exactly equal in size is older than you might think. The Chinese scholar Zhu Zaiyu calculated it as early as 1584. In Europe, Simon Stevin published the same idea around 1585. But implementation took centuries – musicians rejected equal temperament because the thirds sounded too impure.
The mathematical formulation is elegant. We seek 12 equal steps that together make one octave (factor 2). Each step must therefore have the factor
The frequency of the \(n\)-th semitone above a fundamental \(f_0\) is:
Starting from A4 = 440 Hz, this gives for example:
| Note | Semitones above A | Equal temp. (Hz) | Just (Hz) | Difference (cents) |
|---|---|---|---|---|
| A | 0 | 440.00 | 440.00 | 0.0 |
| C♯ | 4 | 554.37 | 550.00 | +13.7 |
| D | 5 | 587.33 | 586.67 | +1.9 |
| E | 7 | 659.26 | 660.00 | −1.9 |
| E (fifth) | 7 | 659.26 | 660.00 | −1.9 |
The equal-tempered fifth is only 1.9 cents too narrow – barely audible. But the equal-tempered major third is 13.7 cents too wide – trained ears can clearly hear that. This is the price of equal temperament: perfect transposability in exchange for slightly impure thirds.
The Logarithm Idea
Why 12? Why not 19 or 31 or 53 tones per octave? The answer lies in approximation theory. The fifth has the frequency ratio \(3/2\). In equal temperament with \(N\) tones per octave, it is approximated by \(2^{k/N}\), where \(k\) is the number of semitones. We seek:
The best continued-fraction approximations of \(0.58496\ldots\) are: \(1/2\), \(3/5\), \(7/12\), \(24/41\), \(31/53\), \(\ldots\) The denominators give the number of tones per octave: 2, 5, 12, 41, 53. Twelve is the first denominator that delivers an excellent approximation of the fifth (\(7/12 = 0.58333\ldots\), error only 1.9 cents). For even purer fifths and thirds, 53 tones per octave would be ideal – but 53 keys per octave are not practical for human hands.
The Irony of Equal Temperament
Mathematically speaking, equal temperament is a sacrifice of all pure harmony in favor of a single property: translational invariance. Every semitone step is the same size, so every key sounds the same. Modulation becomes free. Transposition becomes trivial.
In the language of mathematics: equal temperament replaces the multiplicative group of rational frequency ratios with the cyclic group \(\mathbb{Z}_{12}\). Integer ratios are replaced by irrational numbers – \(\sqrt[12]{2}\) is irrational, like Pythagoras’ diagonal. The cost: no interval except the octave is pure. The gain: every interval is the same everywhere.
Historically, this was a radical compromise. Baroque musicians rejected it because they valued the character of individual keys – D major sounded different from B-flat major in Werckmeister tuning, and that was desired. Only the 19th century, with its growing need for modulation and chromaticism, made equal temperament the standard tuning. Today it is so dominant that most people consider it “natural” – which it decidedly is not.
Microtonal Renaissance
In the 20th and 21st centuries, there has been a counter-movement. Composers like Harry Partch (43 tones per octave), Ben Johnston (just intonation), and Sevish (electronic microtonal music) are exploring tuning systems beyond 12-TET. Software synthesizers allow arbitrary tunings without physical constraints. The question “How many tones does an octave need?” is open once again.
Status: Equal temperament with \(f_n = f_0 \cdot 2^{n/12}\) sacrifices purity for transposability. Twelve tones are not a law of nature, but the best continued-fraction approximation that still fits on a keyboard. Bach’s Well-Tempered Clavier celebrated the victory over the wolf fifth. But – why does our brain accept these compromises? How does the ear process frequencies at all?
Chapter 6
The Well-Tempered Ear
So far, we have treated music as a physical phenomenon – frequencies, ratios, standing waves. But music does not exist in the air. It exists in the brain. And between sound wave and consciousness lies a surprisingly complex organ: the inner ear.
The Cochlea: A Biological Fourier Analysis
Deep within the inner ear lies the cochlea – a spiral-shaped, fluid-filled canal roughly the size of a pea. Uncoiled, it measures about 3.5 centimeters. Running along its entire length is the basilar membrane – and this membrane is the key to everything.
The basilar membrane is narrow and stiff at the entrance (near the oval window) and wide and flexible at the end (apex). High frequencies cause the stiff beginning to vibrate; low frequencies excite the flexible end. Each location on the membrane responds most strongly to a specific frequency – the tonotopic map.
This is remarkable: the cochlea essentially performs a Fourier analysis in hardware. It decomposes incoming sound into its frequency components – not through mathematics, but through mechanics. Georg von Békésy received the Nobel Prize in 1961 for experimentally confirming this theory.
Critical Bands: The Resolution of the Ear
The basilar membrane has a limited frequency resolution. Each point responds not to a single frequency alone, but to a frequency range – the so-called critical band.
The width of a critical band depends on frequency. The approximation formula after Barkhausen/Zwicker:
At low frequencies (below 500 Hz), the critical band is about 100 Hz wide. At high frequencies, it grows to several hundred Hertz. This explains why low chords quickly “muddy up”: at 100 Hz, the critical band is nearly as wide as a whole tone, and the overtone pairs land in the rough zone.
Now everything connects: the Plomp-Levelt dissonance from Chapter 1 is precisely the roughness that arises when two frequencies fall within the same critical band. The dissonance curve is a direct consequence of the physical resolution of the cochlea.
Combination Tones: When the Ear Invents
The ear is not a passive receiver – it is an active signal processor. When two loud tones with frequencies \(f_1\) and \(f_2\) sound simultaneously, the inner ear produces additional tones that are not physically present: combination tones.
The most prominent is the difference tone: \(f_d = f_2 - f_1\). For a pure fifth (660 Hz and 440 Hz), the difference tone is \(660 - 440 = 220\) Hz – exactly one octave below the lower tone. The difference tone reinforces the fundamental. For a second (440 Hz and 495 Hz), the difference tone is 55 Hz – a low hum that matches neither tone. Dissonance.
There are also cubic difference tones (\(2f_1 - f_2\)) and higher orders. These combination tones arise from the nonlinearity of the inner ear – the outer hair cells actively amplify sound and introduce slight distortions in the process. What sounds like a defect is actually a feature: combination tones help the brain with fundamental pitch recognition.
The Missing Fundamental
One of the most fascinating phenomena in psychoacoustics: when you hear the frequencies 400, 500, 600, 700 Hz, you perceive the pitch 200 Hz – even though 200 Hz is physically absent. The brain computes the missing fundamental from the spacing of the harmonic series.
This is why you hear bass on small laptop speakers, even though they physically cannot reproduce frequencies below 150 Hz. Your brain reconstructs the fundamental from the overtone partials that are present. This works because the harmonic series has a unique pattern: evenly spaced frequencies with a spacing of \(f_1\).
This is also why the major triad (4:5:6) sounds so clear: the brain immediately recognizes the missing fundamental (1) and assigns the three tones to a single virtual source. With the minor triad (10:12:15), the missing fundamental is more ambiguous – hence the perceived complexity.
Octave Equivalence: Why C Is Always C
One of the most universal phenomena in music: tones an octave apart are perceived as “the same.” The low C and the high C are different tones, but they carry the same name, the same function. This octave equivalence is found in every known musical culture.
The physical explanation: when \(f\) and \(2f\) sound together, every overtone of \(2f\) is also an overtone of \(f\). The harmonic series of \(2f\) is a proper subset of the harmonic series of \(f\). For the cochlea, the two excitation patterns overlap perfectly – \(2f\) adds nothing new, it only reinforces.
In mathematical terms: pitch perception is cyclic modulo the octave. If we represent the frequency space logarithmically (\(\text{pitch} = \log_2(f/f_0)\)), the octave becomes the interval \([0, 1)\), and the “chroma” (pitch classes C, D, E, ...) lie on the unit circle \(\mathbb{R}/\mathbb{Z}\). Musicians call it the pitch circle, mathematicians a quotient group. Same structure.
Try it yourself
Enter a frequency and see which location on the basilar membrane is maximally excited. Observe how the critical band is wider at low frequencies than at high ones. Play two tones simultaneously and see the overlap of excitation patterns – the greater the overlap, the rougher the sound.
The Ear as a Nonlinear Signal Processor
Let us summarize what the auditory system accomplishes:
• Frequency analysis (cochlea = mechanical Fourier transform)
• Dynamic compression (outer hair cells amplify quiet sounds, dampen loud ones – a dynamic range of 120 dB is compressed to 40 dB)
• Combination tone generation (nonlinear distortion as a feature, not a bug)
• Fundamental reconstruction (missing fundamental from the overtone pattern)
• Temporal analysis (phase locking up to ~5 kHz – the brain also uses timing, not just frequency)
No technical audio analyzer matches the performance of the human cochlea. It has a frequency resolution of about 3,500 channels (inner hair cells), a dynamic range of 120 dB (a factor of 1,000,000 in amplitude), and it processes everything in real time at a power consumption of microwatts.
Status: The cochlea is a mechanical Fourier analyzer. Critical bands explain roughness. Combination tones and the missing fundamental show that the ear is an active signal processor. Octave equivalence follows from the subset relationship of harmonic series. And now the final connection: what does all of this have to do with physics, Fourier, and eigenvalues?
Chapter 7
Everything Is Vibration
In Chapter 2, we accepted the harmonic series as a physical fact: a string vibrates at frequencies \(f, 2f, 3f, \ldots\). But why exactly these frequencies? The answer lies in one of the deepest ideas in mathematics – and it connects music theory with quantum mechanics, image compression, and artificial intelligence.
The Wave Equation and Its Eigenvalues
The vibration of a string is described by the wave equation:
where \(c = \sqrt{T/\mu}\) is the wave speed (\(T\) = string tension, \(\mu\) = mass per length). The boundary conditions \(y(0,t) = y(L,t) = 0\) (string fixed at both ends) constrain the solutions.
We seek solutions of the form \(y(x,t) = X(x) \cdot T(t)\) (separation of variables). Substituting yields:
The left side depends only on \(t\), the right side only on \(x\). For both to be equal, they must both be constant. This constant is called \(-\lambda\). For the spatial part, we get an eigenvalue problem:
The solutions are:
The eigenfunctions are sine waves. The eigenvalues \(\lambda_n\) determine the allowed frequencies: \(f_n = \frac{c}{2L}\,n\). The harmonic series is nothing other than the spectrum of an eigenvalue problem.
Stop. Read that last sentence again. Music theory – consonance, harmonic series, timbre, all of it – follows from an eigenvalue problem. The very same mathematical concept that plays the central role in artificial intelligence and in quantum mechanics.
Schrödinger and the String
The time-independent Schrödinger equation for a particle in a box of length \(L\):
The same equation. The same boundary conditions. The same solutions: \(\psi_n(x) = \sin(n\pi x/L)\) with energies \(E_n = \frac{\hbar^2 \pi^2}{2mL^2}\,n^2\). The allowed energy levels of a quantum particle in a box are the eigenvalues of the same problem that also determines the harmonic series of a string. The string and the quantum particle are mathematical siblings.
Erwin Schrödinger titled his epoch-making 1926 paper: “Quantization as an Eigenvalue Problem.” The discrete energy levels of the hydrogen atom follow from the same principle as the discrete overtone frequencies of a guitar string: boundary conditions enforce quantization.
Chladni Figures: Making Eigenvalues Visible
In 1787, Ernst Florens Friedrich Chladni showed that the vibration modes of a plate can be made visible. He sprinkled fine sand on a metal plate and drew a violin bow across the edge. The sand collected on the nodal lines – the places where the plate does not vibrate. The patterns that emerge are called Chladni figures.
Mathematically, Chladni figures are the eigenfunctions of the two-dimensional wave equation. On a rectangular plate of size \(a \times b\):
with eigenvalues \(\lambda_{mn} = \pi^2\bigl(\frac{m^2}{a^2} + \frac{n^2}{b^2}\bigr)\). The nodal lines are the zeros of \(u_{mn}\). The higher the eigenvalue, the finer the pattern – just as higher overtone frequencies produce finer standing waves.
Chladni demonstrated his experiments to Napoleon Bonaparte, who was so impressed that he offered a prize for their mathematical explanation. Sophie Germain won it in 1816 – one of the earliest recognized scientific achievements by a woman in the modern era.
Try it yourself
Choose the vibration mode (\(m, n\)) and observe the Chladni figure: the pattern of nodal lines on a vibrating plate. Higher modes = more complex patterns = higher eigenvalues = higher frequencies.
Fourier Everywhere: From the String to JPEG
The Fourier decomposition – any function as a sum of sine waves – is the spectral decomposition of an operator. The sine and cosine functions are the eigenfunctions of the derivative operator \(d^2/dx^2\). When we decompose a sound into its overtone components, we are performing an eigenvalue decomposition.
The same mathematics underlies technologies you use every day:
MP3 compression: Music is divided into short blocks. Each block is decomposed into frequency components via the modified discrete cosine transform (MDCT). Inaudible components are discarded – guided by a psychoacoustic model that accounts for the critical bands of the cochlea. The MDCT is a discrete version of the Fourier decomposition – an eigenvalue decomposition on a finite grid.
JPEG image compression: The image is divided into 8×8-pixel blocks. Each block is decomposed into frequency components via the discrete cosine transform (DCT). High-frequency components (fine detail) are compressed more heavily than low-frequency components (coarse shapes). The DCT basis functions – these are the discrete eigenfunctions on a finite grid – are the two-dimensional analogues of Chladni figures.
Speech recognition: Audio is divided into short windows. Each window is transformed to its spectrum via the Fast Fourier Transform (FFT). Then Mel-frequency cepstral coefficients (MFCCs) are computed – a representation modeled on the logarithmic frequency resolution of the cochlea. Here too: eigenvalue decomposition, inspired by the biology of the ear.
The Grand Connection
Let us make the connections explicit:
| Domain | Operator | Eigenfunctions | Eigenvalues |
|---|---|---|---|
| String | \(-d^2/dx^2\) | \(\sin(n\pi x/L)\) | \((n\pi/L)^2 \to f_n\) |
| Quantum mechanics | \(-\frac{\hbar^2}{2m}\nabla^2 + V\) | \(\psi_n(x)\) | \(E_n\) (energy levels) |
| Chladni plate | \(-\nabla^2\) | \(\sin(m\pi x/a)\sin(n\pi y/b)\) | \(\lambda_{mn}\) (frequencies) |
| MP3/JPEG | DCT matrix | Cosine basis functions | Frequency coefficients |
| AI (kernel) | Kernel matrix \(K\) | Mercer eigenfunctions | \(\lambda_n\) (spectrum) |
Five different domains, one pattern: an operator with boundary conditions produces a discrete spectrum of eigenvalues and eigenfunctions. The spectrum determines everything – which tones are possible, which energies are allowed, which patterns form on a plate, which information is preserved during compression, what an algorithm learns.
Fourier and the Sound of Mathematics
In 1822, Jean-Baptiste Joseph Fourier published his Théorie analytique de la chaleur – a treatise on heat conduction. His central claim: any (sufficiently well-behaved) function can be represented as a sum of sines and cosines. The mathematical world was skeptical. Lagrange objected that this couldn’t hold for discontinuous functions.
Fourier was partly right and partly wrong – the precise convergence conditions took another century to resolve (Dirichlet, Carleson). But the core idea proved to be one of the most fruitful in all of mathematics. Fourier analysis permeates physics, engineering, signal processing, medical imaging, cryptography, and – as we have seen – music.
It is no coincidence that the eigenfunctions of the wave equation are sine waves. It is a consequence of symmetry: the derivative operator is translation-invariant, and sine waves are the only bounded functions that merely scale by a factor under translation. In the language of group theory: sine waves are the irreducible representations of the translation group. The harmonic series follows from the symmetry of space itself.
Status: The harmonic series follows from an eigenvalue problem: \(X'' = -\lambda X\) with boundary conditions. The same mathematics describes quantum mechanics, Chladni figures, JPEG compression, and AI. Fourier analysis is eigenvalue decomposition. Everything is vibration – and every vibration has a spectrum.
Epilogue
Pythagoras Was Right – and Wrong
Pythagoras believed the universe was built from number ratios. The music of the spheres – the notion that the planets in their orbits produce tones obeying harmonic ratios – was for him not metaphor but literal truth.
He was wrong in the details: planets do not produce audible tones. Pythagorean tuning fails at its own comma. And the emotional impact of music is culturally shaped, not mathematically determined.
But he was right about the core idea: there is a deep connection between mathematics and perception. The harmonic series is not an invention of music theory, but an eigenvalue spectrum. Consonance is not an aesthetic whim, but a consequence of the physical resolution of the cochlea. And the Fourier decomposition – Pythagoras’ dream of reducing everything to integer ratios, extrapolated to infinity – is the mathematical tool that connects quantum mechanics, signal processing, and artificial intelligence.
In this blog, we have now seen three facets of the same Glass Bead Game pattern:
Quantum mechanics: \(\hat{H}\psi = E\psi\) — eigenvalues determine the allowed energies
Music: \(X'' = -\lambda X\) — eigenvalues determine the overtone frequencies
AI: \(K\boldsymbol{\alpha} = \lambda\boldsymbol{\alpha}\) — eigenvalues determine what is learned
Three different stages, the same principle: an operator with boundary conditions produces a discrete spectrum. Physics forces nature into discrete modes – whether those are vibration modes of a string, energy levels of an atom, or eigenvectors of a kernel matrix.
Pythagoras heard numbers in music. Schrödinger found the same numbers in the atom. And today we find them in the algorithms that model our language.
The music of the spheres may not exist in outer space. But it exists in mathematics – and every time you pluck a string, open a JPEG, or query an AI, you hear an echo of it.
Perhaps Hermann Hesse was right when he wrote in The Glass Bead Game: “Music and mathematics … share almost the same attitude toward the mind, almost the same degree of rigor and precision in their results.”
Frequently Asked Questions
Why does major sound happy and minor sound sad?
Major (4:5:6) contains consecutive partials of a single harmonic series – the brain interprets it as a coherent source. Minor (10:12:15) is more ambiguous. The emotional valuation is also culturally shaped, however: the Tsimane in the Bolivian rainforest show no preference for major over minor.
What is the Pythagorean comma?
The Pythagorean comma is the ratio \(3^{12}/2^{19} \approx 1.0136\) (23.46 cents). It shows that 12 pure fifths do not exactly equal 7 octaves. This is a mathematical theorem: \(3^n \neq 2^k\) for positive integers. It forces every tuning system into compromises.
What does music have to do with eigenvalues?
The harmonic series of a string follows from an eigenvalue problem: \(X'' = -\lambda X\) with boundary conditions. The same mathematics describes quantum mechanics (Schrödinger equation), image compression (DCT), and machine learning (kernel eigenvalues). Eigenvalues are the common thread.
Why does a piano have exactly 12 tones per octave?
Twelve is the best continued-fraction approximation that provides both a good fifth (\(7/12\), only 1.9 cents off) and a usable third while remaining practical for a keyboard. Better approximations (e.g., 53 tones) would be mathematically superior but unmanageable for human hands.
What is the missing fundamental?
When only overtone frequencies are present (e.g., 400, 500, 600 Hz), the brain perceives the pitch 200 Hz – even though it is physically absent. The cochlea recognizes the pattern of evenly spaced overtone intervals and reconstructs the fundamental from them. This is why you can hear bass even on small speakers.