Blog Post · Mathematics · Growth · Chance

Why e Is Special

2.718… – an unassuming number. And yet it shows up in bank interest, in the decay of uranium, in the distribution of prime numbers, in the probability that nobody gets their own hat back, and in the loss that an artificial intelligence minimizes. A journey in ten chapters.

KI-Mathias· · ~45 min read

Chapter 1

Why 2.718… of all numbers?

Some numbers are forced into existence by mathematics itself. The number \(\pi\) appears because circles exist. The imaginary unit \(i\) appears because some quadratic equations have no real solutions. And then there is \(e\).

\(e = 2.71828182845904523536\ldots\)

Not a nice round number. No obvious geometric meaning. And yet it confronts anyone who digs deep enough into physics, biology, statistics, computer science, or finance. It doesn't appear because we invite it – it's already there before we go looking.

This chapter is an inventory. Before we explain why e appears everywhere, let's first establish where. Here are twelve places where e is waiting:

Compound interest: \(\lim_{n\to\infty}\bigl(1+\tfrac{1}{n}\bigr)^n = e\)
Exponential growth: Bacteria, viruses, capital: \(y = y_0 \cdot e^{kt}\)
Radioactive decay: C-14 with half-life 5,730 years: \(N(t) = N_0 \cdot e^{-\lambda t}\)
Newton's law of cooling: Coffee cools by \(T(t) = T_\infty + (T_0 - T_\infty)\cdot e^{-kt}\)
Eigenfunction: \(\tfrac{d}{dx}e^x = e^x\) – the only function that is its own derivative
Prime numbers: The density of primes near \(x\) is \(\approx 1/\ln x\)
Derangements: P(no hat fits) \(\to 1/e\)
Secretary problem: Optimal strategy: reject the first \(n/e\)
Normal distribution: Bell curve \(\propto e^{-x^2/2}\)
Boltzmann factor: \(p(E) \propto e^{-E/kT}\) – how nature distributes energy
Catenary: Hanging chain: \(y = a\cosh(x/a)\), where \(\cosh x = \tfrac{e^x+e^{-x}}{2}\)
Shannon entropy: \(H = -\sum p_i \ln p_i\) – ln is the logarithm base e

Twelve appearances, twelve different disciplines. What do they have in common? That's the question we'll pursue over the next nine chapters. The answer, given upfront: e appears whenever a quantity changes in proportion to itself. The rate of change equals the value itself. That's all. And it's an enormous amount.

Let's begin at the beginning – with a question from the year 1683, when a Swiss mathematician asked about bank interest and accidentally produced the most important number in calculus.

Chapter 2

Bernoulli's Question

Jakob Bernoulli was no romantic mathematician stumbling upon laws of nature in solitary moments. He was a pragmatic Basler, and in 1683 he was occupied with a thoroughly down-to-earth question: how much money do you get if you compound interest not annually, but more and more frequently?

The Compound Interest Experiment

Imagine we invest 1 euro at 100% annual interest. Annual compounding gives after one year: \(1 \cdot (1 + 1)^1 = 2\) euros.

Semi-annual compounding – twice at 50%:

$$\left(1 + \frac{1}{2}\right)^2 = 1.5^2 = 2.25$$

Quarterly – four times at 25%:

$$\left(1 + \frac{1}{4}\right)^4 \approx 2.4414$$

Monthly:

$$\left(1 + \frac{1}{12}\right)^{12} \approx 2.6130$$

Daily:

$$\left(1 + \frac{1}{365}\right)^{365} \approx 2.7146$$

Hourly, by the minute, by the second – the value keeps growing, but more and more slowly. Bernoulli recognized that this process converges to a limit. He could pin it between 2 and 3, but didn't know its exact value. Others named it later: Leonhard Euler designated it with the letter \(e\) in 1731.

$$e = \lim_{n \to \infty} \left(1 + \frac{1}{n}\right)^n = 2.71828182845904\ldots$$

This is not a number anyone invented. It was discovered – as the limit of a completely natural process. And it is irrational and even transcendental: it cannot be written as a fraction, and it satisfies no algebraic equation.

The Series Representation

There's another route to \(e\). Expanding \((1+x/n)^n\) for large \(n\) with the binomial theorem yields, in the limit, the exponential series:

$$e = \sum_{n=0}^{\infty} \frac{1}{n!} = 1 + 1 + \frac{1}{2} + \frac{1}{6} + \frac{1}{24} + \frac{1}{120} + \cdots$$

Just the first five terms give \(1 + 1 + 0.5 + 0.1\overline{6} + 0.041\overline{6} = 2.708\overline{3}\) – only 0.4% away from the true value. The series converges breathtakingly fast, because the factorial in the denominator grows explosively.

Continuous Compounding

What does this mean for the bank? If you invest capital \(K_0\) at continuous interest rate \(r\), it grows after time \(t\) to:

$$K(t) = K_0 \cdot e^{rt}$$

This is not an approximation – it's the exact formula for continuous compounding. 1,000 euros at 5% continuous annual interest for 10 years: \(1000 \cdot e^{0.05 \cdot 10} = 1000 \cdot e^{0.5} \approx 1648.72\) euros.

Chapter 2 in one sentence:

e emerges as the natural limit of the compound interest process: the more frequently you compound, the closer you get to e – but never beyond it.

Chapter 3

The Only Function That Equals Itself

Here's a question that sounds strange at first: Which function is its own derivative?

That means: if you compute the slope of \(f\) at every point, you get \(f\) itself back. Mathematically: \(f'(x) = f(x)\).

There is exactly one family of answers: \(f(x) = C \cdot e^x\) for any constant \(C\). And that's no coincidence – it's the deepest property of \(e\).

e^x as the Eigenfunction of Differentiation

Anyone who has read the post on eigenvalues knows the principle: a linear operator \(L\) has eigenfunctions – functions that the operator doesn't change, only scales. The operator \(L = d/dx\) (differentiation) has a single eigenfunction family:

$$\frac{d}{dx} e^x = e^x$$

The eigenvalue is \(\lambda = 1\). The function \(e^x\) is scaled by a factor of 1 under differentiation – meaning it isn't changed at all.

What about other bases? Take \(2^x\). Its derivative is:

$$\frac{d}{dx} 2^x = 2^x \cdot \ln 2 \approx 0.693 \cdot 2^x$$

Not clean. You get the function back, but with a correction factor of \(\ln 2\). For \(3^x\) the factor is \(\ln 3 \approx 1.099\). For \(e^x\) the factor is \(\ln e = 1\) – exactly 1, nothing left over. That's why e is the natural base for exponential functions: it's the only base where differentiation produces no extra constants.

That's also why physicists almost never write \(2^x\) or \(10^x\) when they have a choice: they write \(e^{kx}\) with a constant \(k\) in the exponent. This isn't aesthetics – it's efficiency.

What Follows from This?

A function that is its own derivative has an important consequence: its growth depends only on itself. The slope at every point equals the current value. The larger the function, the faster it grows – and precisely proportionally so.

That's the equation \(f'(x) = f(x)\). And in the language of physics it becomes: the rate of change is proportional to the current state. This sounds abstract. In the next chapter we'll see what it means across four concrete disciplines.

Something to think about:

What other functions do you know that have a special relationship with differentiation? Sine and cosine return to themselves under double differentiation – with a sign change. No coincidence: \(\sin x = \text{Im}(e^{ix})\). Euler's formula connects everything.

Chapter 3 in one sentence:

\(e^x\) is the eigenfunction of the differentiation operator with eigenvalue 1 – the only function that remains completely unchanged by differentiation, without any correction factor.

Chapter 4

Growth, Decay, and a Single Equation

Here is the most important differential equation in the applied sciences:

$$\frac{dy}{dt} = k \cdot y$$

In plain English: the rate of change of \(y\) is proportional to \(y\) itself. The only solution is:

$$y(t) = y_0 \cdot e^{kt}$$

Depending on the sign of \(k\), the same equation describes completely different phenomena. Four examples from four disciplines:

Bacterial Growth (\(k > 0\))

An E. coli bacterium divides under ideal conditions every 20 minutes. Start: 1 bacterium. After 20 minutes: 2. After 40 minutes: 4. After one hour: 8. After 24 hours, the theoretical count would be \(2^{72} \approx 4.7 \times 10^{21}\) bacteria – more than the number of stars in the observable universe.

The growth coefficient is \(k = \ln 2 / T_\text{doubling} = \ln 2 / 20\,\text{min} \approx 0.0347\,\text{min}^{-1}\). And there appears \(\ln\) (the logarithm base e!) as the natural companion of e.

Radioactive Decay (\(k < 0\)): Carbon-14

C-14 is a radioactive carbon isotope formed in the atmosphere by cosmic radiation. Living organisms absorb it; after death the uptake stops and the C-14 decays. The half-life is 5,730 years.

$$N(t) = N_0 \cdot e^{-\lambda t}, \quad \lambda = \frac{\ln 2}{5730\,\text{yr}} \approx 1.21 \times 10^{-4}\,\text{yr}^{-1}$$

An old piece of wood still contains 73% of its original C-14. How old is it? We solve: \(0.73 = e^{-\lambda t}\), so \(t = -\ln(0.73)/\lambda \approx 2640\) years. That's radiocarbon dating – a measurement technique that rests entirely on \(e\).

Newton's Law of Cooling (\(k < 0\)): Coffee

A coffee at 90°C sits in a 20°C room. Newton's law of cooling states: the cooling rate is proportional to the temperature difference between the coffee and its surroundings.

$$\frac{dT}{dt} = -k(T - T_{\infty}) \quad \Rightarrow \quad T(t) = T_{\infty} + (T_0 - T_{\infty}) \cdot e^{-kt}$$

With \(T_0 = 90\,{}^\circ\text{C}\), \(T_\infty = 20\,{}^\circ\text{C}\), \(k \approx 0.1\,\text{min}^{-1}\) (typical for a ceramic mug), the coffee has cooled after 10 minutes to \(20 + 70 \cdot e^{-1} \approx 20 + 25.7 = 45.7\,{}^\circ\text{C}\). Remarkably accurate.

Capacitor Discharge (\(k < 0\)): Electronics

A charged capacitor (capacitance \(C\), initial voltage \(V_0\)) discharges through a resistor \(R\). The current is proportional to the voltage. The voltage drops as:

$$V(t) = V_0 \cdot e^{-t/(RC)}$$

The product \(RC\) is called the time constant \(\tau\). After one time constant, the voltage has dropped to \(V_0/e \approx 36.8\%\). Every electronics engineer knows this number by heart.

Four completely different systems – biology, physics, thermodynamics, electrical engineering – one equation. This is no coincidence. It's the mathematical consequence of a single assumption: the rate of change is proportional to the current state.

Chapter 4 in one sentence:

\(dy/dt = ky\) – one equation, four disciplines, one solution: always \(y(t) = y_0 \cdot e^{kt}\). e is the universal answer to proportional change.

Chapter 5

e in the Primes

Now things get philosophically uncomfortable. Prime numbers seem to embody chaos itself: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 – no pattern, no rule that predicts the next prime. And yet their global distribution holds a remarkable regularity – and e plays the starring role.

Gauss and the Density of Primes

Carl Friedrich Gauss was 15 or 16 years old (sources disagree) when he was studying a table of prime numbers and noticed: the larger the numbers get, the more thinly scattered the primes become. And the density follows a pattern.

Let \(\pi(x)\) be the number of primes less than or equal to \(x\). Gauss observed empirically:

$$\pi(x) \approx \frac{x}{\ln x}$$

This is the prime number theorem, proved in 1896 by Hadamard and de la Vallée Poussin. The \(\ln\) here is the natural logarithm – base e.

Concretely: around the number 1,000,000, roughly every \(\ln(1{,}000{,}000) = 6 \ln 10 \approx 13.8\)-th number is a prime. The actual count of primes up to one million is 78,498; the formula gives \(10^6 / \ln(10^6) \approx 72{,}382\) – an error of under 8%.

Why Does e Appear?

That's the deeper question. Primes are entirely discrete, entirely deterministic – and yet their distribution behaves as if it were a continuous, exponentially thinning process. The logarithm is the inverse of the exponential function. Its appearance in prime density hints that primes are evenly distributed on a logarithmic scale.

A more precise statement: if you pick a large number \(n\) at random, the probability that it's prime is approximately \(1/\ln n\). Primes become ever rarer – logarithmically, in the rhythm of e.

Riemann and the Error Terms

Gauss knew his formula was only asymptotically correct. Bernhard Riemann showed in 1859 how to describe the deviations precisely: through the zeros of the Riemann zeta function. The famous Riemann hypothesis – that all non-trivial zeros lie on the line \(\text{Re}(s) = 1/2\) – remains unproven to this day and is considered the most important unsolved problem in mathematics.

But that's another story. The key takeaway: e appears in the primes not because primes "grow" or "decay." It appears because the logarithmic scale is the natural scale for multiplicative structures. And the natural base of the logarithm is e.

Chapter 5 in one sentence:

The density of primes near \(x\) is \(\approx 1/\ln x\) – e appears not in the primes themselves, but in their statistical distribution, because primes are evenly scattered on a logarithmic scale.

Chapter 6

e in Chance

e also turns up in situations that, at first glance, have nothing to do with growth or logarithms: in pure combinatorics and probability theory. Three classic problems, all with e as the answer.

6a: The Hat Problem – Derangements

Imagine: \(n\) people check their hats at a cloakroom. The attendant has lost the tickets and returns the hats at random. What's the probability that nobody gets their own hat?

A permutation in which no element ends up in its original position is called a derangement. The number of derangements of \(n\) elements is:

$$D(n) = n! \cdot \sum_{k=0}^{n} \frac{(-1)^k}{k!}$$

The probability that a random permutation is a derangement equals:

$$P(D_n) = \frac{D(n)}{n!} = \sum_{k=0}^{n} \frac{(-1)^k}{k!} \approx e^{-1} = \frac{1}{e} \approx 0.3679$$

For \(n = 2\): exactly 50%. For \(n = 3\): \(2/6 = 33.3\%\). For \(n = 10\): already 36.79% – virtually indistinguishable from \(1/e\). From \(n \geq 5\) onward the probability stabilizes at \(1/e\), regardless of how many people are involved.

This is striking: the probability of total chaos is universal and depends on e. (Strictly speaking, this isn't a coincidence but a consequence of the exponential series: \(e^{-1} = \sum_{k=0}^\infty (-1)^k/k!\).)

6b: The Secretary Problem – the Optimal Strategy

You must choose the best from \(n\) applicants. You interview them one by one in random order. After each interview you must decide immediately: hire or pass. You can't go back to earlier candidates. How do you maximize your chance of finding the best?

The optimal strategy: reject the first \(r\) candidates outright (the learning phase), then hire the next one who is better than everyone before. Which \(r\) maximizes the probability of success?

The answer: \(r^* \approx n/e\). So you reject the first \(1/e \approx 37\%\) of applicants, then hire the next one who surpasses all previous. The success probability of this strategy converges to – you guessed it – \(1/e \approx 37\%\).

$$r^* = \lfloor n/e \rfloor, \qquad P(\text{choose best}) \to \frac{1}{e} \approx 0.3679$$

This is initially sobering: even the optimal strategy finds the best candidate only 37% of the time. But there's no better one. And both the optimal rejection threshold and the maximum success probability are \(1/e\) – the same number, from the same limit.

6c: The Bell Curve

The normal distribution – the Gaussian bell curve – is the most commonly occurring distribution in nature. Its density is:

$$f(x) = \frac{1}{\sigma\sqrt{2\pi}} \cdot e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$

The decisive term: \(e^{-x^2/2}\). The bell curve is an exponential function base e applied to the negative square. The further a value is from the mean, the exponentially less likely it is. This isn't a design choice – the central limit theorem guarantees that sums of many independent random variables always converge to the normal distribution.

Chapter 6 in one sentence:

e appears in probability through the exponential series: the probability of total chaos (derangements), the optimal rejection threshold (secretary problem), and the bell curve all converge to expressions involving \(e^{-1}\).

Chapter 7

e in Physics

Physics is full of e. Not because physicists find exponential functions beautiful – but because nature loves proportionality. Two particularly elegant examples: the distribution of energy in thermal systems, and the hanging chain.

The Boltzmann Factor: How Nature Distributes Energy

In a system in thermal equilibrium at temperature \(T\), the following holds: the probability of finding a state with energy \(E\) occupied is proportional to:

$$p(E) \propto e^{-E/(k_B T)}$$

This is the Boltzmann factor, named after Ludwig Boltzmann. \(k_B \approx 1.38 \times 10^{-23}\,\text{J/K}\) is Boltzmann's constant.

What does it say? The higher the energy of a state, the exponentially less likely it is at a given temperature. States with very high energy are exponentially rare. This applies to gas molecules, electrons in solids, chemical reactions (transition states), and cosmic radiation.

Why e? Because entropy is additive (it grows proportionally) and probability is multiplicative (independent systems multiply their probabilities). The only function that converts additive to multiplicative is the exponential function – base e.

This is no coincidence. It is the deepest reason why e is omnipresent in physics.

The Catenary: Gaudí's Models

What shape does a heavy chain take when suspended from two points? You might guess: a parabola. Galileo Galilei thought so. But he was wrong.

Leibniz, Huygens, and Johann Bernoulli (Jakob's brother) solved the problem in 1691: the shape is a catenary (Latin: catenaria), described by:

$$y = a \cdot \cosh\!\left(\frac{x}{a}\right) = a \cdot \frac{e^{x/a} + e^{-x/a}}{2}$$

e appears here twice over: \(\cosh\) is the hyperbolic cosine, defined as the average of \(e^x\) and \(e^{-x}\).

The Catalan architect Antoni Gaudí knew this result and used it brilliantly: he built hanging chain models from weights and strings for the Sagrada Família and then flipped them upside down. An inverted catenary stands under pure compression – no bending, no breaking. The Gateway Arch in St. Louis (1965) also takes the form of a (slightly modified) catenary.

Chapter 7 in one sentence:

The Boltzmann factor \(e^{-E/kT}\) arises because probability is multiplicative and entropy is additive – e is the only bridge. The catenary \(\cosh(x/a)\) is literally assembled from e^x.

Chapter 8

e in Information

Claude Shannon defined in 1948 how to measure information. He called it entropy, borrowing from the thermodynamic term – no coincidence, since the mathematics is identical:

$$H = -\sum_{i} p_i \cdot \ln p_i$$

When you use the natural logarithm (base e) instead of \(\log_2\), you measure entropy in "nats" instead of bits. In theory these are equivalent; in practice – especially in machine learning – nats are preferred.

Cross-Entropy in AI

Large language models like GPT or Claude are trained with a loss measure called cross-entropy loss:

$$\mathcal{L} = -\sum_{i} p_i \cdot \ln q_i$$

Here \(p\) is the true distribution (e.g., the next correct word) and \(q\) is the distribution predicted by the model. The model learns by minimizing \(\mathcal{L}\) – and this \(\mathcal{L}\) is full of \(\ln\), which means full of e.

So whenever you're formulating a thought in an LLM (like the one that may have helped format this text), deep in the model a calculation runs millions of times per training step in which e sits in the denominator. Euler would have appreciated that.

Anyone who has read the post on emergence in language models will find there the connection between cross-entropy and the emergence of complex language capabilities.

Why the Natural Logarithm?

You could measure entropy with \(\log_2\) instead. But then a factor of \(\ln 2 \approx 0.693\) would appear everywhere. The natural logarithm is the "right" base because it delivers the derivative \(\frac{d}{dx} \ln x = 1/x\) without any prefactors – keeping all optimizations clean.

Chapter 8 in one sentence:

Shannon entropy \(H = -\sum p_i \ln p_i\) and the cross-entropy loss of modern AI are both formulated in nats (base e), because the natural logarithm produces no correction factors.

Chapter 9

Stirling's Formula

Here's a surprise: e also appears in factorials. The factorial \(n!\) (read: "n factorial") is the product of all natural numbers from 1 to n. It grows astonishingly fast: \(10! = 3{,}628{,}800\), \(20! \approx 2.43 \times 10^{18}\), \(100! \approx 9.33 \times 10^{157}\).

James Stirling developed a remarkable approximation in 1730:

$$n! \approx \sqrt{2\pi n} \cdot \left(\frac{n}{e}\right)^n$$

Look at that: three of the most important constants in mathematics appear – \(e\), \(\pi\), and \(\sqrt{2\pi}\). Where does \(e\) come from? From the \((n/e)^n\) term. Where does \(\pi\) come from? From the Gaussian bell curve (more precisely: from the Wallis product and the Gauss integral \(\int_{-\infty}^\infty e^{-x^2}\,dx = \sqrt{\pi}\)).

How Good Is the Approximation?

For \(n = 10\): Stirling gives \(\sqrt{20\pi} \cdot (10/e)^{10} \approx 3{,}598{,}696\), whereas \(10! = 3{,}628{,}800\). Error: under 1%. For \(n = 100\) the relative error is less than 0.08%. As \(n \to \infty\) it is asymptotically zero.

Why e in Factorials?

Intuitively: \(n! = \prod_{k=1}^{n} k\). Taking the logarithm turns the product into a sum: \(\ln(n!) = \sum_{k=1}^n \ln k\). This sum can be approximated by an integral: \(\int_1^n \ln x\,dx = n\ln n - n + 1 \approx n\ln n - n\). Exponentiated, that gives \(n^n \cdot e^{-n}\) – and so e ends up inside \(n!\).

Stirling's formula is indispensable in combinatorics. Whenever you ask: "How many ways are there to arrange \(n\) things?" – in entropy calculations, statistical mechanics, information theory – you need factorials, and hence Stirling, and hence e.

Chapter 9 in one sentence:

\(n! \approx \sqrt{2\pi n}\cdot(n/e)^n\) – Stirling's formula reveals that e (and \(\pi\)!) are embedded deep in the structure of factorials, because taking logarithms converts products into sums that can be expressed via \(e^{-n}\).

Chapter 10

Why e and Not 2 or 3 or \(\pi\)?

We've now seen e in twelve contexts. But one question has gone unanswered: Why e and not some other number? What makes e so special that nature reaches for it again and again?

Every Exponential Function Is e to the Power of Something

The first argument is purely algebraic. Every exponential function with any base can be expressed through e:

$$a^x = e^{x \cdot \ln a}$$

Examples: \(2^x = e^{x \ln 2}\), \(10^x = e^{x \ln 10}\), \(\pi^x = e^{x \ln \pi}\). This means: e is always at the core of every exponential function. You can hide it, but you can't get rid of it.

That's why every differential equation of the form \(y' = y\) (with \(a^x\) instead of \(e^x\)) is ultimately an equation about e – just with a scaling factor in the exponent. e is not one base among many; it is the natural base on which all other bases depend.

The Deepest Answer: Proportionality and Continuity

Imagine a function \(f\) with the following property: its value at two time points adds up logarithmically but multiplies exponentially. That means:

$$f(x + y) = f(x) \cdot f(y)$$

This functional equation has – under minimal continuity assumptions – exactly one class of solutions: \(f(x) = e^{cx}\) for a constant \(c\). Every continuous function that converts additive inputs into multiplicative outputs is an exponential function base e.

That is the deepest reason. e doesn't appear because mathematicians like it. It appears because the multiplicativity of growth, probability, and energy necessarily forces a specific base – and that base is e.

Euler's Identity: The Final Chord

When we apply the exponential series to imaginary arguments, we get Euler's formula:

$$e^{i\theta} = \cos\theta + i\sin\theta$$

This is no magic – it's a direct consequence of the exponential series. For \(\theta = \pi\) we get the most famous formula in mathematics:

$$e^{i\pi} + 1 = 0$$

Five fundamental constants – \(e\), \(i\), \(\pi\), \(1\), \(0\) – in a single equation. None of the five was invented for this purpose; they all arrived from different directions and met here.

In the post on the Glass Bead Game we interpreted \(e^{i\theta}\) as a rotation in the complex plane – a rotating pointer that generates cosine and sine as projections. This is no metaphor; it is literally what \(e^{i\theta}\) means.

Euler's identity is not an isolated curiosity. It shows that e is the connection between growth (the real exponential function), oscillation (sine/cosine), and rotation (complex numbers). All three phenomena – as different as they seem – are aspects of the same function.

Chapter 10 in one sentence:

e is the universal base because it is the only continuous solution to the functional equation \(f(x+y) = f(x)\cdot f(y)\) – and Euler's identity shows that growth, oscillation, and rotation are just three faces of the same exponential function.

Epilogue

The Number That Is Always Waiting

We've followed e through ten chapters. In the interest of Bernoulli the banker. In the derivative that returns itself. In the decay of C-14 and the cooling of coffee. In the uncanny thinning of primes around large numbers. In the chaotic muddle of hats and the cool strategy of the secretary problem. In the hanging chain that Gaudí flipped, and in the Boltzmann factor that governs how nature distributes energy. In Shannon's entropy and in the loss that AI models minimize every day. And in Stirling's formula, which shows that even factorials can't do without e.

What is e? It's not a circle number like \(\pi\), born from geometric intuition. It's not a logical necessity like \(i\), born from an equation with no solution. e arose from a pragmatic question about compound interest – and turned out to be the foundation beneath most of the changing processes we know.

The common thread through all chapters was a single equation:

$$\frac{dy}{dt} = k \cdot y$$

The rate of change is proportional to the state. This is no obscure differential equation – it's the mathematical formulation of proportionality itself. And e is the answer. Not because mathematicians chose it, but because it's the only consistent solution.

There's something slightly unsettling about this realization. The number that doubles bacteria, decays uranium, cools coffee, distributes primes, shuffles hats, describes hanging chains, and trains AI – it's the same one. 2.71828…

Perhaps that's what Leibniz meant with his ars combinatoria and what Hesse dreamed of in the Glass Bead Game: that behind the different languages of the disciplines, the same patterns are waiting. Not as metaphor, but literally. e is one of those patterns – clear, precise, relentless.

It lurks in every compound interest contract. In every radioactive sample. In every prime number table. In every neural network. It waits. It was already there before we went looking.

Frequently Asked Questions

What exactly is Euler's number e?

e = 2.71828182845904… is an irrational and even transcendental number. It is the limit of \((1+1/n)^n\) as \(n \to \infty\) and simultaneously the sum of the series \(\sum_{n=0}^\infty 1/n!\). Leonhard Euler introduced the letter e around 1731, although the number itself was implicitly discovered by Jakob Bernoulli in 1683.

Why does e appear in so many different fields?

e appears whenever a quantity changes in proportion to its current value: \(dy/dt = ky\). This applies to bacterial growth, radioactive decay, cooling, compound interest, and capacitor discharge. Additionally, e appears in probability through the exponential series and in information theory through the natural logarithm.

What does it mean that e^x is its own derivative?

\(d/dx\, e^x = e^x\) means that the slope of \(e^x\) at every point equals the function value itself. This makes \(e^x\) the eigenfunction of the differentiation operator with eigenvalue 1 – the only function with this property (up to scaling). Any other base \(a^x\) produces a correction factor \(\ln a\) upon differentiation.

What does e have to do with prime numbers?

The prime number theorem states that the count of primes up to \(x\) is asymptotically equal to \(x/\ln x\). The natural logarithm (base e) appears because primes are evenly distributed on a logarithmic scale. The probability that a randomly chosen number near \(n\) is prime is approximately \(1/\ln n\).

What is Euler's identity and why is it considered the most beautiful formula in mathematics?

\(e^{i\pi} + 1 = 0\) connects the five most fundamental constants of mathematics: e (natural base), i (imaginary unit), \(\pi\) (circle number), 1 (multiplicative identity), and 0 (additive identity). It follows directly from the exponential series: \(e^{ix} = \cos x + i\sin x\), substituted at \(x = \pi\). All five constants were discovered independently and meet in this one equation.