Chapter 1
Try It Yourself
Before we talk — fly first.
Everything runs in the browser. No install, no download.
Fly now →Keyboard controls:
SPACE Flap wings A/D Turn W Dive S Climb T Webcam on/off
Did it work? Good. Then let me tell you how this thing came to life in a single afternoon.
Chapter 2
Phase 1 — The Scaffold
A Blank Canvas
Every project starts with a blank canvas. In our case: a terminal, npm create vite@latest, and the decision to go with Three.js as the 3D engine. No Unity, no Unreal — pure JavaScript code that runs directly in the browser.
Why Three.js? Because it has the lowest barrier to entry. No plugin, no app store, no download. The user opens a URL and flies. That was the vision from the start.
The first scene was simple: a camera, a sky, light.
const scene = new THREE.Scene();
const sky = new Sky();
sky.scale.setScalar(450000);
scene.add(sky);
const camera = new THREE.PerspectiveCamera(60, w/h, 2, 8000);
const renderer = new THREE.WebGLRenderer({ antialias: true });
In 20 minutes, the first empty scene with a blue sky was up. No terrain, no trees, no bird. But a beginning. And a beginning is all you need to build momentum.
Chapter 3
Phase 2 — A World from Nothing
How do you build a landscape without placing a single polygon by hand?
The answer: with math. More precisely, with parabolas. Imagine a flat table. Now place invisible hills on top — each defined by a center point, a radius and a height. At every point on the terrain you ask: how many of these hills overlap here? And add up their contributions.
function getTerrainHeight(x, z, arcs) {
let h = 0;
for (const arc of arcs) {
const dx = x - arc.cx, dz = z - arc.cz;
const distSq = dx * dx + dz * dz;
const contribution = 1 - distSq / (arc.radius * arc.radius);
if (contribution > 0) h += arc.height * contribution;
}
return h;
}
900 positive arcs form the hills. 270 negative arcs carve out valleys. A radial falloff ensures the island gently slopes into the sea at the edges. The result: a natural-looking landscape that looks slightly different every time you load it — but always plausible.
But a beautiful landscape is useless if it stutters. Computing 900 arcs per vertex? With a 400×400 grid, that’s over 140 million operations — per frame. That won’t fly.
Octree and Frustum Culling
The solution: an Octree divides the world into spatial cells. Instead of rendering all 100,000 trees and 900 buildings every frame, the engine simply asks: which cells are actually in the camera’s field of view? Frustum Culling discards everything outside the view cone. Out of 100,000 trees, typically only 3,000–5,000 are actually drawn.
Summary
900 parabolas + negative arcs + radial island falloff = a natural landscape, without ever placing a single vertex by hand.
Chapter 4
Phase 3 — The Bird Takes Flight
A Camera with a Fear of Heights
The terrain was there. But a bird that doesn’t fly is just a camera with a fear of heights.
The first approach was a simple arcade model: press a key, go up. Let go, sink down. It worked — technically. But it felt like an elevator with a panoramic window. No gliding, no lift, no flying.
So we built an aerodynamic model. The key insight: lift depends on the square of the velocity. That’s dynamic pressure — the same physics that keeps real birds and aircraft in the air.
const dynamicPressure = 0.5 * AIR_DENSITY * speed * speed;
const liftMag = dynamicPressure * wingArea * CL / BIRD_MASS;
velocity.y += baselineLift * dt; // Wing incidence angle
velocity.y += GRAVITY * dt; // -9.81 m/s²
The first wing flap produced… nothing. Gravity won. Every time. The bird fell like a rock, no matter how furiously you hammered the spacebar.
The problem: we had forgotten the wing incidence angle. Real bird wings aren’t flat — they have a built-in angle of attack of about 3°. This angle generates minimal lift even in level flight. Without it, the wing is a plank, and a plank falls.
It took six iterations. Six versions of the physics model until the balance of gravity, lift and wing flapping was right. Until you could press the spacebar and feel the bird climbing. Until a dive felt dangerous and a loop made your stomach clench.
Summary
Realistic physics ≠ fun physics. We needed 6 iterations before flapping actually gained altitude.
Chapter 5
Phase 4 — Hands Become Wings
The Wild Part
And then came the wild part. The idea: what if you didn’t use the keyboard, but your arms? What if you flapped your arms in front of the webcam and the bird in the game actually climbed?
MediaPipe Pose Detection makes this possible. Google’s ML model detects 33 body landmarks in real time — right in the browser, no server, no cloud. Shoulders, elbows, wrists. Everything we need.
The logic: we measure the height of both wrists relative to the shoulders. High hands = wings up. Low hands = wings down. The rate of change between them = wing flap.
const delta = Math.abs(recentAvg - olderAvg);
if (delta > 0.025) {
flapStrength = clamp(delta * 8, 0.5, 1);
}
The challenges were subtler than expected. The webcam image is mirrored — left is right. Sometimes landmarks disappear because the arm is outside the frame. And detection accuracy fluctuates with lighting.
Our solution: Graceful Degradation. When your hands leave the frame, the bird glides on gently — instead of crashing. When MediaPipe detects no pose, the system silently falls back to keyboard controls. No error, no popup, no frustration.
The feeling when you raise your arms for the first time and the bird climbs — that’s the moment the project tips from “technically interesting” to “I want to show this to people.”
Chapter 6
The Last 20%
80% of the Time
The last 20% takes 80% of the time. That’s universal. And in this project, the last 20% was mostly about one thing: textures.
Five Layers of Terrain
A single-color landscape looks like Play-Doh. So we built a custom shader with five texture layers: sand on the beaches, grass on the hills, earth at mid-elevation, rock on the steep slopes, snow on the peaks. Each layer blends smoothly into the next.
float grassFactor = smoothstep(sandEnd, sandEnd + 8.0, h)
* (1.0 - smoothstep(grassEnd - 8.0, grassEnd + 8.0, h));
float rockFactor = smoothstep(earthEnd, earthEnd + 5.0, h)
* (1.0 - smoothstep(rockEnd - 8.0, rockEnd + 8.0, h));
float snowFactor = smoothstep(rockEnd - 8.0, rockEnd + 8.0, h);
The snow texture was gray. We spent three hours hunting the bug in the shader logic. We checked the blending factors, the height thresholds, the UV coordinates. Everything looked correct. Until we realized: the logic was correct. The texture just wasn’t white enough. A 200×200 pixel JPEG with an average value of 180 instead of 250. Three hours of debugging for an image editing problem.
Buildings and Settlements
An island without civilization feels empty. So we generated buildings — five types: houses, barns, towers, hotels and high-rises. Each with procedural brick or concrete textures, random proportions and color variations.
The placement uses noise functions: settlements emerge in flat areas near the coast. Forests take over the hills and mountains. A second noise layer separates urban zones from forest zones, so trees don’t grow through rooftops.
Summary
Snow-covered peaks, 900 buildings, 100,000 trees — all procedurally generated.
Chapter 7
What This Says About AI and Software Development
Human and Machine
Let’s be honest: this project would have taken weeks without AI. Not because the concepts are hard — Three.js documentation is excellent, MediaPipe has examples. But because the iteration was so much faster.
The roles were clear:
The human provided vision, aesthetic judgment and the irreplaceable “this feels wrong.” No prompt in the world could have automated the realization that the wing flap felt “mushy.” Or that the trees looked like green sticks from above.
The AI provided architecture, boilerplate, debugging and an iteration speed no single developer can match. When the human said “more trees,” the AI added 100,000 — complete with octree optimization, LOD system and frustum culling so performance wouldn’t tank.
But — and this is the point — the AI didn’t improve the trees on its own. It placed 100,000 identical green cones and was satisfied. Only when the human said “those look like plastic” did texture variations, different tree types and color gradations appear.
The Future
In five years, any teenager will be able to build something like this in an afternoon. The tools will get better, the models smarter, the barrier to entry lower. But the decisive factor will remain the same: someone has to know what looks good, feels right and is fun.
The bird flies. But it doesn’t fly because the AI wanted it to on its own. It flies because a human said: “This needs to feel like flying.” And then iterated until it did.
Convinced? Or at least curious?
Fly now →