Sound, Waves, & Music

I have always loved music. I've played piano since I was a kid and taught myself the guitar during COVID. But lately I have become curious about what music actually is at a physical level. What is a sound wave? How does a vinyl groove hold an entire orchestra? Why do some chords feel tense and others feel resolved? This page is my attempt to teach myself by teaching you. We'll start simple and build up to how recorded sound actually works. Stick with me.

What is Sound?

Sound is a pressure wave. When something vibrates, it pushes the air around it, which pushes the air next to that, and so on, creating a disturbance that travels outward from the source. What reaches your ear is just air pressure rising and falling very rapidly.

The simplest possible sound is a pure tone, a single, perfectly regular oscillation. Mathematically, it is a sine wave. Every other sound you have ever heard is built from combinations of these.

Two things define a pure tone:

Frequency -- how many times the wave completes a full cycle per second, measured in Hz. This is what you hear as pitch. 440 Hz is the note A4, the standard concert tuning reference.
Amplitude -- how large the pressure swings are. This is what you hear as loudness.

Play with the sliders below. Notice how raising the frequency makes the pitch higher and raising the amplitude makes it louder. The wave on screen changes shape in real time.

Pure Sine Wave

Frequency 440 Hz

Amplitude 50%

▶ Go deeper: the sine wave equation A pure tone is described by: y(t) = A \cdot sin(2π \cdot f \cdot t) Where A is amplitude, f is frequency in Hz, and t is time in seconds. The 2π converts cycles to radians, which is the natural unit for circular motion. A sine wave is literally what you get when you take a point moving in a circle and project its position onto a straight line. At 440 Hz the wave completes 440 full cycles every second. The human ear can detect roughly 20 Hz to 20,000 Hz. Below 20 Hz you feel it more than hear it. Above 20,000 Hz most adults cannot detect it at all, though the range shrinks as you age.

Pure Tones vs Real Instruments

Here is something worth sitting with: a piano and a violin can both play the note A4 at 440 Hz. If it's the same frequency, why do they sound completely different?

Because no real instrument produces a pure sine wave. When a piano string vibrates at 440 Hz, it also vibrates at 880 Hz, 1320 Hz, 1760 Hz, and beyond. These are called harmonics, and they are exact integer multiples of the original frequency (called the fundamental).

Different instruments emphasize different harmonics. A clarinet's cylindrical tube physics strongly emphasize the 1st, 3rd, and 5th harmonics (odd-numbered ones, meaning every other step up the harmonic ladder) while suppressing the 2nd, 4th, and 6th. A violin bow excites both odd and even harmonics but in different proportions. A flute is close to a pure sine wave, which is why it sounds clean and breathy. These harmonic fingerprints are what give instruments their distinct character, a quality called timbre.

Try the presets below. The left shows the pure fundamental tone. The right shows what happens when harmonics are added.

Pure tone (fundamental only)

With harmonics added

Fundamental (220 Hz) 1.00

2nd harmonic (440 Hz) 0.00

3rd harmonic (660 Hz) 0.00

4th harmonic (880 Hz) 0.00

5th harmonic (1100 Hz) 0.00

Why does everything still sound like a synthesizer? Because the browser is generating mathematically perfect waves. A real flute player's breath creates tiny imperfections, pressure variations, and turbulence that no equation fully captures. These imperfections are not flaws -- they are what makes sound feel human and alive. A speaker can reproduce a flute recording perfectly because it faithfully plays back those imperfections. But when the source is a perfect sine wave, there is nothing to reproduce except the math.

▶ Go deeper: Fourier series Any periodic wave can be written as a sum of sine waves at integer multiples of a fundamental frequency. This is a Fourier series : y(t) = A₁\cdotsin(2πf₀t) + A₂\cdotsin(4πf₀t) + A₃\cdotsin(6πf₀t) + ... The coefficients A₁, A₂, A₃... determine how much of each harmonic is present. The full set of these coefficients is called the spectrum of the sound. A clarinet has strong A₁, A₃, A₅ and weak A₂, A₄, A₆. A violin has both. A flute is mostly just A₁. Jean-Baptiste Joseph Fourier proved in 1822 that any periodic function can be decomposed this way. It is one of the most useful results in all of physics and engineering, and it is what makes recorded audio, image compression, and spectrograms possible.

Adding Waves Together

When two sound waves occupy the same space at the same time, they combine by simple addition. At every moment, the air pressure is just the sum of what each wave would have caused on its own. This is called superposition.

Sometimes the waves reinforce each other (constructive interference) and sometimes they partially cancel (destructive interference). Try it below. Notice how two clean sine waves can add up to something that looks surprisingly tangled.

Wave A

Wave A Frequency 220 Hz

Wave A Amplitude 70%

Wave B

Wave B Frequency 330 Hz

Wave B Amplitude 50%

A + B Combined

▶ Go deeper: interference and beating Superposition is just addition: y(t) = y₁(t) + y₂(t) When two waves have slightly different frequencies, they interfere to produce beating : the combined sound pulses in loudness at a rate equal to the difference in frequencies. Play 440 Hz and 444 Hz together and you will hear the volume pulse 4 times per second. Musicians use this to tune instruments -- when two strings are perfectly in tune, the beating disappears entirely. Noise-canceling headphones exploit destructive interference in the other direction. A microphone samples the incoming noise, the electronics generate an inverted copy of that wave, and the two cancel each other out.

Cancellation and Noise-Canceling

Superposition goes both ways. If two waves can reinforce each other, they can also cancel each other out. Take any wave, flip it upside down (invert it), and add the two together. Every peak meets a trough of equal size. The result is silence.

This is called destructive interference, and it is the exact principle inside noise-canceling headphones.

Here is how it works in practice: a tiny microphone on the outside of the headphone samples the incoming noise. The headphone's electronics analyze that wave and generate an inverted copy in real time. That inverted copy plays through the speaker alongside your music. The noise and its inverted copy cancel each other out. Your music, which was not inverted, plays through fine.

Try it below. Wave A is the noise. Wave B starts as a perfect inverted copy. The combined result is silence. Then use the imperfection slider to see what happens when the cancellation is not quite right.

Wave A (noise)

Noise Frequency 180 Hz

Noise Amplitude 80%

Wave B (inverted copy)

Cancellation Quality Perfect

Result: silence

Why noise-canceling works better on some sounds than others. The electronics need a moment to sample the incoming noise and generate the inverted copy. For steady, predictable sounds like engine hum or air conditioning (low frequency, slow-changing waves), the system has plenty of time to catch up and cancellation is nearly perfect. For sudden or rapidly changing sounds like a voice or a door slamming, the wave changes faster than the system can respond and cancellation is incomplete. This is why noise-canceling headphones feel like magic on a plane but only muffle voices rather than eliminating them.

The same thing happens in audio mixing. If you accidentally invert one track in a stereo recording and combine it with the original, the two cancel each other and the instrument vanishes from the mix entirely. Audio engineers call this a "phase issue" and it is one of the more disorienting things that can happen in a recording session -- everything looks fine on screen but a whole instrument disappears when you hit play.

▶ Go deeper: phase and the inverted wave Inverting a wave is the same as shifting its phase by 180 degrees (or π radians): -sin(2πft) = sin(2πft + π) When you add a wave to its phase-shifted copy: sin(2πft) + sin(2πft + π) = 0 . Perfect cancellation. In practice, imperfections come from two sources. First, amplitude mismatch: if the inverted copy is slightly louder or quieter, cancellation is incomplete and you hear a faint residual wave. Second, phase mismatch: if the inverted copy arrives slightly early or late (even by microseconds), the peaks and troughs do not align perfectly. The slider above simulates both effects simultaneously. Real noise-canceling headphones use digital signal processing to minimize both in real time, typically achieving 20 to 30 dB of noise reduction on steady low-frequency sounds.

Chords and Harmony

A chord is superposition applied to musical notes. When you play a C major chord, you are combining three notes simultaneously: C, E, and G. The reason some combinations sound pleasant and others sound tense comes down to frequency ratios.

Simple ratios produce waves that align periodically in a predictable, regular way. Your brain perceives this regularity as consonance, or "pleasantness." Complex ratios take much longer to repeat and sound more tense or unresolved.

Start simple: one note, then octaves. An octave is exactly a 2:1 frequency ratio. C4 is 261.63 Hz. C5 is 523.25 Hz. They sound so similar we give them the same name.

Now build up to full chords and extensions. Notice how adding the 7th creates a tension that wants to move somewhere.

Select a chord or interval above

Why does C7 want to resolve to F? The dominant 7th chord (C7) contains a tritone interval between E and Bb -- a ratio so complex (roughly 32:45) that it creates strong auditory tension. The closest simple-ratio landing spot that resolves both notes by small movements is F major. The E moves up a half step to F, the Bb moves down a whole step to A. Your brain has learned to anticipate this resolution after years of hearing it, which is why it feels almost inevitable.

▶ Go deeper: frequency ratios and equal temperament In just intonation (pure mathematical ratios), a perfect fifth is exactly 3:2. An octave is 2:1. A major third is 5:4. These ratios produce perfectly regular combined waves. But here is the problem: if you stack 12 perfect fifths (3:2 each time), you should end up exactly 7 octaves higher. You do not. You end up slightly sharp -- a discrepancy called the Pythagorean comma . This means you cannot tune a keyboard so that every key has perfect ratios simultaneously. The solution used today is equal temperament : divide the octave into 12 equal semitones, each with a frequency ratio of the 12th root of 2 (about 1.0595). No interval except the octave is perfectly pure, but all are close enough. A440 is 440 Hz, A5 is 880 Hz. The note E above A440 is 440 \times 2^(7/12) = 659.3 Hz rather than the pure 660 Hz (3:2 ratio). The difference is tiny but audible to trained ears.

Spectrograms

A waveform shows amplitude over time but hides what frequencies are present. A spectrogram reveals the full picture: frequency on the y-axis, time on the x-axis, brightness showing how much energy is at each frequency at each moment.

A pure tone shows as a single horizontal line. A chord shows multiple lines, one per note. Speech shows shifting, complex patterns. The spectrogram is basically an X-ray of sound.

Use the buttons below to play tones directly into the spectrogram and watch it update live.

Live Spectrogram

-5s -4s -3s -2s -1s now

This is basically how Shazam works. Shazam analyzes the spectrogram of a recording and identifies the brightest peaks, their frequencies, and their timing relationships. It turns those peaks into a compact "fingerprint." That fingerprint gets matched against a database of pre-computed fingerprints for millions of songs. The reason it works even with background noise is that the strongest spectral peaks tend to survive noise better than quieter features. The core algorithm was published in 2003 and is surprisingly elegant.

▶ Go deeper: the Fourier Transform and STFT A spectrogram is computed using the Short-Time Fourier Transform (STFT) . Take a short window of the audio signal, compute its Fourier Transform to decompose it into component frequencies, then slide the window forward in time and repeat. The Fourier Transform of a signal x(t) is: X(f) = \int x(t) \cdot e^(-2πift) dt In practice we use the Fast Fourier Transform (FFT) algorithm, which computes this in O(n log n) rather than O(n²). The Web Audio API's AnalyserNode is running an FFT right now to generate the visualization above. There is a fundamental tradeoff: more precise frequency resolution requires longer time windows, but longer windows mean less precise time resolution. This is the time-frequency uncertainty principle, a direct analog of Heisenberg's uncertainty principle in quantum mechanics. They share the same mathematical root.

How Vinyl Works

Now we can actually answer the original question.

In a recording studio, every instrument produces sound waves that travel through the air and hit microphones. The microphones convert those pressure waves into electrical signals. Those signals are then summed together into a single combined waveform representing the total sound at every moment. One waveform. All instruments included.

That combined signal is physically cut into a vinyl groove as a lateral wiggle. A string quartet and a full orchestra both end up as one continuous line, just a more complex one. The groove does not store each instrument separately. It stores their sum.

When you play the record, the needle tracks that wiggle. Its motion is converted back to an electrical signal, amplified, and sent to your speakers. The speakers push and pull the air in exactly the pattern of the original combined wave. That wave reaches your ears. And then something remarkable happens: your inner ear decomposes it. The basilar membrane in your cochlea is tuned to respond to different frequencies at different positions along its length. High frequencies trigger cells near the base, low frequencies near the apex. Your ear runs a biological Fourier Transform, separating the combined wave back into components, letting you hear the violin and the trumpet and the piano as distinct sounds even though they all traveled to you as one wave.

The full chain: instruments produce waves, waves are summed, the sum is stored as one groove, the groove is replayed as one wave, your ear decomposes it back into components. The math making this possible is Fourier analysis. Nature invented it in your cochlea. Fourier wrote it down in 1822.

Use the buttons below to add instruments one at a time. The faint colored lines are individual waves. The bright line is their sum -- what goes into the groove.

The groove: individual waves (faint) and their sum (bright)

Playground

You have the theory. Now just play around.

Tone Generator

Frequency 440 Hz

Waveform

Volume 40%

Chord Builder

Toggle notes on and off to build any combination.