nx1.info | Creating a realtime Music Visualizer
What is digital music?
At the most fundemental level, a digital audio file is simply a series of numbers written
in binary. We can see these numbers by using the python soundfile library (pip install soundfile).
read_flac.py
------------
import numpy as np
import soundfile as sf
file_path = 'Troya - Schizophrenic.flac'
data, sample_rate = sf.read(file_path)
print(f'Sample Rate = {sample_rate}')
for d in data:
print(d)
Output
------
Sample Rate = 44100
[ 0.00665283 -0.00546265]
[ 0.00637817 -0.00521851]
[ 0.00628662 -0.00494385]
[ 0.0065918 -0.00500488]
[ 0.00646973 -0.0050354 ]
[ 0.00643921 -0.00506592]
[ 0.00640869 -0.00509644]
[ 0.00628662 -0.00531006]
[ 0.0062561 -0.00558472]
[ 0.00613403 -0.00552368]
[ 0.00601196 -0.00561523]
[ 0.00582886 -0.00546265]
The left and right columns correspond to signal S(t) the normalized amplitudes (0 < S(t) < 1) of
the left and right speaker channels.
To decode the series, we require some key pieces of meta information, for my flac file these are:
- Sample Rate (f_s) = 44100 Hz
- Bits per Sample = 16 bits
- Bitrate (R) = 904 kbps
- Number of Channels = 2 Channels
Each sample in our flac file is made up of 16 bits, we can see how the bits are made up by converting
the decimal representation to binary.
to convert the signal float values to 16bit values we do
data = (data * (2**15)).astype(np.int16)
and to print the values and their binary representation we can change our print function to:
print(f'{d} {d[0]:016b} {d[1]:016b}')
This now outputs:
[ 218 -179] 0000000011011010 -000000010110011
[ 209 -171] 0000000011010001 -000000010101011
[ 206 -162] 0000000011001110 -000000010100010
[ 216 -164] 0000000011011000 -000000010100100
[ 212 -165] 0000000011010100 -000000010100101
[ 211 -166] 0000000011010011 -000000010100110
[ 210 -167] 0000000011010010 -000000010100111
[ 206 -174] 0000000011001110 -000000010101110
[ 205 -183] 0000000011001101 -000000010110111
[ 201 -181] 0000000011001001 -000000010110101
[ 197 -184] 0000000011000101 -000000010111000
[ 191 -179] 0000000010111111 -000000010110011
[ 185 -184] 0000000010111001 -000000010111000
[ 182 -185] 0000000010110110 -000000010111001
[ 184 -178] 0000000010111000 -000000010110010
A signed 16 bit number allows to hold any integer value plus or minus 2^15=32768 for a total of 2^16=65536 unique values.
If every sample in our flac file is 16 bits (2 bytes), and we have 44100 of these
samples every second and two channels, surely our file has a bitrate of
2 * 16 * 44,100 = 1,411,200 / 1024 = 1378.125 kbps ? As would any other 16 bit audio file.
But no, my file has a bitrate of 904 kbps, this is due to the flac compression which I wont cover in detail here.
see:
https://en.wikipedia.org/wiki/Sampling_(signal_processing)
https://en.wikipedia.org/wiki/Bit_rate
https://en.wikipedia.org/wiki/Audio_bit_depth
https://en.wikipedia.org/wiki/FLAC
How can we process the stream of music?
Now we know what digital music is, we need to think about how we can use a continually ariving stream
of numbers to drive some sort of visualization.
The simplest visualization I can think of, is simply to print the value of the signal for every frame
as the music plays. This involves to things 1. Playing the music, 2. printing the value.
Streaming the Sample Data
Using pygame, we can have a loop that takes a chunk of samples and performs some operation on it.
In this example I have calculated the fft and mean of a 2**15=32768 sized chunk of audio.
Game runs at 60fps with a average frame time of 17ms.
However since 2**15 samples turns out to be less than a second, so we probably would like to increase this.
Running at 2**20=1,048,576 (~23secs) we begin to see the frame-rate drop below 60fps to 50 with our 400x400
output.
10^19 samples gives a average frametime of ~34ms, we'll use this for now.
audio_streaming.py
------------------
import numpy as np
import pygame, sys
from pygame.locals import *
import soundfile as sf
file_path = 'Troya - Schizophrenic.flac'
data, sample_rate = sf.read(file_path)
data_T = data.T
data_int16 = (data * (2**15)).astype(np.int16)
fft_chunksize = 2**15
pygame.init()
screen = pygame.display.set_mode((500, 400))
clock = pygame.time.Clock()
font = pygame.font.SysFont(name='consolas', size=15)
antialias = False
color = (255, 255, 255)
while True:
t = pygame.time.get_ticks()
mspf = clock.tick(60) # Miliseconds per frame
fps = clock.get_fps()
d = data_int16[t]
d_l, d_r = data_T[0], data_T[1]
chunk_start = t + 10
chunk_end = t + 10 + fft_chunksize
chunk = d_r[chunk_start:chunk_end]
fft = np.fft.fft(chunk)
fft_mean = np.mean(fft.real)
screen.fill((0, 0, 0))
screen.blit(font.render(f'{mspf}', antialias, color), (50,10))
screen.blit(font.render(f'{fps}', antialias, color), (50,30))
screen.blit(font.render(f'{t}', antialias, color), (50,50))
screen.blit(font.render(f'{d_l[t]:.2f} {d_r[t]:.2f}', antialias, color), (50,70))
screen.blit(font.render(f'{d[1]:016b}', antialias, color), (50,90))
screen.blit(font.render(f'{d[0]:016b}', antialias, color), (50,110))
screen.blit(font.render(f'{chunk_start} - {chunk_end} ({fft_chunksize})', antialias, color), (50,150))
screen.blit(font.render(f'{fft_mean:.2f}', antialias, color), (50,170+5*fft_mean))
v1 = abs(225 * fft_mean)
v2 = np.sin(v1)
pygame.draw.rect(screen, (255,v1,v1), [250, v1, v1, 50], 4)
pygame.display.update()
The structure of dance music.
Certain styles of electronic music, notably house, techno and trance are keen to operate on
a 16 or 32 beat structure for the addition and subtraction of elements.
This means that in theory, if we can detect the beats per minute (BPM) of the track we might
create a visualization that is lagged by 1 or more beats from the actual input signal, but then
predict where the next structural change of the track is going to be.
actual (current)beat = 33 (addition of a hi-hat)
visualization beat = 32 (What the visualization is showing)
beat delay = 1 We know we are 1 beat away from changing the visualization.
if the BPM=134 for example, a 1 beat=134/60=0.44 seconds This means that as long as we can
do the frame drawing in 0.44 seconds, we can just sync everything up 1 beat behind.
BELIEVE IT OR NOT THIS IS A MUSICAL INSTRUMENT... LOOK MUM NO COMPUTER
https://www.youtube.com/watch?v=UcbkyYdV2kE
// python
// User selects sin wave.
//
// User selects Acos(kx)
// Display a circular sine wave
- Norman Khan 2024 (nx1.info)