AudioVisual works in Monolith
This aims to be a best practices guide/tutorial to making audio-visual works in Monolith. The code described below will generate a square wave in the sound domain, and a square in the visual domain. An LFO signal is used to modulate both the frequency of the square wave and the scaling factor of the square.
singing_square.scm is main file. It's scheme code which
controls the show and renders sound.
(monolith:load "ugens.scm") <<singing-square>> <<render-block>> (define (render-singing-square fps dur) <<enable-offline-mode>> <<realloc-blocksize>> <<init-janet>> <<setup-video>> <<setup-sound>> <<render>> <<finish>> ) (render-singing-square 60 10)
singing_square.janet is a janet file used to draw the
<<gfx-init>> <<draw>> <<render-frame-block>>
singing_square.sh is a little shell script that runs
monolith, converts wav to mp3 via
lame, and then
generates an mp4 file via
<<run-monolith>> <<mp3-conversion>> <<ffmpeg>>
How to run
First, tangle up all the code using
worgle from the
Once rendered, run the generated shell script:
With any luck, a file called
AV stuff is limited to offline rendering only. This is because there no way to render video in realtime.
Monolith will render a h264 video file (via ), and an audio file (wav). These two are then stitched
together into an mp4 file via ffmpeg. The wav file can be
optionally converted to an mp3 file via
lame. This is
done because some video players don't support wav.
It usually ends up that sounds are created in a realtime configuration, then rendered with the video later. Video design tends to be more guess-and-wait-and-check.
In order to set up the renderer, monolith must be
offline mode. This can be done with
The internal patchwerk configuration is reallocated to be
a block size of 49 with
monolith:realloc. This is
done to make blocks line up with frames better, as the
default size of 64 does not work.
49 divides samples up evenly when the sampling rate is
44.1kHz (63 also works, may want to try that out).
(monolith:realloc 8 10 49)
Graphics are pretty much always done using Janet, which is embedded in Monolith, and controlled
from inside of Scheme. Janet is initialized with
It is best to have a top-level Janet file to import, then a top-level janet function to initialize stuff with.
(monolith:janet-init) (monolith:janet-eval "(import singing_square)") (monolith:janet-eval "(singing_square/gfx-init)")
A h264 video file is opened using the
I tend to prefer using framerate of 60 fps.
(monolith:h264-begin "singing_square.h264" fps)
Now a monolith patch is created and set up to render to a wavfile. More on this later.
(singing-square) (wavout zz "singing_square.wav") (out zz)
Finally, the actual rendering happens. This is done
monolith:repeat function, which calls a
function a certain number of times. Each time the function
is called, a new frame is written along with a block of
audio that encompasses the frame. Multiplying the intended
duration in seconds by the FPS will get the number of frames
needed to be rendered.
(monolith:repeat render-block (* dur fps))
render-block is a defined scheme function which is in
charge of rendering a frame of video, and a block of sound.
I will often put Janet in charge of rendering the frame
block instead of Scheme, so this function simply evaluates
a Janet function with no arguments.
(define (render-block) (monolith:janet-eval "(singing_square/render-block)"))
After rendering, things are wrapped up with
That's the overall structure of the program!
gfx-init is called from janet. at the very least, this
initializes the framebuffer.
(defn gfx-init  (monolith/gfx-fb-init))
After lots of trial and error, I've found that the cleanest approach to for creating a frame-block is to draw and then compute the block before appending. This is the best approach because it guarantees that something gets drawn on the first frame. Some AV latency issues may occur because of this, but there are some hacks with delays I do to correct this which are tolerable.
monolith/compute is used to compute the block. The block
size is determined with
sr / fps, where
sr is the
sampling rate, and
fps is the frames per second. In other
words, this tells you how many samples of audio are needed
to compute one frame of video.
It's helpful to have some kind of progress. One thing to do is to keep track of and print the frame position at every second (every 60 frames, in this case).
(var framepos 0) (var fps 60) (var sr 44100) (defn render-block  (draw) (if (= (% framepos fps) 0) (print framepos)) (monolith/compute (math/floor (/ sr fps))) (monolith/h264-append) (set framepos (+ framepos 1)))
The Singing Square
What to draw? How 'bout a nice blue square. The scaling of the rectangle can be modulated by some signal in the audio domain, stored in channel 0.
(defn draw  (var allports @[0x32 0x72 0x9c]) (var blue-romance @[0xd2 0xf9 0xde]) (def scale (monolith/chan-get 0)) (def size (+ 30 (* 80 scale))) (def cx (/ (monolith/gfx-width) 2)) (def cy (/ (monolith/gfx-height) 2)) (def x (math/floor (- cx (/ size 2)))) (def y (math/floor (- cy (/ size 2)))) (monolith/gfx-fill (blue-romance 0) (blue-romance 1) (blue-romance 2)) (monolith/gfx-rect-fill x y size size (allports 0) (allports 1) (allports 2)))
What to squawk? How 'bout a nice filtered square oscillator, whose frequency is modulated by a sinusoidal LFO? A copy of this LFO will be stored in monolith channel 0 to scale the square mentioned previously.
(define (singing-square) (biscale (sine 0.2 1) 0 1) (bdup) (monset zz 0) (scale zz 48 60) (sine 6 0.3) (add zz zz) (mtof zz) (blsquare zz 0.5 0.5) (butlp zz 1000) <<some-reverb>> )
Oh heck. Let's add some reverb too. Or as John Chowning (allegedly) calls it, "adding some ketchup".
(bdup) (bdup) (revsc zz zz 0.93 10000) (bdrop) (mul zz (ampdb -20)) (dcblock zz) (add zz zz)
Running and Rendering
So! That's all the basic parts. The scheme file can be
monolith from inside the
./monolith -l p/monolith.scm singing_square.scm
Two files will be generated,
Encode wav to mp3 with lame:
lame --preset insane singing_square.wav
Then, stitch things into an mp4 file with ffmpeg:
ffmpeg -y -i singing_square.mp3 \ -i singing_square.h264 \ -vf format=yuv420p singing_square.mp4
Colorspace is manually converted to
because monolith by default saves to the
yuv444. yuv444 is
much better for pixel-art style videos where every pixel
counts, but it's not always supported in video players.
yuv420 is used to maximize portability.