AudioVisual works in Monolith

AudioVisual works in Monolith

This aims to be a best practices guide/tutorial to making audio-visual works in Monolith. The code described below will generate a square wave in the sound domain, and a square in the visual domain. An LFO signal is used to modulate both the frequency of the square wave and the scaling factor of the square.

Tangled Files

singing_square.scm is main file. It's scheme code which controls the show and renders sound.

<<singing_square.scm>>=
(monolith:load "ugens.scm")

<<singing-square>>
<<render-block>>

(define (render-singing-square fps dur)
    <<enable-offline-mode>>
    <<realloc-blocksize>>
    <<init-janet>>
    <<setup-video>>
    <<setup-sound>>
    <<render>>
    <<finish>>
)

(render-singing-square 60 10)

singing_square.janet is a janet file used to draw the square.

<<singing_square.janet>>=
<<gfx-init>>
<<draw>>
<<render-frame-block>>

singing_square.sh is a little shell script that runs monolith, converts wav to mp3 via lame, and then generates an mp4 file via ffmpeg.

<<singing_square.sh>>=
<<run-monolith>>
<<mp3-conversion>>
<<ffmpeg>>

How to run

First, tangle up all the code using worgle from the top-level directory:

worgle doc/audiovisual.org

Once rendered, run the generated shell script:

sh singing_square.sh

With any luck, a file called singing_square.mp4 will appear.

Main setup

AV stuff is limited to offline rendering only. This is because there no way to render video in realtime.

Monolith will render a h264 video file (via h264, and an audio file (wav). These two are then stitched together into an mp4 file via ffmpeg. The wav file can be optionally converted to an mp3 file via lame. This is done because some video players don't support wav.

It usually ends up that sounds are created in a realtime configuration, then rendered with the video later. Video design tends to be more guess-and-wait-and-check.

Offline Mode

In order to set up the renderer, monolith must be started in offline mode. This can be done with monolith:start-offline.

<<enable-offline-mode>>=
(monolith:start-offline)

Block Reallocatoin

The internal graforge configuration is reallocated to be a block size of 49 with monolith:realloc. This is done to make blocks line up with frames better, as the default size of 64 does not work. 49 divides samples up evenly when the sampling rate is 44.1kHz (63 also works, may want to try that out).

<<realloc-blocksize>>=
(monolith:realloc 8 10 49)

Janet Setup

Graphics are pretty much always done using Janet, which is embedded in Monolith, and controlled from inside of Scheme. Janet is initialized with (monolith:janet-init).

It is best to have a top-level Janet file to import, then a top-level janet function to initialize stuff with.

<<init-janet>>=
(monolith:janet-init)
(monolith:janet-eval "(import singing_square)")
(monolith:janet-eval "(singing_square/gfx-init)")

h264 setup

A h264 video file is opened using the monolith:h264-begin. I tend to prefer using framerate of 60 fps.

<<setup-video>>=
(monolith:h264-begin "singing_square.h264" fps)

Patch Setup

Now a monolith patch is created and set up to render to a wavfile. More on this later.

<<setup-sound>>=
(singing-square)
(wavout zz "singing_square.wav")
(out zz)

Rendering

Finally, the actual rendering happens. This is done using the monolith:repeat function, which calls a function a certain number of times. Each time the function is called, a new frame is written along with a block of audio that encompasses the frame. Multiplying the intended duration in seconds by the FPS will get the number of frames needed to be rendered.

<<render>>=
(monolith:repeat render-block (* dur fps))

The render-block is a defined scheme function which is in charge of rendering a frame of video, and a block of sound. I will often put Janet in charge of rendering the frame block instead of Scheme, so this function simply evaluates a Janet function with no arguments.

<<render-block>>=
(define (render-block)
  (monolith:janet-eval "(singing_square/render-block)"))

Finishing up

After rendering, things are wrapped up with monolith:h264-end.

<<finish>>=
(monolith:h264-end)

That's the overall structure of the program!

Janet Stuff

gfx-init is called from janet. at the very least, this initializes the framebuffer.

<<gfx-init>>=
(defn gfx-init []
    (monolith/gfx-fb-init))

After lots of trial and error, I've found that the cleanest approach to for creating a frame-block is to draw and thencompute the block before appending. This is the best approach because it guarantees that something gets drawn on the first frame. Some AV latency issues may occur because of this, but there are some hacks with delays I do to correct this which are tolerable.

monolith/compute is used to compute the block. The block size is determined with sr / fps, where sr is the sampling rate, and fps is the frames per second. In other words, this tells you how many samples of audio are needed to compute one frame of video.

It's helpful to have some kind of progress. One thing to do is to keep track of and print the frame position at every second (every 60 frames, in this case).

<<render-frame-block>>=
(var framepos 0)
(var fps 60)
(var sr 44100)
(defn render-block []
    (draw)
    (if (= (% framepos fps) 0) (print framepos))
    (monolith/compute (math/floor (/ sr fps)))
    (monolith/h264-append)
    (set framepos (+ framepos 1)))

The Singing Square

Visuals

What to draw? How 'bout a nice blue square. The scaling of the rectangle can be modulated by some signal in the audio domain, stored in channel 0.

<<draw>>=
(defn draw []
  (var allports @[0x32 0x72 0x9c])
  (var blue-romance @[0xd2 0xf9 0xde])
  (def scale (monolith/chan-get 0))

  (def size (+ 30 (* 80 scale)))
  (def cx (/ (monolith/gfx-width) 2))
  (def cy (/ (monolith/gfx-height) 2))

  (def x (math/floor (- cx (/ size 2))))
  (def y (math/floor (- cy (/ size 2))))

  (monolith/gfx-fill
   (blue-romance 0)
   (blue-romance 1)
   (blue-romance 2))

  (monolith/gfx-rect-fill x y size size
  (allports 0)
  (allports 1)
  (allports 2)))

Sound

What to squawk? How 'bout a nice filtered square oscillator, whose frequency is modulated by a sinusoidal LFO? A copy of this LFO will be stored in monolith channel 0 to scale the square mentioned previously.

<<singing-square>>=
(define (singing-square)
    (biscale (sine 0.2 1) 0 1)
    (bdup)
    (monset zz 0)
    (scale zz 48 60)
    (sine 6 0.3)
    (add zz zz)
    (mtof zz)
    (blsquare zz 0.5 0.5)
    (butlp zz 1000)
    <<some-reverb>>
)

Oh heck. Let's add some reverb too. Or as John Chowning (allegedly) calls it, "adding some ketchup".

<<some-reverb>>=
(bdup)
(bdup)
(revsc zz zz 0.93 10000)
(bdrop)
(mul zz (ampdb -20))
(dcblock zz)
(add zz zz)

Running and Rendering

So! That's all the basic parts. The scheme file can be rendered with monolith from inside the doc directory with:

<<run-monolith>>=
./monolith -l p/monolith.scm singing_square.scm

Two files will be generated, singing_square.h264 and singing_square.wav.

Encode wav to mp3 with lame:

<<mp3-conversion>>=
lame --preset insane singing_square.wav

Then, stitch things into an mp4 file with ffmpeg:

<<ffmpeg>>=
ffmpeg -y -i singing_square.mp3 \
-i singing_square.h264 \
-vf format=yuv420p singing_square.mp4

Colorspace is manually converted to yuv420 colorspace, because monolith by default saves to the yuv444. yuv444 is much better for pixel-art style videos where every pixel counts, but it's not always supported in video players. yuv420 is used to maximize portability.