Audio data as video data representation

I doubt that I‘ll find simple answer, but anyway maybe there are some programmers or scientists here who can help or suggest something.

We are working with a video artist who uses his own neural network algorithms to process and morph visual files - videos and pictures.

We are looking for the way to process audio files as well with his algorithms, so we need to “convert“ audio into „video“ format, process it and convert back to hear the results.

So wondering are there some methods to do so? Would appreciate any feedback!

You mean visualization like found in Magix Music Maker and M4L stuff?


I’d imagine the neural net is trained to work on some kind of formatted data, rather than “just a raw stream of bits”… so it makes sense to have a video-like encoding of the audio. Like @Schnork says, a visualisation seems a good idea. Get an old computer and run the audio through Winamp :slight_smile:


No, I mean representation of audio data in a format which can be used to process it with special visual morphing algorithms and then convert results back to audio.

It’s not that easy as simple file converting or WinAmp : )

this area fascinates me, i’ve always found audio -> image representation lacking.

Are we at the stage where there’s a computer powerful enough to track and process all the audio data within a 32-bit floating point bit depth 192 kHz stream in real time?

(sorry i know that’s not what you’re asking, but this application of a neural net algorithm sounds intriguing, i wonder how it will “learn” and translate it’s own processes!?)

1 Like

Given we can mix and affect dozens of channels of this in realtime, I’d says “yes”.

1 Like

audio mixing sure, but translating into visualisation…?

even tracking 20,000 Hz over (x)dB at x ticks a second…
seems like a tall order, bring stereo into the equation…

1 Like

Which format would that be, I mean which format does the algorithm expect?


The author of Sunvox has some software related to this topic, not sure if it will do exactly what you are asking but its kind of interesting :slight_smile:

1 Like

The idea is not that simple as an audio visualization. Actually it doesn’t even presuppose visualization at all : )

So what he does with visuals is: he analyze pictures with neural network algorithms, network finds some patterns and learn “characteristics” of pictures, and after it’s possible to morph between different pictures of adjust their characteristics.

For example he analyze woman faces of different ages and man faces with beards and after he can generate a new woman face, tweak her age as needed and bring a “man beard” elements to it or it morph into man, adjust “age”, “race” or any other characteristics, and it looks really natural, hi end and scaring.

Or he analyzes Van Gogh paintings and Leonardo paintings, neural network finds patterns and after can generate random art which is 40% Leonardo and 60% Van Gogh : )

It’s all super slow and rendering takes dayz : )

So the idea is to use his algorithms and apply them to audio. For example, to make a model of human voice and morph it into elephant talk, or analyze Autechre and Amanda Lear music and generate completely new music which will be 30% Lear 70% Autechre. And it’s not basic mixing, it’s generating new stuff based on neural network analyze.

It’s a bit more complicated, but I’ve tried to explain it as easy as possible.

So, the idea is to interpret audio as visual information. It’s possible to do so with some spectral analyze. I’m looking for some ways and most of them are super scientific and there’s no ready tool for that…

Yes, I know the author of Sunvox… Also sent him a message with this question : )

He also has nice free ANS visual synth. It’s in the same direction, but his algos are more audio related, so he do spectral analyze and interpret audio as spectrogram and after just apply “blur” and other visual effect to process spectrogram and play it as a kind of “visual” scores.

Here we want to get a bit further… Actually it’s quite experimental field and it’s hard to predict what the results will be… That’s why it’s interesting

1 Like

It is indeed interesting, I’d like to hear the result if you find a good solution!

1 Like

It works with any visual files - pics / jpegs or video / mpegs.

But if we just “convert” audio into spectrogram I guess it won’t give anything useful - or it would be just a basic spectral morphing already available in many forms.

here are some examples of similar techniques:

His algos and techinque is more advanced but I don’t want to post it here for some reason… But this examples can give some idea about it.

So the idea is how to let his algos “understand” audio, process it, and convert back to sound.

I guess some kind of “neural network audio analyze” needed… Reading about it now, mostly tools for analyzing voice and text… hmm… easy to get lost…


You could use something like OpenAI Jukebox, that Broccaloo puts to good use:


Convert the audio via Fourier transform into the component frequencies over time. If you want to create a video go ahead, but it would be better to feed the spectral data directly into the neural net.

Reversing the process is a little easier using additive synthesis.

Ask if this isn’t clear to you.

ADDED: If you’d like some other more technical ideas for NetEncoders for audio read this excellent paper from Wolfram:

1 Like

Thanks for this, yeah, after spending last night reading things of same nature I finally starting to see some light : ) for example discovered Fourier transform for the first time : )

Keeping in mind that I’m not programmer and far from being too technical, however understand how digital sound and synthesis works quite well.

The problem for me that no one provides „ready made“ solution with user friendly interface, everything has to be coded.
Another thing is - most of the usages of audio with neural networks related to voice recognition and analysis, some „auto„ sound synthesis, utilitarian things… haven’t seen examples of using it with „artistic goals“ in mind.

So I’ll learn this text tonight and if you wouldn’t mind, ask you things I don’t understand. Still believe this field is quite new and have to be used by artists.

Yeah cool! Will learn how this were done!

Interesting to experiment with it in a more „delicate“ way … like for example, extract some “fundamentals” of ethnic music of a special kind (like Eskimos music or Japanese) and generate midi data based on that analysis to create electronic music with our synths.

Again lack of ready solutions scares me, since I’m not a programmer at all.

Uhm, so I could also not really think of anything else besides fourier transform. Or actually getting a spectrogram – rather than a spectrum – from your audio source.

You kinda want to keep the visualization as lossless as possible, since you want to reconstruct the audio back at some point.

Are there no spectrogram makers online that do this? Google spits out a few results, but I haven‘t tested how reliable these are of course.

What does your AI person say to this?

1 Like

Yeah, you don’t even need to use audio as the input data. It’d be much easier, and possibly more effective, to analyse MIDI data based on recordings of the music you mentioned.

If you don’t mind starting small, it really isn’t hard to get into music-related programming. For the longest time I avoided it because I thought I couldn’t “get” it, but really, it’s just another set of tools that you learn to manipulate in order to get what you want. AI frameworks are a bit of a different thing that I haven’t touched yet, but a lot of the time these things are more about setting parameters than having to think about any of the low-level stuff (again though, I don’t know anything about AI stuff yet, so I may be wrong).

Plus, it has the added benefit of discoveries along the way potentially being more interesting than your original idea.

1 Like

He’s actually very focused on his algorithms (and for good reason - they are really nice, visually results are amazing, he’s one of the top visual artists here, started like 30 years ago… ) and he doesn’t want to rewrite codes or modify his workflow to make it audio friendly, I believe it is not a simple task, more like starting from the scratch.

So basically he says - if you want to process audio with my algorithms, you have to provide your data in a visual format, it’s the only way. And no guaranties that it will be something useful at the end.

Anyways, since I’ve started to learn about AI and NN related to audio, I started to see that there’s a lot more to do than what we’re discussing here… So maybe I could find other paths. But the idea is the same - use AI and NN to analyze audio and morph between different models, process acoustic sounds, create sounds and pieces based on analyzation of other sounds and music pieces…