• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Software (Classic/AI-based) to Translate Audio to Image

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.15/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
Hey guys. I'd like to use the brains of TPU's Forums for a personal project.

The basic idea is this: I write poetry, and I usually go to poetry readings, and all of that. And I found myself thinking on how every poetry read is different, even if everything stays the same. This comes from differences in tone, highs, lows, pauses, in-breaths, and all of those usual speech-related details. Even more so, obviously, if the reader is different.

This got me thinking: is there a way I could use a recording/an audio file to generate an image? The diagram is simple.

Record > Output audio file > Import audio file to an application > the application reads the audio, analyzes frequencies, and uses that to generate an image based on a preset algorithm (if it allows for variable changes when it comes to the image's creation, that would be best) > output. Of course, one could also just analyze the audio and export it in some automated way into text, which is then used as the input for a program that creates the image from it.

I know there are some random fractal generators that create images based on a pretty basic algorithm. I'm thinking something like that, but not fractal-based. I don't know what - if anything - is out there in this regard.

I looked a bit into some AI-based GAANs and all of that, ML-based animations, such as GANBreeder, but I haven't found anything that can generate an imagetic output from an audio input.

I liked the concept from this music visualization work, and this random walkers tutorial which seems like it could create something I like. I also really enjoyed the creations from this generative art tutorial.

I feel like this post is somehow all over the place, but I hope you get the gist of it. Sometimes it's hard to focus with all these ideas swirling around, so, thanks for reading.

1623754742739.png


TL;DR: I want to create images, perhaps like the one on top, from audio files. How?

PS: I have zero coding skills, so I'd be looking at mix and matching different apps, if needed, to achieve the desired effect.
 
Last edited:
I guess simple audio waveforms aren't what you're looking for?
Spectrograms?
 
I ran across a program ages ago, called Mathmatica. Not sure if this fits what you need.

I also found this video which talks a little bit about the program, I didnt watch the whole thing.
 
I guess simple audio waveforms aren't what you're looking for?
Spectrograms?
Hey, kayjay. Audio Waveforms could be what I'm looking for (haven't found a program that generates them in a style I like, though), I could then parse the waveforms through a program (maybe a fractal generator?) that does something with them?


I ran across a program ages ago, called Mathmatica. Not sure if this fits what you need.

I also found this video which talks a little bit about the program, I didnt watch the whole thing.

Hey. It seems interesting, will look into it some more. That's the basis of today's AI art, I guess. It does seem to require coding skills, of which I have, well, zero.
 
Hey, kayjay. Audio Waveforms could be what I'm looking for (haven't found a program that generates them in a style I like, though), I could then parse the waveforms through a program (maybe a fractal generator?) that does something with them?
Like you I also have very little experience in coding or programming, so I'm purely speculating here, but I would imagine Audacity could extract the waveform in some image-based format that some kind of fractal generating algorithm could use.
 
Back
Top