Software (Classic/AI-based) to Translate Audio to Image

Raevenlord · Jun 15, 2021

Hey guys. I'd like to use the brains of TPU's Forums for a personal project.

The basic idea is this: I write poetry, and I usually go to poetry readings, and all of that. And I found myself thinking on how every poetry read is different, even if everything stays the same. This comes from differences in tone, highs, lows, pauses, in-breaths, and all of those usual speech-related details. Even more so, obviously, if the reader is different.

This got me thinking: is there a way I could use a recording/an audio file to generate an image? The diagram is simple.

Record > Output audio file > Import audio file to an application > the application reads the audio, analyzes frequencies, and uses that to generate an image based on a preset algorithm (if it allows for variable changes when it comes to the image's creation, that would be best) > output. Of course, one could also just analyze the audio and export it in some automated way into text, which is then used as the input for a program that creates the image from it.

I know there are some random fractal generators that create images based on a pretty basic algorithm. I'm thinking something like that, but not fractal-based. I don't know what - if anything - is out there in this regard.

I looked a bit into some AI-based GAANs and all of that, ML-based animations, such as GANBreeder, but I haven't found anything that can generate an imagetic output from an audio input.

I liked the concept from this music visualization work, and this random walkers tutorial which seems like it could create something I like. I also really enjoyed the creations from this generative art tutorial.

I feel like this post is somehow all over the place, but I hope you get the gist of it. Sometimes it's hard to focus with all these ideas swirling around, so, thanks for reading.

TL;DR: I want to create images, perhaps like the one on top, from audio files. How?

PS: I have zero coding skills, so I'd be looking at mix and matching different apps, if needed, to achieve the desired effect.

kayjay010101 · Jun 15, 2021

I guess simple audio waveforms aren't what you're looking for?
Spectrograms?

DeathtoGnomes · Jun 15, 2021

I ran across a program ages ago, called Mathmatica. Not sure if this fits what you need.

I also found this video which talks a little bit about the program, I didnt watch the whole thing.

Raevenlord · Jun 15, 2021

kayjay010101 said:
I guess simple audio waveforms aren't what you're looking for?
Spectrograms?

Hey, kayjay. Audio Waveforms could be what I'm looking for (haven't found a program that generates them in a style I like, though), I could then parse the waveforms through a program (maybe a fractal generator?) that does something with them?

DeathtoGnomes said:
I ran across a program ages ago, called Mathmatica. Not sure if this fits what you need.

I also found this video which talks a little bit about the program, I didnt watch the whole thing.

Hey. It seems interesting, will look into it some more. That's the basis of today's AI art, I guess. It does seem to require coding skills, of which I have, well, zero.

kayjay010101 · Jun 15, 2021

Raevenlord said:
Hey, kayjay. Audio Waveforms could be what I'm looking for (haven't found a program that generates them in a style I like, though), I could then parse the waveforms through a program (maybe a fractal generator?) that does something with them?

Like you I also have very little experience in coding or programming, so I'm purely speculating here, but I would imagine Audacity could extract the waveform in some image-based format that some kind of fractal generating algorithm could use.

System Name	The Ryzening
Processor	AMD Ryzen 9 5900X
Motherboard	MSI X570 MAG TOMAHAWK
Cooling	Lian Li Galahad 360mm AIO
Memory	32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s)	Gigabyte RTX 3070 Ti
Storage	Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s)	Acer Nitro VG270UP (1440p 144 Hz IPS)
Case	Lian Li O11DX Dynamic White
Audio Device(s)	iFi Audio Zen DAC
Power Supply	Seasonic Focus+ 750 W
Mouse	Cooler Master Masterkeys Lite L
Keyboard	Cooler Master Masterkeys Lite L
Software	Windows 10 x64

System Name	Main PC
Processor	AMD Ryzen 9 5950X
Motherboard	ASUS X570 Crosshair VIII Hero (Wi-Fi)
Cooling	EKWB X570 VIII Hero Monoblock, 2x XD5, Heatkiller IV SB block for chipset,Alphacool 3090 Strix block
Memory	4x16GB 3200-14-14-14-34 G.Skill Trident RGB (OC: 3600-14-14-14-28)
Video Card(s)	ASUS RTX 3090 Strix OC
Storage	500GB+500GB SSD RAID0, Fusion IoDrive2 1.2TB, Huawei HSSD 2TB, 11TB on server used for steam
Display(s)	Dell LG CX48 (custom res: 3840x1620@120Hz) + Acer XB271HU 2560x1440@144Hz
Case	Corsair 1000D
Audio Device(s)	Sennheiser HD599, Blue Yeti
Power Supply	Corsair RM1000i
Mouse	Logitech G502 Lightspeed
Keyboard	Corsair Strafe RGB MK2
Software	Windows 10 Pro 20H2

System Name	Dumbass
Processor	AMD Ryzen 7800X3D
Motherboard	ASUS TUF gaming B650
Cooling	Artic Liquid Freezer 2 - 420mm
Memory	G.Skill Sniper 32gb DDR5 6000
Video Card(s)	GreenTeam 4070 ti super 16gb
Storage	Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s)	1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case	Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s)	onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply	Corsair HX1000i
Mouse	Steeseries Esports Wireless
Keyboard	Corsair K100
Software	windows 10 H
Benchmark Scores	https://i.imgur.com/aoz3vWY.jpg?2

System Name	The Ryzening
Processor	AMD Ryzen 9 5900X
Motherboard	MSI X570 MAG TOMAHAWK
Cooling	Lian Li Galahad 360mm AIO
Memory	32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s)	Gigabyte RTX 3070 Ti
Storage	Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s)	Acer Nitro VG270UP (1440p 144 Hz IPS)
Case	Lian Li O11DX Dynamic White
Audio Device(s)	iFi Audio Zen DAC
Power Supply	Seasonic Focus+ 750 W
Mouse	Cooler Master Masterkeys Lite L
Keyboard	Cooler Master Masterkeys Lite L
Software	Windows 10 x64

System Name	Main PC
Processor	AMD Ryzen 9 5950X
Motherboard	ASUS X570 Crosshair VIII Hero (Wi-Fi)
Cooling	EKWB X570 VIII Hero Monoblock, 2x XD5, Heatkiller IV SB block for chipset,Alphacool 3090 Strix block
Memory	4x16GB 3200-14-14-14-34 G.Skill Trident RGB (OC: 3600-14-14-14-28)
Video Card(s)	ASUS RTX 3090 Strix OC
Storage	500GB+500GB SSD RAID0, Fusion IoDrive2 1.2TB, Huawei HSSD 2TB, 11TB on server used for steam
Display(s)	Dell LG CX48 (custom res: 3840x1620@120Hz) + Acer XB271HU 2560x1440@144Hz
Case	Corsair 1000D
Audio Device(s)	Sennheiser HD599, Blue Yeti
Power Supply	Corsair RM1000i
Mouse	Logitech G502 Lightspeed
Keyboard	Corsair Strafe RGB MK2
Software	Windows 10 Pro 20H2

Software (Classic/AI-based) to Translate Audio to Image

Raevenlord

News Editor

kayjay010101

DeathtoGnomes

Raevenlord

News Editor

kayjay010101