• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Software (Classic/AI-based) to Translate Audio to Image

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.33/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
Hey guys. I'd like to use the brains of TPU's Forums for a personal project.

The basic idea is this: I write poetry, and I usually go to poetry readings, and all of that. And I found myself thinking on how every poetry read is different, even if everything stays the same. This comes from differences in tone, highs, lows, pauses, in-breaths, and all of those usual speech-related details. Even more so, obviously, if the reader is different.

This got me thinking: is there a way I could use a recording/an audio file to generate an image? The diagram is simple.

Record > Output audio file > Import audio file to an application > the application reads the audio, analyzes frequencies, and uses that to generate an image based on a preset algorithm (if it allows for variable changes when it comes to the image's creation, that would be best) > output. Of course, one could also just analyze the audio and export it in some automated way into text, which is then used as the input for a program that creates the image from it.

I know there are some random fractal generators that create images based on a pretty basic algorithm. I'm thinking something like that, but not fractal-based. I don't know what - if anything - is out there in this regard.

I looked a bit into some AI-based GAANs and all of that, ML-based animations, such as GANBreeder, but I haven't found anything that can generate an imagetic output from an audio input.

I liked the concept from this music visualization work, and this random walkers tutorial which seems like it could create something I like. I also really enjoyed the creations from this generative art tutorial.

I feel like this post is somehow all over the place, but I hope you get the gist of it. Sometimes it's hard to focus with all these ideas swirling around, so, thanks for reading.

1623754742739.png


TL;DR: I want to create images, perhaps like the one on top, from audio files. How?

PS: I have zero coding skills, so I'd be looking at mix and matching different apps, if needed, to achieve the desired effect.
 
Last edited:
Joined
May 28, 2020
Messages
752 (0.53/day)
System Name Main PC
Processor AMD Ryzen 9 5950X
Motherboard ASUS X570 Crosshair VIII Hero (Wi-Fi)
Cooling EKWB X570 VIII Hero Monoblock, 2x XD5, Heatkiller IV SB block for chipset,Alphacool 3090 Strix block
Memory 4x16GB 3200-14-14-14-34 G.Skill Trident RGB (OC: 3600-14-14-14-28)
Video Card(s) ASUS RTX 3090 Strix OC
Storage 500GB+500GB SSD RAID0, Fusion IoDrive2 1.2TB, Huawei HSSD 2TB, 11TB on server used for steam
Display(s) Dell LG CX48 (custom res: 3840x1620@120Hz) + Acer XB271HU 2560x1440@144Hz
Case Corsair 1000D
Audio Device(s) Sennheiser HD599, Blue Yeti
Power Supply Corsair RM1000i
Mouse Logitech G502 Lightspeed
Keyboard Corsair Strafe RGB MK2
Software Windows 10 Pro 20H2
I guess simple audio waveforms aren't what you're looking for?
Spectrograms?
 
Joined
Jul 16, 2014
Messages
8,119 (2.27/day)
Location
SE Michigan
System Name Dumbass
Processor AMD Ryzen 7800X3D
Motherboard ASUS TUF gaming B650
Cooling Artic Liquid Freezer 2 - 420mm
Memory G.Skill Sniper 32gb DDR5 6000
Video Card(s) GreenTeam 4070 ti super 16gb
Storage Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s) 1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s) onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply Corsair HX1000i
Mouse Steeseries Esports Wireless
Keyboard Corsair K100
Software windows 10 H
Benchmark Scores https://i.imgur.com/aoz3vWY.jpg?2
I ran across a program ages ago, called Mathmatica. Not sure if this fits what you need.

I also found this video which talks a little bit about the program, I didnt watch the whole thing.
 

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.33/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
I guess simple audio waveforms aren't what you're looking for?
Spectrograms?
Hey, kayjay. Audio Waveforms could be what I'm looking for (haven't found a program that generates them in a style I like, though), I could then parse the waveforms through a program (maybe a fractal generator?) that does something with them?


I ran across a program ages ago, called Mathmatica. Not sure if this fits what you need.

I also found this video which talks a little bit about the program, I didnt watch the whole thing.

Hey. It seems interesting, will look into it some more. That's the basis of today's AI art, I guess. It does seem to require coding skills, of which I have, well, zero.
 
Joined
May 28, 2020
Messages
752 (0.53/day)
System Name Main PC
Processor AMD Ryzen 9 5950X
Motherboard ASUS X570 Crosshair VIII Hero (Wi-Fi)
Cooling EKWB X570 VIII Hero Monoblock, 2x XD5, Heatkiller IV SB block for chipset,Alphacool 3090 Strix block
Memory 4x16GB 3200-14-14-14-34 G.Skill Trident RGB (OC: 3600-14-14-14-28)
Video Card(s) ASUS RTX 3090 Strix OC
Storage 500GB+500GB SSD RAID0, Fusion IoDrive2 1.2TB, Huawei HSSD 2TB, 11TB on server used for steam
Display(s) Dell LG CX48 (custom res: 3840x1620@120Hz) + Acer XB271HU 2560x1440@144Hz
Case Corsair 1000D
Audio Device(s) Sennheiser HD599, Blue Yeti
Power Supply Corsair RM1000i
Mouse Logitech G502 Lightspeed
Keyboard Corsair Strafe RGB MK2
Software Windows 10 Pro 20H2
Hey, kayjay. Audio Waveforms could be what I'm looking for (haven't found a program that generates them in a style I like, though), I could then parse the waveforms through a program (maybe a fractal generator?) that does something with them?
Like you I also have very little experience in coding or programming, so I'm purely speculating here, but I would imagine Audacity could extract the waveform in some image-based format that some kind of fractal generating algorithm could use.
 
Top