• Hi and welcome to the Studio One User Forum!

    Please note that this is an independent, user-driven forum and is not endorsed by, affiliated with, or maintained by PreSonus. Learn more in the Welcome thread!

AI Instruments

Gray Wolf

Active member
This thread focuses on AI instruments. Not AI-generated music, but using AI as a sound source in place of traditional VST instruments. I’m intrigued by this emerging frontier.

To be clear, this is not an endorsement of ACE Studio. They simply happen to be implementing AI instruments in a compelling way, which makes it a useful reference point for discussing the technology more broadly. I’m sure other companies are exploring similar approaches.

Of course, it’s 100% your music. You played it. The only thing changing here is the sound source, the instrument generating audio from the MIDI notes you performed. In that respect, there’s really nothing to argue about when it comes to AI in this particular context.

In these cases, it's not triggering samples like a traditional Kontakt library, it's driving a trained neural network model that generates audio in real time based on musical intent. For example: The neural network learns:
  • How a violin sounds at different pitches
  • How it transitions between notes
  • How vibrato evolves over time
  • How dynamics shape the tone
  • How phrasing affects timbre
It learns patterns, not samples.

Here is an example of that with strings where a composer feeds it his midi tracks that were previously playing BBCSO.

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.

Those strings sound pretty good to me, mmv. The solo violin from that model was also pretty impressive:

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.

I'm very curious about the CPU, RAM and latency footprint of a larger collection of neural network instruments like that.
  • Are neural network models perhaps the future of software instruments?
  • Will we at some point have orchestral libraries that don't take up terrabytes of hard drive space and take hours to download?
Thoughts?
 
Yes to both questions, but I've yet to see any part of the software industry that hasn't expanded its products to fit the assumed space available, so I'm sure that AI instrument makers will find a way to use any space freed up by not needing so many samples!

I'm less impressed than this guy by the results he's getting in this test, but I'm sure that - at least in part - that's because we've become used to hearing 'perfect' renditions by VSTs, and getting variations more representative of human players actually generates a less perfect result! There's no doubt that the technology is impressive, and it will undoubtedly get better. I do like that he emphasises that it's a tool for perhaps improving mock-ups, not for creating commercial music.
 
Fascinating topic and something new that deserves consideration. :)

As I explained in more detail in another topic, everything involved with computers, audio hardware like Fender and MOTU external digital audio and MIDI processors, Digital Audio Workstation (DAW) applications, VSTi virtual instruments and their sampled-sound libraries, and VST effect plug-ins is AI in one way or another.

The operating systems (Linux, macOS, and Windows) are AI, as well; and the practical perspective on software engineering is that it also is an AI activity, where the way applications behave is based on the AI provided by Software Engineers (SE), which I suppose makes it SE/AI.

There are different levels of what AI does and is capable of doing, of course; but how much of it one decides to use is a subjective matter, at least depending on how absolutely and restrictively you define "AI Sound Source", since as noted everything in the digital music production universe is AI in one degree or another.

One of the things I have been doing for decades is old-time science fiction radio plays; and for the first few decades I read the scripts for all the characters; but over a year ago, I discovered 11ElevenLabs AI and now am using it for female voices and certain male voices, which I enhance with various VST effects plug-ins, audio clip editing in Fender Studio Pro 8 and Melodyne to make the voices more personalized, tailored, and in some instances alien in outer space ways.

Years ago, I started using Realivox Blue (RealiTone) as a virtual soprano in Kontakt; and she is my favorite virtual soprano and sings melodies composed in music notation using words provided in custom phonetic language scripts that are created via the Realivox Blue GUI running in Kontakt.

For a long time, I did not use highly automated musical ensembles; but starting several years ago, I decided to use a few of what one can call "AI ensembles". I select them and provide the music notation, where perhaps the best example is "Hollywood Pop Brass" (EW ComposerCloud+), which has a virtual festival of different horn section riffs but also can be used to play music notation for individual brass instruments.

Reason (Reason Studios) is MIDI-based; and it has advanced "players" that can be programmed to do all sorts of useful musical activities, including chord patterns, drumkit rhythms, and so forth, all of which can be used in Fender Studio Pro 8 via the Reason Rack Plug-in (VST and VSTi).

My perspective is that AI generally maps to Glenn Miller and Tommy Dorsey using musicians for their songs and Elvis Presley using the Jordanaires for singing backup, which also is like the Beatles using symphonic orchestras for some of their songs and usually were arranged and composed by George Martin, along with other arrangers.

The practical aspect of my perspective is that the way things sound is what matters ultimately.

If it sounds good, then it works here in the sound isolation studio, which considering I have an artificial heart valve and would be dead without it is the rule in the world according to me. (y)

THOUGHTS

The OP topic ". . . using AI as a sound source in place of traditional VST instruments" makes what I think is an incorrect presumption that VSTi virtual instruments are not AI, which I suggest is incorrect because while the notes generally are played by trained musicians using professional quality instruments, the notes nevertheless are recorded, digitized, and enhanced (a) using AI equipment and software and (b) are no less artificial than supposedly "AI sound sources", which themselves are likely to be based on digitized sampled sounds.

Yet, although it appears that one of the distinguishing factors might be the way AI sound sources automagically handle and play what with more traditional VSTi virtual instruments are handled and played either (a) by requiring elaborately articulated, stylized, and dynamic music notation symbols and language--although only when the respective engine supports them--which is vastly and annoyingly visually cluttering and at least when the focus is on providing printed sheet music for use by human musicians cannot be avoided or (b) by using specific sampled-sound library subsets where the instruments are played in specific articulations, styles, and dynamics, which can be enhanced via VST effect plug-ins, where for example in what I call a "diatonically sampled" library only every other note is sampled, this requires non-sampled notes to be computed using logarithmic interpolation and by its nature changes the rate and depth of motion effects like tremolo and vibrato, hence the only way to ensure consistent rates and depth is to use VST effect plug-ins for motion.

Based on the example video, it appears that the AI aspect perhaps maps to a better interpretation of some articulations, styles, and dynamics, although I think the only practical way to determine this is to do a side-by-side comparison with the way specific sampled-sound subsets respond when played by simple music notation, which specifically maps "simple" to providing no articulations, styles, and dynamics in the music notation, which for reference is the way I use music notation.

In this regard, the comparison should exclude using a MIDI keyboards and things like "mod wheels", expression pedals (una corda, sostenuto, and damper), and other onboard controlling devices; because a MIDI keyboard can control things which in music notation are articulations, playing styles, dynamics, and so forth.

Explained another way, it appears that this new style of AI sound sourcing automagically does (a) what is done in music notation by specifying articulations, playing styles, dynamics, and other expression symbols and language--but again only if the respective engine supports it--or (b) by engine-supported sampled sound libraries that respond directly to advanced controls on MIDI keyboards.

One of the relevant example is the Gypsy Jazzy (UVI) collection, which responds to some music notation symbols and language, as well as certain MIDI keyboard controls, as do the Orange Tree Samples SLIDE Lap Steel sample library and engine for Kontakt (Native Instruments) and World Instruments (UVI) for its VSTi virtual instrument UVI Workstation player, where the advanced features of these require either (a) elaborate music notation or (b) using an advanced MIDI keyboard, where the latter is required for doing string bends, slides, and other articulations in controlled and very realistic ways, although they can be used with simple music notation that does not provide specific articulation, playing styles, dynamics, and other information, where for reference I prefer to control dynamics with compressor-limiters rather than with dynamic marks in music notation that are used for pianissimo and fortissimo.

[NOTE: It appears that UVI has stopped providing Gypsy Jazzy in its subscription and as a a single product, but it appears to be included in Sonic Bundle. Gypsy Jazzy continues to work nicely in UVI Workstation player, which is all I know about its status at present, other than I like it and actually use it for Gypsy violin. I have a full license for Gypsy Jazzy, so it works in UVI Workstation. ]

If I want it loud, then I crank it up; but if I want is not so loud then I either (a) crank it down or (b) remove it based on the idea that if you cannot hear it, then it's noise and probably is not necessary.

Can the apparently new "AI sound source" technology do this?

It appears that it does this, although doing it probably requires some user intervention at least to provide guidance on what the AI should be doing.

And if you connect a few relevant dots, I think Tonalic (Celemony) at least does some of this, which it appears it can and therefore makes it one of the new "AI sound source" technologies for doing more while requiring less elaborate music notation and fewer MIDI keyboard activities?

SUMMARY

This looks to be intriguing; and while I do not distinguish it so much conceptually from strategies which already exist--including automated automagical ensembles like World Instruments (UVI)--the new "AI Sound Source" technology certainly appears to do more of the desired work automagically.

Lots of FUN (y)

[NOTE: I wrote "Chapter 1: On The Asteroid" in the late-1970's and did the first version about two decades ago with music using a 1999 Fender American Deluxe Stratocaster and an Alesis ION Analog Modeling Synthesizer (for outer space sounds). Last year, I started updating the older chapters using more elaborate music, AI ensembles, and 11ElevenLabs AI voices for some of the characters. Whether any of the AI sounds are controlled solely by AI depends on the way one defines "solely by AI", since I used VST effect plug-ins to control volume levels and a few of the motion effects. If one allows audio-engineering, then the AI sounds are not done "solely by AI", but I think one can be too "purist" in that regard. If the AI sounds are done primarily by AI, then this is consistent with my ongoing series of old-time science fiction radio plays, "Extreme Gravity", where the strategy is to compose a script for the AI voice; upload it to the 11ElevenLabs AI website, where it is generated to audio based on the AI Voice you select; and then I download the resulting audio clip and modify it using VST effect plug-ins, FSP8 audio clip editing, and Melodyne (Celemony). :)]

ElevenLabs AI Voice Library

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.
 

Attachments

  • SW-Realivox-Blue-Kontakt.jpg
    SW-Realivox-Blue-Kontakt.jpg
    205,1 KB · Views: 2
  • SW-External-Digital-Audio-MIDI-Interfaces.jpg
    SW-External-Digital-Audio-MIDI-Interfaces.jpg
    211,4 KB · Views: 2
Last edited:
You're casting your AI net very wide. In general AI is associated with autonomous learning, reasoning, problem-solving, perception, and decision-making (Wikipedia). A system that by itself can't deliberately go beyond its initial programming (e.g. an operating system) is not considered to have artificial intelligence. Makes AI a bit more special ;)
 
You're casting your AI net very wide. In general AI is associated with autonomous learning, reasoning, problem-solving, perception, and decision-making (Wikipedia). A system that by itself can't deliberately go beyond its initial programming (e.g. an operating system) is not considered to have artificial intelligence. Makes AI a bit more special ;)
I cast a wide net; and I think it's justified. :)

Rollback the clock over three decades and AI was called "Knowledge Engineering" and there were programming libraries and tools that software engineers used to create knowledge systems that ran in Windows, and other operating systems.

There were distinct steps, but there also are distinct steps in AI using current software engineering strategies, rules, and all that stuff.

Consider my Apple Watch Ultra for a moment . . .

It monitors my sleep, heart, and lots of other stuff, including doing real-time EKG and warning me if I might be having atrial fibrillation, which happens every once in a while and my cardiologist says it's not a problem. I take regular strength Bayer aspirin and don't worry about it.

I think my Apple Watch Ultra is an AI device; and I think my iMac is an AI device, as is Fender Studio Pro 8 and all the VSTi virtual instruments and VST effects plug-ins.

Yet, as you expressed, considering everything--hardware and software--to be AI is all inclusive; but I think it helps folks to understand (a) that none of it is magic and (b) at least so far it has not publicly declared itself to be conscious, although how would we know and would it tell us?

For reference, the same terminology can be applied to songs, arranging, composing, performing, producing, and audio-engineering.

In this context, I suggest songs at a quantum level have phonons--tiny packets of waves similar to photons--that have been proven to be able to become entangled with photons, which then leads me to hypothesize that since the human mind is composed to quantum particles, waves, and all that stuff, it can become entangled with carefully designed and created phonons in songs.

I call this "Quantum Sonic Entanglement (QSE)"; and I think it explains the observed fact that certain songs have snippets of sounds which become entangled with the minds of listeners, where for example if you listen to "Billie Jean" (Michael Jackson) with studio-quality headphones like SONY MDR-7506 headphones (a personal favorite), when your hearing is trained sufficiently, you will be able to hear and to count over 150 of what I call "QSE Sparkles", which are all the hiccups, oohs, ahhs, and other vocal sounds Michael Jackson makes, as well as all the short and rapid instrumental sounds Quincy Jones and Michael Jackson had the musicians make, including some orchestral sounds and Latin percussion instrument sounds like maracas, guiros, clickers, and so forth.

I suggest this is real and tangible, if only because it takes time to design, perform, record, edit, and audio-engineer all those sounds.

It's not something that was done in an hour or two; and I think it's was planed and rehearsed in advance the same way as Michael Jackson taught himself how to moon walk.

Elvis Presley had a nervous tic where he would hiccup; but rather than be embarrassed by it, he noticed that hiccuping made the teenage girls in the audience go berserk; so while at first it was a nervous tic, he learned how and when to hiccup on demand, which I think is the way Michael Jackson discovered the value of hiccuping in songs.

[NOTE: I like to explain QSE Sparkles and the general concept of phonons in songs becoming entangled with the listeners' minds as a type of Hermann Hesse's "Glass Bead Game", where the spin aspect is focused on where a sound tends to move. If it has left spin, then it tends move left, but with right spin it tends to move right. These are things the producer and audio-engineer control and manage; and they usually are done with various types of panning, although tremolo and vibrato are two more ways sounds can be put in motion. For folks who might think this is the goofiest hypothesis ever, I ask one simple and profound question, "Why are there over 150 QSE Sparkles in 'Billie Jean' and why does Elvis hiccup in his songs?" It might have been a bit of serendipity the first time or two it happened, but then it was moved to something done intentionally by design that took hours of recording and studio time to achieve. It could be a simple mistake like the singer coming in too early after the lead guitar solo in "Louie Louie" (The Kingsmen); and it could be vast serendipity that the singer mumbled and slurred the lyrics to such an extent that after two years the FBI gave up trying to determine whether the actual lyrics were as "naughty" as teenagers at the time believed-- where for reference they are "naughty" and every teenage garage band knew them--but (a) stuff like that happens and (b) I think it can be done productively by design and intent, where another example is the Beatles singing "Yeah, Yeah, Yeah" in " She Loves You", which is the primary time they did it so clearly, yet it was monumental and definitive, hence I suggest it required design and intent, as well as rehearsing, producing, and audio-engineering.]

Lots of FUN

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.
 

Attachments

  • SW-Quantum-Sonic-Spin.jpg
    SW-Quantum-Sonic-Spin.jpg
    110,1 KB · Views: 1
  • SW-Glass-Bead-Complex.jpg
    SW-Glass-Bead-Complex.jpg
    41 KB · Views: 1
Last edited:
It depends on the definition of AI I guess, which may be different depending on where you live. But somehow a machine that correctly deduces, from me pressing a button, that I want coffee shouldn't qualify as AI when it is a coffee machine ;)
 
It depends on the definition of AI I guess, which may be different depending on where you live. But somehow a machine that correctly deduces, from me pressing a button, that I want coffee shouldn't qualify as AI when it is a coffee machine ;)

I knew I was doing something wrong!
 
Hence, my often used gum-ball machine analogy. Anywhere there's an algorithm, there's pratically someone claiming "AI". There's so much slight of hand because there's a market for it. So for the purposes of what's been presented here, sure. Acoustic and dynamic variability, sympathetic vibrations, and any way to gain realism of an instrument or timbre is likely a good thing. Its just that this has been continually going on as development anyway. So to drop the idea of AI is formulating something must further be explained.
JMO, but once a person has requested some whatever known artist to sit in on their song to create some line they couldn't come up with themselves, then they've just basterdized their product. Some of you may not agree with that, but then I wouldn't care anyway.
 
It depends on the definition of AI I guess, which may be different depending on where you live. But somehow a machine that correctly deduces, from me pressing a button, that I want coffee shouldn't qualify as AI when it is a coffee machine ;)
One way to ponder the thought is to consider two lighting devices, (a) a simple incandescent light and (b) a LED light, where excluding wandering into the universe of quantum mechanics, I think (a) is just a light bulb and generally is not an AI device.

On the other hand, if (b) has any type of processor and requires an algorithm, then it could be an AI device, although that depends on how the LED works and what it does.

Similarly, if a coffee machine is simple like the electric coffee makers from the 1930's onward and perhaps a bit earlier (an "electric percolator"), then it's probably not an AI device; but what about a Keurig pod device that makes coffee based on various rules and configurations?

I suggest the best way to answer the Keurig AI question is to ask Google AI, which clearly indicates the Keurig device is an AI device and among other things some of them respond to voice commands and interact with Amazon Alexa to order refills and to explore new features and trends.

Google AI suggests there are "AI driven LEDs", perhaps currently more for changing colors to adjust to moods or for entertainment; but whether a light-emitting diode (LED) by itself is an AI device wanders into the wondering whether a simple incandescent light bulb can be considered to be an AI device when one includes considerations for the quantum mechanics aspects of particle physics, where as Richard Feynman explained in one of his lectures, in the quantum world it's a matter of probabilities and while most of the possibilities cancel, leaving only what actually happens, it's a bit strange in the sense of the classic 1935 Schrödinger's cat Gedankenexperiment and what Einstein over a decade later inn 1947 described as "spook action at a distance" when referring to quantum entanglement, where the perhaps "spooky" part as it relates to my hypothesis of "Quantum Sonic Entanglement" can describe what happens when a song is broadcast by a radio station and then heard hundreds of miles away by a listener who for some real but perhaps so well understood reason has a crush on Elvis Presley or becomes a fan of the Beatles, all from listening to a song.

I suggest that people tend to remember certain events very clearly, and I think one such event is the first time they hear a Beatles song and "get it", which for me did not happen the first time but happened a few listens later, and the song was "She Loves You", which I bought as a 45 RPM record and still have (Swan label).

I clearly remember being mesmerized in 1957 by "Great Balls Of Fire" (Jerry Lee Lewis), which was the first record I bought (a 78 RPM record).

"Walk Don't Run" (The Ventures) is another song that mesmerized me; and I would play it over-and-over trying to discover how to play the drumkit rhythm pattern and what I now call "straight" snare drum rolls (as contrasted to Big Band and Symphonic drum rolls, which are entirely too "blurry"). Google AI says it's a "buzz" roll, which makes it a Big Band and Symphonic drum roll; but I disagree and never heard it that way and never played it that way at the time. In retrospect, I think I played the snare drum rolls "straight" like a machine gun because I never had drum lessons and didn't know how to do "buzz" rolls. Basically, other than singing soprano in a liturgical boys choir, I taught myself everything else mostly by listening to records and practicing until I could do what was on the records, which at first required playing records at half-speed for lead guitar but at double-speed to identify bass parts, where first I taught myself how to play bass, since at double-speed the bass parts sounded like soprano singing, which I could do intuitively by ear; and then when I started teaching myself how to play lead guitar, I played the lead guitar solos at half-speed, which made them sound like bass parts and by that time I knew bass, so identifying lead guitar parts was easier and more intuitive, where the difficult part was discovering how to play everything rapidly, which mostly was a matter playing phrases over-and-over for hours at a time until I could play them naturally without needing to think about it, something that literally and physically requires rewiring the brain (a) to create new neural pathways between the Frontal Eye Fields (FEF) and Auditory Cortex and (b) to develop the ability to let the unconscious mind ("id" in Freudian terminology) run the show for a while, since conscious mentation ("ego" and "superego") is too slow, at least if you expect to be able to compose and to play a note every 50 to 100 milliseconds in real-time on the fly, which basically maps to developing what folks call "muscle memory".

Curiously, Ringo Starr plays snare drum rimshots in an occasionally "buzzy" way and at times does "buzz" rolls, although it's more subtle in a filler sort of way to play something not so loud in the space between the distinct snare drum notes; but even then I hear it as the type of snare drum rimshots Charlie Watts played at certain times, although I am thinking about experimenting with adding some "buzzing" to the starts and tails of snare drum rimshots, which is easy to see Ringo Starr playing in early Beatles concerts; but adding "buzzing" to the tails is easier than to the starts, since for "buzzing" tails it's just a matter of letting the drumstick bounce when playing it on a real snare drum but is a bit more difficult when doing it with music notation and a virtual snare drum. It might be easier to do with a rapid, repeating echo; but it's something I am thinking about exploring.

For reference, Charlie Watts had a brilliant way of playing certain snare drum rimshots where he only played the snare drum rimshot and nothing else for that instant in time. It looks awkward as you can see in the YouTube music video; but I think it's something he discovered; and for that instant in time, it propels him to the front of The Rolling Stones and makes him the leader; so for that instant in time, it's Charlie Watts and The Rolling Stones. (y)

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.
 
Last edited:
Hmm, then a pencil would be intelligent too: You push one end and the other end converts your thoughts to text. I think we can agree that's far-fetched (as are the lightbulbs).:p
 
Hmm, then a pencil would be intelligent too: You push one end and the other end converts your thoughts to text. I think we can agree that's far-fetched (as are the lightbulbs).:p
Ask Google AI, "Are LED lights AI?"

"LED lights are not inherently AI, but they are increasingly integrated with Artificial Intelligence to create smart, adaptive, and energy-efficient lighting systems. These AI-driven LEDs use sensors, machine learning, and connectivity to adjust brightness, color, and timing based on user behavior, occupancy, and environmental data."

Lots of FUN 💡
 
I'm not seeing the lightbulb analogy either. I mean, these are incremental developments that man has achieved using different alloys, filaments, and such so lets give a little credit to actual human intelligence and milestones on its own merit. For example, if any type light turns on due to motion sensing. That's not AI. No matter how one spins it. My Keurig coffee machine isn't AI. We're getting ahead of ourselves. ; )

Why doesnt it surprise me this thread will run in all directions irregardless of what the OP (respectfully) tried to keep on the rails.
......lets see.
 
I'm not seeing the lightbulb analogy either. I mean, these are incremental developments that man has achieved using different alloys, filaments, and such so lets give a little credit to actual human intelligence and milestones on its own merit. For example, if any type light turns on due to motion sensing. That's not AI. No matter how one spins it. My Keurig coffee machine isn't AI. We're getting ahead of ourselves. ; )

Why doesnt it surprise me this thread will run in all directions irregardless of what the OP (respectfully) tried to keep on the rails.
......lets see.
I'm just the messenger. ;)

Ask Google AI . . .

Google AI says some LED lights and Keurig coffee machines are AI devices depending on their specific capabilities and behaviors.

Some of the newer Keurig coffee machines can "talk" to Amazon Alexa--itself clearly an AI entity--and order refills and do other stuff automagically.

Considering it has been suggested that I am AI, well it's . . .

Lots of FUN :)
 
Last edited:
I'm just the messenger. ;)

Ask Google AI . . .

Google AI says some LED lights and Keurig coffee machines are AI devices depending on their specific capabilities and behaviors.

Some of the newer Keurig coffee machines can "talk" to Amazon Alexa--itself clearly an AI entity--and order refills and do other stuff automagically.

Considering it has been suggested that I am AI, well it's . . .

Lots of FUN :)
Spoken like a true AI! :) I have a question for you, just wondering, are you a real person or are you an AI bot? Just asking no offense intended. I'm just courios to know.
 
I'm just the messenger. ;)

Ask Google AI . . .
"Ask Google AI" ?
Here's a thought. If I am any form of intelligence, in this universe, I know that I wouldn't want to have my species be coined as artificial in front of our level of intelligence. That phrase was pigeonheld by humans. That said, I'm perfectly fine with calling ninety percent of humans "artificial humans". Seems fitting from that perspective.

* * * * *
French philosopher René Descartes, who coined the phrase “I think, therefore I am,” laid the foundation for modern philosophy, asserting the certainty of one's existence through the act of thought.
 
Forget about analogies, metaphors, and similes involving light bulbs, coffee machines, operating systems, DAW applications, VSTi virtual instruments, VST effect plug-ins, and all that stuff. :)

I explored the ACE Studio website; and it's similar to what ElevenLabs provides.

I use ElevenLabs AI voices for some of the characters in my ongoing series of old-time science fiction radio plays; so I have experience using this stuff.

ElevenLabs and ACE Studio are cloud-based services and require an Internet connection for the services they provide.

In the case of ElevenLabsI, I have a monthly subscription which costs $22 a month; is billed in advance month-by-month; and does not require a contract.

If I need a voice for a female character, then I compose a script for what I want her to say and then upload it to 11ElevenLabs.

ElevenLabs then generates the voice and I download the MP4 audio file of the generated voice.

I studied the information at the ACE Studio website; and it's very similar to the way ElevenLabs works; but is focused more on generating music than generating voices.

FINANCIAL ASPECTS

The ACE Studio "Artist Pro" subscription costs $22 a month, which is the same as ElevenLabs for its "Creator" subscription; but ElevenLabs also has a "PRO" subscription for $99 per month; and ElevenLabs has Business subscriptions, which are more expensive but provide more minutes and other features.

For what I am doing, the "Creator" subscription is all I need.

ACE Studio uses a two-year "rent-to-own" plan, where after subscribing and paying for two years, you are granted a license to use the cloud-based service.

At $22 per month, this is $264 for each of the two years for a total of $528 (USD).

My perspective based on subscribing and using ElevenLabs AI voice services is that it's worth $22 a month; because it creates male and female voices that I can use to enhance the voice-overs I do myself. If I need a female voice for spoken word in one of the chapters of my old-time science fiction radio play, then I this is what ElevenLabs provides based on the voice actor I select and the script I provide, where the script is simple text with relevant punctuation, for example "Look at the monitor and chart the trajectory of the asteroid."

Then, I edit the audio clip in Fender Studio Pro 8 (FSP8) to adjust the timing and volume level; and I process the edited audio clip with a few VST effect plug-ins, which can include processing with Melodyne to change the vocal pitch and other characteristics, all of which takes the raw audio generated by ElevenLabs and tailors it to what I desire.

THOUGHTS AND SPECIFICS

In all these new AI services, I think it makes sense to focus on specific details and the way things happen.

For ACE Studio, the marketing blurb is that everything is done in what I call an automagical way which does not use traditional VSTi virtual instruments and sampled sound libraries and VST effect plug-ins.

However, ACE Studio partnered with EastWest in January 2026; and since I have been using EastWest instruments and sounds for years, I know what EastWest does and what EastWest can provide, which basically is technology and sampled sounds for virtual instruments and virtual voices.

ACE Studio Partners with EASTWEST Sounds

This is fine with me; and it looks to provide ACE Studio with oodles of sampled-sound libraries, since at present EastWest has over 40,000 virtual instruments and sounds.

Regarding using ACE Studio, I watched the YouTube video; and the process begins by the user providing MIDI sequences and various information to tell ACE Studio what the user desires to happen.

In this respect, ACE Studio is not psychic, hence needs information from the user to develop an accurate sense of what the user desires.

This ACE Studio web page provides a short, high-level overview of the AI Violin and how to use it; but it does not explain how to provide the information necessary for ACE Studio to determine how to do the desired articulations, playing styles, dynamics, and all that detailed music notation, VSTi virtual instrument, and VST effect plug-in stuff.

AI Violin (ACE Studio)

Yet, although I do not have a subscription for ACE Studio, I am certain the user needs to provide more information than just "I like violins".

As noted, neither ACE Studio nor ElevenLabs is psychic.

If anyone needs a clue, then ask Amazon Alexa, Apple Siri, Google AI, or ChatGPT, "What am I thinking?" 🤪

ChatGPT

Based on the information on the AI Violin at the ACE Studio website, it needs to be provided with a MIDI sequence to generate the violin audio; and in the same or similar way as Tonalic (Celemony), I think it needs more information to know what to do.

SUMMARY

If it's possible to try ACE Studio for one-month only at $22 without being obligated to continue the service and without any cancellation fees or penalties, then I might try it to develop a perspective on how it works and what it does.

I do everything with music notation in FSP8; but as demonstrated in the YouTube video I made earlier this morning (Sunday February 8, 2026) to help a forum member learn how to use the FSP8 Arranger Track, Signature Track, and Tempo Track to specify various time signatures and tempos for the sound track of a film, it's easy in FSP8 to switch from (a) music notation to (b) MIDI and vice-versa; hence the ACE Studio requirement for MIDI is not a problem for me.

Since at minimum, I have to provide MIDI or music notation, I am not certain what advantage using ACE Studio can provide, especially since I do not specify articulations, playing styles, dynamics, and all that visually cluttering music notation nonsense, preferring instead (a) to use sampled-sound libraries where the trained musicians are playing in the desired articulations and playing styles, (b) to use VST effect plug-ins to control dynamics, and (c) for what I call "diatonically" sampled-sound libraries where only every other note is sampled, which requires the in-between notes to be computer-generated using logarithmic interpolation, I use dry samples and then add motion effects like tremolo, vibrato, and echoes via VST effect plugins, FSP8 Automation, Melodyne (Celemony) and whatever else avoids having motion embedded in the sample sounds, which is based on the rule that it's easy to add stuff to audio but nearly impossible to remove when it's embedded in the audio.

Lots of FUN :)

P. S. If this is too much information and too many words, then it's not my fault. I do this every day somewhere and have been writing tens of thousands of words every day for decades. I enjoy touch-typing and composing prose in real-time on the fly. It's one of the things I do; and I hope it helps folks, but regardless (a) it helps me and (b) it's a key aspect of the way I learn new stuff.

I hope the information in this post is helpful! (y)

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.
 
Spoken like a true AI! :) I have a question for you, just wondering, are you a real person or are you an AI bot? Just asking no offense intended. I'm just courios to know.
Thanks for asking! :)

As I explained in detail in another topic, when I was in the 5th grade I liked to crawl around underneath the class room during the hour the teacher was explaining how to multiply single integers.

After a week or two of "sneaking" out of the class room and crawling around underneath, one day the teacher asked me if I liked to solve puzzles; and I to told her I like solving puzzles; so she asked me to help a fellow in another class room who as having trouble solving puzzles, which I did.

During the hour of multiplication instruction, the teacher would tap her foot on the floor to let me know when there was a Pop Test; and I would "sneak" back into the class room and take the Pop Test, always making a perfect score.

All I knew at the time was that a week or two after helping the fellow solve all his puzzles, the teacher told me I could crawl around underneath the class room any time I desired and that I did not need to "sneak" out.

Two decades later, I told with my sister, who is physician, that I think I am smart.

Her reply was, "Didn't mother tell you about the I. Q. Test in the 5th grade?"

And I told her "No, nobody told me about it, but I remembered helping a fellow who was having a hard time solving a bunch of puzzles."

My sister told me that the result of the I. Q. Test in the 5th grade was that my I. Q. was "well over 170".

It's very specialized intelligence; and at times I am more like Forest Gump than Albert Einstein; but that's the way it works for me.

After the first week in elementary school I was so traumatized by the German NAZI teacher that I developed a severe stutter; and it took me nearly 15 minutes to say "I want an apple".

The school sent me to the University of Texas Speech Pathology Laboratory to see if they could determine why I was stuttering; but the folks there were very smart and were not mean; and I felt so comfortable and relaxed that I did not stutter, at all.

At the end of the evaluation week, I went back to elementary school and told the principal, "T-T-T-T-T-heeeeyyyy saaaaaaaaiiiiiid I-I-I d-d-d-d-d-ooooooo-nnnnnnnn't ssssss-ttttttt-uuuuuuuu-tt-er."

Fast forward a decade or so; and I psychoanalyzed myself and worked through being traumatized as a child and for the most part stopped stuttering, although if I get stopped by the police, I start stuttering intentionally, since it usually makes them feel sorry for me and usually not give me a ticket.

FACT: I am not an AI bot. I'm just very smart in a few ways, one of which maps to being able to write this post at touch-typing speed (70 words per minute on average).

I learn by writing and drawing diagrams; and this is one of the things I do, along with composing music and writing science fiction radio plays and books on music.

I also taught myself how to play bass, rhythm guitar, lead guitar, and drums; but the most FUN of all was the experiment I did to determine whether I could teach myself how to play grand piano solely by directed-dreaming and watching videos of Liberace, Floyd Cramer, Chico Marx, and John Lennon to study and memorize the motions they made playing piano and a Vox Continental Organ.

This experiment took about 20 years during which time I dreamed about playing grand piano and once a year actually did a test where I played grand piano for a few minute, usually with not stellar results.

Progress was slow until I got my first drumkit and then had the epiphany that the keys of a grand piano are like tiny drums, cymbals, and Latin percussion instruments; hence since I could play drums, and I knew a good bit of music theory, I should be able to play grand piano like it was a drumkit.

The YouTube music video is a recording of me composing and playing an 88-weighted key KORG Triton Workstation with a Grand Piano and synthesized fog preset, which I played to accompany a set of rhythm guitar chords, bass line, and drumkit that I already had recorded, hence knew the key and chords.

Everything you hear for the grand piano was composed and played in real-time on the fly on the first take; and for reference I played it through cascading echo units, because I discovered that with a bit of practice you can play melodies in the future once you get the echo units primed, which is a nice way to play harmony without need to do overdubbing.

It's obvious none of the grand piano was composed in advance and written to sheet music; because it would be vastly difficult to transcribe all that stuff, including the glissandi that I copied from Jerry Lee Lewis and his 1957 hit song "Great Balls Of Fire".

For a while, I thought the grand piano playing was just a bunch of what for electric guitar colloquially is called "chickin pickin"; but then I discovered Arnold Schönberg's "Twelve tone technique" which some folks suggest is total garbage, except other folk study it for years in colleges and other folks enjoy listening to it. 🤪

And for reference, if you play drums and have a good ear for music, then I just taught you how to play grand piano once you develop the required finger dexterity and rewire your brain (Frontal Eye Fields and Auditory Cortex) to be able to play as many as 40 notes per second while you have suspended all conscious thought and have allowed your unconscious mind to run the show, since the amazing thing about the unconscious mind is that it's like a cloud computer which knows everything you have learned, even though consciously you forgot nearly all of it.

". . . as many as 40 notes per second"?

Tap the four fingers of each hand (index, middle, ring, and pinky) rapidly at a diagonal so they are tapped in sequence rather than all at the same time, alternating from right hand to left hand; and then practice this until you can do it five times in one second. When you can do that, you are tapping 40 times per second; and it sounds like the low-pitch "E" string of Paul McCartney's Höfner Violin Bass.

The knowledge is there; and the key to accessing it is to learn how to let your unconscious mind run the show, where remembering to breath and not to drool is important when you are suspending nearly all conscious thought. (y)

Lots of FUN :)

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.
 
I think that information could have been consolidated.

Most of us know East West, and its huge assortment of library sampled sounds.
Pay the $22, and see what you come up with or for Ace Studio. Post it, even. We can determine (more importantly, you) as to how real it is or ultimately if it works for you. In the final analysis, if you like it, that is what is important.

Respectfully: If you dont mind, it might be more helpful to trim your responses down in size. Thanks.
If you have to go to such lengthy responses, I honestly dont have the time to read them. Nor watch the videos (some posts, being numerous videos). Just sayin'
😊
 
Back
Top