Generative AI tools like DALL-E 2, Dreamstudio, and Midjourney have opened the world's eyes to the possibilities of generating license-free image content for ads, personal projects, and idea visualization. Creators have taken notice, and "prompt engineering" - the study of words and techniques can be used to generate useful images - has taken off.
What about music? Can a YouTuber leverage a text-to-music tool like Midjourney's text-to-image to create an original audio track? Imagine typing "2 minutes of soft jazz with saxophone" and getting a custom-made background track for your next webinar.
I'm the cofounder and CEO of Kapwing, an AI-powered video editor. We help creators generate informational videos about topics, like "how to cook broccoli," using AI, and our technology pulls in background music for the video montage. Kapwing also has a popular embedded tool for DALL-E image generation that creators leverage for stock content, thumbnail backgrounds, and more. We would love to make the same functionality available for music so that you can generate original background music with certain styles and qualities from a text prompt, so we're staying on top of new companies, research, and approaches to this problem.
In this article, I'll cover the exiting tools available for using AI to generate music or audio and how you can best leverage them for useful outputs. Because we know that a demo is very different than a production-ready application, we've only included tools that you can actually sign up for and use, today.
❓MusicLM by Google
This article was partly inspired by Google's Generative AI announcements last week at IO. Unfortunately, I'm not able to access MusicLM, the text to audio functionality, yet, because I'm still on the Test Kitchen waitlist. To access MusicLM, you need to register for Google's "Test Kitchen," a launchpad for its new Generative AI products. The website says "Join the waitlist," and you'll need to answer a few questions about your profession, country, and use case.
Google published a research paper about the underlying technology.
I cannot review MusicLM first-hand yet, but I will share more here about the quality of the generated output as soon as I get off the waitlist.
I was happy to find a Text to Music generator available for free on Mubert. Try out Mubert's Text to Music generator from their website, which won Product Hunt's #1 slot when it launched. I love that creators can set a duration and use the dropdown to switch between "track" and other types of music like "jingle."
However, I was disappointed by the quality of the generated output. I asked for a "low-key track with instrumental jazz and saxophone," but my track does not seem to have sax sounds at all. Most musicians would not match my song with jazz; it sounds more like soft rock.
Also, on the free version, there's a jarring "Mubert" sound bite that interrupts my track every 15 seconds or so.
This product has promise, but the underlying technology still has a long way to go.
AIVA does not yet support text to music, but it is an impressive music to music generative AI product. Starting with the seed file, creators can specify the length and key of a new "Composition" that is generated using AI. They can then edit other aspects of the composition, like the melody, chords, and bass.
Billed as a "Creative Assistant," AIVA also publishes helpful tutorials and demos on YouTube for artists using their product.
Pricing: You can try AIVA out for free and have 3 downloads each month for non-commercial use. To have a full commercial license to the downloaded compositions, creators pay €49/month for an AIVA subscription or €33/month billed annually.
Soundraw does not support text to music or natural language inputs, but the tracks that it generates sound good. Creators must start by choosing a genre and duration and generates dozens of original tracks to choose from. Users can easily customize tempo and other elements from web UI.
Pricing: The premium Soundraw subscription is $16.99/month for commercial license over the music tracks you create.
API: They do have an API interest form on their website for businesses to sign up, but do not seem to have a public API available.
Currently, there are few providers to text to music technology. It seems that the underlying models - and policy debates about copyright and authorization - are still developing. However, with the pace of development in Generative AI and Google's MusicLM becoming more available, I predict that prompt engineering will become a more important field of study for audio producers and sound effects artists in the future.
We will continue to update this article with new text-to-music apps and vendors that we discover in the wild. DM us on Twitter if you have a contribution that we haven't included here!Create content faster with Kapwing's online video editor →