Posted on: 7 July 2023
Ph.D - Chief Experience Officer
Making Music with HAL
During a recent meetup focused on the application of ChatGPT, a startup presented their new AI-assisted music generation tool. The purpose of this tool is to aid musicians in overcoming writer’s block rather than replacing them entirely.
I won’t delve into the creators’ naivety in assuming their tool would only enhance human creativity and not replace it. It’s like selling sticks and then being surprised when people use them to hit each other. Instead, I’m more interested in discussing how users interact with the AI, in particular the design of prompts.
All AI systems rely on the instructions they receive, and prompt engineering involves designing how users provide those instructions. It entails carefully nudging the input to achieve the desired output. Poorly designed prompt inputs lead to inaccurate or irrelevant responses.
As I listened to the startup’s presentation, it reminded me of one of the projects I worked on as a consultant. About a decade ago, I helped a then startup called WaveDNA create a music generation tool (link to case study). Like this NYC startup, Toronto based WaveDNA also uses an algorithm to generate music samples. It required an initial piece of music, and based on that seed, it could generate numerous variations in different beats, keys and tempos. Fast forward to today, with the advancements in AI, the NYC startup can now generate several samples based solely on the user’s beats per minute, key, and intended emotion.
While I have some knowledge of music theory, I lack experience in composing. However, both approaches for providing prompts to their respective algorithms don’t seem ideal. waveDNA’s method of generating hundreds of variations was overwhelming, and distinguishing between the samples could be challenging to the human ear. On the other hand, the NYC startup’s approach of prompting based solely on key, beats per minute, and emotional keywords can be tricky to use. If you generate samples using the keyword “Happy” and dislike the results, should you try “elated,” “joyful,” or “exuberant” next? It would be difficult for the user to determine which emotional keyword would yield the desired output, leaving them to guess.
Instead, I propose a method that guides users when exploring AI-generated music samples. I drew some inspiration from a simple chart I saw in my son’s grade 1 class. Each day, the kids placed their names on a mood meter to indicate how they were feeling, and then considered what they needed to change in order to reach a target mood state. One axis of the chart represented their energy level (high/low), and the other axis represented their affect level (sad to happy). So, “ecstatic” represented high energy and happiness, while “serene” represented low energy and happiness. Figure 1 below illustrates an example of this emotional mapping to language.
If the AI prompt is implemented in this manner, the user can input a key and beats per minute to generate a hundred easily explorable variations. If something sounds too energetic, they can explore options further down. If it sounds overly happy, they can check out samples on the left. What I appreciate about this approach is that it allows users to explore a wide range of emotions without having to pinpoint the exact emotional prompt and can easily navigate through adjacent samples.
Interacting with AI tools
Hundreds of new generative AI tools are being released, but the manner in which users interact and prompt these tools is still not well understood. The design of the prompt may well end up being the key factor in determining which AI tools succeed and which fail.
Ph.D - Chief Experience Officer
Dan firmly believes that technology must be created with the user in mind. Never shy to critique a bad design, Dan uses the Akendi blog to shine a spotlight on usability mistakes…and their solutions. Leveraging his background in engineering, computer science, psychology, and anthropology, Dan offers a unique perspective on the latest UX trends and techniques.