Posted in | News | Machining Robotics

New AI-Based System Helps Convert Text to Audio Clips

Download PDF Copy

Reviewed

Reviewed by Skyla BailyMar 16 2023

According to scientists from the University of Surrey, who are inviting the public to check their new text-to-audio model, Generative Artificial Intelligence (AI) systems will stimulate an explosion of creativity in the music sector and beyond.

New AI-Based System Helps Convert Text to Audio Clip — *Image Credit: Getty*

AudioLDM is a new AI-based system from Surrey enabling users to submit a text prompt which is further utilized to produce an equivalent audio clip. The system has the potential to process prompts and offer clips with the help of less computational power than present AI systems without making a compromise on sound quality or the users’ potential to manipulate clips.

The general public is capable of trying out AudioLDM by visiting its Hugging Face space. Also, their code is open-sourced on Github with 1000⁺ stars.

Such a system could be utilized by sound designers in different applications such as film-making, digital art, game design, the metaverse, virtual reality, and digital assistants for the visually impaired.

Generative AI has the potential to transform every sector, including music and sound creation. With AudioLDM, we show that anyone can create high-quality and unique samples in seconds with very little computing power.

Haohe Liu, Study Project Lead, University of Surrey

Liu state, “While there are some legitimate concerns about the technology, there is no doubt that AI will open doors for many within these creative industries and inspire an explosion of new ideas.”

Surrey’s open-sourced model is constructed in a semi-supervised approach with a method known as Contrastive Language-Audio Pretraining (CLAP). With the help of the CLAP method, AudioLDM could be trained on enormous amounts of audio data in the absence of text labeling, thereby considerably enhancing model capacity.

What makes AudioLDM special is not just that it can create sound clips from text prompts, but that it can create new sounds based on the same text without requiring retraining.

Wenwu Wang, Professor in Signal Processing and Machine Learning, University of Surrey

Wang added, “This saves time and resources since it doesn't require additional training. As generative AI becomes part and parcel of our daily lives, it's important that we start thinking about the energy required to power up the computers that run these technologies. AudioLDM is a step in the right direction."

The user community has made a range of music clips with the help of AudioLDM in various genres.

AudioLDM is a research demonstration project and depends on the present UK copyright exception exemption available for data mining for non-commercial research.

Source:

https://www.surrey.ac.uk/

Download PDF Copy

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback

(Logout)

Public Comment

Private Feedback to AZoRobotics.com

Submit

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.