Team Members

Mahmoud Mamdouh

Team Leader

Maged Magdy

Team Member

Omar Adel

Team Member

Ahmed Mohamed

Team Member

Supervisors

Dr. Diaa Salama

Professor

Eng. Nada Nofal

Teaching Assistant

Abstract

In the realm of audio enhancement, traditional methods of signal processing are now competing with the innovative wave of deep learning and Generative AI technologies and solutions, which are shortening the gap in audio enhancement in an unprecedented way. This is particularly evident when dealing with vintage recordings from older audio equipment and in challenging noisy environments. Our proposed system embraces a multimodal approach that harnesses the power of generative AI, integrating audio Super Resolution and Retrieval Based Voice Conversion (RVC), to elevate the overall and perceived quality of user-uploaded audio files. Recognizing that superior audio quality is not the product of a single technology, our system integrates a suite of advanced techniques to deliver a truly enhanced auditory experience. It is designed to be accessible across various platforms, ensuring compatibility with a wide range of devices and browsers, and providing an intuitive interface for users with no advanced audio concepts to effortlessly upload and improve their audio content and perform authentic voice conversion.

System Objectives

 Develop a high-fidelity vocal replication system using Retrieval-based voice conversion (RVC) and text-to-speech sound synthesis (TTS), to ensure a natural and superior quality that closely resembles the original recordings.

Employ Audio super-resolution technology (Audio SR) to enhance the bit-depth and sample rate of audio files and recordings, contributing to a significantly improved audio experience.

Enhance the overall and perceived quality of user-uploaded audio files by integrating and devoloping a multimodal approach using generative AI and deep learning technologies, encompassing Super Resolution, and Retrieval Based Voice Conversion (RVC).

Introduce innovative configurations and parameter adjustments to optimize transfer learning between technologies, achieving greater accuracy in audio replication and fidelity, recognizing that superior audio quality requires a combination of techniques rather than relying on a single technology alone.

Develop an intuitive web application designed for simplicity and ease of use, enabling users with no advanced audio concepts to enhance audio quality and perform voice conversions effortlessly.


System Scope

The system shall develop a multi-modal audio enhancement platform that integrates text-to-speech sound synthesis (TTS), retrieval-based voice conversion (RVC), and audio super-resolution (Audio SR).

The system shall employ deep learning, machine learning algorithms, and generative AI techniques for high-fidelity vocal replication and the improvement of perceived audio quality.

The system shall provide an intuitive interface, allowing users to easily upload and enhance audio files, ensuring a seamless experience for those without advanced audio knowledge.

The system shall enable users to share their enhanced audio and videos with others, providing a platform for feedback and ratings on exported files.

The system shall offer additional features such as speech-to-text functionality and enhancements to the sound quality of uploaded videos.

The system shall be accessible across various platforms, ensuring compatibility with a wide range of web browsers.

Documents and Presentations

Proposal

You will find here the documents and presentation for our proposal.

SRS

You will find here the documents and presentation for our SRS.

SDD

You will find here the documents and presentation for our SDD.

Thesis

You will find here the documents and presentation for our Thesis

Document

Presentation

Accomplishments

Publications

Competitions

Competition Title

type here detailss about your participation in the competition.