Inkyvoice is an initiative towards teaching Arabic human speech to machines.

Inkyvoices is the first step towards teaching Arabic human speech to machines. Despite there being 25 countries that consider Arabic to be their first language, and approximately 375 million Arabic speakers, the language is still inadequately represented within the NLP domain.

Through this platform, we aim for the level of Arabic NLP to become equal to the English one.

The first step in teaching human speech to a machine is to provide it with enough samples. Inkyvoices is an initiative to gather as much data as possible through the power of crowdsourcing, so that we can push the community to innovate and be creative.

This dataset targets the Arabic speaking community to make their individual contributions to the largest available corpus of annotated audio in Arabic.

Enter email to download
Last update
5.64 KB

What does the Inkyvoice Dataset
consist of ?

The inkyvoices dataset will be composed of texts with corresponding audio recordings. We aim to add more information to the audio to categorise it, and expand the reach of potential dataset uses. The aim is to create a better level of Arabic audio treatment, using the innovative power of a community with unlimited access to a rich and public dataset.

Why Inkyvoices
Potential uses for the بيانات إنكي فويسز

Text to speach

One of the main reasons that pushed us to create the platform is the desire to create a dataset with the ability to generate Arabic text into speech.

Speach to text

Usage is limitless for a generator that can automatically transform transcriptions into subtitles, which is what this dataset will help build.

Audio Classification

The ability to classify and determine the origins, age, gender etc. of the speaker can be valuable in a wide variety of research fields.

Other uses

By making this dataset open and public, we open up for new and innovative tasks for our minds to collectively both envision and realise.

Tell us where you want to go and
we'll help you get there.
Drop us a message at