As of July 1, 2021, there were more than 4.8 billion Internet users on the planet, or 61% of the world’s population. A figure that is constantly increasing, as is the volume of content published. However, no less than 80% of them are only available in ten Western languages, never translated into African languages, not even the most important ones like Swahili. Automatic translations are therefore intended to accelerate access to knowledge for populations who speak neither English nor French, for example.
Ten things to know about Nitin Gajria, Google’s “Mr. Africa”
From Lingala to Oromo
On May 15, Google announced that Bambara (Mali), Ewe (Ghana, Togo), Krio (Sierra Leone), Lingala (Central Africa), Luganda (Uganda, Rwanda), Oromo ( Ethiopia), Sepedi (South Africa), Tigrinya (Eritrea, Ethiopia), Tsonga (South Africa) and Twi (Ghana) could now be translated by Google. Good news, a priori, especially since Google Translate is integrated by default on many third-party sites or major platforms such as Facebook or Twitter. However, this one does not go without alleviating some questions.
We’re adding 24 new languages to Google Translate, the first using a revolutionary machine learning approach called Zero-Shot Machine Translation, in which the model learns a new language without ever seeing the direct translation of it. #GoogleIO https://t.co/5Imnj6ff1E
-Google Google) May 11, 2022
For example, why were these particular languages chosen among the 2000 on the continent? We know very little about what justifies the choice of the engineers of the American firm. However, African resource personalities could help the company to extend this initiative. Before this announcement, other languages of the continent already existed on Google, such as Yoruba, with often approximate translations. Should we continue to integrate new languages or improve existing ones?
But the operation of Google Translate is quite opaque. To offer relevant results, Google absorbs all millions of diverse and varied data, in different fields. But where will he find them when it comes to African languages? Knowing that the speakers of these idioms produce very little writing on the web, are the producers of this data activated by Google compensated at their fair value for their contribution to the tool? Because yes, the tool is free (for the moment) but it is indeed an intellectual contribution to a commercial enterprise. Its turnover in 2020 amounts to 182.52 billion dollars. Our challenge is therefore to revitalize our languages, which have real value, while maintaining control of our data and protecting their integrity.
These data, the source of which is not known, are no longer available in free access for developers who wish to offer concrete tools to populations remote from digital technology because of their language. No public site exists to consult them. However, by integrating new languages, Google, through its commercial and communication power, renders more collective initiatives – whose data is clearly open – invisible. However, there are some who deserved to be supported and encouraged by the international firm.
5G, data, networks… What will Africans’ mobile internet be for in 2025?
Thus, the Idemi Africa collective calls on the firm to make its policy of integrating African languages more transparent, to make a greater collaborative effort with existing players and to make this data accessible. It is a matter of treating people who speak African languages as co-creators of these tools by paying them, for example, and no longer as people whose language is siphoned off.
Language is more than a set of words, it’s a way of thinking and relating to others… Also, as Souleymane Bachir Diagne says, quoting Ngugi wa Thiong’o, translation is “the language of languages” and it deserves to be invested in it with humility and with real human resources.