"Language diversity on the African continent is both a cultural strength and a technological challenge. With over 2,000 spoken languages, mainstream artificial intelligence tools have largely ignored the vast majority of African languages, creating a significant barrier to the full participation of
...
African communities in the global digital economy. But that’s about to change, thanks to the work of Lelapa AI that recently launched InkubaLM, a groundbreaking AI language model designed for low-resource African languages." (Introduction)
more
"This document is a detailed report of best practices for the digital inclusion of speakers of endangered Indigenous languages. The proclamation of the period between 2022 and 2032 as the International Decade of Indigenous Languages (IDIL 2022-2032), as per Resolution A/RES/74/135 from United Nation
...
s General Assembly, shines a spotlight on the issue and invites the world to pay more attention to the critical and fragile situation of many Indigenous languages. This document aims to provide a blueprint for the addition of Indigenous languages into software in order to aid digital inclusion efforts, and to provide a set of recommended steps to follow in order to successfully achieve it. As new generations of Indigenous people increase their literacy and use of technology, it is crucial that they are able to use their native language in digital formats to avoid the endangerment and loss of language. With the main goals of promoting the written form of a language in a natural vehicle such as technology, bringing awareness to the endangered Indigenous languages and working towards their survival, Motorola has open-sourced over 800,000 translated Indigenous words as of April 2023 through its official website, enabling other OEMs and companies to promote the languages through their interfaces and paving the way for broader use and revitalization efforts." (Introduction)
more
"AI tools, from ChatGPT to Google Translate, are useless to billions of people in the Global South who don't work in western languages. Researchers and startups from Africa and other parts of the world are changing that." (Introduction)
"In the context of the International Year and International Decade of Indigenous Languages, a Global Call for Research Papers was commissioned with the aim to show a diversity of scholarship in the field of Indigenous languages and related issues. The international peer-review team carried out a com
...
prehensive review process of nearly 300 papers from 63 countries, facilitated the final compilation of 38 selected papers by researchers from 26 countries, and the production of 10 articles by peer-reviewers. Overall, this multilingual collection includes 48 articles by researchers and analytical pieces by peer reviewers from 30 countries. This initiative aims to contribute to the creation of favourable conditions for knowledge-sharing and dissemination of good practices on Indigenous languages, as well as growth and development through elaboration of new knowledge." (Back cover)
more
"The report documents the quality and usability of Facebook and YouTube content moderation policies in four languages. It found that the translations of Facebook and YouTube's content moderation policies in these languages are far below a standard that would be considered acceptable by the average u
...
ser. Each of the translations showed numerous and systematic errors that impacted readers' ability to understand the policy without referencing the original version in English. The report also discusses how translation impacts the entire value chain of content moderation and platform governance - from end-users' ability to report content to moderators' ability to detect and remove harmful content and to regulators' knowledge about what content is on the platform. The report includes recommendations for improving translation policies and processes, as well as for advancing best practices in translation across the industry." (Publisher description)
more
"This white paper seeks to unpack the use of Indigenous or non-majority language in the existing digital landscape. This ties into ideas about digital colonialism (Kwett, 2022), wherein hegemonic, or dominant, languages are threatening and jeopardising the ability for local language speakers to expr
...
ess themselves and communicate in digital spaces. We hope to analyse a sample of existing scholarship on digital inclusion to examine how it plays out specifically through the use of local language on social media. We map key issues at work when local languages are used on social media platforms. These may concern issues that build on the theme of the digital divide to raise questions about digital equality, participation, citizenship, belonging and identity. Through this white paper, we aim to understand how the digital onboarding of language may empower, limit, extend and enrich user engagement. We also seek to unpack themes of access, safety and usability that the average user in these contexts may experience when using digital platforms for communication and daily life." (Aim of the paper, page 4)
more
"This book discusses how digital inequalities today may lead to other types of inequalities in the Global South. Contributions to this collection move past discussing an access problem - a binary division between 'haves and have-nots' - to analyse complex inequalities in the internet use, benefits,
...
and opportunities of people in the Global South region. Using specific case studies, this book underlines how communities in the Global South are now attempting to participate in the information age despite high costs, a lack of infrastructure, and more barriers to entry. Contributions discuss the recent changes in the Global South. These changes include greater technological availability, the spread of digital literacy programs and computer courses, and the overall growth in engagement of people from different backgrounds, ethnicities, and languages in digital environments." (Publisher description)
more
"This research aims at finding the current state of (open) voice datasets in Indian languages, including information about their volume, quality, mode of collection, and availability. The present research also explores the challenges for the creation and maintenance of open voice datasets in India.
...
The report makes practice-oriented recommendations for future sustainable voice data collection based on both extensive desk research and expert interviews."
more
"FUNREDES and Union Latine have designed an original research method to measure linguistic diversity in cyberspace. The aim was to use search engines and a sample of word-concepts to measure the proportionate presence of these concepts in their various linguistic equivalences (in Latin languages, En
...
glish and German) in cyberspace. The research, undertaken from 1996 to 2008, has enabled interesting indicators to be built in order to measure linguistic diversity. Additionally, some basic evaluations of the cultural projections associated with these languages (mentioned above) were undertaken. This paper describes the research method and its results, advantages and limitations. It also provides an overview of existing alternative methods and results, for comparison. The paper concludes with the examination of different perspectives in a field which have in the past been considered to have been characterized by a lack of scientific rigor. This has led to some misinformation about the dominant presence of English on the Web. It is a topic that is only now slowly attracting due attention from international organizations and the academic world." (Abstract, page V)
more
"In order to promote and bolster linguistic and cultural diversity in cyberspace, the most underprivileged languages need help to gain access to it. If it is possible to do this with a small, oral, unwritten, endangered language, there is all the more reason why this should be possible with all poor
...
ly endowed languages which are in somewhat better circumstances. The first stage consists in undertaking the necessary studies in order to develop the linguistic resources that are indispensable: a list of phonemes, an alphabet, a spelling system, a grammar, a dictionary and a collection of texts. The second stage involves work on computerization of the language in order to identify or develop compatible IT resources: a character set in at least one font, a virtual keyboard and corpus processing programmes, which may also be used to fine-tool linguistic analysis of the language and enhance its linguistic resources. The third stage consists in developing and adapting cultural resources so that they may be shared in cyberspace. This means recording and digitizing as many text, sound and graphic records as possible and making them ready for posting on websites. It is also necessary to design the various ingredients of a website, such as menus, navigation bars, titles and other texts for human-machine communication. In some cases, it will be necessary to localize programmes in order to develop the language as a working tool and endow it with supplementary IT resources. Finally, it is useful to learn to develop websites in the poorly endowed language, possibly in tandem with a more widely used language. All tools necessary for such training and tools for creating forums and localizing freeware may be found on the Internet. Once it has a website, a forum, a mailing list, IP telephony, music, still photographs and video, the lesser-used language can now be well ensconced in cyberspace, but to survive there, a community capable of using it intensively must be developed. Assistance to local associations in developing such communities will contribute to the promotion and enhancement of the diversity of languages and cultures in cyberspace." (Conclusion, page 45-46)
more