Decolonise the language of science
Many words common to science have never been written in African languages. Now, researchers from across Africa are changing that.
SARAH WILD / NATURE MAGAZINE
There’s no original isiZulu word for dinosaur. Germs are called amagciwane, but there are no separate words for viruses or bacteria. A quark is ikhwakhi (pronounced kwa-ki); there is no term for red shift. And researchers and science communicators using the language, which is spoken by more than 14 million people in southern Africa, struggle to agree on words for evolution.
IsiZulu is one of approximately 2 000 languages spoken in Africa. Modern science has ignored the overwhelming majority of these languages, but now a team of researchers from Africa wants to change that.
A research project called Decolonise Science plans to translate 180 scientific papers from the AfricArXiv preprint server into six African languages: isiZulu and Northern Sotho from southern Africa; Hausa and Yoruba from West Africa; and Luganda and Amharic from East Africa.
These languages are collectively spoken by around 98 million people. Earlier this month, AfricArXiv called for submissions from authors interested in having their papers considered for translation.
The translated papers will span many disciplines of science, technology, engineering and mathematics. The project was launched a year ago by philanthropic and government funders from Europe and North America, and Google.
Left behind
The lack of scientific terms in African languages has real-world consequences, particularly in education. In South Africa, for example, less than 10% of citizens speak English as their home language, but it is the main teaching language in schools — something that scholars say is an obstacle to learning science and mathematics.
African languages are being left behind in the online revolution, says Kathleen Siminyu, a specialist in machine learning and natural language processing for African languages based in Kenya.
“African languages are seen as something you speak at home, not in the classroom, not showing up in the business setting. It is the same thing for science,” she says.
Eventually, Decolonise Science aims to create freely available online glossaries of scientific terms in the six languages, and use them to train machine-learning algorithms for translation. The researchers hope to complete this project by the beginning of 2022. But there’s a wider ambition: to reduce the risk of these languages becoming obsolete by giving them a stronger foothold online.
Permeate
Decolonise Science will employ translators to work on papers from AfricArXiv, says principal investigator Jade Abbott, a machine-learning specialist based in Johannesburg, South Africa.
Words that do not have an equivalent in the target language will be flagged so that terminology specialists and science communicators can develop new terms.
“It is not like translating a book, where the words might exist,” Abbott says. “This is a terminology-creating exercise.”
The team will offer its glossaries as free tools for journalists and science communicators, as well as national language boards, universities and technology companies, which are increasingly providing automated translation.
“If you create a term and it isn’t being used by others, it isn’t going to permeate into the language,” says Biyela.
Masakhane’s researchers say that global technology companies have historically ignored African languages, but in recent years, they have begun funding research in the field.
“We’re aware that the many thousands of African languages are currently under-represented in translation software,” a Google spokesperson told Nature.
The tech giant wants to expand Google Translate to include more African languages.
However, it needs “speakers of those languages to help us improve the quality of our translations” so they can be integrated into the service.
“The big idea is cultural ownership of science,” Biyela explains. Both he and Abbott say it is crucial to decolonize science by allowing people to do research and speak about science in their own languages. At the moment, it is possible to use African languages to talk about politics and sport, but not science, says Biyela.
Similarly, English is the dominant language of environmental stewardship and conservation — but unless people understand the meaning of specific terms and concepts and can talk about them in their home languages, they can feel disconnected from efforts to preserve ecosystems and species.
The researchers are concerned that if African languages are not included in online algorithms, they could, eventually, become obsolete and forgotten. “These are languages [people] speak. These are languages they use every day, and they live with and see the reality that in x number of years, their language might be dead because there is no digital footprint,” says Siminyu.
There’s no original isiZulu word for dinosaur. Germs are called amagciwane, but there are no separate words for viruses or bacteria. A quark is ikhwakhi (pronounced kwa-ki); there is no term for red shift. And researchers and science communicators using the language, which is spoken by more than 14 million people in southern Africa, struggle to agree on words for evolution.
IsiZulu is one of approximately 2 000 languages spoken in Africa. Modern science has ignored the overwhelming majority of these languages, but now a team of researchers from Africa wants to change that.
A research project called Decolonise Science plans to translate 180 scientific papers from the AfricArXiv preprint server into six African languages: isiZulu and Northern Sotho from southern Africa; Hausa and Yoruba from West Africa; and Luganda and Amharic from East Africa.
These languages are collectively spoken by around 98 million people. Earlier this month, AfricArXiv called for submissions from authors interested in having their papers considered for translation.
The translated papers will span many disciplines of science, technology, engineering and mathematics. The project was launched a year ago by philanthropic and government funders from Europe and North America, and Google.
Left behind
The lack of scientific terms in African languages has real-world consequences, particularly in education. In South Africa, for example, less than 10% of citizens speak English as their home language, but it is the main teaching language in schools — something that scholars say is an obstacle to learning science and mathematics.
African languages are being left behind in the online revolution, says Kathleen Siminyu, a specialist in machine learning and natural language processing for African languages based in Kenya.
“African languages are seen as something you speak at home, not in the classroom, not showing up in the business setting. It is the same thing for science,” she says.
Eventually, Decolonise Science aims to create freely available online glossaries of scientific terms in the six languages, and use them to train machine-learning algorithms for translation. The researchers hope to complete this project by the beginning of 2022. But there’s a wider ambition: to reduce the risk of these languages becoming obsolete by giving them a stronger foothold online.
Permeate
Decolonise Science will employ translators to work on papers from AfricArXiv, says principal investigator Jade Abbott, a machine-learning specialist based in Johannesburg, South Africa.
Words that do not have an equivalent in the target language will be flagged so that terminology specialists and science communicators can develop new terms.
“It is not like translating a book, where the words might exist,” Abbott says. “This is a terminology-creating exercise.”
The team will offer its glossaries as free tools for journalists and science communicators, as well as national language boards, universities and technology companies, which are increasingly providing automated translation.
“If you create a term and it isn’t being used by others, it isn’t going to permeate into the language,” says Biyela.
Masakhane’s researchers say that global technology companies have historically ignored African languages, but in recent years, they have begun funding research in the field.
“We’re aware that the many thousands of African languages are currently under-represented in translation software,” a Google spokesperson told Nature.
The tech giant wants to expand Google Translate to include more African languages.
However, it needs “speakers of those languages to help us improve the quality of our translations” so they can be integrated into the service.
“The big idea is cultural ownership of science,” Biyela explains. Both he and Abbott say it is crucial to decolonize science by allowing people to do research and speak about science in their own languages. At the moment, it is possible to use African languages to talk about politics and sport, but not science, says Biyela.
Similarly, English is the dominant language of environmental stewardship and conservation — but unless people understand the meaning of specific terms and concepts and can talk about them in their home languages, they can feel disconnected from efforts to preserve ecosystems and species.
The researchers are concerned that if African languages are not included in online algorithms, they could, eventually, become obsolete and forgotten. “These are languages [people] speak. These are languages they use every day, and they live with and see the reality that in x number of years, their language might be dead because there is no digital footprint,” says Siminyu.
Comments
Namibian Sun
No comments have been left on this article