“Once the project is completed, it will give us a framework under which hundreds of researchers can build interesting AI models that can be used every day in middle-income countries like Bangladesh and India, as well as with other Bangla language speakers worldwide,” said Amin. “This will potentially reduce the digital divide on a global scale.”
A Popular Language With a Critical Weakness
Artificial intelligence has become an essential part of everyday life, from online spell-checkers to voice-enabled devices like Amazon Echo. In order to use AI, these machines first need to be trained to understand the human language. Their natural language processing (NLP) system needs to absorb large amounts of data in order to recognize all the unique parts of a language, including idioms, metaphors, and even sarcasm.
English, the primary language of the internet, has a plethora of online texts to learn from, including a mature corpus: a large collection of English texts assembled by academics that are used to build up NLP for the English language. It includes social media, newspapers, and blogs.
But that isn’t the case for the national language of Bangladesh, known as Bangla—one of the most widely spoken languages in the world.
“When it comes to English, Google first understands a query, processes the query, and then provides a user with the best result. But that doesn’t happen with Bangla and other low-resource languages,” said Amin. “Google provides very good search results for some languages like English, Chinese, and Spanish. With Bangla, Google provides search results, but it can’t analyze that data because it doesn’t have a foundation of Bangla semantics information to draw from. Google does not understand the language, linguistically. So Google search results in English are very dynamic, but not in Bangla.”
A Global Project to Develop Artificial Intelligence Abroad
Over the next two years, Amin is working with Giga Tech, a global technology company in Bangladesh, to develop the first Bangla corpus.
“We want to create a large dataset labeled with grammatical properties by linguistic experts, which will then be able to identify people, places, and things. This will strengthen the Bangla national language’s NLP framework. Then we will develop a large-scale computational algorithm that can automatically detect those things from Bangla texts,” said Amin. “In the future, researchers can improve the model and local industries can build applications with it. That is the Bangladesh government’s goal—to create the framework so that information and communication technology within the country can lift off.”
Amin is originally from Bangladesh. He was born and raised in the capital city, Dhaka, and immigrated to the U.S. in 2013. That same year, he developed Bangladesh’s first national search engine—Pipilika, which ran for a total of eight years—in a project co-funded by Telenor, Accenture, and a2i, a Bangladesh government program that aims to improve access to public services through new technology.
Research Guided by Ignatian Philosophy
In 2019, Amin joined Fordham’s faculty, where he teaches and conducts research with undergraduate and graduate students. He also collaborates with academic institutions in North America, Europe, Australia, and Southeast Asia.
“Most of the problems solved by my team are local to the U.S., but that does not mean we have to only solve problems here. We can do the same thing for other languages and nations from where we are,” said Amin, who is virtually working with Giga Tech and the Bangladesh government on this project.
Amin said that his research, whether it’s conducted in the U.S. or in Bangladesh, is always guided by Fordham’s Ignatian principles.
“I am deeply motivated by the Ignatian principles, and I believe that education is one of the best ways to help people. We should continue to spread the knowledge we create within Fordham to touch people outside the University,” said Amin. “The best way to do it is through collaboration with outside entities—not just through academic research, but implementation that touches people’s lives beyond binaries and boundaries.”