Ruhul Amin, Ph.D., wants to understand patterns all around us.

And with the aid of technology and data, he says, there is nothing that one can’t sort through. Want to know what the common themes are of 30,000 books? There’s an algorithm for that, and his team developed the methods to understand how syntax and themes influence a book’s success. Maybe you’d like to help a country better manage the way it responds to a pandemic? There’s an algorithm for that—and he’s used it in studies such as “Adjusted Dynamics of COVID-19 Pandemic due to Herd Immunity in Bangladesh.”

“I feel like, as a scientist, we all dream of impacting the actual lives of people. It’s not just that we will limit ourselves to theoretical contributions only. I figured that our work could reach the public by working side by side with the government, especially policymakers. This is how I thought it would be the best way to achieve a common good,” he said.

“I love data science because, with data science, you can work on so many diverse projects.”

Predicting COVID-19 Spikes

Amin, a native of Bangladesh who joined the department of computer and information science as an assistant professor in 2019, has been focusing his data analysis tools onto an array of areas, most recently the pandemic.

In “Adjusted Dynamics,” he and four collaborators examined data from the Bangladeshi government and created a new model that tries to predict how many people will become infected with COVID-19. A new model was needed because in Bangladesh, testing is prohibitively expensive, unlike in the United States, where it’s free. This means that Bangladeshi residents wait longer to get tested after initial exposure, and because COVID-19 can be spread by people who are not showing symptoms, they may be spreading it to others, causing the positivity rate to skew higher.

They started with SIRD (Susceptible-Infectious-Recovered-Deceased), a common statistical model, and modified it using an algorithm traditionally used in physics to predict the trajectory of objects in motion, called a Kalman Filter. For each of the country’s 64 provinces, they assigned color codes of green, yellow, and red, and plotted them on a timeline from May 2021 to May 2022. Ultimately, they were able to accurately predict 95% of the time where rates of COVID would rise and where they would fall. He shared the methodology with the Bangladeshi government, which instituted some of the recommendations regarding actions such as lockdowns.

Forecasting a Book’s Success

The computational research is extremely flexible and thus highly inter-disciplinary in nature, Amin said. When he learned that one of his graduate students had earned a bachelor’s degree in English literature, they teamed up together for a project that requires a deeper understanding of both linguistics and natural language processing (NLP). Using language features such as syntax, and the conceptual framework on which a piece of literature is based, they created NLP models to make predictions about a book’s success. The model was trained on the properties of other successful books to learn either their ranking on Goodreads or the number of times they’ve been downloaded.

In a similar study, “Stereotypical Gender Associations in Language Have Decreased Over Time” (Sociological Sciences, 2020), Amin used an automated process to scan a million English language books published between 1800 and 2000, and found that while stereotypical gender associations in language have decreased over time, career and science terms still demonstrate positive male gender bias, while family and arts terms still demonstrate negative male gender bias. He then further extended the work at Fordham to produce another research, “A Comparative Study of Language Dependent Gender Bias in the Online Newspapers of Conservative, Semi-conservative and Western Countries.”

The success of studies such as these has made Amin confident that he can use the technique to examine documents to detect everything from political leanings to racial bias.

Finding Patterns in Mental Health Hotline Calls

Amin is also working in the area of mental health. In collaboration with NYU and the University of Toronto, he is analyzing five years’ worth of recorded phone conversations from a popular mental health “befriending” hotline in Bangladesh. The goal is to use past records to see if any patterns emerge that can be used for the future. This could be used by healthcare professionals to better tailor messages to the public or adjust staffing levels more efficiently.

“The interesting thing is what people really discuss during, let’s say, the weekend. Is it different from the weekdays? When do you get the most calls? Is it right after you post something where you say, ‘Hey, we’re a befriending service, we’re providing this kind of help?’ When do you get suicidal calls? You can literally change this area by using this modeling,” he said.

So long as there is enough data and computing power, Amin is optimistic that the possibilities for projects using algorithms are nearly endless. One of his projects, for instance, involves the analysis of a billion tweets on Twitter that tries to ascertain what constitutes offensive and biased language. Eventually, he hopes the data collected can be deployed the way Grammarly is used to clean up grammatical mistakes, but to help us identify blind spots in our perspectives.

“I actually published a paper in gender bias, and so I thought. ‘I’m a person without any biases.’ But when I took this psychological test recently, I found that I’m still male-biased,” he said.

“We’re coming from different backgrounds, and all have these kinds of stereotypes within us. So I want to develop a tool that can suggest to you how biased and how offensive the language is that you just wrote to any person or community.”

Even Fordham itself has the potential to be a good research project; Amin has his eyes set on the collections of the University’s library system. “We’re constantly conceptualizing the whole world,” he said. “Why not Fordham?”