skip to main content

Beyond the Doctor’s Office: Mining the Online Pool of Health Tips


Through his Big Data Analytics in Healthcare Social Media Project, W. “R.P.” Raghupathi is organizing cancer-related blogs into a powerful research tool.

In his research, W. “R.P.” Raghupathi has seen all kinds of offbeat medical information online—like the blog post saying that cancer patients can use meat tenderizer to unclog their feeding tubes.

While they may mention it to patients, “doctors are not going to prescribe that,” said Raghupathi, PhD, a professor of information systems at the Gabelli School of Business.

And that, in a way, is the point of his current research project. A “big data” expert interested in health care and information technology, Raghupathi is looking for a way to harness all the informal cancer-related information that patients and doctors post online via social media. He and his students are analyzing thousands of blog posts, hoping to create a research tool that’s far more powerful than your average hit-or-miss web search.

“The idea would be to develop some kind of decision support system that individuals can use to zoom in on whatever they’re looking for” on the Internet, said Raghupathi, founding director of Fordham’s Center for Digital Transformation. “Right now it’s an ocean out there. I mean, there’s just so much data.”

He envisions something that would cluster related posts together and show their connections rather than produce page after page of search results. In addition to helping people locate information, he said, his project would provide a patient-driven knowledge bank to complement the more established sources—like medical schools and institutes—and help set directions for research.

“There are all these large amounts of data. Let the data speak for itself,” he said.

A Web of Closely Linked Content

These goals call for something more advanced than a search engine, which might rank sites “but doesn’t actually go into their content,” he said.

The program he’s building would do that. Instead of listing sites that happen to share a few keywords, as a search engine does, it would digest the sites in their entirety and relate them to others that cover the same type of thing, creating a web of closely linked content. Results would probably be presented as a chart or a table, rather than a list, Raghupathi said.


W. “R.P.” Raghupathi

His hope is that people will more easily find obscure-but-valuable postings they wouldn’t have known to look for—like the one about meat tenderizer, or another one he saw, which said Japanese tea can help cancer patients control their nausea. Also, he hopes his program would help scientists would find interesting connections to follow up on.

Raghupathi started the project three years ago, and hopes to eventually incorporate other kinds of social media besides blog posts. He and his students are analyzing the posts with various statistical or data mining techniques—clustering, word count, word association, pattern recognition—and weeding out common words like “an” and “the” to get cleaner correlations.

He noted that users would always need to consider the information’s source, although the program could be designed to highlight some of the more reputable sites.

This kind of work has proven its value in the health arena, Raghupathi said, pointing to Google’s analysis of online chatter to help pinpoint flu clusters around the country. He also noted the digital “mining” of medical records that showed a link between Vioxx and strokes and heart attacks, leading Merck to pull it from the market in 2004.

Raghupathi foresees far more work of this type, given the explosion of online information and the advent of new tools for analyzing it and making “apples-to-oranges” comparisons among different types of media.

“We are collecting audio data, video data—YouTube, Twitter, tweets, all of that—at such an exponential rate,” he said. “There are mountains of data now available.”


Comments are closed.