Parsing the mountains of data that are generated daily by computers is a daunting task, said Lilian Wu, Ph.D., an IBM executive who spoke at Fordham.
But advances in data analytics are beginning to put the torrent of information to good use, she said.
“What many of us experience is that data is rushing at us in such huge quantities that it’s completely unmanageable and undigestible,” said Wu, who serves on the Fordham University Board of Trustees.
She offered several examples of how technology and analytics are using data to solve problems and create new possibilities in people’s everyday lives.
Wu mentioned an IBM project that sought to decrease preventable medical errors at hospitals. She referenced a study by the Institute of Medicine that estimated 98,000 deaths nationwide each year are caused by such errors.
“Hospitals are not safe places at all; they are very chaotic and difficult environments,” she said. “This was an area where technology offered just the right solution.”
IBM experimented with bar coding medications at Vassar Brothers Medical Center, a 350-bed hospital in Poughkeepsie, N.Y. Before any medication was given, the nurse swiped a bar code on the medication, on the patient, and on herself.
“Before the system was installed, the nurses registered about 250 alerts in a year that dealt with medication issues,” Wu said. “After the system came online, the number of alerts jumped into the thousands.
“That is a very straightforward and simple technology that will save many lives. It will prevent a double dose of medication from being given, or a child being given an adult dose—which happens all the time due to the environment,” she said.
Wu gave another example of how sensors that deliver data in real time can help tackle a global problem, namely, increasing water scarcity.
Citrus grower Sun World International, LLC, worked with IBM to design and embed moisture-measuring sensors throughout its citrus fields. As a result, only areas that need water will receive it.
“We’re dealing with the problem of water scarcity by breaking it up into smaller pieces that are more manageable,” she said. “We’re starting to think of strategies on the micro-level. There’s great potential.”
Wu then switched from discussing structured data—which are captured intentionally and sorted with purpose—to unstructured data, such as information expressed in natural language.
The most famous natural-language processor is Watson, the IBM supercomputer that bested top human competitors on the game show Jeopardy!. Wu explained why natural language is so difficult for computers to comprehend by using this sentence:
“If leadership is an art, then surely Jack Welsh has proven himself a master painter during his tenure at GE.”
“Maybe GE is an art school, so that is vague. If you took what was implicitly said there, you would get the wrong impression,” she explained.
The IBM-Watson team chose to tackle Jeopardy! because it is full of puns, slang and abbreviations that mean something else. It is impossible to build a database that could handle such complex language.
Also, the answers must be phrased with a high level of precision and there is a single correct answer for each question.
The IBM team built Watson to generate a series of 10 to 15 hypotheses from the clue given byJeopardy! host Alex Trebek. Watson then generates multiple possible correct answers.
“Then it looks to its corpus of knowledge and scores each answer on multiple dimensions based on evidence that it has in its knowledge base,” Wu said. “Then, combining the scores, one particular evaluation of the multiple hypotheses is generated.”
Watson then gauges its confidence in the chosen evaluation, and if the confidence is high enough, it will buzz in.
“Over time, we’re going to be able to include more unstructured data in analytics,” she said. “It’s difficult to imagine how that’s going to go, but it will happen.”
The 2011 Clavius Distinguished Lecture was sponsored by Fordham’s Department of Computer and Information Science.