| The Scientist 12[10]:12, May. 11, 1998 |
Profession
By Robert Finn
Date: May 11, 1998
Author: Robert Finn
It seems like such an obvious idea once it's stated: With the explosive growth of scientific literature and the concomitant fragmentation of the scientific community into narrow specialties, there must be undisclosed connections lurking. Suppose one field of science has linked medical condition A with symptom B, and a completely different field has linked dietary deficiency C with that same symptom B. The literature then would contain an implicit logical link between A and C, but unless a researcher from one field stumbled upon the other field's literature, that link would never become explicit.
![]() PROGRAM DEVELOPER: Don R. Swanson, professor emeritus of information science at the University of Chicago, has developed a computer program that will allow systematic searching for links in scientific literature. |
Swanson prefers to call his approach text-based informatics. Other enthusiasts, such as Michael D. Gordon, professor of computer and information systems at the University of Michigan Business School in Ann Arbor, refer to it as literature-based discovery. "I think that [Swanson] has made concrete and operational this idea that there are discoveries that can be made within the public literature," notes Gordon. "He has performed very, very well-conceived and well-executed experiments. . . . And there has been follow-on work in the medical community that has been based on his reporting these connections."
"He's opening up a new way of doing information science, a new avenue for approaching information," adds Henry Small, director of contract research at the Philadelphia-based Institute for Scientific Information Inc. "Whether or not this pans out, I think people are thinking in a new way about information. They're thinking of knowledge discovery rather than just information retrieval."
One of Swanson's first successes with text-based informatics came in 1986: "Raynaud's disease is a circulatory disorder for which neither the cure nor the cause is known," he explains. "I found literature that indicated that at least a subgroup of Raynaud's patients have certain abnormalities in their blood. Separately [I learned that] dietary fish oil tended to correct those abnormalities, such as high viscosity. And so putting those together suggested that dietary fish oil should be of benefit to Raynaud's patients. About two years later somebody did a clinical trial that did show a beneficial effect." (Swanson's original paper is: D.R. Swanson, Perspectives in Biology and Medicine, 30[1]:7-18, 1986; and the papers reporting the clinical trial that validated his suggestion are: B.B. Chang et al., Surgical Forum, 39:324-326, 1988; and R.A. DiGiacomo et al., American Journal of Medicine, 86[2]:158-164, 1989.)
Then in 1988 Swanson used his method to establish a relationship between migraine headaches and magnesium deficiency (D. R. Swanson, Perspectives in Biology and Medicine, 31:526-557, 1988). Despite the fact that at the time there were virtually no records in the MEDLINE database mentioning both migraine and magnesium, Swanson found that there were 11 intermediate effects linking the neurological disorder with the dietary deficiency. To give just two examples, writes Swanson, "magnesium can inhibit spreading depression in the cortex, and spreading depression may be implicated in migraine attacks; magnesium-deficient rats have been used as a model of epilepsy, and epilepsy has been associated with migraine." The relationship Swanson found between migraine and magnesium has been validated repeatedly in the clinic.
More recently Swanson has collaborated with Neil R. Smalheiser, now an assistant professor of psychiatry at the University of Illinois at Chicago, who says that he has encouraged Swanson to take a slightly different approach to text-based informatics. Previously Swanson had focused on cases in which no link between a particular A and a particular C had been suspected. Now the collaborators are focusing on cases in which researchers have already established at least a tenuous connection between A and C. "They either know that A and C are linked experimentally, but may not know how they're linked," explains Smalheiser, "or they want to see what is already known about possible relationships between A and C."
![]() COLLABORATOR: Neil R. Smalheiser, assistant professor of psychiatry at the University of Illinois at Chicago, has worked with Swanson in developing ARROWSMITH. |
On the one hand ARROWSMITH uncovered, in Smalheiser's words, "a whole cottage literature in schizophrenia arguing that patients shared markers of various kinds suggestive of chronic oxidative stress." On the other hand, the program turned up information about an animal model of chronic oxidative stress that's produced by depriving rats of Vitamin E and selenium for a period of six weeks. These rats show selective increases in calcium-independent phospholipase A2.
"We saw this A-to-B-to-C link and proposed not only that this could explain what they saw in people, but that this chronic oxidative stress model could be [useful] for studying this phenomenon," notes Smalheiser.
Swanson describes ARROWSMITH (which is available for use at no cost at http://kiwi.uchicago.edu) as, "interactive software that extends the capability of a MEDLINE search. It operates on the output of a conventional search in a way that lets the user see new relationships and form novel hypotheses. . . . The user puts in a broader field as a conjecture--'I'd like to look at all dietary substances and see any connections to migraine.' And that's organized into a ranked list on which magnesium appears at or near the top. Therefore the hypothesis is suggested to the user, rather than asking the user to come up with the hypothesis."
Despite the promise of Swanson's approach, information scientists and researchers have detailed a number of shortcomings and criticisms both with the ARROWSMITH program and, more broadly, with literature-based discovery itself. ISI's Small, for example, expresses a general sense of doubtfulness: "I have to say I'm skeptical, but it's not so farfetched that it's not worth some serious attention. I think everyone has some skepticism that you can find something new from what's already out there."
Of Swanson, Small remarks, "He's not doing anything in the laboratory. He's not collecting any new experimental results. And that seems to be the primary way discoveries occur, through the experimental method, testing hypotheses and so forth. So to think, say in the case of a disease, that the answer to a particular disease . . . is actually sitting out there somewhere hidden, is a little bit of a stretch. . . . What if the migraine headache was caused by something we haven't even looked at yet? In that case, no matter how hard the machine flails away at this, it's not going to find an answer."
On the other hand, allows Small, "In the cases where the explanation and the problem are already out there and just have to be put together, his method is a very, very reasonable one to try."
A more specific shortcoming of ARROWSMITH is that the current version of the software operates only on an article's title, not on the full text, the abstract, or even its medical subject headings (although Swanson has a version in development that does examine subject headings). Swanson defends this approach by noting that most titles in the biomedical literature are highly descriptive, and by pointing out that having the program examine additional fields would greatly increase processing time.
![]() NOT JUST SCIENCE: Kenneth A. Cory used a Swanson approach to find a link between the poetry of Robert Frost and ancient Greek philosopher Carneades. |
Another problem with ARROWSMITH is that it requires a significant investment in time and effort from its users, who must first conduct two detailed MEDLINE searches and download the results in a specific text format. "I think the most basic [limitation] is that this is really meant for the person who already knows how to do database searching," Swanson acknowledges. "It doesn't replace conventional searching, and the more the user knows about database searching, the better the user can exploit ARROWSMITH."
Then users must upload the results to the ARROWSMITH site for processing. "When the Internet slows down, the responsiveness of the system slows down," notes Swanson.
Smalheiser agrees that this can be a problem. "The trouble with the web site is that when you're downloading very large files of the sort that a literature would consist of--schizophrenia has thousands and thousands of titles--it takes a long time to do that over the Internet." And then, once ARROWSMITH produces its output--a process that takes from 30 minutes to several hours, depending on the size of the input files--a researcher who is highly knowledgeable in the subjects under investigation must take the time to interpret the results.
"Even though the computer is doing a lot, the user has to invest the time to study the output and use the output as a guide to the literature," Swanson notes. "It's not a magical solution in which the computer comes up with new ideas and new hypotheses. The role of the computer is to display information that stimulates the user to see new connections. So it depends on the sophistication and the abilities of the user."
Despite the technique's promise, literature-based discovery seems to have generated more interest within the information science community than among life scientists. Swanson points out that very few individuals have made use of ARROWSMITH despite its free availability to anyone in the world with an Internet connection. Smalheiser observes that the papers he's co-authored with Swanson reporting the associations they discovered--including associations between indomethacin and Alzheimer's disease and estrogen and Alzheimer's disease--have been cited only for the results they produced. The innovative method itself has apparently not caught many eyes.
"I find it almost unbelievable that this isn't something that has captured everyone's attention," complains Gordon, who co-authored a paper (M.D. Gordon, R.K. Lindsay, Journal of the American Society for Information Science, 47:116-128, 1996) replicating and extending Swanson's original results on migraine. "It's both extremely exciting intellectually and extremely important in terms of the kind of results it potentially can produce. I don't understand why it hasn't been seized upon."
Smalheiser suggests one reason: "There is a certain amount of work that is involved, because when you're linking two literatures there's a significant number of terms--a B list--that you have to wade through. So you have to know what you're looking for and why to make sense of this, to quickly go through separating the wheat from the chaff."
This is much more difficult, he notes, than using one of the publicly available sequence databases to perform a simple DNA homology search. "When you do a DNA homology search, they will give you a list of the 100 most homologous genes. . . . Whereas if we generate a B list, the raw B list might be 600 terms. And then we manually edit it down to 150 terms. Then we scan through the titles quickly, knowing what kind of things we're looking for, but it still could be a stack of 20 pages."
"It takes an expert to evaluate the different hypotheses," ISI's Small observes. "And how many medical experts are going to have the patience to do this kind of thing? I think this might be the crux of the issue, because a clinical practitioner is not going to go to MEDLINE and try to churn away at something like this."
And then there may be a psychological barrier preventing researchers from allowing a computer to generate hypotheses. But researchers need have no fear that they'll be supplanted by machines, says Gordon. After all, accountants weren't rendered obsolete by computerized spreadsheet programs.
| The most detailed expression of Swanson's ideas on text-based informatics can be found in the paper: D.R. Swanson, N.R. Smalheiser. "An interactive system for finding complementary literatures: a stimulus to scientific discovery." Artificial Intelligence , 91:183-203, 1997. The full text of this article, as well as the gateway to the ARROWSMITH program, are on Swanson's web site: http://kiwi.uchicago.edu/ |
The bottom line, concludes Gordon, is that Swanson's work has provided a valuable set of tools. "Put them in the hands of someone who really understands medicine. They ought to be able to comb through a good body of literature in a much more systematic way than they could have otherwise. They can have brought to their attention concepts and articles that otherwise, because of their specialization, they very likely never would have considered. They could be much more efficient with this tool working for them, and ultimately they could probably be more effective, more creative."
Robert Finn, a freelance science writer based in Long Beach, Calif., can be reached online at finn@nasw.org.
| The Scientist 12[10]:12, May. 11, 1998 |