Excitation Exploration:
Looking for Information Space with Excite

Recently, I wrote a paper dealing with what I called "information space." The idea is comparable to a tradition called "the vector space model" of information retrieval, but has some differences. My goal for this search is to identify papers similar to the paper I wrote. Thus, I am the audience for this review's topic.

The review might be useful to anyone who wants to use a Web search engine to find papers or other works "similar" to a particular paper or work. For example, you could take an abstract or outline for something you wrote, and see if anything similar is "out there."

The audience for Excite is general: it's intended to be useful to anyone searching for anything on the Internet. However, an article I saw in the June 1996 Wired magazine indicated that Excite actually uses some advanced experimental retrieval methods that I was familiar with -- actually, similar methods to what I wrote the paper about!

Excite has a "power user" page, which lets you specify which terms are required ("MUST OCCUR") and which are desired ("MAY OCCUR"). Essentially, this works like Boolean logic for the MUST terms. So, I first tried entering just my title from the paper, with "information space" as MUST and "building" as MAY:

building information space

The results were terrible. All I got was stuff about people making new buildings where the space was very nice and lots of information would be stored. In other words, press releases and other publicity information or announcements/descriptions.

So, instead, I tried a "query-by-example" model. By this I mean I tried to give enough text so that "similar" documents would come up in Excite. I copied the whole abstract from my article into the MAY box:

A set of procedures for the creation of metric multidimensional information space for information retrieval (IR) are presented. The rationale for spatial approaches to the representation of information is discussed, and a comparison is made between various representation methods for IR. The uses of information space for different types of IR systems are introduced, along with means for assessing information space.

The first few documents in the list (I had Excite retrieve the top 50 with brief summaries) were disappointing. But further down were two documents that were very good matches. One was a link to an abstract of a Swiss fellow's paper dealing with a very similar topic to mine [1]. I followed this link and discovered I now knew of someone working in this area who I didn't know about before.

I used this for "relevance feedback" with the option to Click here to perform a search for documents like this one, but the results were disappointing: they called up a whole bunch of other stuff by the same author or his research group, and nothing else. It made me wonder if the domain name were considered in doing "similar" searches.

The second useful link was about multidimensional analysis for engineers [2]. This was interesting, because although I mentioned "multidimensional" in my query, I didn't use any other terms I thought were really related to analysis or mathematical techniques. When I tried this for feedback, all I got was more technical mathematical documents, which really didn't help.

In the end, I found a link to one researcher and his group that was relevant and interesting. Also, a number of links to tangentially related mathematical treatment of a topic related to my paper. This was not great, but wasn't terrible. The main thing that this search demonstrated was that the "query-by-example" concept for Excite that I read about in Wired doesn't really work that well; or at least it didn't for entering my whole abstract. Perhaps using an entire document (not just an abstract) would work better.

References

  1. The Swiss fellow: http://www.ifi.unizh.ch/groups/hess/hess/swan21.html

  2. The multidimensional analysis link. Interestingly, only one such document showed up on this list. Perhaps Excite specifically gives a variety in its responses? http://www.georgehart.com/

To my home page

* UNC's homepage