<?xml version="1.0"?>
<!DOCTYPE Paper SYSTEM "XML2000.dtd">

<Paper track="society">
<Title>Will XML and Information Retrieval make Society Transparent?
</Title>
<AuthorInfo>
<Name>
<First>Gregory
</First>
<Middle>B.
</Middle>
<Last>Newby
</Last>
</Name>
<JobTitle>Professor
</JobTitle>
<Affiliation>University of North Carolina at Chapel Hill
</Affiliation>
<Address>
<Street>CB 3360 Manning Hall
</Street>
<City>Chapel Hill,
</City>
<State>NC
</State>
<Country>USA
</Country>
<ZipCode>27599-3360
</ZipCode>
<Email>gbnewby@ils.unc.edu
</Email>
<WebSite>ils.unc.edu/gbnewby
</WebSite>
</Address>
<Bio>
<Para>Gregory B. Newby received his undergraduate degree with majors in
Communication and Psychology, and his Master's degree in
Communication, at the State University of New York at Albany.  He
originally studied mass media and organizational communication, but
took a new focus after starting to make regular use of BITNET in the
early 1980s and later, the Internet.  Newby examined issues
surrounding new electronic communication media use during his PhD
studies at Syracuse University.  While at Syracuse, he developed a new
virtual reality laboratory and worked on development of a visual
interface to information space as part of his dissertation.  After his
PhD, Newby took a position as Assistant Professor in the Graduate
School of Library and Information Science at the University of
Illinois in Urbana-Champaign from 1991-1997.</Para>
<Para>At the University of Illinois, Prof. Newby worked extensively to
update the information technology curriculum and to integrate
education of technological skills for all students.  During this time,
he founded Prairienet, a public-access community computing system.  He
was also given responsibility to develop a new technology-based
distance education option for the MS degree at UIUC.  He has written
on information retrieval, human-computer interaction, electronic
publishing, uses and norms for the Internet, and new technologies for
business use.  Newby has taught courses dealing with Internet use
since 1988.</Para>
<Para>He is currently an Assistant Professor in the School of Information
and Library Science at the University of North Carolina in Chapel
Hill.  His research interests are focused on information retrieval,
information space, human-computer interaction, and impacts of new
electronic media.  His courses include "Information Security" and
"Distributed Systems and Administration."</Para>
</Bio>
</AuthorInfo>




<Sect>
<SubSect1>
<Title>Abstract</Title>
<Para>
This work presents an argument that information retrieval (IR) will be
augmented by increased use of XML.  This increased searchability of
various types of data will result in risks to personal privacy.  These
risks will come from at least three types of sources: large data
stores held by corporations or government; information brokers or data
miners, who are able to gain access to a variety of data sets; and the
Internet, where XML-tagged documents will be more searchable than
current HTML-tagged documents or plain text.
</Para>
</SubSect1>

<SubSect1>
<Title>Information Retrieval: Current Status
</Title>

<Para>
Information retrieval (IR) is a subdiscipline of information science, which
is an academic field related to computer science, library science and other
fields including communication, sociology and management.  The modern
history of information retrieval is usually traced to either 
Vannevar Bush's seminar "As we may think" article (<a href="#bush">Bush, 1942</a>), or to the more recent proliferation of tape-based document matching
systems of the 1960s.
</Para>

<Para>
IR has mostly to do with text retrieval, although other targets such
as sounds or images are also of interest.  The challenge of IR is that
<i>language is imprecise</i>.  This imprecision is a problem when
people express their information needs using language (where these
information needs might also be ill-formed, per the Anomalous State of
Knowledge hypothesis [<a href="#Belkin">Belkin et al., 1982</a>]).  The
imprecision is also a problem in the documents themselves: document authors
(or indexers, if the documents are described using fixed vocabulary
systems such as Library of Congress Subject Headings) can use a variety
of terms to express a concept.
</Para>

<Para>
In order to find those documents or document fragments that best 
satisfy a human information need, an IR system must somehow deal
with the ambiguity of human language, the difficulties of expressing
(ill-formed) information needs, and problems of imprecision in
how documents are written or described.  This is a very different
challenge than is found in <i>database systems</i>, in which
a match is sought between items that are usually assumed to be
unambiguous (such as date ranges or numeric values).  
</Para>

<Para>
Most IR research for the past two decades has focused on 
full text retrieval.  The idea is to create a searchable 
index of all words in a collection of documents, in order that
the best documents can be retrieved in response to a query.
Computer storage and the availability of entire documents
(as opposed to abstracts) has increased, but many IR systems
still treat all of a document's terms equally - that is, 
without attention to document structure.  (There are many
methods for looking at the relative importance of different
words, however.)
</Para>

<Para>
For the past few years, public awareness of IR systems has grown
as a result of Web search engines' popularity.  Systems such
as Google, Lycos and Altavista have demonstrated the utility of
IR techniques for finding Web documents based in short queries.
These systems are tuned for retrieval of documents formatted
with HTML, and have made some effort to increase their performance
by looking at document structure.  Specifically, examination of
search engine results indicate that the &lt;title&gt; tag is given
special emphasis, so that documents with search terms in their
&lt;title&gt; are ranked more highly than documents with the search
terms elsewhere.
</Para>

<Para>
The challenge with HTML markup, though, is that it is often
used for layout rather than actual structure.  For example, tables
are more often used for layout than for the presentation of tabular
data.  Even various headings (&lt;h1&gt;, &lt;h2&gt;, etc.) are
not necessarily indicators of major sections or divisions in the work.
Furthermore, HTML documents can cover a wide range of topics, so
decisions about which tags are related to what structure can seldom
if ever be made unambiguously (for example, an unordered list might
be used for references in a scholarly article, but for ingredients in
a recipe -- these are quite different structural elements, for very
different types of documents).
</Para>

</SubSect1>

<SubSect1>
<Title>Information Retrieval: How XML will Help
</Title>

<Para>
What XML offers over HTML and unformatted text is an opportunity
to treat different structural elements differently for IR.  Many
search engines have already demonstrated some of this potential
with HTML, by treating the &lt;title&gt; tag specially so that
search terms are more likely to be matched in a document title than
in the body.
</Para>

<Para>
To the extent to which standardized or similar DTDs are used for
XML documents dealing with similar content, IR systems will be
able to act more like database systems, bypassing some of the
ambiguity of language.  An example might be the retrieval of
documents based on their author or other metadata characteristics.
Although author/title searches are common in online library catalog
systems, they are seldom effective in Web searches except for
highly unusual names.  Consider the current author's name:  Newby
is not an unusual last name, but more importantly it is also used
as an indicator of a new network user.  There is little in 
HTML markup (except perhaps for the &lt;meta&gt; tag) that helps to
disambiguate these two uses.  An XML &lt;author&gt; tag is a clearly
superior markup method.
</Para>

<Para>
The most important benefit of XML for IR, though, is yet to be clearly
seen.  The author anticipates that content-specific structural
markup will enable increased IR performance for all purposes.  There
is already evidence for the utility of discourse-level linguistic
information in IR (<a href="#Liddy">Liddy, 1997</a>), as well as
for different sub-languages that are indicative of different
disciplines (<a href="#Haas">Haas, 1997</a>).  It is reasonable
to assume that many of the most common IR techniques will be more
effective when document structures are able to be handled appropriately
by IR systems.
</Para>

<Para>
Such an advance for IR will require moving away from the common
"bag of words" approach that has been favored by IR researchers since
the days of online abstracts and complete documents.  Wheras a
current IR system might have only one inverted index of terms for
a collection (in which the list of documents containing a particular
term may be quickly found), new IR systems might require multiple
indices for different DTDs or document types, and for different
tags or structural elements within the documents.
</Para>

<Para>
Better IR might also require some additional specification by the
human information seekers, as well.  For example, a searcher might
need to specify the type of document(s) she is seeking, or provide
hints as to the tags or document substructures that are most likely to
contain the information she is seeking.  Although there is strong
evidence that Web searchers, especially, are lazy (<a
href="#Silver">Silverstein &amp; Henzinger, 1999</a>), we have also
seen evidence of search engines such as Northern Light that are able
to solicit such user feedback relatively painlessly and with good
benefit.
</Para>

<Para>
Every year, the Text REtrieval Conference (TREC, see <a
href="#Voorhees">Voorhees and Harman, 2000</a>) offers IR researchers
the opportunity to demonstrate their new techniques for retrieval,
in a head-to-head comparison against the leading systems and methods.
Due to a shortage of medium or large test collections of XML
documents, we have not yet seen specific attention to XML.  However,
it is likely that within the next year or two there will be some
XML-specific activity in TREC, as well as some increased activity
for XML searching by the Web search engines.  We have also seen
considerable commercial sector activity in adding XML capability
to database systems.  All of these developments in the near future 
will provide opportunities for developing IR systems that capitalize
on XML.
</Para>

<Para>
To summarize, trends and research in IR, database systems and
search engines indicate a strong probability that XML documents will
enable a leap in IR performance:
</Para>

<Para>
<ul>
<li>XML will enable non-ambigious database-style lookup for some
items such as author name, key words, dates and other items.  This
is not generally feasible with HTML or plain text documents.</li>
<li>XML will provide means for accessing subdocuments that are 
sensitive to context, sublanguage, genre, etc.  This will
enable segmentation of document collections based on type, and searching
to focus on particular tags or structural elements.</li>
<li>XML search engines will develop that change searcher's habits
and expectations.  In able to gain improved precision in IR results,
searchers will be willing to provide additional details on their
information need to make better use of XML tagged content.</li>
</ul>
</Para>

<Para>
We might reasonably expect that XML will also be a solution to
the problem of scaling: as the number of interesting documents
has grown, our techniques for effective IR have not kept up
(<a href="#Newby">Newby, 2000</a>).  One of the most promising
methods for dealing with larger collections of documents is to
be able to segment them based on content type or structural
elements.  While this is impractical with HTML or plain text
due to the inherent ambiguities of hypertext markup and language,
the use of domain-specific DTDs with XML make segmenting collections
feasible.
</Para>

<Para>
The remainder of this paper addresses the risks to society and
to individuals as a result of improved IR via XML.  Although it
is impossible to predict the exact IR techniques and societal
risks involved, there are several classes of risk that seem
inevitable.
</Para>

</SubSect1>


<SubSect1>
<Title>Risks by Data Accumulation 
</Title>

<Para>
Below, we will turn to issues of publicly available information, such as
that found on the Internet today.  In this section, we will consider the
role of powerful entities in accumulating information about individuals
and groups, and how better IR through XML might be harmful to privacy.
</Para>

<Para>
The U.S. has historically fought most attempts to create
a national citizen identification number and vast centralized database.
Nevertheless, these things do exist.  The Social Security Number (SSN)
is used for identification purposes, often in direct opposition to
laws surrounding the SSN's use.  Financial transactions such
as loans, mortgages, credit card applications, and so forth are
especially subject to indexing by SSN.  Some of the largest centralized
databases are the Internal Revenue Service's tax systems and the private
sector's credit worthiness and reporting systems.
</Para>

<Para>
Access to credit reporting systems, the IRS databases, and state-run
systems such as driver's license bureaus and associated records is
limited mostly to companies or government organizations.  The risk of
easier IR is that additional data will accumulate from other sources, and
be more easily integrated.
</Para>

<Para>
Consider records from health, employment, education and civic activities.
Without consistent markup, there is no good way to integrate data
from these types of records with each other, or with the larger 
national systems described above.  Despite the existence of large-scale
organizations such as the <a href="http://www.mib.com">Medical
Information Bureau</a> that track virtually all medical conditions
and procedures paid for by insurance companies, the MIB and other 
organizations do not have an easy way of incorporating their data with
other data sources.
</Para>

<Para>
The drive to integrate data from multiple sources will come from
many sources, including legislation, law enforcement and other
public-sector areas.  The biggest drive, however, is likely to come
from the private sector.  Consider that an insurance company or
bank would like access to more than an individual's credit records
or health history: they would benefit from driver's records, taxpayer
information, employment history, etc.  By merging data from multiple
sources, a much more detailed profile can be achieved.
</Para>

<Para>
Such a profile could be used for marketing, for making decisions about
pricing or the availability of loans or insurance coverage, or for
resale to potential employers.  The tools of IR, along with the
emerging tools for XML (such as query languages, schema and
interchange formats) will facilitate this type of information sharing.
</Para>

<Para>
Although IR is not a panacea for integrating disparate databases
marked up with XML, various IR techniques have been successful at
resolving the ambiguities of language and information need.  When
combined with strong data typing, shared XML schema, and the 
imperative for competitive advantage in the marketplace, we can
anticipate great advances in how information may be aggregated
for many purposes.  
</Para>

<Para>
There are many specific risks associated with data accumulation
and aggregation by government and private entities:
</Para>

<Para>
<ul>
<li>Individuals will not have direct access to records about
them.  (In the U.S., the Fair Credit Reporting Act provides some
recourse for credit information, but no such mechanism exists for
discovering most other types of information about us.)</li>
<li>Records may be incomplete or erroneous.  Credit and health
information has a high error rate of at least a few percentage
points (consumer groups find higher error rates than the credit
bureaus and insurance agencies will admit to, making the actual
error rate elusive).  Individuals are stymied at correcting data
about them, even when such data are available.</li>
<li>Transformations or decisions based on aggregated data may
be held as proprietary.  Credit decisions are based on a credit
worthiness score assigned by credit bureaus or banks.  Individuals
do not have access to their scores, though, only the raw data
used to make the score.  With other types of accumulated data, we
can anticipate an increased role for automated summarization or
scoring to satisfy the needs of private industry or government.
Such scoring or summarization could have an extremely important role
in many areas effecting individuals' lives, but not be accessible
to their scrutiny.</li>
</ul>
</Para>

<Para>
With standard IR on text collections, we can identify documents or
subdocuments relevant to a particular information need.  We can
anticipate a time when text documents are part of a larger collection
of XML-tagged data -- something we don't have now, but are likely to
have in the future.  This advance will enable easier integration of
accumulated datasets with (ambigious) text.  For example, an employer
might combine an IR query on performance evaluations with a dataset of
financial and health information to determine an employee's eligibility
for a promotion.
</Para>

<Para>
Standards for electronic data interchange (EDI) evolved as a result
of businesses wanting to streamline operations, share information and
thereby save money.  Many current standardization efforts and 
products with XML are directed towards the same purpose: making
exchange of marked up data simpler, in order to maximize the utility
of the shared data.  
</Para>

<Para>
We cannot anticipate all of the technical, social and legal
impediments that might block some types of data incorporation into
extremely large datasets from heterogenous sources.  But it would be
naive to think that the development of such datasets is anything but
assured.  When information retrieval techniques are integrated with
database techniques, for large datasets, we can anticipate greatly
enhanced abilities to retrieve data about individuals.
</Para>
</SubSect1>

<SubSect1>
<Title>Risks by Data Acquisition
</Title>

<Para>
Information brokering is a growing business.  Many skilled 
individuals and specialized businesses are able to mine various
data sources to find information about individuals.  Even when
information is not centralized, such as described above, there is
a considerable risk to personal privacy as a result of XML
plus IR.
</Para>

<Para>
There are many legitimate purposes for data mining, such as
tracking alimony or child support non-payers, finding missing
persons, or seeking ideal life partners.  But potential for
abuse is serious, and examples are common in the pre-XML world:
mistaken identities, erroneous data, and witch hunts.
</Para>

<Para>
As with large-scale national databases, an individual's access to
information about him is often limited.  More importantly, access to
information about others may be restricted.  Yet, potentially
sensitive data such as driving records, bank records, employment
records, etc. may be revealed to a licensed information broker,
government agencies, or a clever social engineer.
</Para>

<Para>
Despite the shortage of regulation for large data sharing enterprises,
there is some regulation (at least for credit and driving records).
We can't anticipate such regulation for small-scale information
brokerage operations, though -- even if regulated, the nature of the
business is to circumvent blocks to access.
</Para>

<Para>
It may be that the most visible risk to individual privacy will be for
public figures.  The opportunity for muckraking will be greater with
better standards for data interchange and formatting, and further
facilitated by IR.
</Para>
</SubSect1>


<SubSect1>
<Title>Risks by Seeking
</Title>

<Para>
Typing a name, a concept or a few words to a Web search engine will
result in documents with the words typed.  The challenge of IR, as
mentioned above, is to address the ambiguity in language so that
the information need behind the words typed is matched.  Although we
do not yet know how Web search engines will meet the new and different
demands of XML (notably the wide variety of DTDs), we can be sure
they will.
</Para>

<Para>
The two previous sections addressed the risks to individual privacy
that will result when IR techniques are applied to XML data held by
private industry and government.  In this section, we will turn to the
large and varied dataset of the public Internet.  For this section, we
will assume that XML will largely replace HTML as the language of
choice on the public Internet, although whether this will occur is a
matter for conjecture.
</Para>

<Para>
What XML offers the Web is content-based structure, as opposed to
the current layout-based structure of HTML.  This will be tremendously
useful for IR (as exemplified in Web search engines), in that it will
reduce the ambiguity in language by enabling context-sensitive queries.
</Para>

<Para>
Currently, it is difficult to find all Web-based materials written
by an individual.  It is also hard to find all items about an
individual, an entity or an event.  XML will facilitate this type of
retrieval.
</Para>

<Para>
The main immediate risk to individual privacy is the difficulty
of anonymity.  In the wider view, we must consider the risk of 
full disclosure of all public events.  For anonymity, the technical
facilities for anonymity (such as email anonymizers and network
path obfuscators) are in opposition to efforts of law enforcement
and policy makers to provide for accountability.  At the same
time, the push for electronic commerce has been accompanied by 
strong ties between an individual's online presence and her
credit card.
</Para>

<Para>
For full disclosure, consider the growing body of publicly
available online data.  Newspapers and local government are
scrambling to get their content online and searchable.  Colleges
and other agencies have similar efforts.  At the same time,
individuals are putting an ever-larger proportion of their lives
online.  The risk is clear: XML will provide a much easier way
for IR systems to link data from disparate sources to a single
individual, entity or event.
</Para>

</SubSect1>


<SubSect1>
<Title>Towards Transparency
</Title>

<Para>
In <i>The Transparent Society</i> (<a href="#Brin">1998</a>), David
Brin envisions a world in which video cameras and other surveillance
devices are rampant.  Indeed, he argues, such surveillance already
occurs in many locations, from gas stations to school playgrounds.
What Brin suggests is that we put the surveillance devices to work
by making their data streams accessible by anyone, anywhere -- instead
of just to the privileged few engaged in law enforcement or other
monitoring.
</Para>

<Para>
What Brin suggests is happening already with Internet-based activities.
It's ever harder for people to hide their identifies when chatting,
sending email or posting Web pages.  As the 2000 legal action by Metallica
against their fans using Napster demonstrated, even an assurance of
confidentiality is not a match for a determined third party.  
</Para>

<Para>
Transparency on the Internet is aided particularly by information retrieval.
Through IR, it is quick and easy to find network news messages, mailing
list archives or other publicly posted discussion items through centralized
databases.  This ease is based on the relative stability of email addresses
for individuals.  
</Para>

<Para>
What we can expect, when XML becomes the mode of choice for posting
online data, is that IR will work even better.  It will be reasonable to
start with a document, and find similar documents or documents by similar
authors (many IR techniques are quite successful at finding documents
"similar to" a target document).  We can envision and start to plan for
a time when all aspects of our online lives are searchable, except
those we work hard to keep private or anonymous.  
</Para>

<Para>
What of the hidden data sources?  This is where Brin's vision could 
have an impact, but such an impact does not seem likely.  Imagine if
all health, government, financial and other data sources were completely
open to public search and scruitiny.  As Brin points out, this would
change the nature of privacy: when no-one's data are private, we will
discover that we all have embarrassing events in our past, or health
problems, or financial difficulties.  
</Para>

<Para>
Apart from the problem of complete transparency being counter to
many people's expectations, and beyond the problem of prejudice that 
we might imagine would still exist in a transparent society, it
seems the main challenge to such a change will be the self-interests
of the parties that hold the data currently.  As Simson Garfinkle
(<a href="#Garfinkle">1999</a>) points out in his dystopian
look at who possesses data and what they are doing with it, data
are big business.  From financial institutions to government, the
data that agencies or businesses possess, manage, repackage, mine
or resell have very high value.
</Para>

<Para>
Nevertheless, as the above sections have tried to argue, XML plus
IR will result in real advantages in information seeking for those
with access to data.  While individuals might only have access to
publicly available data or, in some cases, data concerning themselves,
we can anticipate a continued growth in data brokering and mining
activities.  
</Para>
</SubSect1>




<SubSect1>
<Title>Conclusion</Title>

<Para>
The theme of this paper has been that XML will make data easier to find,
notably data concerning individuals.  There seems little doubt that
a main selling point of XML is its ability to apply standardized
DTDs to data, or to merge data from multiple sources by matching
DTDs or other content-based tags.
</Para>

<Para>
Based on the history and status of information retrieval, it is the
author's view that IR will be greatly enhanced for XML-tagged data
versus comparable plain text or HTML data.  Web search engines, 
which in many ways are the most advanced IR systems today, will need
to change to take advantage of XML data.  End users may be solicited for
additional input to aid in determining appropriate tags or other 
content-specific elements when there is a clear advantage in doing so,
as search engines such as Northern Light have already demonstrated.
</Para>

<Para>
Personal privacy threats will result from the enhanced searchability
of XML data with new IR techniques.  These threats are not new, but will
grow as a result of enhanced searchability.  Threats include those presented
by large organizations that accumulate and trade data, by various 
information brokers, miners and intermediaries, and by publicly-searchable
data on the Internet.
</Para>

<Para>
We cannot predict with confidence the specific challenges to personal
privacy that XML plus IR will introduce, but there seems to be little
reason to doubt that the challenges will come.  Although some authors, such
as Garfinkle, advocate strong new privacy laws to protect against possible
misuse of data, such laws do not appear to be forthcoming.  Instead, 
the best prophylaxis against privacy risks in the near term will be
using existing means to stem the flow of personal data.  These include:
</Para>

<Para>
<ul>
<li>Registering with services offered by the Direct Marketing
Association (<a href="http://www.the-dma.org">DMA</a>) to decrease
exchange of your personal data for direct mail and tele-marketing,
as well as spam.</li>
<li>Utilizing anonymizers when possible to mask yourself and your
data on the Internet.</li>
<li>Instructing all agencies with your sensitive personal data, and their
data sharing businesses, to limit exchange of personal data.  For example,
mortgage holders will usually heed your request to not forward any data
about you to any other party (except as required by law).  Credit reporting
agencies and the Medical Information Bureau (MIB) offer opt-out methods for
preventing non-authorized use of your credit and health data.</li>
<li>Read privacy policies before divulging personal data on the
Internet.  Conglomeration in online media (notably the Time-Warner
acquisition by AOL) offer unprecedented means for merging data about
you from disparate sources.</li>
</ul>
</Para>

<Para>
In addition, technologists need to work to inform policy makers of
the increased risks to personal privacy that XML plus IR will
generate.  By enabling additional personal control over data
about ourselves, we may be able to shape a future where the benefits
of XML plus IR outweigh the dangers.
</Para>



</SubSect1>



<SubSect1>
<Title>References</Title>
<Para>
<ul>
<li><a name="Belkin"></a>Belkin, N.J., Oddy, R.N, &amp; Brooks, H.M.
1982.  "ASK for information retrieval: Parts I &amp; II."
J. Documentation 38:61-71, 145-164.</li>
<li><a name="Bush"></a>Bush, Vannevar.  1945.  "As we may think."  Atlantic
Monthly 176: 101-108.</li>
<li><a name="Brin"></a>Brin, David.  1998.  The Transparent Society:
Will Technology Force Us to Choose Between Privacy and Freedom?
Reading, Massachusetts: Perseus Books.</li>
<li><a name="Garfinkle"></a>Garfinkel, Simson. 2000. Database Nation:
The Death of Privacy in the 21st Century.  Sebastopol, California:
O'Reilly &amp; Associates.</li>
<li><a name="Haas"></a>Haas, Stephanie W.  1997.  "Disciplinary
variation in automatic sublanguage term identification."  Journal of
the American Society for Information Science, 48(1): 67-79.</li>
<li><a name="Liddy"></a>Liddy, E.D.  1997.  "Natural Language
Processing for Information Retrieval and Knowledge Discovery."
Proceedings of the 34th Annual Data Processing Clinic.  University of
Illinois at Urbana-Champaign.</li>
<li><a name="Newby"></a>Newby, Gregory B.  2000.  "The Science of
Large-Scale Information Retrieval." Internet Archive 2000
Colloquium. San Francisco, March 8-9.</li>
<li><a name="Silver"></a>Silverstein, C. Henzinger, M.  1999.
"Analysis of a Very Large Web Search Engine Query Log."  SIGIR Forum
31(1).</li>
<li><a name="Voorhees"></a>Voorhees, Ellen &amp; Harman, Donna.
(Eds.).  2000.  TREC-9 Proceedings.  Gaithersburg, Maryland: National
Institute of Science and Technology.</li>
</ul>

</Para></SubSect1>


</Sect>
</Paper>
