| |  | XIE, Hong | Shifts in information-seeking strategies in information retrieval in the digital age. A planned-situational model read moreAbstract: Shifts in information-seeking strategies in information retrieval in the digital age. A planned-situational model
| 2007 |
| |  | Marchionini, Gary | Exploratory search: from finding to understanding read moreAbstract: From the earliest days of computers, search has been a fundamental application that has driven research and development. For example, a paper published in the inaugural year of the IBM journal 36 years ago outlined challenges of text retrieval that continue to the present [4] . Today's data storage and retrieval applications range from database systems that manage the bulk of the world's structured data to Web search engines that provide access to petabytes of text and multimedia data. As computers have become consumer products and the Internet has become a mass medium, searching the Web has become a daily activity for everyone from children to research scientists. | 2006 |
| |  | Han, H. | Two supervised learning approaches for name disambiguation in author citations read moreAbstract: Due to name abbreviations, identical names, name misspellings, and pseudonyms in publications or bibliographies (citations), an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, Web search, database integration, and may cause improper attribution to authors. We investigate two supervised learning approaches to disambiguate authors in the citations. One approach uses the naive Bayes probability model, a generative model; the other uses support vector machines (SVMs) [V. Vapnik (1995)] and the vector space representation of citations, a discriminative model. Both approaches utilize three types of citation attributes: coauthor names, the title of the paper, and the title of the journal or proceeding. We illustrate these two approaches on two types of data, one collected from the Web, mainly publication lists from homepages, the other collected from the DBLP citation databases. | 2004 |
| |  | Wildemuth, Barbara M. | The effects of domain knowledge on search tactic formulation read moreAbstract: A search tactic is a set of search moves that are temporally and semantically related. The current study examined the tactics of medical students searching a factual database in microbiology. The students answered problems and searched the database on three occasions over a 9-month period. Their search moves were analyzed in terms of the changes in search terms used from one cycle to the next, using two different analysis methods.Common patterns were found in the students' search tactics; the most common approach was the specification of a concept, followed by the addition of one or more concepts, gradually narrowing the retrieved set before it was displayed. It was also found that the search tactics changed over time as the students' domain knowledge changed. These results have important implications for designers in developing systems that will support users' preferred ways of formulating searches. In addition, the research methods used (the coding scheme and the two data analysis methods--zero-order state transition matrices and maximal repeating patterns [MRP] analysis) are discussed in terms of their validity in future studies of search tactics. | 2004 |
| |  | Bollacker, K. D. | Discovering relevant scientific literature on the Web read moreAbstract: Scientific literature on the Web makes up a massive, noisy, disorganized database. Unlike large, single-source databases such as a corporate customer database, the Web database draws from many sources, each with its own organization. Also, owing to its diversity, most records in this database are irrelevant to an individual researcher. Furthermore, the database is constantly growing in content and changing in organization. All these characteristics make the Web a difficult domain for knowledge discovery. To quickly and easily gather useful knowledge from such a database, users need the help of an information filtering system that automatically extracts only relevant records as they appear in a stream of incoming records. To this end, we have developed the CiteSeer. CiteSeer is an automatic generator of digital libraries of scientific literature. It uses sophisticated acquisition, parsing, and presentation methods to eliminate most of the manual effort of finding useful publications on the Web | 2000 |
| |  | Montebello, M. | Information overload-an IR problem? read moreAbstract: Information overload on the World Wide Web (WWW) is a well recognised problem. Research to subdue this problem and extract maximum benefit from the Internet is still in its infancy. With huge amounts of information connected to the Internet, efficient and effective discovery of resources and knowledge has become an imminent research issue. A vast array of network services is growing up around the Internet and a massive amount of information is added everyday. Despite the potential benefits of existing indexing, retrieving and searching techniques in assisting users in the browsing process, little has been done to ensure that the information presented is of a high recall and precision standard. Therefore, search for specific information on this massive and exploding information resource base becomes highly critical. The author discusses the issues involved in resolving the information overload over the WWW and argues that this is solely an information retrieval problem. As a contribution to the field he proposes a general architecture to subdue information overload and describes how this architecture has been instantiated in a functional system he developed | 1998 |
| |  | Schatz, Bruce R. | Information Retrieval in Digital Libraries: Bringing Search to the Net read moreAbstract: this article owes as much to Bush's
fame at the time (he had been director of
the Office of Scientific Research and Development,
coordinating all U.S. technology
efforts during the war) as to the actual
article itself | 1997 |
| |  | Mizzaro, Stefano M. | A Cognitive Analysis of Information Retrieval read moreAbstract: The lackness of a formal account is probably one of the most evident of the shortcomings of information retrieval : concepts like information, information need, and relevance are neither well understood nor formally defined. This paper sketches a cognitive framework that permits to analyze these three central concepts of the information retrieval scenario. The framework consists of concepts as cognitive agents acting in the world, knowledge states possessed by the cognitive agents, transitions among knowledge states, and inferences. On the basis of the framework, information is formally defined as a pair representing the difference between two knowledge states ; this definition permits to clarify the distinction among data, knowledge, and information and to discuss the subjectiveness of information. On this ground, the concept of information need is examined : it is defined, it is studied in the context of the interaction between an information retrieval system and a user, and the well known classification in verificative, conscious topical, and muddled needs is analyzed. On the basis of the above definitions of information and information need, relevance is formally defined, and some critical features of this concept are discussed. | 1996 |
| |  | Sheth, B. | Evolving agents for personalized information filtering read moreAbstract: Describes how techniques from artificial life can be used to evolve a population of personalized information filtering agents. The technique of artificial evolution and the technique of learning from feedback are combined to develop a semi-automated information filtering system which dynamically adapts to the changing interests of the user. Results of a set of experiments are presented in which a small population of information filtering agents was evolved to make a personalized selection of news articles from the USENET newsgroups. The results show that the artificial evolution component of the system is responsible for improving the recall rate of the selected set of articles, while learning from feedback component improves the precision rate | 1993 |