| | | | | |
Select: All | None | Toggle Preview: Open All | Close All |
2007
|
| |  | Telles, G. P. | Normalized compression distance for visual analysis of document collections read moreAbstract: In a world flooded by text of various sources, it is of strategic importance to find ways to map information present in written documents in a form that helps users locate and associate important information within a particular text data set. Content-based maps can support extremely useful explorations of text data sets. This paper proposes and evaluates the use of Kolmogorov complexity approximations as a means to detect similarity between general textual documents, in order to support mapping and visualization techniques for corpora exploration. The calculation of this similarity measure requires no intermediate representation of a corpus (such as vector representation) and therefore no pre-processing or parametrization steps. That makes it very attractive for a wider range of exploratory applications compared to conventional measures that need vector-based text representations. The visual layout used here is based on fast distance multi-dimensional projections. It is shown that the similarity measure and the resulting maps present very good precision and that the approach can be used successfully for visual analysis of automatically generated text maps. | 2007 |
| |  | Rodriguez, Marko A. | A practical ontology for the large-scale modeling of scholarly artifacts and their usage read moreAbstract: Sorry no abstract available for this article | 2007 |
| |  | | Collaborative structuring: organizing document repositories effectively and efficiently read moreAbstract: Sorry no abstract available for this article | 2007 |
| |  | Micarelli, Aless | Web Document Modeling read moreAbstract: A very common issue of adaptive Web-Based systems is the modeling of documents. Such documents represent domain-specific information for a number of purposes. Application areas such as Information Search, Focused Crawling and Content Adaptation (among many others) benefit from several techniques and approaches to model documents effectively. For example, a document usually needs preliminary processing in order to obtain the relevant information in an effective and useful format, so as to be automatically processed by the system. The objective of this chapter is to support other chapters, providing a basic overview of the most common and useful techniques and approaches related with document modeling. This chapter describes high-level techniques to model Web documents, such as the Vector Space Model and a number of AI approaches, such as Semantic Networks, Neural Networks and Bayesian Networks. This chapter is not meant to act as a substitute of more comprehensive discussions about the topics presented. Rather, it provides a brief and informal introduction to the main concepts of document modeling, also focusing on the systems that are presented in the rest of the book as concrete examples of the related concepts. | 2007 |
| |  | Collins, Linn M. | Information visualization and large-scale repositories read moreAbstract: Purpose - To describe how information visualization can be used in the design of interface tools for large-scale repositories. Design/methodology/approach - One challenge for designers in the context of large-scale repositories is to create interface tools that help users find specific information of interest. In order to be most effective, these tools need to leverage the cognitive characteristics of the target users. At the Los Alamos National Laboratory, the authors target users are scientists and engineers who can be characterized as higher-order, analytical thinkers. In this paper, the authors describe a visualization tool they have created for making the authors large-scale digital object repositories more usable for them: SearchGraph, which facilitates data set analysis by displaying search results in the form of a two- or three-dimensional interactive scatter plot. Findings - Using SearchGraph, users can view a condensed, abstract visualization of search results. They can view the same dataset from multiple perspectives by manipulating several display, sort, and filter options. Doing so allows them to see different patterns in the dataset. For example, they can apply a logarithmic transformation in order to create more scatter in a dense cluster of data points or they can apply filters in order to focus on a specific subset of data points. Originality/value - SearchGraph is a creative solution to the problem of how to design interface tools for large-scale repositories. It is particularly appropriate for the authors target users, who are scientists and engineers. It extends the work of the first two authors on ActiveGraph, a read-write digital library visualization tool. | 2007 |
2006
|
| |  | JIN, Wei | Knowledge Discovery across Documents through Concept Chain Queries read moreAbstract: This paper focuses on detecting links between two concepts across text documents (e.g. two persons). We interpret such a query as finding the most meaningful evidence trail across documents that connect these two concepts. Here we propose a fast and efficient algorithm to perform this task. It is based on the idea of hypothesis generation originated by Swanson called "complementary structures in disjoint literatures" (CSD). We adapted the technique by (i) developing an alternate method of generating semantic profiles and (ii) extending the technique to generate concept chains. Counterterrorism corpus is used to evaluate the performance of this approach and demonstrates the effectiveness of our algorithm. | 2006 |
| |  | Sahami, Mehran | A web-based kernel function for measuring the similarity of short text snippets read moreAbstract: Sorry no abstract available for this article | 2006 |
| |  | Plake, Conrad | ALIBABA: PubMed as a graph read moreAbstract: The biomedical literature contains a wealth of information on associations between many different types of objects, such as protein-protein interactions, gene-disease associations and subcellular locations of proteins. When searching such information using conventional search engines, e.g. PubMed, users see the data only one-abstract at a time and hidden in natural language text. ALIBABA is an interactive tool for graphical summarization of search results. It parses the set of abstracts that fit a PubMed query and presents extracted information on biomedical objects and their relationships as a graphical network. ALIBABA extracts associations between cells, diseases, drugs, proteins, species and tissues. Several filter options allow for a more focused search. Thus, researchers can grasp complex networks described in various articles at a glance. Availability: http://alibaba.informatik.hu-berlin.de/ Contact: hakenberg@informatik.hu-berlin.de 10.1093/bioinformatics/btl408 | 2006 |
2003
|
| |  | Weeber, Marc | Generating Hypotheses by Discovering Implicit Associations in the Literature: A Case Report of a Search for New Potential Therapeutic Uses for Thalidomide read moreAbstract: The availability of scientific bibliographies through online databases provides a rich source of information for scientists to support their research. However, the risk of this pervasive availability is that an individual researcher may fail to find relevant information that is outside the direct scope of interest. Following Swanson’s ABC model of disjoint but complementary structures in the biomedical literature, we have developed a discovery support tool to systematically analyze the scientific literature in order to generate novel and plausible hypotheses. In this case report, we employ the system to find potentially new target diseases for the drug thalidomide. We find solid bibliographic evidence suggesting that thalidomide might be useful for treating acute pancreatitis, chronic hepatitis C, Helicobacter pylori-induced gastritis, and myasthenia gravis. However, experimental and clinical evaluation is needed to validate these hypotheses and to assess the trade-off between therapeutic benefits and toxicities. | 2003 |
1999
|
| |  | Chakrabarti, Soumen | Topic Distillation and Spectral Filtering read moreAbstract: This paper discuss topic distillation, an information retrieval problemthat is emerging as a critical task for the www. Algorithms for this problemmust distill a small number of high-quality documents addressing a broadtopic from a large set of candidates.We give a review of the literature, and compare the problem with relatedtasks such as classification, clustering, and indexing. We then describe ageneral approach to topic distillation with applications to searching andpartitioning, based on the algebraic properties of matrices derived fromparticular documents within the corpus. Our method – which we call special filtering – combines the use of terms, hyperlinks and anchor-textto improve retrieval performance. We give results for broad-topic querieson the www, and also give some anecdotal results applying the sametechniques to US Supreme Court law cases, US patents, and a set of WallStreet Journal newspaper articles. | 1999 |
| |  | Boley, Daniel | Document Categorization and Query Generation on the World Wide WebUsing WebACE read moreAbstract: Sorry no abstract available for this article | 1999 |
1998
|
| |  | Han, Eui-Hong | WebACE: a Web agent for document categorization and exploration read moreAbstract: Sorry no abstract available for this article | 1998 |
| |  | Dumais, Susan | Inductive learning algorithms and representations for text categorization read moreAbstract: Sorry no abstract available for this article | 1998 |
1997
|
| |  | Allan, J. | Interactive Cluster Visualization for Information Retrieval read moreAbstract: This study investigates the ability of cluster visualization to help a user rapidly identify relevant documents. It provides added support for the truth of the Cluster Hypothesis on retrieved documents and shows that clustering of relevant documents is readily visible. The study then shows the visual effect of a technique similar to relevance feedback and shows how to enhance that effect to further help the user locate relevant material. A ranked list returned by a text search engine purports... | 1997 |
1992
|
| |  | Lin, X. | Visualization for the document space read moreAbstract: An information retrieval frame work that promotes graphical displays, and that will make documents in the computer visualizable to the searcher, is described. As examples of such graphical displays, two simulation results of using a Kohonen feature map to generate map displays for information retrieval are presented and discussed. The map displays are a mapping from a high-dimensional document space to a two-dimensional space. They show document relationships by various visual cues, such as dots, links, clusters, and areas, as well as their measurement and spatial arrangement. Using the map displays as an interface for document retrieval systems, the user is provided with richer visual information to support browsing and searching | 1992 |
1991
|
| |  | Swanson, Don R. | Complementary structures in disjoint science literatures read moreAbstract: Difficult and intriguing information retrieval (IR) problems
derive from what I call complementary but disjoint
(CBD) structures within the literature of science. Complementary
refers to the relationship between two separate
scientific arguments which, when combined, yield important
inferences and insights not apparent in the separate
arguments. Corresponding to the two arguments are two
complementary literatures. Each literature (ideally) is the
“complete” set of articles that contain the argument in
question. Disjoint literatures have no articles in common,
do not cite or mention each other, and are not co-cited. If
two complementary literatures are also disjoint, the possibility
is worth investigating that the combined arguments
and the inferences to which they lead might not be made
explicit anywhere within the published record of science.
The ever-increasing fragmentation of science into
mutually-isolated specialties probably assures a limitless
supply and combinatorial growth of implicit connections,
some of which may be unknown solutions to important
problems. These solutions are worth seeking.... | 1991 |