| |  | Zhao, Huimin | Semantic matching across heterogeneous data sources read moreAbstract: Sorry no abstract available for this article | 2007 |
| |  | Specia, Lucia | Integrating Folksonomies with the Semantic Web read moreAbstract: While tags in collaborative tagging systems serve primarily an indexing purpose, facilitating search and navigation of resources, the use of the same tags by more than one individual can yield a collective classification schema. We present an approach for making explicit the semantics behind the tag space in social tagging systems, so that this collaborative organization can emerge in the form of groups of concepts and partial ontologies. This is achieved by using a combination of shallow pre-processing strategies and statistical techniques together with knowledge provided by ontologies available on the semantic web. Preliminary results on the del.icio.us and Flickr tag sets show that the approach is very promising: it generates clusters with highly related tags corresponding to concepts in ontologies and meaningful relationships among subsets of these tags can be identified. | 2007 |
| |  | Oliver, D. E. | Tools for loading MEDLINE into a local relational database. read moreAbstract: BACKGROUND: Researchers who use MEDLINE for text mining, information extraction, or natural language processing may benefit from having a copy of MEDLINE that they can manage locally. The National Library of Medicine (NLM) distributes MEDLINE in eXtensible Markup Language (XML)-formatted text files, but it is difficult to query MEDLINE in that format. We have developed software tools to parse the MEDLINE data files and load their contents into a relational database. Although the task is conceptually straightforward, the size and scope of MEDLINE make the task nontrivial. Given the increasing importance of text analysis in biology and medicine, we believe a local installation of MEDLINE will provide helpful computing infrastructure for researchers. RESULTS: We developed three software packages that parse and load MEDLINE, and ran each package to install separate instances of the MEDLINE database. For each installation, we collected data on loading time and disk-space utilization to provide examples of the process in different settings. Settings differed in terms of commercial database-management system (IBM DB2 or Oracle 9i), processor (Intel or Sun), programming language of installation software (Java or Perl), and methods employed in different versions of the software. The loading times for the three installations were 76 hours, 196 hours, and 132 hours, and disk-space utilization was 46.3 GB, 37.7 GB, and 31.6 GB, respectively. Loading times varied due to a variety of differences among the systems. Loading time also depended on whether data were written to intermediate files or not, and on whether input files were processed in sequence or in parallel. Disk-space utilization depended on the number of MEDLINE files processed, amount of indexing, and whether abstracts were stored as character large objects or truncated. CONCLUSIONS: Relational database (RDBMS) technology supports indexing and querying of very large datasets, and can accommodate a locally stored version of MEDLINE. RDBMS systems support a wide range of queries and facilitate certain tasks that are not directly supported by the application programming interface to PubMed. Because there is variation in hardware, software, and network infrastructures across sites, we cannot predict the exact time required for a user to load MEDLINE, but our results suggest that performance of the software is reasonable. Our database schemas and conversion software are publicly available at http://biotext.berkeley.edu. | 2004 |
| |  | schuh, Siegfried | On deep annotation read moreAbstract: Sorry no abstract available for this article | 2003 |
| |  | Cohen, S. | XSEarch: A semantic search engine for XML read moreAbstract: XSEarch, a semantic search engine for XML, is presented. XSEarch has a simple query language, suitable for a naive user. It returns semantically related document fragments that satisfy the users query. Query answers are ranked using extended information-retrieval techniques and are generated in an order similar to the ranking. Advanced indexing techniques were developed to facilitate e#cient implementation of XSEarch. The performance of the di#erent techniques as well as the recall... | 2003 |
| |  | Kutlu, G. | Support Tools for Visual Information Management read moreAbstract: Visual applications need to represent, manipulate, store, and retrieve both raw and processed visual data. Existing relational and object-oriented database systems fail to offer satisfactory visual data management support because they lack the kinds of representations, storage structures, indices, access methods, and query mechanisms needed for visual data. We argue that extensible visual object stores offer feasible and effective means to address the data management needs of visual... | 1996 |
| |  | Koperski, K. | Data mining methods for the analysis of large geographic databases read moreAbstract: this paper, a number of methods based on knowledge discovery techniques for large databases are presented. This methods may overcome some of the weaknesses of statistical analysis. Our study is focused on efficient method for mining strong spatial association rules in geographic information databases. A spatial association rule is a rule indicating certain association relationship among a set of spatial and possibly some non-spatial predicates. For example, a rule 80% of gas stations in rural... | 1996 |