| | | | | |
Select: All | None | Toggle Preview: Open All | Close All |
2007
|
| |  | Porter, Michael D. | Detecting local regions of change in high-dimensional criminal or terrorist point processes read moreAbstract: A method is presented for detecting changes to the distribution of a criminal or terrorist point process between two time periods using a non-model-based approach. By treating the criminal/terrorist point process as an intelligent site selection problem, changes to the process can signify changes in the behavior or activity level of the criminals/terrorists. The locations of past events and an associated vector of geographic, environmental, and socio-economic feature values are employed in the analysis. By modeling the locations of events in each time period as a marked point process, we can then detect differences in the intensity of each component process. A modified PRIM (patient rule induction method) is implemented to partition the high-dimensional feature space, which can include mixed variables, into the most likely change regions. Monte Carlo simulations are easily and quickly generated under random relabeling to test a scan statistic for significance. By detecting local regions of change, not only can it be determined if change has occurred in the study area, but the specific spatial regions where change occurs is also identified. An example is provided of breaking and entering crimes over two-time periods to demonstrate the use of this technique for detecting local regions of change. This methodology also applies to detecting regions of differences between two types of events such as in case-control data. | 2007 |
| |  | Congdon, Peter | Mixtures of spatial and unstructured effects for spatially discontinuous health outcomes read moreAbstract: Mixture models are used for spatially adaptive smoothing of health event data (e.g. mortality or illness totals). Such models allow for spatial pooling of strength where appropriate but adopt a mixture strategy that also reflects health risks that are discordant with those of surrounding areas. Mixing is either discrete or based on beta densities. A fully Bayesian estimation and specification strategy is applied with fit based on DIC and BIC criteria. Illustrative applications are to long term illness in 133 London small areas, where event counts are large, and to lip cancer in Scottish counties where the majority of event totals are under 10. | 2007 |
| |  | Auer, S. | What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content read moreAbstract: Wikis are established means for the collaborative authoring, versioning and publishing of textual articles. The Wikipedia project, for example, succeeded in creating the by far largest encyclopedia just on the basis of a wiki. Recently, several approaches have been proposed on how to extend wikis to allow the creation of structured and semantically enriched content. However, the means for creating semantically enriched structured content are already available and are, although unconsciously, even used by Wikipedia authors. In this article, we present a method for revealing this structured content by extracting information from template instances. We suggest ways to efficiently query the vast amount of extracted information (e.g. more than 8 million RDF statements for the English Wikipedia version alone), leading to astonishing query answering possibilities (such as for the title question). We analyze the quality of the extracted content, and propose strategies for quality improvements with just minor modifications of the wiki systems being currently used. | 2007 |
| |  | Ghoniem, Mohammad | NewsLab: Exploratory Broadcast News Video Analysis read moreAbstract: Sorry no abstract available for this article | 2007 |
| |  | Ozonoff, Al | Effect of spatial resolution on cluster detection: a simulation study read moreAbstract: Sorry no abstract available for this article | 2007 |
| |  | Frey, Brendan J. | Clustering by Passing Messages Between Data Points. read moreAbstract: Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such exemplars can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this only works well if that initial choice is close to a good solution. We describe a new method called affinity propagation, which takes as input measures of similarity between pairs of data points. Real-valued messages are exchanged between data points until a high-quality set of exemplars and corresponding clusters gradually emerges. We used affinity propagation to cluster images of faces, detect genes in microarray data, identify representative sentences in this manuscript and identify cities that are efficiently accessed by airline travel. Affinity propagation found clusters with much lower error than those found by other methods, and it did so in less than one-hundredth the amount of time. | 2007 |
| |  | Giacomo, E. | Graph Visualization Techniques for Web Clustering Engines read moreAbstract: One of the most challenging issues in mining information from the World Wide Web is the design of systems that present the data to the end user by clustering them into meaningful semantic categories. We show that the analysis of the results of a clustering engine can significantly take advantage of enhanced graph drawing and visualization techniques. We propose a graph-based user interface for Web clustering engines that makes it possible for the user to explore and visualize the different semantic categories and their relationships at the desired level of detail | 2007 |
| |  | Al-Khalifa, Hend S. | Towards better understanding of folksonomic patterns read moreAbstract: Sorry no abstract available for this article | 2007 |
| |  | Caldarelli, Guido | Folksonomies and clustering in the collaborative system CiteULike read moreAbstract: We analyze CiteULike, an online collaborative tagging system where users bookmark and annotate scientific papers. Such a system can be naturally represented as a tripartite graph whose nodes represent papers, users and tags connected by individual tag assignments. The semantics of tags is studied here, in order to uncover the hidden relationships between tags. We find that the clustering coefficient reflects the semantical patterns among tags, providing useful ideas for the designing of more efficient methods of data classification and spam detection. | 2007 |
| |  | Segaran, Toby | Programming Collective Intelligence: Building Smart Web 2.0 Applications read moreAbstract: Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once youve found it. Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains: - Collaborative filtering techniques that enable online retailers to recommend products or media
- Methods of clustering to detect groups of similar items in a large dataset
- Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm
- Optimization algorithms that search millions of possible solutions to a problem and choose the best one
- Bayesian filtering, used in spam filters for classifying documents based on word types and other features
- Using decision trees not only to make predictions, but to model the way decisions are made
- Predicting numerical values rather than classifications to build price models
- Support vector machines to match people in online dating sites
- Non-negative matrix factorization to find the independent features in a dataset
- Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a game
Each chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details. -- Dan Russell, Google Tobys book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today. If I had this book two years ago, it would have saved precious time going down some fruitless paths. -- Tim Wolters, CTO, Collective Intellect | 2007 |
| |  | Lian, Min | Using geographic information systems and spatial and space-time scan statistics for a population-based risk analysis of the 2002 equine West Nile epidemic in six contiguous regions of Texas read moreAbstract: Sorry no abstract available for this article | 2007 |
| |  | Ali, Mohammad | Spatial risk for gender-specific adult mortality in an area of southern China read moreAbstract: Sorry no abstract available for this article | 2007 |
2005
|
| |  | Freeman, Hp | Excess Cervical Cancer Mortality A Marker for Low Access to Health Care in Poor Communities read moreAbstract: Sorry no abstract available for this article | 2005 |
2004
|
| |  | Abello, James | Matrix Zoom: A Visual Interface to Semi-external Graphs read moreAbstract: In web data, telecommunications traffic and in epidemiological studies, dense subgraphs correspond to subsets of subjects (i.e. users, patients) that share a collection of attributes values (i.e. accessed web pages, email-calling patterns or disease diagnostic profiles). Visual and computational identification of these "clusters" becomes useful when domain experts desire to determine those factors of major influence in the formation of access and communication clusters or in the detection and contention of disease spread. With the current increases in graphic hardware capabilities and RAM sizes, it is more useful to relate graph sizes to the available screen real estate S and the amount of available RAM M, instead of the number of edges or nodes in the graph. We offer a visual interface that is parameterized by M and S and is particularly suited for navigation tasks that require the identification of subgraphs whose edge density is above certain threshold. This is achieved by providing a zoomable matrix view of the underlying data. This view is strongly coupled to a hierarchical view of the essential information elements present in the data domain. We illustrate the applicability of this work to the visual navigation of cancer incidence data and to an aggregated sample of phone call traffic. | 2004 |
| |  | Naaman, M. | Automatic organization for digital photographs with geographic coordinates read moreAbstract: We describe PhotoCompas, a system that utilizes the time and location information embedded in digital photographs to automatically organize a personal photo collection. PhotoCompas produces browseable location and event hierarchies for the collection. These hierarchies are created using algorithms that interleave time and location to produce an organization that mimics the way people think about their photo collections. In addition, the algorithm annotates the generated hierarchy with geographical names. We tested our approach in case studies of three real-world collections and verified that the results are meaningful and useful for the collection owners. | 2004 |
2003
|
| |  | | Power comparisons for disease clustering tests read moreAbstract: Sorry no abstract available for this article | 2003 |
1997
|
| |  | Allan, J. | Interactive Cluster Visualization for Information Retrieval read moreAbstract: This study investigates the ability of cluster visualization to help a user rapidly identify relevant documents. It provides added support for the truth of the Cluster Hypothesis on retrieved documents and shows that clustering of relevant documents is readily visible. The study then shows the visual effect of a technique similar to relevance feedback and shows how to enhance that effect to further help the user locate relevant material. A ranked list returned by a text search engine purports... | 1997 |
1991
|
| |  | Besag, Julian | The detection of clusters in rare diseases read moreAbstract: Sorry no abstract available for this article | 1991 |
| |  | Bouman, C. | Multiple Resolution Segmentation of Textured Images read moreAbstract: Sorry no abstract available for this article | 1991 |
| |  | Matthews, G. | Clustering Without a Metric read moreAbstract: Sorry no abstract available for this article | 1991 |