Semiparametric temporal clustering
Item
-
Title
-
Semiparametric temporal clustering
-
Identifier
-
d_2009_2013:3a1cf3f7fed7:12001
-
identifier
-
12651
-
Creator
-
Tamang, Suzanne,
-
Contributor
-
Simon Parsons
-
Date
-
2013
-
Language
-
English
-
Publisher
-
City University of New York.
-
Subject
-
Computer science | clustering | data mining | electronic health records | temporal analysis | unsupervised learning
-
Abstract
-
Although temporal data provides critical context for many real-world reasoning tasks, incorporating the temporal dimension into an analysis can present methodological challenges. Traditional methods from statistics are limited in their ability to process noisy, large-scale secondary data sources. Data mining approaches are better suited for these types of problems, but have primarily focused on static data sets. However, few real-world data sets are static, or measure stationary phenomena; rather, they are dynamic. To facilitate the meaningful use of abundant, unlabeled temporal data, I develop a new temporal clustering method that can assist in the preprocessing, exploration, and discovery of new knowledge from secondary data sources that are subject to arbitrary sampling schemes, and contain observation sequences of different durations. My approach builds on the semiparametric time series clustering framework, which has demonstrated clear benefits over fully parametric, or fully non-parametric methods. The framework combines beneficial parametric assumptions, such as the Markov or hidden-state assumption, to model temporal systems, with a more agnostic, nonparametric approach for clustering the embedded models. Using digital health data as a case study, I broaden the range of scenarios for which semiparametric clustering can be successfully applied. Specifically, I develop a method to use a state-of-the-art continuous-time Bayesian network to more naturally represent temporal information, addressing limitations of discrete- time methods. Also, as an alternative to spectral methods I pair model-based abstraction with a nonparametric Bayesian clustering technique that allows k to be expressed as a function of the size and complexity of the patient population, avoiding the requirement to prespecify the number of clusters using a heuristic. To demonstrate the ability of this approach to produce meaningful results, clusters are evaluated using intrinsic and extrinsic validation. In addition, I compare cluster assignments with that of temporal clustering systems reported in the research literature, showing a 20% relative improvement over the best system's performance and recognizable differences among the patient clusters that are detected.
-
Type
-
dissertation
-
Source
-
2009_2013.csv
-
degree
-
Ph.D.
-
Program
-
Computer Science