AutoShape：时间序列集群的自动编码器窗帘方法

论文标题

AutoShape：时间序列集群的自动编码器窗帘方法

AUTOSHAPE: An Autoencoder-Shapelet Approach for Time Series Clustering

论文作者

Li, Guozhong, Choi, Byron, Xu, Jianliang, Bhowmick, Sourav S, Mah, Daphne Ngar-yin, Wong, Grace Lai-Hung

论文摘要

时间序列形状是歧视性子序列，最近发现对时间序列聚类有效（TSC）。形状方便地解释簇。因此，TSC的主要挑战是发现高质量的可变长度形状以区分不同的簇。在本文中，我们提出了一种新型的自动编码器窗帘方法（AutoShape），这是第一次利用自动编码器和塑形器以不受监督的方式确定形状的研究。自动编码器的专门设计用于学习高质量的形状。更具体地说，为了指导潜在的表示学习，我们采用了最新的自我监督损失来学习不同变量的可变长度塑形塑形（时间序列子序列）的统一嵌入，并提出多样性损失以选择统一空间中的歧视嵌入。我们介绍了重建损失，以在原始时间序列空间中恢复形状，以进行聚类。最后，我们采用Davies Bouldin指数（DBI），将学习过程中的聚类性能告知AutoShape。我们介绍了有关自动赛的广泛实验。为了评估单变量时间序列（UTS）的聚类性能，我们将AutoShape与使用UCR存档数据集的15种代表性方法进行了比较。为了研究多元时间序列（MTS）的性能，我们使用5种竞争方法评估了30个UEA档案数据集的AutoShape。结果证明了AutoShape是所有比较的方法中最好的。我们用形状来解释簇，并可以在两个UTS案例研究和一个MTS案例研究中获得有关簇的有趣直觉。

Time series shapelets are discriminative subsequences that have been recently found effective for time series clustering (TSC). The shapelets are convenient for interpreting the clusters. Thus, the main challenge for TSC is to discover high-quality variable-length shapelets to discriminate different clusters. In this paper, we propose a novel autoencoder-shapelet approach (AUTOSHAPE), which is the first study to take the advantage of both autoencoder and shapelet for determining shapelets in an unsupervised manner. An autoencoder is specially designed to learn high-quality shapelets. More specifically, for guiding the latent representation learning, we employ the latest self-supervised loss to learn the unified embeddings for variable-length shapelet candidates (time series subsequences) of different variables, and propose the diversity loss to select the discriminating embeddings in the unified space. We introduce the reconstruction loss to recover shapelets in the original time series space for clustering. Finally, we adopt Davies Bouldin index (DBI) to inform AUTOSHAPE of the clustering performance during learning. We present extensive experiments on AUTOSHAPE. To evaluate the clustering performance on univariate time series (UTS), we compare AUTOSHAPE with 15 representative methods using UCR archive datasets. To study the performance of multivariate time series (MTS), we evaluate AUTOSHAPE on 30 UEA archive datasets with 5 competitive methods. The results validate that AUTOSHAPE is the best among all the methods compared. We interpret clusters with shapelets, and can obtain interesting intuitions about clusters in two UTS case studies and one MTS case study, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题