论文标题
使用文本检测和跟踪的抒情视频分析
Lyric Video Analysis Using Text Detection and Tracking
论文作者
论文摘要
我们试图在歌词视频中识别和跟踪抒情词。歌词视频是一个音乐视频,显示了歌曲的抒情词。歌词视频的主要特征是,歌词单词与音乐同步显示。识别和跟踪抒情词的困难是(1)单词通常被装饰和几何扭曲,(2)单词在视频框架中任意和急剧移动。本文的目的是分析抒情视频中的歌词单词的动作,这是自动抒情视频生成的第一步。为了分析歌词单词的运动,我们首先将最新的场景检测器和识别器应用于每个视频框架。然后,进行抒情框匹配以建立抒情词和帧之间的最佳对应关系。在固定了对应关系中单词单词的运动轨迹后,我们通过K-Medoids聚类和动态时间扭曲(DTW)分析了抒情词的轨迹。
We attempt to recognize and track lyric words in lyric videos. Lyric video is a music video showing the lyric words of a song. The main characteristic of lyric videos is that the lyric words are shown at frames synchronously with the music. The difficulty of recognizing and tracking the lyric words is that (1) the words are often decorated and geometrically distorted and (2) the words move arbitrarily and drastically in the video frame. The purpose of this paper is to analyze the motion of the lyric words in lyric videos, as the first step of automatic lyric video generation. In order to analyze the motion of lyric words, we first apply a state-of-the-art scene text detector and recognizer to each video frame. Then, lyric-frame matching is performed to establish the optimal correspondence between lyric words and the frames. After fixing the motion trajectories of individual lyric words from correspondence, we analyze the trajectories of the lyric words by k-medoids clustering and dynamic time warping (DTW).