動態時間規整算法


http://www.cnblogs.com/luxiaoxun/archive/2013/05/09/3069036.html

Dynamic Time Warping(DTW)是一種衡量兩個時間序列之間的相似度的方法,主要應用在語音識別領域來識別兩段語音是否表示同一個單詞。

1. DTW方法原理

在時間序列中,需要比較相似性的兩段時間序列的長度可能並不相等,在語音識別領域表現為不同人的語速不同。而且同一個單詞內的不同音素的發音速度也不同,比如有的人會把“A”這個音拖得很長,或者把“i”發的很短。另外,不同時間序列可能僅僅存在時間軸上的位移,亦即在還原位移的情況下,兩個時間序列是一致的。在這些復雜情況下,使用傳統的歐幾里得距離無法有效地求的兩個時間序列之間的距離(或者相似性)。

DTW通過把時間序列進行延伸和縮短,來計算兩個時間序列性之間的相似性:

如上圖所示,上下兩條實線代表兩個時間序列,時間序列之間的虛線代表兩個時間序列之間的相似的點。DTW使用所有這些相似點之間的距離的和,稱之為歸整路徑距離(Warp Path Distance)來衡量兩個時間序列之間的相似性。

2. DTW計算方法:

令要計算相似度的兩個時間序列為X和Y,長度分別為|X|和|Y|。

歸整路徑(Warp Path)

歸整路徑的形式為W=w1,w2,...,wK,其中Max(|X|,|Y|)<=K<=|X|+|Y|。

wk的形式為(i,j),其中i表示的是X中的i坐標,j表示的是Y中的j坐標。

歸整路徑W必須從w1=(1,1)開始,到wK=(|X|,|Y|)結尾,以保證X和Y中的每個坐標都在W中出現。

另外,W中w(i,j)的i和j必須是單調增加的,以保證圖1中的虛線不會相交,所謂單調增加是指:

最后要得到的歸整路徑是距離最短的一個歸整路徑:

最后求得的歸整路徑距離為D(|X|,|Y|),使用動態規划來進行求解:

上圖為代價矩陣(Cost Matrix) D,D(i,j)表示長度為i和j的兩個時間序列之間的歸整路徑距離。

3. DTW實現:

matlab代碼:

復制代碼
function dist = dtw(t,r)
n = size(t,1);
m = size(r,1);
% 幀匹配距離矩陣
d = zeros(n,m);
for i = 1:n
    for j = 1:m
        d(i,j) = sum((t(i,:)-r(j,:)).^2);
    end
end
% 累積距離矩陣
D = ones(n,m) * realmax;
D(1,1) = d(1,1);
% 動態規划
for i = 2:n
    for j = 1:m
        D1 = D(i-1,j);
        if j>1
            D2 = D(i-1,j-1);
        else
            D2 = realmax;
        end
        if j>2
            D3 = D(i-1,j-2);
        else
            D3 = realmax;
        end
        D(i,j) = d(i,j) + min([D1,D2,D3]);
    end
end
dist = D(n,m);
復制代碼

C++實現:

dtwrecoge.h

View Code

 dtwrecoge.cpp

View Code

C++代碼下載:DTW算法.rar

 

http://blog.csdn.net/vanezuo/article/details/5586727

動態時間規整 DTW
動態 時間規整DTW(dynamic time warping)曾經是語音識別的一種主流方法。
思想是:由於語音信號是一種具有相當大隨機性的信號,即使相同說話者對相同的詞,每一次發音的結果都是不同的,也不可能具有完全相同的時間長度。因此在與已存儲模型相匹配時,未知單詞的時間軸要不均勻地扭曲或彎折,以使其特征與模板特征對正。用時間規整手段對正是一種非常有力的措施,對提高系統的識別精度非常有效。
動態時間規整 DTW是一個典型的優化問題,它用滿足一定條件的的 時間規整函數W(n)描述輸入模板和參考模板的時間對應關系,求解兩模板匹配時累計距離最小所對應的規整函數。
 
™將時間規整與距離測度結合起來,采用動態規划技術,比較兩個大小不同的模式,解決語音識別中語速多變的難題;
™一種非線性時間規整模式匹配算法;
 
DTW ( Dynamic Time Warping ),即「動態時間扭曲」或是「動態時間規整」。這是一套根基於「動態規划」(Dynamic Programming,簡稱DP)的方法,可以有效地將搜尋比對的時間大幅降低。
DTW 的目標就是要找出兩個向量之間的最短距離。一般而言,對於兩個 n 維空間中的向量 x  y,它們之間的距離可以定義為兩點之間的直線距離,稱為尤拉距離Euclidean Distance)。
dist(xy) = |x – y
但是如果向量的長度不同,那它們之間的距離,就無法使用上述的數學式來計算。一般而言,假設這兩個向量的元素位置是代表時間,由於我們必須容忍在時間軸的偏差,因此我們不知道兩個向量的元素對應關系,因此我們必須靠着一套有效的運算方法,才可以找到最佳的對應
關系
 
 
動態規划算法總體思想
動態規划算法基本思想是將待求解問題分解成若干個子問題
但是經分解得到的子問題往往不是互相獨立的。不同子問題的數目常常只有多項式量級。求解時,有些子問題被重復計算了許多次。
如果能夠保存已解決的子問題的答案,而在需要時再找出已求得的答案,就可以避免大量重復計算,從而得到多項式時間算法。
 
動態規划基本步驟
v找出最優解的性質,並刻划其結構特征。
v遞歸地定義最優值。
v以自底向上的方式計算出最優值。
v根據計算最優值時得到的信息,構造最優解
 
https://en.wikipedia.org/wiki/Dynamic_time_warping

Dynamic time warping

From Wikipedia, the free encyclopedia
 
 
 
Dynamic Time Warping
Not to be confused with the Time Warp mechanism for discrete event simulation, or the Time Warp Operating System that used this mechanism.

In time series analysisdynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences which may vary in time or speed. For instance, similarities in walking patterns could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data which can be turned into a linear sequence can be analyzed with DTW. A well known application has been automaticspeech recognition, to cope with different speaking speeds. Other applications include speaker recognition and onlinesignature recognition. Also it is seen that it can be used in partial shape matching application.

In general, DTW is a method that calculates an optimal match between two given sequences (e.g. time series) with certain restrictions. The sequences are "warped" non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. This sequence alignment method is often used in time series classification. Although DTW measures a distance-like quantity between two given sequences, it doesn't guarantee the triangle inequality to hold.

 

 

Implementation[edit]

This example illustrates the implementation of the dynamic time warping algorithm when the two sequences s and t are strings of discrete symbols. For two symbols x and yd(x, y) is a distance between the symbols, e.g. d(x, y) = | x - y |

int DTWDistance(s: array [1..n], t: array [1..m]) {
    DTW := array [0..n, 0..m]

    for i := 1 to n
        DTW[i, 0] := infinity
    for i := 1 to m
        DTW[0, i] := infinity
    DTW[0, 0] := 0

    for i := 1 to n
        for j := 1 to m
            cost:= d(s[i], t[j])
            DTW[i, j] := cost + minimum(DTW[i-1, j  ],    // insertion
                                        DTW[i  , j-1],    // deletion
                                        DTW[i-1, j-1])    // match

    return DTW[n, m]
}

We sometimes want to add a locality constraint. That is, we require that if s[i] is matched with t[j], then | i - j | is no larger than w, a window parameter.

We can easily modify the above algorithm to add a locality constraint (differences marked in bold italic). However, the above given modification works only if | n - m | is no larger than w, i.e. the end point is within the window length from diagonal. In order to make the algorithm work, the window parameter w must be adapted so that | n - m | ≤ w (see the line marked with (*) in the code).

int DTWDistance(s: array [1..n], t: array [1..m], w: int) {
    DTW := array [0..n, 0..m]
 
    w := max(w, abs(n-m)) // adapt window size (*)
 
    for i := 0 to n
     for j:= 0 to m
     DTW[i, j] := infinity
    DTW[0, 0] := 0

    for i := 1 to n
        for j := max(1, i-w) to min(m, i+w)
            cost := d(s[i], t[j])
            DTW[i, j] := cost + minimum(DTW[i-1, j  ],    // insertion
                                        DTW[i, j-1],    // deletion
                                        DTW[i-1, j-1])    // match
 
    return DTW[n, m]

Fast computation[edit]

Computing the DTW requires O(N^2) in general. Fast techniques for computing DTW include SparseDTW[1] and the FastDTW.[2] A common task, retrieval of similar time series, can be accelerated by using lower bounds such as LB_Keogh[3] or LB_Improved.[4] In a survey, Wang et al. reported slightly better results with the LB_Improved lower bound than the LB_Keogh bound, and found that other techniques were inefficient.[5]

Average sequence[edit]

Averaging for Dynamic Time Warping is the problem of finding an average sequence for a set of sequences. The average sequence is the sequence that minimizes the sum of the squares to the set of objects. NLAAF[6] is the exact method for two sequences. For more than two sequences, the problem is related to the one of the Multiple alignment and requires heuristics. DBA[7] is currently the reference method to average a set of sequences consistently with DTW. COMASA[8] efficiently randomizes the search for the average sequence, using DBA as a local optimization process.

Supervised Learning[edit]

Nearest Neighbour Classifier can achieve state-of-the-art performance when using Dynamic Time Warping as a distance measure.[9]

Alternative approach[edit]

An alternative technique for DTW is based on functional data analysis, in which the time series are regarded as discretizations of smooth (differentiable) functions of time and therefore continuous mathematics is applied.[10] Optimal nonlinear time warping functions are computed by minimizing a measure of distance of the set of functions to their warped average. Roughness penalty terms for the warping functions may be added, e.g., by constraining the size of their curvature. The resultant warping functions are smooth, which facilitates further processing. This approach has been successfully applied to analyze patterns and variability of speech movements.[11][12]

Open Source software[edit]

  • The lbimproved C++ library implements Fast Nearest-Neighbor Retrieval algorithms under the GNU General Public License (GPL). It also provides a C++ implementation of Dynamic Time Warping as well as various lower bounds.
  • The FastDTW library is a Java implementation of DTW and a FastDTW implementation that provides optimal or near-optimal alignments with an O(N) time and memory complexity, in contrast to the O(N^2) requirement for the standard DTW algorithm. FastDTW uses a multilevel approach that recursively projects a solution from a coarser resolution and refines the projected solution..
  • FastDTW fork (Java) published to Maven Central
  • The R package dtw implements most known variants of the DTW algorithm family, including a variety of recursion rules (also called step patterns), constraints, and substring matching.
  • The mlpy Python library implements DTW.
  • The pydtw C++/Python library implements the Manhattan and Euclidean flavoured DTW measures including the LB_Keogh lower bounds.
  • What about the dtw python library?
  • The cudadtw C++/CUDA library implements subsequence alignment of Euclidean-flavoured DTW and z-normalized Euclidean Distance similar to the popular UCR-Suite on CUDA-enabled accelerators.
  • The JavaML machine learning library implements DTW.
  • The ndtw C# library implements DTW with various options.
  • Sketch-a-Char uses Greedy DTW (implemented in JavaScript) as part of LaTeX symbol classifier program.
  • The MatchBox implements DTW to match Mel-Frequency Cepstral Coefficients of audio signals.
  • Sequence averaging: a GPL Java implementation of DBA.[7]
  • C/Python library implements DTW with some variations(distance functions, step patterns and windows)

Applications[edit]

Spoken word recognition[edit]

Due to different speaking rates, a non-linear fluctuation occurs in speech pattern versus time axis which needs to be eliminated.[13] DP-matching, which is a pattern matching algorithm discussed in paper "Dynamic Programming Algorithm Optimization For Spoken Word Recognition" by Hiroaki Sakoe and Seibi Chiba, uses a time normalisation effect where the fluctuations in the time axis are modeled using a non-linear time-warping function. Considering any two speech patterns, we can get rid off their timing differences by warping the time axis of one so that the maximum coincidence in attained with the other. Moreover, if the warping function is allowed to take any possible value, very less distinction can be made between words belonging to different categories. So, to enhance the distinction between words belonging to different categories, restrictions were imposed on the warping function slope.

References[edit]

  1. Jump up^ Al-Naymat, G., Chawla, S., & Taheri, J. (2012). SparseDTW: A Novel Approach to Speed up Dynamic Time Warping
  2. Jump up^ Stan Salvador & Philip Chan, FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space. KDD Workshop on Mining Temporal and Sequential Data, pp. 70-80, 2004
  3. Jump up^ Keogh, E.; Ratanamahatana, C. A. (2005). "Exact indexing of dynamic time warping". Knowledge and Information Systems 7 (3): 358–386.doi:10.1007/s10115-004-0154-9.
  4. Jump up^ Lemire, D. (2009). "Faster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound"Pattern Recognition 42 (9): 2169–2180.doi:10.1016/j.patcog.2008.11.030.
  5. Jump up^ Wang, Xiaoyue; et al. "Experimental comparison of representation methods and distance measures for time series data". Data Mining and Knowledge Discovery 2010: 1–35.
  6. Jump up^ Gupta, L.; Molfese, D. L.; Tammana, R.; Simos, P. G. (1996). "Nonlinear alignment and averaging for estimating the evoked potential". IEEE Transactions on Biomedical Engineering 43 (4): 348–356. doi:10.1109/10.486255PMID 8626184.
  7. Jump up to:a b Petitjean, F. O.; Ketterlin, A.; Gançarski, P. (2011). "A global averaging method for dynamic time warping, with applications to clustering".Pattern Recognition 44 (3): 678. doi:10.1016/j.patcog.2010.09.013.
  8. Jump up^ Petitjean, F. O.; Gançarski, P. (2012). "Summarizing a set of time series by averaging: From Steiner sequence to compact multiple alignment".Theoretical Computer Science 414: 76. doi:10.1016/j.tcs.2011.09.029.
  9. Jump up^ Ding, Hui; Trajcevski, Goce; Scheuermann, Peter; Wang, Xiaoyue; Keogh, Eamonn (2008). "Querying and mining of time series data: experimental comparison of representations and distance measures". Proc. VLDB Endow 1 (2): 1542–1552. doi:10.14778/1454159.1454226.
  10. Jump up^ Lucero, J. C.; Munhall, K. G.; Gracco, V. G.; Ramsay, J. O. (1997). "On the Registration of Time and the Patterning of Speech Movements".Journal of Speech, Language, and Hearing Research 40: 1111–1117.
  11. Jump up^ Howell, P.; Anderson, A.; Lucero, J. C. (2010). "Speech motor timing and fluency". In Maassen, B.; van Lieshout, P. Speech Motor Control: New Developments in Basic and Applied Research. Oxford University Press. pp. 215–225. ISBN 978-0199235797.
  12. Jump up^ Koenig, Laura L.; Lucero, Jorge C.; Perlman, Elizabeth (2008). "Speech production variability in fricatives of children and adults: Results of functional data analysis"The Journal of the Acoustical Society of America 124 (5): 3158–3170. doi:10.1121/1.2981639ISSN 0001-4966.PMC 2677351PMID 19045800.
  13. Jump up^ Sakoe, Hiroaki; Chiba, Seibi. "Dynamic programming algorithm optimization for spoken word recognition". IEEE Transactions on Acoustics, Speech and Signal Processing 26 (1): 43–49. doi:10.1109/tassp.1978.1163055.

Further reading[edit]

  • Vintsyuk, T.K. (1968). "Speech discrimination by dynamic programming". Kibernetika 4: 81–88.
  • Sakoe, H.; Chiba (1978). "Dynamic programming algorithm optimization for spoken word recognition". IEEE Transactions on Acoustics, Speech and Signal Processing 26 (1): 43–49. doi:10.1109/tassp.1978.1163055.
  • C. S. Myers and L. R. Rabiner.
    A comparative study of several dynamic time-warping algorithms for connected word recognition.
    The Bell System Technical Journal, 60(7):1389-1409, September 1981.
  • L. R. Rabiner and B. Juang.
    Fundamentals of speech recognition.
    Prentice-Hall, Inc., 1993 (Chapter 4)
  • Muller, M., Information Retrieval for Music and Motion, Ch. 4 (available online athttp://www.springer.com/cda/content/document/cda_downloaddocument/9783540740476-1.pdf?SGWID=0-0-45-452103-p173751818), Springer, 2007, ISBN 978-3-540-74047-6
  • Rakthanmanon, Thanawin (September 2013). "Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping".ACM Transactions on Knowledge Discovery from Data 7 (3): 10:1–10:31. doi:10.1145/2510000/2500489.

See also[edit]


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM