SVD(Singular Value Decomposition,奇異值分解)
算法優缺點:
- 優點:簡化數據,去除噪聲,提高算法結果
- 缺點:數據的轉換可能難於理解
- 適用數據類型:數值型數據
算法思想:
很多情況下,數據的一小部分包含了數據的絕大部分信息,線性代數中有很多矩陣的分解技術可以將矩陣表示成新的易於處理的形式,不同的方法使用與不同的情況。最常見的就是SVD,SVD將數據分成三個矩陣U(mm),sigma(mn),VT(nn),這里得到的sigma是一個對角陣,其中對角元素為奇異值,並且它告訴了我們重要的特征。
這里的實現用的也是numpy種的函數linalg.svd()
實例:用SVD進行圖像壓縮
這里的數據集是前面用於手寫識別的一個數據,大小為32*32=1024像素,因為進行svd之后我們的數據變成一堆浮點數,所以輸出函數要改進一下,設置一個閥值(這個值的設置會影響顯示效果)。可以看出完成壓縮之后我們只需要兩個奇異值和U、VT兩個矩陣,共計64+64+2=130個像素,達到了近十倍壓縮比,而且還原出來的圖像基本不變
數據如下:
執行結果:
*********orignal matrix**************
00000000000000110000000000000000
00000000000011111100000000000000
00000000000111111110000000000000
00000000001111111111000000000000
00000000111111111111100000000000
00000001111111111111110000000000
00000000111111111111111000000000
00000000111111100001111100000000
00000001111111000001111100000000
00000011111100000000111100000000
00000011111100000000111110000000
00000011111100000000011110000000
00000011111100000000011110000000
00000001111110000000001111000000
00000011111110000000001111000000
00000011111100000000001111000000
00000001111100000000001111000000
00000011111100000000001111000000
00000001111100000000001111000000
00000001111100000000011111000000
00000000111110000000001111100000
00000000111110000000001111100000
00000000111110000000001111100000
00000000111110000000011111000000
00000000111110000000111111000000
00000000111111000001111110000000
00000000011111111111111110000000
00000000001111111111111110000000
00000000001111111111111110000000
00000000000111111111111000000000
00000000000011111111110000000000
00000000000000111111000000000000
****reconstructed matrix using 3 singular values******
00000000000000000000000000000000
00000000000000000000000000000000
00000000000010111110000000000000
00000000000011111110000000000000
00000000000111111111000000000000
00000000001111111111110000000000
00000000001111111111110000000000
00000000011100000000111000000000
00000000111100000000111100000000
00000001111100000000111100000000
00000001111100000000011100000000
00000001111100000000011100000000
00000001111100000000011100000000
00000000111100000000001111000000
00000000111100000000001111000000
00000000111100000000001111000000
00000000111100000000001111000000
00000000111100000000001111000000
00000000111100000000001111000000
00000000111100000000001110000000
00000000111100000000001111000000
00000000111100000000001111000000
00000000111100000000001111000000
00000000111100000000001111000000
00000000111100000000001110000000
00000000111100000000111100000000
00000000001111111111111000000000
00000000001111111111110000000000
00000000001111111111110000000000
00000000000011111111110000000000
00000000000011111111100000000000
00000000000000000000000000000000
1 #coding=utf-8 2 from numpy import * 3 def printMat(inMat, thresh=0.8): 4 for i in range(32): 5 for j in range(32): 6 if float(inMat[i,j]) > thresh: 7 print 1, 8 else: 9 print 0, 10 print ' ' 11 12 def imgCompress(numSV=3, thresh=0.8): 13 myl = [] 14 for line in open('0_5.txt').readlines(): 15 newRow = [] 16 for i in range(32): 17 newRow.append(int(line[i])) 18 myl.append(newRow) 19 myMat = mat(myl) 20 print '*********orignal matrix**************' 21 printMat(myMat,thresh) 22 U, sigmal, VT = linalg.svd(myMat) 23 SigRecon =mat(zeros((numSV,numSV))) 24 for k in range(numSV): 25 SigRecon[k,k] = sigmal[k] 26 reconMat = U[:,:numSV] * SigRecon * VT[:numSV,:] 27 print "****reconstructed matrix using %d singular values******" % numSV 28 printMat(reconMat, thresh) 29 30 def main(): 31 imgCompress() 32 33 if __name__ == '__main__': 34 main()