SVD(Singular Value Decomposition,奇異值分解)
算法優缺點:
- 優點:簡化數據,去除噪聲,提高算法結果
- 缺點:數據的轉換可能難於理解
- 適用數據類型:數值型數據
算法思想:
很多情況下,數據的一小部分包含了數據的絕大部分信息,線性代數中有很多矩陣的分解技術可以將矩陣表示成新的易於處理的形式,不同的方法使用與不同的情況。最常見的就是SVD,SVD將數據分成三個矩陣U(mm),sigma(mn),VT(nn),這里得到的sigma是一個對角陣,其中對角元素為奇異值,並且它告訴了我們重要的特征。
這里的實現用的也是numpy種的函數linalg.svd()
實例:用SVD進行圖像壓縮
這里的數據集是前面用於手寫識別的一個數據,大小為32*32=1024像素,因為進行svd之后我們的數據變成一堆浮點數,所以輸出函數要改進一下,設置一個閥值(這個值的設置會影響顯示效果)。可以看出完成壓縮之后我們只需要兩個奇異值和U、VT兩個矩陣,共計64+64+2=130個像素,達到了近十倍壓縮比,而且還原出來的圖像基本不變
數據如下:

執行結果:
*********orignal matrix**************0000000000000011000000000000000000000000000011111100000000000000000000000001111111100000000000000000000000111111111100000000000000000000111111111111100000000000000000011111111111111100000000000000000011111111111111100000000000000000111111100001111100000000000000011111110000011111000000000000001111110000000011110000000000000011111100000000111110000000000000111111000000000111100000000000001111110000000001111000000000000001111110000000001111000000000000111111100000000011110000000000001111110000000000111100000000000001111100000000001111000000000000111111000000000011110000000000000111110000000000111100000000000001111100000000011111000000000000001111100000000011111000000000000011111000000000111110000000000000111110000000001111100000000000001111100000000111110000000000000011111000000011111100000000000000111111000001111110000000000000000111111111111111100000000000000000111111111111111000000000000000001111111111111110000000000000000001111111111110000000000000000000001111111111000000000000000000000000111111000000000000****reconstructed matrix using 3 singular values******0000000000000000000000000000000000000000000000000000000000000000000000000000101111100000000000000000000000001111111000000000000000000000000111111111000000000000000000000011111111111100000000000000000000111111111111000000000000000000011100000000111000000000000000001111000000001111000000000000000111110000000011110000000000000001111100000000011100000000000000011111000000000111000000000000000111110000000001110000000000000000111100000000001111000000000000001111000000000011110000000000000011110000000000111100000000000000111100000000001111000000000000001111000000000011110000000000000011110000000000111100000000000000111100000000001110000000000000001111000000000011110000000000000011110000000000111100000000000000111100000000001111000000000000001111000000000011110000000000000011110000000000111000000000000000111100000000111100000000000000000011111111111110000000000000000000111111111111000000000000000000001111111111110000000000000000000000111111111100000000000000000000001111111110000000000000000000000000000000000000000000
1 #coding=utf-8 2 from numpy import * 3 def printMat(inMat, thresh=0.8): 4 for i in range(32): 5 for j in range(32): 6 if float(inMat[i,j]) > thresh: 7 print 1, 8 else: 9 print 0, 10 print ' ' 11 12 def imgCompress(numSV=3, thresh=0.8): 13 myl = [] 14 for line in open('0_5.txt').readlines(): 15 newRow = [] 16 for i in range(32): 17 newRow.append(int(line[i])) 18 myl.append(newRow) 19 myMat = mat(myl) 20 print '*********orignal matrix**************' 21 printMat(myMat,thresh) 22 U, sigmal, VT = linalg.svd(myMat) 23 SigRecon =mat(zeros((numSV,numSV))) 24 for k in range(numSV): 25 SigRecon[k,k] = sigmal[k] 26 reconMat = U[:,:numSV] * SigRecon * VT[:numSV,:] 27 print "****reconstructed matrix using %d singular values******" % numSV 28 printMat(reconMat, thresh) 29 30 def main(): 31 imgCompress() 32 33 if __name__ == '__main__': 34 main()
