最近整理圖片發現,好多圖片都非常相似,於是寫如下代碼去刪除,有兩種方法:
注:第一種方法只對於連續圖片(例一個視頻里截下的圖片)准確率也較高,其效率高;第二種方法准確率高,但效率低
方法一:相鄰兩個文件比較相似度,相似就把第二個加到新列表里,然后進行新列表去重,統一刪除。
例如:有文件1-10,首先1和2相比較,若相似,則把2加入到新列表里,再接着2和3相比較,若不相似,則繼續進行3和4比較...一直比到最后,然后刪除新列表里的圖片
代碼如下:
#!/usr/bin/env python # -*- coding: utf-8 -*- # @Time : 2019/1/15 9:19 # @Author : xiaodai import os import cv2 from skimage.measure import compare_ssim # import shutil # def yidong(filename1,filename2): # shutil.move(filename1,filename2) def delete(filename1): os.remove(filename1) if __name__ == '__main__': path = r'D:\camera_pic\test\rec_pic' # save_path_img = r'E:\0115_test\rec_pic' # os.makedirs(save_path_img, exist_ok=True) img_path = path imgs_n = [] num = [] img_files = [os.path.join(rootdir, file) for rootdir, _, files in os.walk(path) for file in files if (file.endswith('.jpg'))] for currIndex, filename in enumerate(img_files): if not os.path.exists(img_files[currIndex]): print('not exist', img_files[currIndex]) break img = cv2.imread(img_files[currIndex]) img1 = cv2.imread(img_files[currIndex + 1]) ssim = compare_ssim(img, img1, multichannel=True) if ssim > 0.9: imgs_n.append(img_files[currIndex + 1]) print(img_files[currIndex], img_files[currIndex + 1], ssim) else: print('small_ssim',img_files[currIndex], img_files[currIndex + 1], ssim) currIndex += 1 if currIndex >= len(img_files)-1: break for image in imgs_n: # yidong(image, save_path_img) delete(image)
方法二:逐個去比較,若相似,則從原來列表刪除,添加到新列表里,若不相似,則繼續
例如:有文件1-10,首先1和2相比較,若相似,則把2在原列表刪除同時加入到新列表里,再接着1和3相比較,若不相似,則繼續進行1和4比較...一直比,到最后一個,再繼續,正常應該再從2開始比較,但2被刪除了,所以從3開始,繼續之前的操作,最后把新列表里的刪除。
代碼如下:
#!/usr/bin/env python # -*- coding: utf-8 -*- # @Time : 2019/1/16 12:03 # @Author : xiaodai import os import cv2 from skimage.measure import compare_ssim import shutil import datetime def yidong(filename1,filename2): shutil.move(filename1,filename2) def delete(filename1): os.remove(filename1) print('real_time:',now_now-now) if __name__ == '__main__': path = r'F:\temp\demo' # save_path_img = r'F:\temp\demo_save' # os.makedirs(save_path_img, exist_ok=True) for (root, dirs, files) in os.walk(path): for dirc in dirs: if dirc == 'rec_pic': pic_path = os.path.join(root, dirc) img_path = pic_path imgs_n = [] num = [] del_list = [] img_files = [os.path.join(rootdir, file) for rootdir, _, files in os.walk(img_path) for file in files if (file.endswith('.jpg'))] for currIndex, filename in enumerate(img_files): if not os.path.exists(img_files[currIndex]): print('not exist', img_files[currIndex]) break new_cur = 0 for i in range(10000000): currIndex1 =new_cur if currIndex1 >= len(img_files) - currIndex - 1: break else: size = os.path.getsize(img_files[currIndex1 + currIndex + 1]) if size < 512: # delete(img_files[currIndex + 1]) del_list.append(img_files.pop(currIndex1 + currIndex + 1)) else: img = cv2.imread(img_files[currIndex]) img = cv2.resize(img, (46, 46), interpolation=cv2.INTER_CUBIC) img1 = cv2.imread(img_files[currIndex1 + currIndex + 1]) img1 = cv2.resize(img1, (46, 46), interpolation=cv2.INTER_CUBIC) ssim = compare_ssim(img, img1, multichannel=True) if ssim > 0.9: # imgs_n.append(img_files[currIndex + 1]) print(img_files[currIndex], img_files[currIndex1 + currIndex + 1], ssim) del_list.append(img_files.pop(currIndex1 + currIndex + 1)) new_cur = currIndex1 else: new_cur = currIndex1 + 1 print('small_ssim',img_files[currIndex], img_files[currIndex1 + currIndex + 1], ssim) for image in del_list: # yidong(image, save_path_img) delete(image) print('delete',image)
如果有更好的方法,歡迎留言