使用python轉換編碼格式


之前有寫過一個使用powershell轉換文檔格式的方法,然而因為powershell支持不是很全,所以並不好用。這里使用python再做一個。

思路

檢測源碼格式,如果不是utf8,則進行轉換,否則跳過

代碼

import chardet
import sys
import codecs


def findEncoding(s):
    file = open(s, mode='rb')
    buf = file.read()
    result = chardet.detect(buf)
    file.close()
    return result['encoding']


def convertEncoding(s):
    encoding = findEncoding(s)
    if encoding != 'utf-8' and encoding != 'ascii':
        print("convert %s%s to utf-8" % (s, encoding))
        contents = ''
        with codecs.open(s, "r", encoding) as sourceFile:
            contents = sourceFile.read()

        with codecs.open(s, "w", "utf-8") as targetFile:
            targetFile.write(contents)

    else:
        print("%s encoding is %s ,there is no need to convert" % (s, encoding))


if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("error filename")
    else:
        convertEncoding(sys.argv[1])

實際測試,可以成功轉換。

知識點

  1. chardet,這個模塊是用來檢測編碼格式的。檢測完成之后返回一個dict類型。dict的key又兩個,一個是encode,一個是confidence,參數函數顧名思義。
  2. with as 這個語法很好用,特別是在打開文件的時候,可以處理忘記關閉文件導致文件一直被占用等異常。

批量轉換

import chardet
import sys
import codecs
import os


def findEncoding(s):
    file = open(s, mode='rb')
    buf = file.read()
    result = chardet.detect(buf)
    file.close()
    return result['encoding']


def convertEncoding(s):
    if  os.access(s,os.W_OK):
        encoding = findEncoding(s)
        if encoding != 'utf-8' and encoding != 'ascii':
            print("convert %s%s to utf-8" % (s, encoding))
            contents = ''
            with codecs.open(s, "r", encoding) as sourceFile:
                contents = sourceFile.read()

            with codecs.open(s, "w", "utf-8") as targetFile:
                targetFile.write(contents)

        else:
            print("%s encoding is %s ,there is no need to convert" % (s, encoding))
    else:
        print("%s read only" %s)


def getAllFile(path, suffix='.'):
    "recursive is enable"
    f = os.walk(path)
    fpath = []

    for root, dir, fname in f:
        for name in fname:
            if name.endswith(suffix):
                fpath.append(os.path.join(root, name))

    return fpath


def convertAll(path):
    fclist = getAllFile(path, ".c")
    fhlist = getAllFile(path, ".h")
    flist = fclist + fhlist
    for fname in flist:
        convertEncoding(fname)


if __name__ == "__main__":
    path = ''
    if len(sys.argv) == 1:
        path = os.getcwd()

    elif len(sys.argv) == 2:
        path = sys.argv[1]
    else:
        print("error parameter")
        exit()

    convertAll(path)

可以指定目錄,也可以在當前目錄下用,遞歸遍歷。

知識點

  1. os.walk,遍歷所有文件
  2. os.access,檢查文件屬性


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM