Unicode 范圍以及python中生成所有Unicode的方法

本文轉載自查看原文 2017-11-02 14:22 1728 Python / Python Web Development

Unicode范圍和表示語言

Unicode是一個通用的字符集，包含了65535個字符。計算機在處理特殊字符（除了ASCII表以外的所有字符）時都是把Unicode按照一種編碼來保存的。當然了，unicode的統一花了不少人的精力，而且不同編碼到今天還有一些不兼容的問題，不過平常的代碼中了解一些基礎也就夠了。

Unicode字符表示語言的范圍參考下文：

http://www.cnblogs.com/chenwenbiao/archive/2011/08/17/2142718.html

中文（包括日文韓文同用）的范圍：

Python生成所有Unicode

python2 版本：

def print_unicode(start, end):
    with open('unicode_set.txt', 'w') as f:
        loc_start = start
        ct = 0
        while loc_start <= end:
            try:
                ustr = hex(loc_start)[2:]
                od = (4 - len(ustr)) * '0' + ustr # 前補0
                ustr = unichr(loc_start) #'\u' + od
                index = loc_start - start + 1
                f.write(str(index) + '\t' + '0x' + od + '\t' + ustr.encode('utf-8', 'ignore'))
                loc_start = loc_start + 1
            except Exception as e:
                traceback.print_exc()
                loc_start += 1
                print(loc_start)

由於python3對編碼的處理方式變化（str和unicode合並，去掉unicode關鍵字；bytes替代python2的str），上述代碼python2不能使用

python3版本如下

import traceback
def print_unicode3(start, end):
    #'wb' must be set, or f.write(str) will report error
    with open('unicode_set.txt', 'wb') as f:
        loc_start = start
        ct = 0
        while loc_start <= end:
            try:
                tmpstr = hex(loc_start)[2:]
                od = (4 - len(tmpstr)) * '0' + tmpstr # 前補0
                ustr = chr(loc_start) #
                index = loc_start - start + 1
                line = (str(index) + '\t' + '0x' + od + '\t' + ustr + '\r\n').encode('utf-8')
                f.write(line)
                loc_start = loc_start + 1
            except Exception as e:
                traceback.print_exc()
                loc_start += 1
                print(loc_start)

def expect_test(expected, actual):
    if expected != actual:
        print('expected ', expected, 'actual', actual)

# 測試：
print_unicode3(0x4e00, 0x9fbf)
expect_test('在', '\u5728')

生成結果

中文

可以看到有些是不能顯示的。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 關於python中生成器之Send方法 VS中生成GUID的方法 sqlsever中生成GUID的方法 PHP中生成json信息的方法 Java中生成隨機字符的方法總結 .NET中生成水印更好的方法 Python中生成隨機數 python中生成器generator echart 圖表在.net中生成圖片的方法 python中生成器對象和return 還有循環的區別

Unicode 范圍以及python中生成所有Unicode的方法

Unicode范圍和表示語言

Python生成 所有Unicode

生成結果

免責聲明！

Python生成所有Unicode