Python sqlite3模塊的text_factory屬性的使用方法研究

本文轉載自查看原文 2015-05-14 16:26 3183

寫這篇文章，起源於要寫一個腳本批量把CSV文件（文件采用GBK或utf-8編碼）寫入到sqlite數據庫里。

Python版本：2.7.9

sqlite3模塊提供了con = sqlite3.connect("D:\\text_factory.db3") 這樣的方法來創建數據庫（當文件不存在時，新建庫），數據庫默認編碼為UTF-8，支持使用特殊sql語句設置編碼

PRAGMA encoding = "UTF-8";
PRAGMA encoding = "UTF-16";
PRAGMA encoding = "UTF-16le";
PRAGMA encoding = "UTF-16be";　　　　

但設置編碼必須在main庫之前，否則無法更改。 https://www.sqlite.org/pragma.html#pragma_encoding

認識text_factory屬性，大家應該都是通過以下錯誤知曉的：

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

大意是推薦你把字符串入庫之前轉成unicode string，你要用bytestring字節型字符串（如ascii ，gbk，utf-8），需要加一條語句text_factory = str。

Python擁有兩種字符串類型。標准字符串是單字節字符序列，允許包含二進制數據和嵌入的null字符。 Unicode 字符串是雙字節字符序列，一個字符使用兩個字節來保存，因此可以有最多65536種不同的unicode字符。盡管最新的Unicode標准支持最多100萬個不同的字符，Python現在尚未支持這個最新的標准。

默認text_factory = unicode，原以為這unicode、str是函數指針，但貌似不是，是<type 'unicode'>和<type 'str'>

下面寫了一段測試驗證代碼：

 1 # -*- coding: utf-8 -*-
 2 import sqlite3
 3 '''
 4 GBK   UNIC  UTF-8
 5 B8A3  798F  E7 A6 8F  福
 6 D6DD  5DDE  E5 B7 9E  州
 7 '''
 8 
 9 con = sqlite3.connect(":memory:")
10 # con = sqlite3.connect("D:\\text_factory1.db3")
11 # con.executescript('PRAGMA encoding = "UTF-16";')
12 cur = con.cursor()
13 
14 a_text      = "Fu Zhou"
15 gb_text     = "\xB8\xA3\xD6\xDD"
16 utf8_text   = "\xE7\xA6\x8F\xE5\xB7\x9E"
17 unicode_text= u"\u798F\u5DDE"
18 
19 print 'Part 1: con.text_factory=str'
20 con.text_factory = str
21 print type(con.text_factory)
22 cur.execute("CREATE TABLE table1 (city);")
23 cur.execute("INSERT INTO table1 (city) VALUES (?);",(a_text,))
24 cur.execute("INSERT INTO table1 (city) VALUES (?);",(gb_text,))
25 cur.execute("INSERT INTO table1 (city) VALUES (?);",(utf8_text,))
26 cur.execute("INSERT INTO table1 (city) VALUES (?);",(unicode_text,))
27 cur.execute("select city from table1")
28 res = cur.fetchall()
29 print "--  result: %s"%(res)
30 
31 print 'Part 2: con.text_factory=unicode'
32 con.text_factory = unicode
33 print type(con.text_factory)
34 cur.execute("CREATE TABLE table2 (city);")
35 cur.execute("INSERT INTO table2 (city) VALUES (?);",(a_text,))
36 # cur.execute("INSERT INTO table2 (city) VALUES (?);",(gb_text,))
37 # cur.execute("INSERT INTO table2 (city) VALUES (?);",(utf8_text,))
38 cur.execute("INSERT INTO table2 (city) VALUES (?);",(unicode_text,))
39 cur.execute("select city from table2")
40 res = cur.fetchall()
41 print "--  result: %s"%(res)
42 
43 print 'Part 3: OptimizedUnicode'
44 con.text_factory = str
45 cur.execute("CREATE TABLE table3 (city);")
46 cur.execute("INSERT INTO table3 (city) VALUES (?);",(a_text,))
47 #cur.execute("INSERT INTO table3 (city) VALUES (?);",(gb_text,))
48 cur.execute("INSERT INTO table3 (city) VALUES (?);",(utf8_text,))
49 cur.execute("INSERT INTO table3 (city) VALUES (?);",(unicode_text,))
50 con.text_factory = sqlite3.OptimizedUnicode
51 print type(con.text_factory)
52 cur.execute("select city from table3")
53 res = cur.fetchall()
54 print "--  result: %s"%(res)
55 
56 print 'Part 4: custom fuction'
57 con.text_factory = lambda x: unicode(x, "gbk", "ignore")
58 print type(con.text_factory)
59 cur.execute("CREATE TABLE table4 (city);")
60 cur.execute("INSERT INTO table4 (city) VALUES (?);",(a_text,))
61 cur.execute("INSERT INTO table4 (city) VALUES (?);",(gb_text,))
62 cur.execute("INSERT INTO table4 (city) VALUES (?);",(utf8_text,))
63 cur.execute("INSERT INTO table4 (city) VALUES (?);",(unicode_text,))
64 cur.execute("select city from table4")
65 res = cur.fetchall()
66 print "--  result: %s"%(res)

打印結果：

Part 1: con.text_factory=str
<type 'type'>
--  result: [('Fu Zhou',), ('\xb8\xa3\xd6\xdd',), ('\xe7\xa6\x8f\xe5\xb7\x9e',), ('\xe7\xa6\x8f\xe5\xb7\x9e',)]
Part 2: con.text_factory=unicode
<type 'type'>
--  result: [(u'Fu Zhou',), (u'\u798f\u5dde',)]
Part 3: OptimizedUnicode
<type 'type'>
--  result: [('Fu Zhou',), (u'\u798f\u5dde',), (u'\u798f\u5dde',)]
Part 4: custom fuction
<type 'function'>
--  result: [(u'Fu Zhou',), (u'\u798f\u5dde',), (u'\u7ec2\u5fd3\u7a9e',), (u'\u7ec2\u5fd3\u7a9e',)]

Part 1：unicode被轉換成了utf-8，utf-8和GBK被透傳，寫入數據庫，GBK字符串被取出顯示時，需要用類似'gbk chars'.decode("cp936").encode("utf_8")的語句進行解析print

Part 2：默認設置，注釋的掉都會產生以上的經典錯誤，輸入范圍被限定在unicode對象或純ascii碼　　

Part 3：自動優化，ascii為str對象，非ascii轉為unicode對象

Part 4：GBK被正確轉換，utf-8和unicode在存入數據庫時，都被轉為了默認編碼utf-8存儲，既'\xe7\xa6\x8f\xe5\xb7\x9e'，

In[16]: unicode('\xe7\xa6\x8f\xe5\xb7\x9e','gbk')
Out[16]: u'\u7ec2\u5fd3\u7a9e'

就得到了以上結果。

接着，用軟件查看數據庫里是如何存放的。

分別用官方的sqlite3.exe和SqliteSpy查看，sqlite3.exe因為用命令行界面，命令行用的是GBK顯示；SqliteSpy則是用UTF顯示，所以GBK顯示亂碼。這就再次印證了GBK被允許存放入數據庫的時候，存放的是raw數據，並不會強制轉為數據庫的默認編碼utf-8保存。

Connection.text_factory使用此屬性來控制我們可以從TEXT類型得到什么對象(我：這也印證寫入數據庫的時候，需要自己編碼，不能依靠這個)。默認情況下，這個屬性被設置為Unicode，sqlite3模塊將會為TEXT返回Unicode對象。若你想返回bytestring對象，可以將它設置為str。

因為效率的原因，還有一個只針對非ASCII數據，返回Unicode對象，其它數據則全部返回bytestring對象的方法。要激活它，將此屬性設置為sqlite3.OptimizedUnicode。

你也可以將它設置為任意的其它callabel，接收一個bytestirng類型的參數，並返回結果對象。《摘自http://www.360doc.com/content/11/1102/10/4910_161017252.shtml》

以上一段話是官方文檔的中文版關於text_factory描述的節選。

綜上，我談談我的看法*和使用建議：

1）sqlite3模塊執行insert時，寫入的是raw數據，寫入前會根據text_factory屬性進行類型判斷，默認判斷寫入的是否為unicode對象；

2）使用fetchall()從數據庫讀出時，會根據text_factory屬性進行轉化。

3）輸入字符串是GBK編碼的bytestring，decode轉為unicode寫入；或加text_factory=str直接寫入，讀出時仍為GBK，前提需要數據庫編碼為utf-8，注意用sqlitespy查看是亂碼。

4）輸入字符串是Utf-8編碼的bytestring，可以設置text_factory=str直接寫入直接讀出，sqlitespy查看正常顯示。

5）如果不是什么高性能場景，入庫前轉成unicode，性能開銷也很小，測試數據找不到了，像我這樣話一整天研究這一行代碼，不如讓機器每次多跑零點幾秒。。

*（因為沒有查看sqlite3模塊的源代碼，所以只是猜測）

另外，附上數據庫設置為UTF-16編碼時，產生的結果，更亂，不推薦。

Part 1: con.text_factory=str
<type 'type'>
--  result: [('Fu Zhou',), ('\xc2\xb8\xc2\xa3\xef\xbf\xbd\xef\xbf\xbd',), ('\xe7\xa6\x8f\xe5\xb7\x9e',), ('\xe7\xa6\x8f\xe5\xb7\x9e',)]
Part 2: con.text_factory=unicode
<type 'type'>
--  result: [(u'Fu Zhou',), (u'\u798f\u5dde',)]
Part 3: OptimizedUnicode
<type 'type'>
--  result: [('Fu Zhou',), (u'\u798f\u5dde',), (u'\u798f\u5dde',)]
Part 4: custom fuction
<type 'function'>
--  result: [(u'Fu Zhou',), (u'\u8d42\u62e2\u951f\u65a4\u62f7',), (u'\u7ec2\u5fd3\u7a9e',), (u'\u7ec2\u5fd3\u7a9e',)]

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 sqlite3 外鍵作用原理和使用方法讓 Python 更加充分的使用 Sqlite3 讓 Python 更加充分的使用 Sqlite3 【Python包】SQLite3使用說明（內置模塊）使用 Python 在線操作 sqlite3 Python 之操作sqlite3 Python安裝sqlite3 python連接sqlite3 關於Setup Factory 9的一些使用方法 python3.5中，import sqlite3 出現 no module named _sqlite3的解決方法