用此文章來記錄在開發過程中遇到用sqlite數據庫進行並發操作導致數據庫被鎖的問題。
這里我先簡單說一下sqlite3數據庫的特性:
- SQLite 是一款輕型的嵌入式數據庫,它占用資源非常的低,處理速度快,高效而且可靠。在嵌入式設備中,可能只需要幾百 K 的內存就夠了。因此在移動設備爆發時,它依然是最常見的數據持久化方案之一;
- SQLite 的 API 是支持多線程訪問的,多線程訪問必然帶來數據安全問題。
- sqlite3支持並發執行讀事務,即可以同時開啟多個進程/線程從數據庫讀數據
- sqlite3 是 不支持並發執行寫事務,即不能多個進程/線程同時往數據庫寫數據,它的寫事務就是鎖表,你無論開幾個線程,只要寫操作訪問的是同一張表,最后在 sqlite3 那里都會被鎖,實際上最后寫操作都是順序執行的。
- 本地存儲,不支持網絡訪問
問題1描述
在項目開發過程中,SQLite數據庫同一時刻只允許單個線程寫入,很多服務端程序會開很多線程,每個線程為一個客戶端服務,如果有多個客戶端同時發起寫入請求,在服務端會因為某個線程尚未寫入完成尚未解除對數據庫的鎖定而導致其他線程無法在限定的時間內完成寫入操作而拋出異常,提示“database is locked”。下面我復現一下問題的發生。
問題1復現
執行以下多線程寫操作的代碼:
# coding:utf-8
"""
測試sqlite數據庫鎖的問題
"""
import threading
import time
import contextlib
import traceback
import sqlite3
import os
# Path = "/tmp"
Path = r"D:\PythonProject\testProject"
Name = "openmptcprouter.db"
class DbError(Exception):
def __init__(self):
super().__init__(self)
def __str__(self):
return "DB Error"
class Job(object):
"""
A indicator to mark whether the job is finished.
"""
def __init__(self):
self._finished = False
def is_finished(self):
return self._finished
def finish(self):
self._finished = True
@contextlib.contextmanager
def transaction(path=Path, name=Name):
"""
Automatic handle transaction COMMIT/ROLLBACK. You MUST call trans.finish(),
if you want to COMMIT; Otherwise(not call or exception occurs), ROLLBACK.
>>> with transaction(conn) as trans:
>>> do something...
>>> if xxxxx:
>>> # if you don't want to commit, you just not call trans.finish().
>>> return error_page("xxxxxx")
>>> # if you want to commit, you call:
>>> trans.finish()
@param conn: database connection
"""
db_path = os.path.join(path, name)
conn = sqlite3.connect(db_path)
# conn.row_factory = dict_factory
cursor = conn.cursor()
trans = Job()
cursor.execute("BEGIN TRANSACTION")
try:
yield trans, cursor
if trans.is_finished():
conn.commit()
else:
conn.rollback()
except:
conn.rollback()
raise DbError
finally:
cursor.close()
conn.close()
def write_fun():
ip = "172.0.0.1"
user_id = "1"
path = "/status/vpn"
params = "{work_fun1}"
info = "0000 獲取vpn列表狀態成功"
cost_time = "5"
print("wating to synchronize write")
ready.wait()
try:
print("=================start sqlite connection=================")
with transaction() as (trans, cursor):
print("starting to write")
ready.wait()
cursor.execute(
"""
insert into operation_log(ip,user_id,path,params,info,cost_time)
values(?,?,?,?,?,?)
""", (ip, user_id, path, params, info, cost_time))
print("wating to commit")
# time.sleep(3) # 在這里延遲,數據庫則會被鎖住
trans.finish()
print("write commit complete")
print("=================close sqlite connection=================")
except:
print(traceback.format_exc())
if __name__ == '__main__':
ready = threading.Event()
threads = [threading.Thread(target=write_fun) for i in range(3)]
[t.start() for t in threads]
time.sleep(1)
print("Setting ready")
ready.set()
[t.join() for t in threads]
輸出結果:
wating to synchronize write
wating to synchronize write
wating to synchronize write
Setting ready
=================start sqlite connection=================
=================start sqlite connection=================
=================start sqlite connection=================
starting to write
starting to write
starting to write
wating to commit
write commit complete
=================close sqlite connection=================
wating to commit
write commit complete
=================close sqlite connection=================
wating to commit
write commit complete
=================close sqlite connection=================
從輸出結果來看,當用三個線程同時並發去進行數據庫的寫操作的時候,並不會並發去執行,而是順序去執行,如果一個寫操作沒完成,其他寫操作需要等待。
接下來我的問題出現了:
此時如果我們在執行完sql操作后,進行commit操作之前,堵塞個3秒(在trans.finish()前面加上sleep延遲),延遲整個寫操作的過程,只有兩個線程能完成寫入操作,剩下一個,則會報數據庫被鎖住的異常(sqlite3.OperationalError: database is locked)。
注意:這里如果不想延遲去復現鎖住的問題,則可以使用多一點的線程去同時執行,比如500個線程。一般這里執行到200多到300線程,就會被鎖住。
wating to synchronize write
wating to synchronize write
wating to synchronize write
Setting ready
=================start sqlite connection=================
=================start sqlite connection=================
=================start sqlite connection=================
starting to write
starting to write
starting to write
wating to commit
write commit complete
=================close sqlite connection=================
wating to commit
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 67, in transaction
yield trans, cursor
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 100, in write_fun
""", (ip, user_id, path, params, info, cost_time))
sqlite3.OperationalError: database is locked
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 104, in write_fun
print("write commit complete")
File "D:\developer\Python37-64\lib\contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 76, in transaction
raise DbError
DbError: DB Error
write commit complete
=================close sqlite connection=================
所以我在這里猜測,順序執行寫操作的時候,一個寫操作沒完成,其他寫操作需要等待,而且等待的時候大約是2*3s=6秒,如果超過,則數據庫會被鎖住。因為這里如果我選擇12個線程去執行,然后延遲1秒,只有6個線程能完成寫入操作。
這里的規律后面我研究發現,這里的等待時間就是數據庫的連接timeout設置。默認timeout是5秒。如果這里我把timeout設定為10秒,選擇12個線程去執行,然后延遲1秒,就有11個線程能完成寫入操作。
問題1解決
這里我最終的解決方案是,使用線程隊列,把所有的數據庫寫操作放入隊列,然后使用一個線程去執行隊列里面的數據庫寫操作。
問題2描述
在項目開發過程中,如果有多個客戶端同時發起寫入和讀取請求,此時如果其中有一個讀取請求持續的時間過長,一直沒有斷開連接,尚未解除對數據庫的鎖定,導致其他的寫操作一直掛着,便拋出異常,提示“database is locked”。下面我復現一下問題的發生。
問題2復現
執行10個線程寫操作和一個線程讀操作的代碼:
# coding:utf-8
"""
測試sqlite數據庫鎖的問題
"""
import threading
import time
import contextlib
import traceback
import sqlite3
import os
import datetime
Path = r"D:\PythonProject\testProject"
Name = "openmptcprouter.db"
class DbError(Exception):
def __init__(self):
super().__init__(self)
def __str__(self):
return "DB Error"
class Job(object):
"""
A indicator to mark whether the job is finished.
"""
def __init__(self):
self._finished = False
def is_finished(self):
return self._finished
def finish(self):
self._finished = True
@contextlib.contextmanager
def transaction(path=Path, name=Name):
"""
Automatic handle transaction COMMIT/ROLLBACK. You MUST call trans.finish(),
if you want to COMMIT; Otherwise(not call or exception occurs), ROLLBACK.
>>> with transaction(conn) as trans:
>>> do something...
>>> if xxxxx:
>>> # if you don't want to commit, you just not call trans.finish().
>>> return error_page("xxxxxx")
>>> # if you want to commit, you call:
>>> trans.finish()
@param conn: database connection
"""
db_path = os.path.join(path, name)
conn = sqlite3.connect(db_path, timeout=10)
# conn.row_factory = dict_factory
cursor = conn.cursor()
trans = Job()
cursor.execute("BEGIN TRANSACTION")
try:
yield trans, cursor
if trans.is_finished():
conn.commit()
else:
conn.rollback()
except:
conn.rollback()
raise DbError
finally:
cursor.close()
conn.close()
def write_fun():
ip = "172.0.0.1"
user_id = "1"
path = "/status/vpn"
params = "{work_fun1}"
info = "0000 獲取vpn列表狀態成功"
cost_time = "5"
print("wating to synchronize write")
ready.wait()
try:
print("=================start sqlite connection=================")
with transaction() as (trans, cursor):
print("starting to write")
cursor.execute(
"""
insert into operation_log(ip,user_id,path,params,info,cost_time)
values(?,?,?,?,?,?)
""", (ip, user_id, path, params, info, cost_time))
print("{}:wating to commit".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
# time.sleep(2)
trans.finish()
print("{}:write commit complete".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
print("=================close sqlite connection=================")
except:
print(traceback.format_exc())
def read_fun(delay):
print("Wating to read_fun")
ready.wait()
# time.sleep(delay)
with transaction() as (trans, cursor):
print("connect read_fun")
cursor.execute("select * from operation_log")
print("{}:read_fun sleep".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
time.sleep(delay)
print("{}:read_fun Done".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
if __name__ == '__main__':
ready = threading.Event()
threads = [threading.Thread(target=write_fun) for i in range(10)]
threads.extend([threading.Thread(target=read_fun, args=(15,)) for i in range(1)])
[t.start() for t in threads]
time.sleep(1)
print("Setting ready")
ready.set()
[t.join() for t in threads]
輸出結果
D:\python_XZF\py37env\Scripts\python.exe D:/PythonProject/testProject/test_lock_sqlite.py
wating to synchronize write
wating to synchronize write
wating to synchronize write
wating to synchronize write
wating to synchronize write
wating to synchronize write
wating to synchronize write
wating to synchronize write
wating to synchronize write
wating to synchronize write
Wating to read_fun
Setting ready
=================start sqlite connection=================
=================start sqlite connection=================
=================start sqlite connection=================
=================start sqlite connection=================
=================start sqlite connection=================
=================start sqlite connection=================
=================start sqlite connection=================
=================start sqlite connection=================
=================start sqlite connection=================
=================start sqlite connection=================
starting to writestarting to write
starting to write
starting to write
starting to write
starting to write
connect read_fun
starting to writestarting to write
starting to writestarting to write
2021-10-11 14:19:13:read_fun sleep
2021-10-11 14:19:13:wating to commit
2021-10-11 14:19:13:write commit complete
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 71, in transaction
conn.commit()
sqlite3.OperationalError: database is locked
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 104, in write_fun
print("{}:write commit complete".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
File "D:\developer\Python37-64\lib\contextlib.py", line 119, in __exit__
next(self.gen)
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 77, in transaction
raise DbError
DbError: DB Error
2021-10-11 14:19:24:wating to commit
2021-10-11 14:19:24:write commit complete
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 68, in transaction
yield trans, cursor
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 100, in write_fun
""", (ip, user_id, path, params, info, cost_time))
sqlite3.OperationalError: database is locked
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 104, in write_fun
print("{}:write commit complete".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
File "D:\developer\Python37-64\lib\contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 77, in transaction
raise DbError
DbError: DB Error
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 68, in transaction
yield trans, cursor
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 100, in write_fun
""", (ip, user_id, path, params, info, cost_time))
sqlite3.OperationalError: database is locked
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 104, in write_fun
print("{}:write commit complete".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
File "D:\developer\Python37-64\lib\contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 77, in transaction
raise DbError
DbError: DB Error
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 68, in transaction
yield trans, cursor
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 100, in write_fun
""", (ip, user_id, path, params, info, cost_time))
sqlite3.OperationalError: database is locked
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 104, in write_fun
print("{}:write commit complete".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
File "D:\developer\Python37-64\lib\contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 77, in transaction
raise DbError
DbError: DB Error
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 68, in transaction
yield trans, cursor
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 100, in write_fun
""", (ip, user_id, path, params, info, cost_time))
sqlite3.OperationalError: database is locked
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 104, in write_fun
print("{}:write commit complete".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
File "D:\developer\Python37-64\lib\contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 77, in transaction
raise DbError
DbError: DB Error
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 68, in transaction
yield trans, cursor
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 100, in write_fun
""", (ip, user_id, path, params, info, cost_time))
sqlite3.OperationalError: database is locked
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 104, in write_fun
print("{}:write commit complete".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
File "D:\developer\Python37-64\lib\contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 77, in transaction
raise DbError
DbError: DB Error
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 68, in transaction
yield trans, cursor
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 100, in write_fun
""", (ip, user_id, path, params, info, cost_time))
sqlite3.OperationalError: database is locked
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 104, in write_fun
print("{}:write commit complete".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
File "D:\developer\Python37-64\lib\contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 77, in transaction
raise DbError
DbError: DB Error
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 68, in transaction
yield trans, cursor
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 100, in write_fun
""", (ip, user_id, path, params, info, cost_time))
sqlite3.OperationalError: database is locked
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 104, in write_fun
print("{}:write commit complete".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
File "D:\developer\Python37-64\lib\contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 77, in transaction
raise DbError
DbError: DB Error
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 68, in transaction
yield trans, cursor
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 100, in write_fun
""", (ip, user_id, path, params, info, cost_time))
sqlite3.OperationalError: database is locked
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 104, in write_fun
print("{}:write commit complete".format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")))
File "D:\developer\Python37-64\lib\contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "D:/PythonProject/testProject/test_lock_sqlite.py", line 77, in transaction
raise DbError
DbError: DB Error
2021-10-11 14:19:28:read_fun Done
=================close sqlite connection=================
Process finished with exit code 0
從執行過程和輸出結果來看,剛開始執行的時候只有一個線程馬上完成了寫入操作,其他寫入操作因讀操作的延誤,而一直掛着等待執行,如果寫入操作等待的時間超過了timeout值(默認5秒),則會拋出異常,提示“database is locked”。
問題2解決
這里我最終的解決方案是,對數據庫讀操作后,馬上斷開連接,不要做其他跟數據庫操作無關的操作。
原理解析
1.SQLite 如何實現線程安全?
答:SQLite 的 API 是支持多線程訪問的,多線程訪問必然帶來數據安全問題。
為了確保數據庫安全,SQLite 內部抽象了兩種類型的互斥鎖(鎖的具體實現和宿主平台有關)來應對線程並發問題:
- fullMutex
- 可以理解為 connection mutex,和連接句柄(上問描述的 sqlite3 結構體)綁定
- 保證任何時候,最多只有一個線程在執行基於連接的事務
- coreMutex
- 當前進程中,與文件綁定的鎖
- 用於保護數據庫相關臨界資源,確保在任何時候,最多只有一個線程在訪問
如何理解 fullMutex?SQLite 中與數據訪問相關的 API 都是通過連接句柄 sqlite3 進行訪問的,基於 fullMutex 鎖,如果多個線程同時訪問某個 API -- 譬如 sqlite3_exec(db, ...)
,SQLite 內部會根據連接的 mutex 將該 API 的邏輯給保護起來,確保只有一個線程在執行,其他線程會被 mutex 給 block 住。
對於 coreMutex,它用來保護數據庫相關臨界資源。
用戶可以配置這兩種鎖,對這兩種鎖的控制衍生出 SQLite 所支持的三種線程模型:
- single-thread
- coreMutex 和 fullMutex 都被禁用
- 用戶層需要確保在任何時候只有一個線程訪問 API,否則報錯(crash)
- multi-thread
- coreMutex 保留,fullMutex 禁用
- 可以多個線程基於不同的連接並發訪問數據庫,但單個連接在任何時候只能被一個線程訪問
- 單個 connection,如果並發訪問,會報錯(crash)
- 報錯信息:illegal multi-threaded access to database connection
- serialized
- coreMutex 和 fullMutex 都保留
2.如果SQLite 對並發讀寫,也即同時進行讀事務和寫事務 的支持如何?
答:這個問題的答案與用戶所選擇的日志模型有關,以下答案也能解釋問題2出現的具體原因。
SQLite 支持兩種日志記錄方式,或者說兩種日志模型:Rollback和WAL。SQLite 默認的日志模式是 rollback。
這里簡單對rollback 日志模式稍作總結(想了解wal日志模式,請參考https://zhangbuhuai.com/post/sqlite.html):
- 每次寫事務都有兩個寫 IO 的操作(一次是創建 .db-journal,一次修改數據庫)
- 可以同時執行多個讀事務
- 不能同時執行多個寫事務
- 讀事務會影響寫事務,如果讀事務較多,寫事務在提交階段(獲取 exclusive 鎖)常會遇到 SQLITE_BUSY 錯誤
- 寫事務會影響讀事務,在寫事務的提交階段,讀事務是無法進行的
- 寫事務遇到 SQLITE_BUSY 錯誤的節點較多
總結
如果編寫高並發的服務端程序,一定要對sqlite3數據庫的寫入操作和讀取操作進行有效管理,常用的方案有四個:
- 使用線程隊列,把所有的寫操作放入隊列中,確保同一時刻只有一個線程執行寫入數據庫的代碼;
- 使用鎖機制使得多個線程競爭進入臨界區,確保同一時刻只有一個線程執行寫入數據庫的代碼;
- 連接數據庫時設置參數timeout,設置當數據庫處於鎖定狀態時最長等待時間,sqlite3.connect()函數的參數timeout默認值為5秒,不適合服務端程序。但是參數timeout設置為多少更合適取決於具體的應用場景,雖然形式簡潔,但是不如前面兩種方法通用。
- 讀操作和寫操作的時間不宜過長,操作完數據庫后,馬上斷開連接,不要做其他無關數據庫的操作。
關於sqlite3詳細原理的文章,可參考:https://zhangbuhuai.com/post/sqlite.html