問題背景:
客戶反饋,設備斷電以后,重新啟動,原有配置丟失變磚
問題分析:
變磚的直接原因是配置丟失,配置丟失的原因是啟動后flash上的數據已經被破壞,讀取失敗;
進一步分析,主要是flash數據未完全寫入導致;
為何先前發布的yaffs2文件系統沒有問題?目前的ubi文件系統會存在問題?
分析app層對於flash數據的操作流程,主要是以下步驟:
fopen -> fwrite -> fsync -> fclose
然而,實際的應該是如下步驟:
fopen -> fwrite -> fflush -> fsync -> fclose
fopen是帶有緩沖的io操作,fflush的調用,可以使c庫緩沖中的數據刷新到os層,而fsync只是將os層數據同步到介質中;
因此再缺失fflush的情況下,只是fsync再fclose,立即斷電,會導致刷新數據不全。
至於yaffs2文件系統為什么沒有問題,內核方面給出解釋是:yaffs2文件系統是不帶緩沖的,fclose可以觸發將緩沖中殘留數據刷新到介質;
結合man手冊的走讀,有如下結論:
1. 如果需要再描述符關閉前,將數據同步刷新到介質,需要調用fync接口,尤其針對一些關鍵的數據(丟失會引起嚴重問題的數據);
2. fopen方式打開的,如果需要調用fsync,正確的調用順序為:fopen -> fwrite -> fflush -> fsync -> fclose
3. open方式打開的,如果需要調用到fsync,正確的調用順序為:open -> write -> fsync -> close
問題修復:
1. 寫配置文件的接口中,fsync前用fflush,出臨時版本
2. 檢索工程中,所有fopen打開文件,調用fsync前,增加fflush的調用
3. 鑒於業務的特殊情況,檢索工程中,所有fclose或者close前,沒有調用fsync的接口,需要補充fsync的調用
以下是man手冊上摘錄相關接口的一些注意點:
close()調用的理解(來自https://linux.die.net/man/2/close):
Not checking the return value of close() is a common but nevertheless serious programming error. It is quite possible that errors on a previous write(2) operation are first reported at the final close(). Not checking the return value when closing the file may lead to silent loss of data. This can especially be observed with NFS and with disk quota.
A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a file system to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.)
It is probably unwise to close file descriptors while they may be in use by system calls in other threads in the same process. Since a file descriptor may be reused, there are some obscure race conditions that may cause unintended side effects
有可能有些write的錯誤,是報在close調用的時候,close的返回值不判斷可能會不知情的情況下,已經丟失了數據。尤其是在帶有磁盤配額的NFS文件系統上;
close函數不保證數據寫到介質上的,要保證刷新到介質,需要調用fsync進行刷新,再去看fsync的接口手冊,對於自身帶有緩沖的介質,fsync也是無法保證真正寫入的。
fclose()調用的理解(來自man手冊):
Note that fclose() only flushes the user space buffers provided by the C library. To ensure that the data is physically stored on disk the kernel buffers must be flushed too,
for example, with sync(2) or fsync(2).
fclose只刷新C庫提供的用戶空間buf,數據到物理介質的寫入還需要sync或者fsync來保證;