為什么用ls和du顯示出來的文件大小有差別？

本文轉載自查看原文 2012-02-19 14:23 6756 du/ Linux/ sparse file/ linux/ ls

曾經有幾次，我用ls和du查看一個文件的大小，發現二者顯示出來的大小並不一致，例如：

bl@d3:~/test/sparse_file$ ls -l fs.img
-rw-r--r-- 1 bl bl 1073741824 2012-02-17 05:09 fs.img

bl@d3:~/test/sparse_file$ du -sh fs.img
0       fs.img

這里ls顯示出fs.img的大小是1073741824字節（1GB），而du顯示出fs.img的大小是0。

原來一直沒有深究這個問題，今天特來補上。

造成這二者不同的原因主要有兩點：

稀疏文件（sparse file）
ls和du顯示出的size有不同的含義

先來看一下稀疏文件。稀疏文件只文件中有“洞”（hole）的文件，例如有C寫一個創建有“洞”的文件：

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    int fd = open("sparse.file", O_RDWR|O_CREAT);
    lseek(fd, 1024, SEEK_CUR);
    write(fd, "\0", 1);

    return 0;
}

從這個文件可以看出，創建一個有“洞”的文件主要是用lseek移動文件指針超過文件末尾，然后write，這樣就形成了一個“洞”。

用Shell也可以創建稀疏文件：

$ dd if=/dev/zero of=sparse_file.img bs=1M seek=1024 count=0
0+0 records in
0+0 records out

使用稀疏文件的優點如下（Wikipedia上的原文）：

The advantage of sparse files is that storage is only allocated when actually needed: disk space is saved, and large files can be created even if there is insufficient free space on the file system.

即稀疏文件中的“洞”可以不占存儲空間。

再來看一下ls和du輸出的文件大小的含義（Wikipedia上的原文）：

The du command which prints the occupied space, while ls print the apparent size。

換句話說，ls顯示文件的“邏輯上”的size，而du顯示文件“物理上”的size，即du顯示的size是文件在硬盤上占據了多少個block計算出來的。舉個例子：

bl@d3:~/test/sparse_file$ echo -n 1 > 1B.txt
bl@d3:~/test/sparse_file$ ls -l 1B.txt
-rw-r--r-- 1 bl bl 1 2012-02-19 05:17 1B.txt
bl@dl3:~/test/sparse_file$ du -h 1B.txt
4.0K    1B.txt

這里我們先創建一個文件1B.txt，大小是一個字節，ls顯示出的size就是1Byte，而1B.txt這個文件在硬盤上會占用N個block，然后根據每個block的大小計算出來的。這里之所以用了N，而不是一個具體的數字，是因為隱藏在幕后的細節還很多，例如Fragment size，我們以后再討論。

當然，上述這些都是ls和du的缺省行為，ls和du分別提供了不同參數來改變這些行為。比如ls的-s選項（print the allocated size of each file, in blocks）和du的--apparent-size選項（print apparent sizes, rather than disk usage; although the apparent size is usually smaller, it may be larger due to holes in (`sparse') files, internal fragmentation, indirect blocks, and the like）。

此外，對於拷貝稀疏文件，cp缺省情況下會做一些優化，以加快拷貝的速度。例如：

strace cp fs.img fs.img.copy >log 2>&1

打開log文件，我們發現cp命令只是read和lseek，並沒有write。

stat("fs.img.copy", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
stat("fs.img", {st_mode=S_IFREG|0644, st_size=1073741824, ...}) = 0
stat("fs.img.copy", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
open("fs.img", O_RDONLY)                = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1073741824, ...}) = 0
open("fs.img.copy", O_WRONLY|O_TRUNC)   = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap(NULL, 532480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f90df965000
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 524288) = 524288
lseek(4, 524288, SEEK_CUR)              = 524288
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 524288) = 524288
lseek(4, 524288, SEEK_CUR)              = 1048576
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 524288) = 524288
lseek(4, 524288, SEEK_CUR)              = 1572864

這和cp的關於sparse的選項有關，看cp的manpage：

By default, sparse SOURCE files are detected by a crude heuristic and the corresponding DEST file is made sparse as well. That is the behavior selected by --sparse=auto. Specify --sparse=always to create a sparse DEST file whenever the SOURCE file contains a long enough sequence of zero bytes. Use --sparse=never to inhibit creation of sparse files.

看了一下cp的源代碼，發現每次read之后，cp會判斷讀到的內容是不是都是0，如果是就只lseek而不write。

當然對於sparse文件的處理，對於用戶都是透明的。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 du和ls的區別：如何正確計算文件大小 linux du與ls查看文件大小時的區別【linux】ls顯示文件大小時顯示單位 Linux查看文件大小的幾種方法示例 stat du ls awk (轉） U盤拷貝文件大小有限制嗎？ idea文件折疊顯示出來配置關於右鍵屬性與du -sh顯示的文件大小不一致的解決 ls命令按文件大小排序命令ls按文件大小來排序 linux ls 按文件大小排序