sort 命令詳解

本文轉載自查看原文 2019-12-30 16:47 804 Linux 命令實戰

sort 命令

sort 命令是在Linux里非常有用，它將文件進行排序，並將排序結果標准輸出。sort命令既可以從特定的文件，也可以從stdin中獲取輸入。

語法

sort (選項) (參數)

選項

-b：忽略每行前面開始出的空格字符；
-c：檢查文件是否已經按照順序排序；
-d：排序時，處理英文字母、數字及空格字符外，忽略其他的字符；
-f：排序時，將小寫字母視為大寫字母；
-i：排序時，除了040至176之間的ASCII字符外，忽略其他的字符；
-m：將幾個排序號的文件進行合並；
-M：將前面3個字母依照月份的縮寫進行排序；
-n：依照數值的大小排序；
-o<輸出文件>：將排序后的結果存入制定的文件；
-r：以相反的順序來排序；
-t<分隔字符>：指定排序時所用的欄位分隔字符；
+<起始欄位>-<結束欄位>：以指定的欄位來排序，范圍由起始欄位到結束欄位的前一欄位。

SORT(1)                                  User Commands                                  SORT(1)

NAME
       sort - sort lines of text files

SYNOPSIS
       sort [OPTION]... [FILE]...
       sort [OPTION]... --files0-from=F

DESCRIPTION
       Write sorted concatenation of all FILE(s) to standard output.

       Mandatory arguments to long options are mandatory for short options too.  Ordering options:

       -b, --ignore-leading-blanks
              ignore leading blanks

       -d, --dictionary-order
              consider only blanks and alphanumeric characters

       -f, --ignore-case
              fold lower case to upper case characters

       -g, --general-numeric-sort
              compare according to general numerical value

       -i, --ignore-nonprinting
              consider only printable characters

       -M, --month-sort
              compare (unknown) < 'JAN' < ... < 'DEC'

       -h, --human-numeric-sort
              compare human readable numbers (e.g., 2K 1G)

       -n, --numeric-sort
              compare according to string numerical value

       -R, --random-sort
              sort by random hash of keys

       --random-source=FILE
              get random bytes from FILE

       -r, --reverse
              reverse the result of comparisons

       --sort=WORD
              sort according to WORD: general-numeric -g, human-numeric -h, month -M, numeric -n, random -R, version -V

       -V, --version-sort
              natural sort of (version) numbers within text

       Other options:

       --batch-size=NMERGE
              merge at most NMERGE inputs at once; for more use temp files

       -c, --check, --check=diagnose-first
              check for sorted input; do not sort

       -C, --check=quiet, --check=silent
              like -c, but do not report first bad line

       --compress-program=PROG
              compress temporaries with PROG; decompress them with PROG -d

       --debug
              annotate the part of the line used to sort, and warn about questionable usage to stderr

       --files0-from=F
              read input from the files specified by NUL-terminated names in file F; If F is - then read names from 
              standard input

       -k, --key=KEYDEF
              sort via a key; KEYDEF gives location and type

       -m, --merge
              merge already sorted files; do not sort

       -o, --output=FILE
              write result to FILE instead of standard output

       -s, --stable
              stabilize sort by disabling last-resort comparison

       -S, --buffer-size=SIZE
              use SIZE for main memory buffer

       -t, --field-separator=SEP
              use SEP instead of non-blank to blank transition

       -T, --temporary-directory=DIR
              use DIR for temporaries, not $TMPDIR or /tmp; multiple options specify multiple directories

       --parallel=N
              change the number of sorts run concurrently to N

       -u, --unique
              with -c, check for strict ordering; without -c, output only the first of an equal run

       -z, --zero-terminated
              end lines with 0 byte, not newline

       --help display this help and exit

       --version
              output version information and exit

       KEYDEF  is  F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a field number and C a character 
       position in the field; both are origin 1, and the stop position defaults to the line's end.
       If neither -t nor -b is in effect, characters in a field are counted from the beginning of the preceding 
       whitespace.  OPTS is one or more single-letter ordering  options  [bdfgiMhnRrV],  which  override
       global ordering options for that key.  If no key is given, use the entire line as the key.

       SIZE may be followed by the following multiplicative suffixes: % 1% of memory, b 1, K 1024 (default), and so 
       on for M, G, T, P, E, Z, Y.

       With no FILE, or when FILE is -, read standard input.

       *** WARNING *** The locale specified by the environment affects sort order.  Set LC_ALL=C to get the traditional
       sort order that uses native byte values.

       GNU coreutils online help: <http://www.gnu.org/software/coreutils/> Report sort translation bugs to 
       <http://translationproject.org/team/>

AUTHOR
       Written by Mike Haertel and Paul Eggert.

COPYRIGHT
       Copyright © 2013 Free Software Foundation, Inc.  License GPLv3+: GNU GPL version 3 or later 
       <http://gnu.org/licenses/gpl.html>.
       This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent permitted 
       by law.

SEE ALSO
       uniq(1)

       The full documentation for sort is maintained as a Texinfo manual.  If the info and sort programs are properly 
       installed at your site, the command

              info coreutils 'sort invocation'

       should give you access to the complete manual.

GNU coreutils 8.22                                    April 2018                                     SORT(1)

參數

文件：指定待排序的文件列表。

實例

sort將文件/文本的每一行作為一個單位，相互比較，比較原則是從首字符向后，依次按ASCII碼值進行比較，最后將他們按升序輸出。

[root@test sort]# cat sort.txt 
aaa:10:1.1
ccc:30:3.3
ddd:40:4.4
bbb:20:2.2
eee:50:5.5
eee:50:5.6
eee:50:5.5
[root@test sort]# sort sort.txt 
aaa:10:1.1
bbb:20:2.2
ccc:30:3.3
ddd:40:4.4
eee:50:5.5
eee:50:5.5
eee:50:5.6

忽略相同行使用 -u 選項或者 uniq：

[root@test sort]# cat sort.txt 
aaa:10:1.1
ccc:30:3.3
ddd:40:4.4
bbb:20:2.2
eee:50:5.5
eee:50:5.5
eee:50:5.6
eee:50:5.5

[root@test sort]# sort -u sort.txt 
aaa:10:1.1
bbb:20:2.2
ccc:30:3.3
ddd:40:4.4
eee:50:5.5
eee:50:5.6

# 或者使用 uniq 命令。 切記：uniq 只是忽略相鄰的向同行
[root@test sort]# uniq sort.txt 
aaa:10:1.1
ccc:30:3.3
ddd:40:4.4
bbb:20:2.2
eee:50:5.5
eee:50:5.6
eee:50:5.5

sort 命令的 -n、-r、-k、-t 選項的使用：

[root@test sort]# cat sort.txt 
AAA:BB:CC
aaa:30:1.6
ccc:50:3.3
ddd:20:4.2
bbb:10:2.5
eee:40:5.4
eee:60:5.1

# 將BB列按照數字從小到大順序排列
[root@test sort]# sort -nk 2 -t: sort.txt 
AAA:BB:CC
bbb:10:2.5
ddd:20:4.2
aaa:30:1.6
eee:40:5.4
ccc:50:3.3
eee:60:5.1

# 將CC列數字從大到小順序排列
[root@test sort]# sort -nrk 3 -t: sort.txt 
eee:40:5.4
eee:60:5.1
ddd:20:4.2
ccc:50:3.3
bbb:10:2.5
aaa:30:1.6
AAA:BB:CC

# -n 是按照數字大小排序
# -r 是以相反順序
# -k 是指定需要排序的欄位
# -t 指定欄位分隔符，此處為冒號

-k 選項的具體語法格式：

[FStart[.CStart]] Modifie[,[FEnd.[CEnd]] Modifier]]
----------Start-----------,----------End-----------
    FStart.CStart 選項     ,     FEnd.CEnd 選項

這個語法格式可以被其中的逗號,分為兩大部分，Start部分和End部分。Start部分也由三部分組成，其中的Modifier部分就是我們之前說過的類似n和r的選項部分。我們重點說說Start部分的FStart和C.Start。

C.Start也是可以省略的，省略的話就表示從本域的開頭部分開始。FStart.CStart，其中FStart就是表示使用的域，而CStart則表示在FStart域中從第幾個字符開始算“排序首字符”。若不設定 End 部分，則就認為End被設定為行尾。

同理，在End部分中，你可以設定FEnd.CEnd，如果你省略.CEnd，則表示結尾到“域尾”，即本域的最后一個字符。或者，如果你將CEnd設定為0(零)，也是表示結尾到“域尾”。

每列的信息：姓名 身高 年齡 工資
[root@test sort]# cat info.txt 
zhangsan 175 20 5000
lisi 170 25 6000
wangwu 170 28 5000
zhangxiaoliu 165 30 6000 # 按照員工姓名進行排序
[root@test sort]# sort -t ' ' -k 1 info.txt 
lisi 170 25 6000
wangwu 170 28 5000
zhangsan 175 20 5000
zhangxiaoliu 165 30 6000
按照姓名，第一個區域進行比較即可：-k 1 # 按照員工身高進行排序
[root@test sort]# sort -t ' ' -n -k 2 info.txt 
zhangxiaoliu 165 30 6000
lisi 170 25 6000
wangwu 170 28 5000
zhangsan 175 20 5000
按照升高，數字要加 -n；第二個區域比較實用：-k 2,；
但是lisi和wangwu身高一樣，這時默認會按照第一區域進行比較，所以lisi在前。

# 按照員工身高進行排序，身高相同的員工按照工資升序排序。
[root@test sort]# sort -t ' ' -n -k2 -k4 info.txt 
zhangxiaoliu 165 30 6000
wangwu 170 28 5000
lisi 170 25 6000
zhangsan 175 20 5000
按照身高和工資，這樣設定了區域，加入 -k2 -k4 ，從第2個區域開始比較，
如果相同，再以第4個區域排序。（若需要，可以一直在后面加上區域）

# 按照員工工資降序排序，工資相同的以年齡升序排序
[root@test sort]# sort -t ' ' -n -k4r -k3 info.txt 
lisi 170 25 6000
zhangxiaoliu 165 30 6000
zhangsan 175 20 5000
wangwu 170 28 5000
先比較工資，要先比較第四個區域，另外要降序，則是 -k4r 在前，再比較年齡，則是 -k2 默認升序，所以為 -n -k4r -k2。
因為比較的都是數字，所以 -n 參數前置，還可以寫成： -k4rn -k2n
[root@test sort]# sort -t ' ' -k4rn -k3n info.txt 
lisi 170 25 6000
zhangxiaoliu 165 30 6000
zhangsan 175 20 5000
wangwu 170 28 5000 # 按照員工姓名的第二個字母排序，如果相同的則按照工資進行降序排序
[root@test sort]# sort -t ' ' -k1.2,1.2 -k4nr info.txt 
wangwu 170 28 5000
zhangxiaoliu 165 30 6000
zhangsan 175 20 5000
lisi 170 25 6000
-k1.2  比較的是第一個區域的第2個字母開始，到本區域的最后一個字符結束，來進行比較。
       故zhangsan和zhaoxiaoliu，n 在 o 的前面，所以 zhangsan 排在前面。
-k1.2,1.2 限定了姓名第二個字母的比較范圍，因此第一個區域必須使用 FStart和CStart指定，及-k1.2,1.2；
          再比較工資第四個區域，數字降序排列，故為 -k4nr。

從公司英文名稱的第二個字母開始進行排序：（有字符和數值同時排序時）

[root@test sort]# cat company.txt 
dangdang 50 6000
baidu 100 5000
sohu 100 4500
google 110 5000
guge 50 3000

[root@test sort]# sort -t ' ' -k 1.2 company.txt 
baidu 100 5000
dangdang 50 6000
sohu 100 4500
google 110 5000
guge 50 3000

-k 1.2 表示對第一個域的第二個字符開始到本域的最后一個字符為止的字符串進行排序。baidu 和 dangdang 第二個字符都是 a，但是第三個字符baidu 的 i 順序優先於 n，所以 baidu 名列榜首。sohu 和 google 第二個字符都是 o，但是 sohu 的 h 在 google 的 o 前面，所以 sohu 排在 google 前面。guge 只能排在最后。

只針對公司英文名稱的第二個字母進行排序，如果相同的按照員工工資進行排序：

[root@test sort]# cat company.txt 
dangdang 50 6000
baidu 100 5000
sohu 100 4500
google 110 5000
guge 50 3000

# 只針對公司英文名稱的第二個字母進行排序，如果相同的按照員工工資進行降序排序
[root@test sort]# sort -t ' ' -k 1.2,1.2 -k3nr company.txt 
dangdang 50 6000
baidu 100 5000
google 110 5000
sohu 100 4500
guge 50 3000

# 只針對公司英文名稱的第二個字母進行排序，如果相同的按照員工工資進行升序排序
[root@test sort]# sort -t ' ' -k 1.2,1.2 -k3n company.txt 
baidu 100 5000
dangdang 50 6000
sohu 100 4500
google 110 5000
guge 50 3000

由於只對第二個字母進行排序，所以我們使用了 -k 1.2,1.2 的表示方式（此處也可以寫成 -k1.2,1.2），表示我們只對第二個字母進行排序。

（如果問使用 -k 1.2 怎么不行？當然不行，因為后面省略了 End 部分，這就意味着你將對從第二個字母到本域最后一個字符為止的字符串進行排序，最后排除來的就只能按 -k3nr 或 -k3n 來排序了。）

在只對公司英文名稱第二個人字母排序之后，那么接下來要對員工工資進行排序，此處使用了 -k3n或 -k3nr （也可以使用 -k 3n 或 -k 3nr），因為本域是工資（數值），那么必須在本域后加上n。

錯誤的示范，如下：（這個 n 必須在本域3的后面，在本域（3）前面加 n 會出錯；若加在 k 前面也會得不到想要的結果。）

[root@test sort]# cat company.txt 
dangdang 50 6000
baidu 100 50000
sohu 100 4500
google 110 5000
guge 50 3000

[root@test sort]# sort -t ' ' -k 1.2,1.2 -k n3 company.txt 
sort: invalid number at field start: invalid count at start of ‘n3’

[root@test sort]# sort -t ' ' -k 1.2,1.2 -nk 3 company.txt 
guge 50 3000
sohu 100 4500
google 110 5000
dangdang 50 6000
baidu 100 50000

# 網絡上有些使用 -nrk 3,3，這種方式也得不到想要的結果。是有問題的。
[root@test sort]# sort -t ' ' -k 1.2,1.2 -nrk 3,3 company.txt 
baidu 100 50000
dangdang 50 6000
google 110 5000
sohu 100 4500
guge 50 3000

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 sort命令詳解 Linux sort命令詳解 sort命令詳解 linux sort 命令詳解 (轉)linux sort 命令詳解 linux下sort命令詳解 linux sort uniq命令詳解 Redis sort 排序命令詳解 Linux文件排序工具 sort 命令詳解 Linux下的sort排序命令詳解(一)