【文本處理命令】之awk命令詳解

本文轉載自查看原文 2019-10-28 15:40 432 文本處理命令

一、awk命令簡介

　　awk 是一種很棒的語言，它適合文本處理和報表生成，其語法較為常見，借鑒了某些語言的一些精華，如 C 語言等。在 linux 系統日常處理工作中，發揮很重要的作用，掌握了 awk將會使你的工作變的高大上。 awk 是三劍客的老大，利劍出鞘，必會不同凡響。

　　awk是一種編程語言，用於在linux/unix下對文本和數據進行處理。數據可以來自標准輸入(stdin)、一個或多個文件，或其它命令的輸出。它支持用戶自定義函數和動態正則表達式等先進功能，是linux/unix下的一個強大編程工具。它在命令行中使用，但更多是作為腳本來使用。awk有很多內建的功能，比如數組、函數等，這是它和C語言的相同之處，靈活性是awk最大的優勢。

二、使用方法

2.1 格式

awk '{pattern + action}' {filenames}
awk [options] 'script' var=value file(s)
awk [options] -f scriptfile var=value file(s)

　　其中 pattern 表示 AWK 在數據中查找的內容，而 action 是在找到匹配內容時所執行的一系列命令。花括號（{}）不需要在程序中始終出現，但它們用於根據特定的模式對一系列指令進行分組。 pattern就是要表示的正則表達式，用斜杠括起來。

2.2 常用命令選項

-F fs 　　fs 指定輸入分隔符（可指定一個或多個），fs可以是字符串或正則表達式
-v var=value 賦值一個用戶定義變量，將外部變量傳遞給awk
-f scriptfile 從腳本文件中讀取awk命令

2.3 awk的內置變量

$n : 當前記錄的第n個字段，比如n為1表示第一個字段，n為2表示第二個字段。 $0 : 這個變量包含執行過程中當前行的文本內容。 ARGC : 命令行參數的數目。 ARGIND : 命令行中當前文件的位置（從0開始算）。 ARGV : 包含命令行參數的數組。 CONVFMT : 數字轉換格式（默認值為%.6g）。 ENVIRON : 環境變量關聯數組。 ERRNO : 最后一個系統錯誤的描述。 FIELDWIDTHS : 字段寬度列表（用空格鍵分隔）。 FILENAME : 當前輸入文件的名。 NR : 表示記錄數，在執行過程中對應於當前的行號 FNR : 同NR :，但相對於當前文件。 FS : 字段分隔符（默認是任何空格）。 IGNORECASE : 如果為真，則進行忽略大小寫的匹配。如：IGNORECASE=1表示忽略大小寫

NF : 表示字段數，在執行過程中對應於當前的字段數。 print $NF打印一行中最后一個字段 OFMT : 數字的輸出格式（默認值是%.6g）。 OFS : 輸出字段分隔符（默認值是一個空格）。 ORS : 輸出記錄分隔符（默認值是一個換行符）。 RS : 記錄分隔符（默認是一個換行符）。 RSTART : 由match函數所匹配的字符串的第一個位置。 RLENGTH : 由match函數所匹配的字符串的長度。 SUBSEP : 數組下標分隔符（默認值是34）。

2.4 運算符

2.5 正則表達式

三、案例

我們以passwd文件來操作。操作之前先備份好，使用備份的passwd文件操作

[root@VM_0_10_centos shellScript]# awk -F ":" '{print $1}' /tmp/passwd  root bin # 這里可以使用“,”將打印的多個條件分隔，打印的效果會自動有空格。如果使用的是$1 $2，里邊是用空格，那打印出來的第一列和第3列會挨在一起 [root@VM_0_10_centos shellScript]# awk -F ":" '{print $1,$3}' /tmp/passwd  root 0 bin 1 [root@VM_0_10_centos shellScript]# awk -F ":" '{print $1 $3}' /tmp/passwd  root0 bin1 或 [root@VM_0_10_centos shellScript]# awk -F ":" '{print $1 " " $3}' /tmp/passwd  root 0 bin 1 # 使用制表符打印出信息 [root@VM_0_10_centos shellScript]# awk -F ":" '{print "user:"$1 "\tuid:"$3}' /tmp/passwd  user:root uid:0 user:bin uid:1

1）只查看passwd文件（100內）第20到第30行內容。（面試）

[root@VM_0_10_centos shellScript]# awk '{if(NR>=20 && NR<=30) print $1}' /tmp/passwd  abrt:x:173:173::/etc/abrt:/sbin/nologin sshd:x:74:74:Privilege-separated postfix:x:89:89::/var/spool/postfix:/sbin/nologin chrony:x:997:995::/var/lib/chrony:/sbin/nologin tcpdump:x:72:72::/:/sbin/nologin syslog:x:996:994::/home/syslog:/bin/false mysql:x:27:27:MySQL nagcmd:x:603:1000::/home/nagcmd:/sbin/nologin nagios:x:604:1001::/home/nagios:/sbin/nologin apache:x:1000:1002::/home/apache:/bin/bash nginx:x:602:993:Nginx
# 不加$1打印20-30行所有內容 [root@VM_0_10_centos shellScript]# awk '{if(NR>=20 && NR<=30) print}' /tmp/passwd  abrt:x:173:173::/etc/abrt:/sbin/nologin sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin postfix:x:89:89::/var/spool/postfix:/sbin/nologin chrony:x:997:995::/var/lib/chrony:/sbin/nologin tcpdump:x:72:72::/:/sbin/nologin syslog:x:996:994::/home/syslog:/bin/false mysql:x:27:27:MySQL Server:/var/lib/mysql:/bin/false nagcmd:x:603:1000::/home/nagcmd:/sbin/nologin nagios:x:604:1001::/home/nagios:/sbin/nologin apache:x:1000:1002::/home/apache:/bin/bash nginx:x:602:993:Nginx web server:/var/lib/nginx:/sbin/nologin [root@VM_0_10_centos shellScript]# nl /tmp/passwd | awk '{if(NR>=20 && NR<=30) print}' 
    20    abrt:x:173:173::/etc/abrt:/sbin/nologin 21    sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin 22    postfix:x:89:89::/var/spool/postfix:/sbin/nologin 23    chrony:x:997:995::/var/lib/chrony:/sbin/nologin 24    tcpdump:x:72:72::/:/sbin/nologin 25    syslog:x:996:994::/home/syslog:/bin/false
    26    mysql:x:27:27:MySQL Server:/var/lib/mysql:/bin/false
    27    nagcmd:x:603:1000::/home/nagcmd:/sbin/nologin 28    nagios:x:604:1001::/home/nagios:/sbin/nologin 29    apache:x:1000:1002::/home/apache:/bin/bash 30    nginx:x:602:993:Nginx web server:/var/lib/nginx:/sbin/nologin

2）已知text.txt文件內容，打印出Poe和33794712，並以 Poe 33794712格式顯示

[root@VM_0_10_centos shellScript]# cat test.txt I am Poe,my qq is 33794712

# 使用多個分隔符，先使用空格分割，然后對分割結果再使用","分割
[root@VM_0_10_centos shellScript]# awk -F "[ ,]" '{print $3,$7}' test.txt Poe 33794712 或 [root@VM_0_10_centos shellScript]# awk -F '[ ,]+' '{print $3,$7}' test.txt Poe 33794712

3）awk -F 指定分割字符

# -F 相當於內置變量FS
[root@VM_0_10_centos shellScript]#  awk 'BEGIN{FS=","} {print $1,$2}' test.txt 2 this is a test 3 Are you like awk This's a test 
10 There are orange apple

4）通過-v 設置變量

[root@VM_0_10_centos shellScript]# cat test.txt I am Poe,my qq is 33794712
2
7 [root@VM_0_10_centos shellScript]# awk -vx=12 '{print $1,$1+x}' test.txt I 12
2 14
7 19 [root@VM_0_10_centos shellScript]# awk -vx=12 '{print $1,$(1+x)}' test.txt I 2 
7 [root@VM_0_10_centos shellScript]# awk -vx=12 -vy=i '{print $1,$1+x,$1y}' test.txt I 12 Ii 2 14 2i 7 19 7i

5）格式化輸出

# 8s代表8個space \n回車換行 %- 后面的-表示前面不空格，不加的話前面也會有8個空格
# 因為這里是打印的$1和$4,所以前面格式輸出的有兩個值：%-8s和%-10s [root@VM_0_10_centos shellScript]# awk '{printf "%-8s %-10s\n",$1,$4}' test.txt I qq 2                  
7

6）過濾第一列大於2、等於2、第一列大於2且第2列等於'Are'

[root@VM_0_10_centos shellScript]# awk '$1>2 {print $1,$3}' test.txt 3 you This's test
10 are # 需注意，這里不能使用單=號，會被認為是賦值。判斷是否等於要使用雙等號== [root@VM_0_10_centos shellScript]# awk '$1==2 {print $1,$3}' test.txt 2 is [root@VM_0_10_centos shellScript]# awk '$1=2 {print $1,$3}' test.txt 2 is
2 you 2 test 2 are [root@VM_0_10_centos shellScript]# awk '$1>2 && $2=="Are" {print $1,$3}' test.txt 3 you

7）內置參數用法

NF：字段數　　NR：行號　　FNR：文本行號（文件計數的行號）

FS：記錄是以什么為分隔符　　RS：默認分隔符（\n）　　FILENAME：當前文件名　　

OFS：輸出記錄分隔符（輸出換行符），輸出時用指定的符號代替換行符

ORS：輸出記錄分隔符(默認值是一個換行符)

[root@VM_0_10_centos shellScript]# cat test.txt 2 this is a test 3 Are you like awk This's a test
10 There are orange,apple,mongo [root@VM_0_10_centos shellScript]# awk 'BEGIN{printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n","FILENAME","ARGC","FNR","FS", "NF","NR","OFS","ORS","RS";printf "---------------------------------------------\n"} {printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n",FILENAME,ARGC,FNR,FS,NF,NR,OFS,ORS,RS}' test.txt 
FILENAME ARGC FNR FS NF NR OFS ORS RS
--------------------------------------------- test.txt 2    1         5    1 test.txt 2    2         5    2 test.txt 2    3         3    3 test.txt 2    4         4    4 [root@VM_0_10_centos shellScript]# awk -F "," 'BEGIN{printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n","FILENAME","ARGC","FNR"
,"FS","NF","NR","OFS","ORS","RS";printf "---------------------------------------------\n"} {printf "%4s %4s %4s %4s %4s %4s %4s %4s %4s\n",FILENAME,ARGC,FNR,FS,NF,NR,OFS,ORS,RS}' test.txt 
FILENAME ARGC FNR FS NF NR OFS ORS RS
--------------------------------------------- test.txt 2    1    ,    1    1 test.txt 2    2    ,    1    2 test.txt 2    3    ,    1    3 test.txt 2    4    ,    3    4

8）輸出行號（行條數NR）及文件行號（FNR）

# $0　在這里是打印文件所有內容 [root@VM_0_10_centos shellScript]# awk '{print NR,FNR,$0}' test.txt 1 1 2 this is a test 2 2 3 Are you like awk 3 3 This's a test
4 4 10 There are orange,apple,mongo

9）輸出指定的分隔符

# 注意這里的$前后都加了空格，也可以不加空格，但是格式會比較緊湊 [root@VM_0_10_centos shellScript]# awk '{print $1,$2,$5}' OFS=" $ " test.txt 2 $ this $ test 3 $ Are $ awk This's $ a $ 
10 $ There $

10）使用正則，字符串匹配。匹配第2列包含“th”字符，並打印出第二列和第四列

~ 表示模式開始。// 中是模式。！~表示模式取反，不匹配搜索的行

[root@VM_0_10_centos shellScript]# awk '$2 ~ /th/ {print $2,$4}' test.txt 
this a

四、awk腳本模式

　　對於每個輸入行， awk 都會執行每個腳本代碼塊一次。然而，在許多編程情況中，可能需要在 awk 開始處理輸入文件中的文本之前執行初始化代碼。對於這種情況， awk 允許您定義一個 BEGIN 塊。

　　因為 awk 在開始處理輸入文件之前會執行 BEGIN 塊，因此它是初始化 FS（字段分隔符）變量、打印頁眉或初始化其它在程序中以后會引用的全局變量的極佳位置。
　　awk 還提供了另一個特殊塊，叫作 END 塊。 awk 在處理了輸入文件中的所有行之后執行這個塊。通常， END 塊用於執行最終計算或打印應該出現在輸出流結尾的摘要信息。

格式：

BEGIN{ 執行前的語句 }
END {處理完所有的行后要執行的語句 }
{處理每一行時要執行的語句}

案例：

1）通過腳本查看輸出的效果

[root@VM_0_10_centos shellScript]# cat score.txt Marry 2143 78 84 77 Jack 2321 66 78 45 Tom 2122 48 77 71 Mike 2537 87 97 95 Bob 2415 40 57 62 [root@VM_0_10_centos shellScript]# cat awk_score.awk #/bin/awk -f # 注意上面使用的bash # 運行前，輸出名稱 BEGIN { Chinese = 0 Math = 0 English = 0 printf "NAME NO. Chinese Math English TOTAL\n" printf "-------------------------------------------------------------\n" } # 運行中 { Chinese += $3 Math += $4 English += $5 printf "%-8s %-8s %6d %10d %10d %12d\n",$1,$2,$3,$4,$5, $3+$4+$5 } # 運行后 END { printf "-------------------------------------------------------------\n" printf " TOTAL:%16d %10d %10d \n",Chinese,Math,English printf "AVERAGE:%16.2f %10.2f %10.2f\n",Chinese/NR,Math/NR,English/NR } [root@VM_0_10_centos shellScript]# awk -f awk_score.awk score.txt NAME NO. Chinese Math English TOTAL ------------------------------------------------------------- Marry 2143         78         84         77          239 Jack 2321         66         78         45          189 Tom 2122         48         77         71          196 Mike 2537         87         97         95          279 Bob 2415         40         57         62          159
------------------------------------------------------------- TOTAL: 319        393        350 AVERAGE: 63.80      78.60      70.00

2）計算文件大小

[root@VM_0_10_centos shellScript]# ls -l *.sh -rwxr-xr-x 1 root root  675 Oct  8 14:36 addUser.sh -rwxr-xr-x 1 root root 1148 Oct 10 09:34 autoCreateUser.sh -rwxr-xr-x 1 root root  559 Oct  9 08:46 checkMem.sh -rwxr-xr-x 1 root root  338 Oct  9 08:58 checkRoot.sh -rwxr-xr-x 1 root root  574 Oct 10 10:28 createUsers.sh -rwxr-xr-x 1 root root  425 Oct 10 10:22 delUsers.sh -rwxr-xr-x 1 root root  628 Oct 14 09:16 modifyExtension.sh -rwxr-xr-x 1 root root  121 Oct 12 18:19 mulTable.sh -rwxr-xr-x 1 root root  844 Oct 10 10:56 numSort.sh -rwxr-xr-x 1 root root  518 Oct 12 17:41 progressBar2.sh -rwxr-xr-x 1 root root  784 Oct 12 16:38 progressBar.sh -rwxr-xr-x 1 root root  213 Oct 14 15:54 randowName.sh -rwxr-xr-x 1 root root  239 Oct 14 16:05 sum.sh -rwxr-xr-x 1 root root   33 Oct  8 14:50 test.sh [root@VM_0_10_centos shellScript]# ls -l *.sh | awk '{sum+=$5} END {print sum}'
7099

3）打印九九乘法表

[root@VM_0_10_centos shellScript]# seq 9 | sed 'H;g' | awk -v RS='' '{for(i=1;i<=NF;i++)printf("%dx%d=%d%s", i, NR, i*NR, i
==NR?"\n":"\t")}'
1x1=1
1x2=2    2x2=4 1x3=3    2x3=6    3x3=9 1x4=4    2x4=8    3x4=12    4x4=16 1x5=5    2x5=10    3x5=15    4x5=20    5x5=25 1x6=6    2x6=12    3x6=18    4x6=24    5x6=30    6x6=36 1x7=7    2x7=14    3x7=21    4x7=28    5x7=35    6x7=42    7x7=49 1x8=8    2x8=16    3x8=24    4x8=32    5x8=40    6x8=48    7x8=56    8x8=64 1x9=9    2x9=18    3x9=27    4x9=36    5x9=45    6x9=54    7x9=63    8x9=72    9x9=81

4）統計passwd賬戶人數

[root@VM_0_10_centos shellScript]# awk '{count++;} END{print "USER Total:" count}' /tmp/passwd USER Total:35 [root@VM_0_10_centos shellScript]# awk 'BEGIN {COUNT=0;print "[start]:" COUNT} {COUNT++;} END{print "USER Total:" COUNT }' 
/tmp/passwd [start]:0 USER Total:35 [root@VM_0_10_centos shellScript]# awk 'BEGIN {COUNT=0;print "[start]:" COUNT} {COUNT+=1;} END{print "USER Total:" COUNT }'
 /tmp/passwd [start]:0 USER Total:35

5）查詢某個文件字節大小

[root@VM_0_10_centos shellScript]# ll users.txt | awk 'BEGIN {SIZE=0} {SIZE=$5+SIZE} END {print "[end] SIZE:" SIZE}' [end] SIZE:68 或 [root@VM_0_10_centos shellScript]# ll numSort.sh | awk 'BEGIN {SIZE=0} {SIZE=$5+SIZE} END {print "[end] SIZE:" SIZE/1024/10
24 ,"M"}'[end] SIZE:0.000804901 M

補充單位換算：

6）經典：查看服務器的連接狀態

[root@VM_0_10_centos shellScript]# netstat -an|awk '/^tcp/ {++s[$NF]} END{for(a in s)print a,s[a]}' LISTEN 7 ESTABLISHED 3 TIME_WAIT 3

7）查看日志訪問情況

awk '{a[$7]+=$10;++b[$7];total+=$10}END{for(x in a)print b[x],x,a[x]|"sort -rn -k1";print "total size is :"total}' /app/log/access_log

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 awk文本處理【文本處理命令】之grep搜索命令詳解常用文本處理命令幾個常用的文本處理shell 命令：find、grep、sort、uniq、sed、awk Linux文本處理三劍客之awk學習筆記05：getline用法詳解文本處理三劍客之AWK的用法 Shell awk文本處理,shell腳本編寫 Python文本處理 thymeleaf文本處理 Linux下命令行（二）之文本處理基礎