linux grep 和 sed使用

本文轉載自查看原文 2015-11-30 14:50 1813 grep sed

http://www.cnblogs.com/zhuyp1015/archive/2012/07/01/2572289.html

聽說過sed 和 awk 比較強大，專門學習了一下。

使用這些shell工具需要一些正則表達式的知識，這里先來了解一些特殊符號的意思：

特殊符號	代表意義
[:alnum:]	代表英文大小寫字節及數字，亦即 0-9, A-Z, a-z
[:alpha:]	代表任何英文大小寫字節，亦即 A-Z, a-z
[:blank:]	代表空白鍵與 [Tab] 按鍵兩者
[:cntrl:]	代表鍵盤上面的控制按鍵，亦即包括 CR, LF, Tab, Del.. 等等
[:digit:]	代表數字而已，亦即 0-9
[:graph:]	除了空白字節 (空白鍵與 [Tab] 按鍵) 外的其他所有按鍵
[:lower:]	代表小寫字節，亦即 a-z
[:print:]	代表任何可以被列印出來的字節
[:punct:]	代表標點符號 (punctuation symbol)，亦即：" ' ? ! ; : # $...
[:upper:]	代表大寫字節，亦即 A-Z
[:space:]	任何會產生空白的字節，包括空白鍵, [Tab], CR 等等
[:xdigit:]	代表 16 進位的數字類型，因此包括： 0-9, A-F, a-f 的數字與字節

結合實例來練習，使用下面的文本來練習（regular_express.txt）：

"Open Source" is a good mechanism to develop programs.
apple is my favorite food.
Football game is not use feet only.
this dress doesn't fit me.
However, this dress is about $ 3183 dollars.^M
GNU is free air not free beer.^M
Her hair is very beauty.^M
I can't finish the test.^M
Oh! The soup taste good.^M
motorcycle is cheap than car.
This window is clear.
the symbol '*' is represented as start.
Oh!     My god!
The gd software is a library for drafting programs.^M
You are the best is mean you are the no. 1.
The world <Happy> is the same with "glad".
I like dog.
google is the best tools for search keyword.
goooooogle yes!
go! go! Let's go.
# I am VBird


搜尋特定字符串：

[root@www ~]# grep -n 'the' regular_express.txt
8:I can't finish the test.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.

反向選擇，可以使用 'v' 選項：

[root@www ~]# grep -vn 'the' regular_express.txt

則所有包含 ‘the’ 的行都不會顯示

如果不需要區分大小寫，可以使用 ‘i’ 選項：

[root@www ~]# grep -in 'the' regular_express.txt
8:I can't finish the test.
9:Oh! The soup taste good.
12:the symbol '*' is represented as start.
14:The gd software is a library for drafting programs.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.

利用中括號 [] 來搜尋集合字節

[root@www ~]# grep -n 't[ae]st' regular_express.txt
8:I can't finish the test.
9:Oh! The soup taste good.

如果需要匹配某一部分，但是不想匹配以某些開頭的情況：（匹配oo，但不匹配以g開頭的情況）

[root@www ~]# grep -n '[^g]oo' regular_express.txt
2:apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!

但是我的機器上（使用ubuntu11.04），下面這種情況不能匹配成功：

[root@www ~]# grep -n '[^a-z]oo' regular_express.txt
3:Football game is not use feet only. #本來該有這樣的輸出，但是卻無輸出

轉而使用下面的命令則成功：

[root@www ~]# grep -n '[^[:lower:]]oo' regular_express.txt
3:Football game is not use feet only.

行首與行尾字節 ^ $

[root@www ~]# grep -n '^the' regular_express.txt
12:the symbol '*' is represented as start.

匹配以小寫字母開頭的行：

[root@www ~]# grep -n '^[a-z]' regular_express.txt
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.

匹配非大小寫字母開頭的行：（很納悶，這里使用 [^a-zA-Z] 為什么就能成功？）

[root@www ~]# grep -n '^[^a-zA-Z]' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
21:# I am VBird

注意：那個 ^ 符號，在字節集合符號(括號[])之內與之外是不同的！在 [] 內代表“反向選擇”，在 [] 之外則代表定位在行首的意義！

匹配空行：

[root@www ~]# grep -n '^$' regular_express.txt
22:

任意一個字節 . 與重復字節 *

[root@www ~]# grep -n 'goo*g' regular_express.txt
18:google is the best tools for search keyword.
19:goooooogle yes!

在正則表達式中‘.’ 0個或1個任意字符，‘*’代表任意多個任意字符

限定連續 RE 字符范圍 {}

[root@www ~]# grep -n 'go\{2,5\}g' regular_express.txt
18:google is the best tools for search keyword.

上面的式子匹配了2個‘o’，表達式的意思是匹配2-5個‘o’

若第二個參數為空，則匹配第一個參數到無窮多個的情況：

[root@www ~]# grep -n 'go\{2,\}g' regular_express.txt
18:google is the best tools for search keyword.
19:goooooogle yes!

將基礎的正規表示法特殊字符匯整如下：

RE 字符	意義與范例
^word	意義：待搜尋的字串(word)在行首！范例：搜尋行首為 # 開始的那一行，並列出行號 grep -n '^#' regular_express.txt
word$	意義：待搜尋的字串(word)在行尾！范例：將行尾為 ! 的那一行列印出來，並列出行號 grep -n '!$' regular_express.txt
.	意義：代表『一定有一個任意字節』的字符！范例：搜尋的字串可以是 (eve) (eae) (eee) (e e)，但不能僅有 (ee) ！亦即 e 與 e 中間『一定』僅有一個字節，而空白字節也是字節！ grep -n 'e.e' regular_express.txt
\	意義：跳脫字符，將特殊符號的特殊意義去除！范例：搜尋含有單引號 ' 的那一行！ grep -n \' regular_express.txt
*	意義：重復零個到無窮多個的前一個 RE 字符范例：找出含有 (es) (ess) (esss) 等等的字串，注意，因為 * 可以是 0 個，所以 es 也是符合帶搜尋字串。另外，因為 * 為重復『前一個 RE 字符』的符號，因此，在 * 之前必須要緊接著一個 RE 字符喔！例如任意字節則為『.』！ grep -n 'ess' regular_express.txt
[list]	意義：字節集合的 RE 字符，里面列出想要擷取的字節！范例：搜尋含有 (gl) 或 (gd) 的那一行，需要特別留意的是，在 [] 當中『謹代表一個待搜尋的字節』，例如『 a[afl]y 』代表搜尋的字串可以是 aay, afy, aly 即 [afl] 代表 a 或 f 或 l 的意思！ grep -n 'g[ld]' regular_express.txt
[n1-n2]	意義：字節集合的 RE 字符，里面列出想要擷取的字節范圍！范例：搜尋含有任意數字的那一行！需特別留意，在字節集合 [] 中的減號 - 是有特殊意義的，他代表兩個字節之間的所有連續字節！但這個連續與否與 ASCII 編碼有關，因此，你的編碼需要配置正確(在 bash 當中，需要確定 LANG 與 LANGUAGE 的變量是否正確！) 例如所有大寫字節則為 [A-Z] grep -n '[A-Z]' regular_express.txt
[^list]	意義：字節集合的 RE 字符，里面列出不要的字串或范圍！范例：搜尋的字串可以是 (oog) (ood) 但不能是 (oot) ，那個 ^ 在 [] 內時，代表的意義是『反向選擇』的意思。例如，我不要大寫字節，則為 [^A-Z]。但是，需要特別注意的是，如果以 grep -n [^A-Z] regular_express.txt 來搜尋，卻發現該文件內的所有行都被列出，為什么？因為這個 [^A-Z] 是『非大寫字節』的意思，因為每一行均有非大寫字節，例如第一行的 "Open Source" 就有 p,e,n,o.... 等等的小寫字 grep -n 'oo[^t]' regular_express.txt
\{n,m\}	意義：連續 n 到 m 個的『前一個 RE 字符』意義：若為 \{n\} 則是連續 n 個的前一個 RE 字符，意義：若是 \{n,\} 則是連續 n 個以上的前一個 RE 字符！范例：在 g 與 g 之間有 2 個到 3 個的 o 存在的字串，亦即 (goog)(gooog) grep -n 'go\{2,3\}g' regular_express.txt

sed 工具：

[root@www ~]# sed [-nefr] [動作]選項與參數：
-n ：使用安靜(silent)模式。在一般 sed 的用法中，所有來自 STDIN
的數據一般都會被列出到螢幕上。但如果加上 -n 參數后，則只有經過
sed 特殊處理的那一行(或者動作)才會被列出來。
-e ：直接在命令列模式上進行 sed 的動作編輯；
-f ：直接將 sed 的動作寫在一個文件內， -f filename 則可以運行 filename 內的
sed 動作；
-r ：sed 的動作支持的是延伸型正規表示法的語法。(默認是基礎正規表示法語法)
-i ：直接修改讀取的文件內容，而不是由螢幕輸出。

動作說明： [n1[,n2]]function
n1, n2 ：不見得會存在，一般代表『選擇進行動作的行數』，舉例來說，如果我的動作
是需要在 10 到 20 行之間進行的，則『 10,20[動作行為] 』

function 有底下這些咚咚：
a ：新增， a 的后面可以接字串，而這些字串會在新的一行出現(目前的下一行)～
c ：取代， c 的后面可以接字串，這些字串可以取代 n1,n2 之間的行！
d ：刪除，因為是刪除啊，所以 d 后面通常不接任何咚咚；
i ：插入， i 的后面可以接字串，而這些字串會在新的一行出現(目前的上一行)；
p ：列印，亦即將某個選擇的數據印出。通常 p 會與參數 sed -n 一起運行～
s ：取代，可以直接進行取代的工作哩！通常這個 s 的動作可以搭配

范例一：將 /etc/passwd 的內容列出並且列印行號，同時，請將第 2~5 行刪除！
[root@www ~]# nl /etc/passwd | sed '2,5d'
1 root:x:0:0:root:/root:/bin/bash
6 sync:x:5:0:sync:/sbin:/bin/sync
7 shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
.....(后面省略).....

范例二：承上題，在第二行后(亦即是加在第三行)加上『drink tea?』字樣！
[root@www ~]# nl /etc/passwd | sed '2a drink tea'
1 root:x:0:0:root:/root:/bin/bash
2 bin:x:1:1:bin:/bin:/sbin/nologin
drink tea
3 daemon:x:2:2:daemon:/sbin:/sbin/nologin
.....(后面省略).....

范例三：我想將第2-5行的內容取代成為“No 2-5 number”呢？
[root@www ~]# nl /etc/passwd | sed '2,5c No 2-5 number'
1 root:x:0:0:root:/root:/bin/bash
No 2-5 number
6 sync:x:5:0:sync:/sbin:/bin/sync
.....(后面省略).....

范例四：僅列出 /etc/passwd 文件內的第 5-7 行
[root@www ~]# nl /etc/passwd | sed -n '5,7p'
5 lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
6 sync:x:5:0:sync:/sbin:/bin/sync
7 shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

部分數據的搜尋並取代的功能

基本使用方式：

sed 's/要被取代的字串/新的字串/g'

步驟一：先觀察原始信息，利用 /sbin/ifconfig 查詢 IP 為何？
[root@www ~]# /sbin/ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:90:CC:A6:34:84
inet addr:192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::290:ccff:fea6:3484/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
.....(以下省略).....
# 因為我們還沒有講到 IP ，這里你先有個概念即可啊！我們的重點在第二行，
# 也就是 192.168.1.100 那一行而已！先利用關鍵字捉出那一行！

步驟二：利用關鍵字配合 grep 擷取出關鍵的一行數據
[root@www ~]# /sbin/ifconfig eth0 | grep 'inet addr'
inet addr:192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0
# 當場僅剩下一行！接下來，我們要將開始到 addr: 通通刪除，就是像底下這樣：
# inet addr:192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0
# 上面的刪除關鍵在於『 ^.*inet addr: 』啦！正規表示法出現！ ^_^

步驟三：將 IP 前面的部分予以刪除
[root@www ~]# /sbin/ifconfig eth0 | grep 'inet addr' | \
> sed 's/^.*addr://g'
192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0
# 仔細與上個步驟比較一下，前面的部分不見了！接下來則是刪除后續的部分，亦即：
# 192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0
# 此時所需的正規表示法為：『 Bcast.*$ 』就是啦！

步驟四：將 IP 后面的部分予以刪除
[root@www ~]# /sbin/ifconfig eth0 | grep 'inet addr' | \
> sed 's/^.*addr://g' | sed 's/Bcast.*$//g'
192.168.1.100

范例六：利用 sed 將 regular_express.txt 內每一行結尾若為 . 則換成 !
[root@www ~]# sed -i 's/\.$/\!/g' regular_express.txt

# 上頭的 -i 選項可以讓你的 sed 直接去修改后面接的文件內容而不是由屏幕輸出！注意使用的時候需要對 ‘.’ 進行轉義，而且需要指明是對結尾為‘.’ 的進行替換，使用 $ 符號

這里就順便來實踐一下，我有一個dos環境下創建的文件，當復制到 unix 環境下之后，結尾就會出現難看的“^M” 符號，如果要把這個符號去掉使用sed工具可以輕松搞定：

[root@www ~]# sed -i 's/\^M$//g' regular_express.txt

注：學習內容來源於“鳥哥的私房菜”，自己都在Ubuntu11.04下運行過這些實例，最后一個是自己使用sed解決的一個實際問題。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 awk\sed\grep的使用 Linux命令-grep,sed,awk grep、cut、awk、sed的使用 Linux-正則、grep、sed學習筆記【Linux】日志分析工具grep sed sort linux sed、awk、grep同時匹配多個條(並且 or 或者) Linux高級命令 ==> find、grep、sed、awk linux sed、awk、grep同時匹配多個條(並且 or 或者) linux shell grep/awk/sed 匹配tab grep 和 sed

RE 字符	意義與范例
^word	意義：待搜尋的字串(word)在行首！范例：搜尋行首為 # 開始的那一行，並列出行號 grep -n '^#' regular_express.txt
word$	意義：待搜尋的字串(word)在行尾！范例：將行尾為 ! 的那一行列印出來，並列出行號 grep -n '!$' regular_express.txt
.	意義：代表『一定有一個任意字節』的字符！范例：搜尋的字串可以是 (eve) (eae) (eee) (e e)，但不能僅有 (ee) ！亦即 e 與 e 中間『一定』僅有一個字節，而空白字節也是字節！ grep -n 'e.e' regular_express.txt
\	意義：跳脫字符，將特殊符號的特殊意義去除！范例：搜尋含有單引號 ' 的那一行！ grep -n \' regular_express.txt
*	意義：重復零個到無窮多個的前一個 RE 字符范例：找出含有 (es) (ess) (esss) 等等的字串，注意，因為 * 可以是 0 個，所以 es 也是符合帶搜尋字串。另外，因為 * 為重復『前一個 RE 字符』的符號，因此，在 * 之前必須要緊接著一個 RE 字符喔！例如任意字節則為『.』！ grep -n 'ess' regular_express.txt
[list]	意義：字節集合的 RE 字符，里面列出想要擷取的字節！范例：搜尋含有 (gl) 或 (gd) 的那一行，需要特別留意的是，在 [] 當中『謹代表一個待搜尋的字節』，例如『 a[afl]y 』代表搜尋的字串可以是 aay, afy, aly 即 [afl] 代表 a 或 f 或 l 的意思！ grep -n 'g[ld]' regular_express.txt
[n1-n2]	意義：字節集合的 RE 字符，里面列出想要擷取的字節范圍！范例：搜尋含有任意數字的那一行！需特別留意，在字節集合 [] 中的減號 - 是有特殊意義的，他代表兩個字節之間的所有連續字節！但這個連續與否與 ASCII 編碼有關，因此，你的編碼需要配置正確(在 bash 當中，需要確定 LANG 與 LANGUAGE 的變量是否正確！) 例如所有大寫字節則為 [A-Z] grep -n '[A-Z]' regular_express.txt
[^list]	意義：字節集合的 RE 字符，里面列出不要的字串或范圍！范例：搜尋的字串可以是 (oog) (ood) 但不能是 (oot) ，那個 ^ 在 [] 內時，代表的意義是『反向選擇』的意思。例如，我不要大寫字節，則為 [^A-Z]。但是，需要特別注意的是，如果以 grep -n [^A-Z] regular_express.txt 來搜尋，卻發現該文件內的所有行都被列出，為什么？因為這個 [^A-Z] 是『非大寫字節』的意思，因為每一行均有非大寫字節，例如第一行的 "Open Source" 就有 p,e,n,o.... 等等的小寫字 grep -n 'oo[^t]' regular_express.txt
\{n,m\}	意義：連續 n 到 m 個的『前一個 RE 字符』意義：若為 \{n\} 則是連續 n 個的前一個 RE 字符，意義：若是 \{n,\} 則是連續 n 個以上的前一個 RE 字符！范例：在 g 與 g 之間有 2 個到 3 個的 o 存在的字串，亦即 (goog)(gooog) grep -n 'go\{2,3\}g' regular_express.txt