Linux學習29-awk提取log日志信息,統計日志里面ip訪問次數排序


前言

有一段log日志,需從日志里面分析,統計IP訪問次數排序前10名,查看是否有異常攻擊。

日志提取

如下日志,這段日志包含的信息內容較多,我們希望提取ip,訪問時間,請求方式,訪問路徑(不帶參數),狀態碼

123.125.72.61 - - [05/Dec/2018:00:00:02 +0000] "GET /yoyo/artical?locale=en HTTP/1.1" 200 12164 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 0.032 0.032 .
123.125.72.61 - - [05/Dec/2018:00:00:02 +0000] "GET /index?page=1 HTTP/1.1" 200 16739 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 0.120 0.120 .
141.1.142.111 - - [05/Dec/2018:00:00:02 +0000] "GET /index?page=61 HTTP/1.1" 200 16739 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 0.120 0.120 .
141.1.142.131 - - [05/Dec/2018:00:00:02 +0000] "GET /yoyoketang?page=62 HTTP/1.1" 200 16739 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 0.120 0.120 .
141.1.142.131 - - [05/Dec/2018:00:00:02 +0000] "GET /blog?page=3 HTTP/1.1" 200 16739 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 0.120 0.120 .
142.22.12.132 - - [05/Dec/2018:00:00:02 +0000] "GET /blog?page=1 HTTP/1.1" 200 16739 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 0.120 0.120 .
142.22.12.132 - - [05/Dec/2018:00:00:02 +0000] "POST /blog?page=1 HTTP/1.1" 200 16739 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 0.120 0.120 .
142.22.12.132 - - [05/Dec/2018:00:00:02 +0000] "POST /blog?page=3 HTTP/1.1" 200 16739 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 0.120 0.120 .

可以使用 awk 對日志內容格式化輸出,根據空格格式化輸出,第一列是ip,也就是'{print $1}',其它列依次類推

awk '{print $1,$4,$6,$7,$9}' log.txt

[root@VM_0_2_centos ~]# awk '{print $1,$4,$6,$7,$9}' log.txt
123.125.72.61 [05/Dec/2018:00:00:02 "GET /yoyo/artical?locale=en 200
123.125.72.61 [05/Dec/2018:00:00:02 "GET /index?page=1 200
141.1.142.111 [05/Dec/2018:00:00:02 "GET /index?page=61 200
141.1.142.131 [05/Dec/2018:00:00:02 "GET /yoyoketang?page=62 200
141.1.142.131 [05/Dec/2018:00:00:02 "GET /blog?page=3 200
142.22.12.132 [05/Dec/2018:00:00:02 "GET /blog?page=1 200
142.22.12.132 [05/Dec/2018:00:00:02 "POST /blog?page=1 200
142.22.12.132 [05/Dec/2018:00:00:02 "POST /blog?page=1 200

接下來需要去掉多余的[ 和 " 和?后面的參數,可以使用符號繼續分割

awk -F '[[, ",?]' '{print $1,$5,$8,$9,$13}' log.txt

[root@VM_0_2_centos ~]# awk -F '[[, ",?]'  '{print $1,$5,$8,$9,$13}' log.txt
123.125.72.61 05/Dec/2018:00:00:02 GET /yoyo/artical 200
123.125.72.61 05/Dec/2018:00:00:02 GET /index 200
141.1.142.111 05/Dec/2018:00:00:02 GET /index 200
141.1.142.131 05/Dec/2018:00:00:02 GET /yoyoketang 200
141.1.142.131 05/Dec/2018:00:00:02 GET /blog 200
142.22.12.132 05/Dec/2018:00:00:02 GET /blog 200
142.22.12.132 05/Dec/2018:00:00:02 POST /blog 200
142.22.12.132 05/Dec/2018:00:00:02 POST /blog 200

統計ip次數

統計IP訪問次數排序前10名,使用 sort 對內容進行排序,默認是自然順序排序。head -10 是前十個倒敘

[root@VM_0_2_centos ~]# awk -F '[[, ",?]'  '{print $1,$5,$8,$9,$13}' log.txt  | sort | uniq -c | sort -k 1 -n -r |head -10
      2 142.22.12.132 05/Dec/2018:00:00:02 POST /blog 200
      1 142.22.12.132 05/Dec/2018:00:00:02 GET /blog 200
      1 141.1.142.131 05/Dec/2018:00:00:02 GET /yoyoketang 200
      1 141.1.142.131 05/Dec/2018:00:00:02 GET /blog 200
      1 141.1.142.111 05/Dec/2018:00:00:02 GET /index 200
      1 123.125.72.61 05/Dec/2018:00:00:02 GET /yoyo/artical 200
      1 123.125.72.61 05/Dec/2018:00:00:02 GET /index 200

uniq指令用於排重,而是只適用於相鄰兩行相同的情況。所以一般結合sort使用。即先sort排序再排重。
uniq -u是只顯示唯一的記錄行。uniq -c是顯示有重復記錄的情況。sort -k 1 -n -r這個指令,參看下面sort指令參數的詳細說明
sort選項與參數:

  • -f :忽略大小寫的差異,例如 A 與 a 視為編碼相同;
  • -b :忽略最前面的空格符部分;
  • -M :以月份的名字來排序,例如 JAN, DEC 等等的排序方法;
  • -n :使用『純數字』進行排序(默認是以文字型態來排序的);
  • -r :反向排序;
  • -u :就是 uniq ,相同的數據中,僅出現一行代表;
  • -t :分隔符,默認是用 [tab] 鍵來分隔;
  • -k :以哪個區間 (field) 來進行排序的意思


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM