python工作實戰---對文本文件進行分析

本文轉載自查看原文 2020-01-14 19:03 906 Python

查找以什么關鍵字結尾的文件
判斷文件大小
使用python分析Apache的訪問日志

判斷目錄下.py結尾的文件

[smcuser@smc-postman-script test]$ ll
total 4
-rw-rw-r--. 1 smcuser smcuser   0 Jan 12 23:13 1.txt -rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:13 2.txt -rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:13 3.txt -rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:13 4.txt -rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:13 5.txt -rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:14 a.py -rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:14 b.py -rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:14 c.py -rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:14 d.py -rw-rw-r--. 1 smcuser smcuser 0 Jan 12 23:14 e.py -rw-rw-r--. 1 smcuser smcuser 116 Jan 12 23:35 test.py

#!/url/bin/evn python # # import os test = [item for item in os.listdir('.') if item.endswith('.py')] print(test) 執行結果 [smcuser@smc-postman-script test]$ python test.py ['a.py', 'b.py', 'c.py', 'd.py', 'e.py', 'test.py']

判斷文件大小

#!/url/bin/evn python
#
#
import os
txt = [item for item in os.listdir('.') if item.endswith('.txt')]

sun_size = sum(os.path.getsize(os.path.join('/tmp/test',item)) for item in txt)

print(sun_size)

使用python分析Apache的訪問日志

Apache日志示例
193.252.243.232 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )”””
193.252.243.232 - - [29/Mar/2009:06:05:34+0200)” GET /index.html HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )”””
193.252.243.231 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )”””
193.252.243.230 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )”””
193.252.243.237 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )”””
193.252.243.237 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )”””
193.252.243.230 - - [29/Mar/2009:06:05:34+0200)” GET /index.html HTTP/1.1” 400 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )”””
193.252.243.232 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 503 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )”””

跟進IP獲取網站的PV和UV(PV是網站的訪問請求數，UV是網站的獨立訪客數) #!/bin/usr/evn python

ips = []
with open('access.log') as f:
    for line in f:
        ips.append(line.split()[0])

print("pv is {0}".format(len(ips)))
print("uv is {0}".format(len(set(ips))))

統計網站中最熱的資源，counter是dict的子類，對於普通的計數功能，Counter比字典更好用 #!/usr/bin/env python

from collections import Counter

c = Counter()

with open('access.log') as f:
    for line in f:
        c[line.split()[5]] += 1

print(c.most_common(10))

統計用戶體驗，如果http code 為4xx 5xx則視為訪問出錯，統計出錯比例 #!/url/bin/env python
#
d = {}
with open('access.log') as f:
    for line in f:
        key = line.split()[7]
        d.setdefault(key,0)
        d[key] += 1
print(d)
sum_requests = 0
error_requests = 0

for key,val in d.iteritems():
    if int(key) >= 400:
        error_requests += val
    sum_requests += val

print('error rate: {0:.2f}%'.format(error_requests * 100 / sum_requests))

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 二、python3對文本文件進行操作 python對文本文件的讀寫操作 python分析文本文件/json C++對文本文件的讀取和輸出 Hadoop對文本文件的快速全局排序 C#對文本文件的讀寫（轉） Python讀寫txt文本文件 python 文本文件操作 python 將字節寫入文本文件 python 讀取文本文件