洛谷全網排名生成系統 (Luogu-question-rank)

本文轉載自查看原文 2022-04-21 16:00 989 整活類

前言

這是這位蒟蒻OIer的第一篇博客，請看下去吧

背景

在寒假時，我報了洛谷網校的算法基礎組春令營（這不是廣告！）

看着團隊里神犇叢生，我不禁陷入了沉思。。。

"怎樣才能知道自己在團隊里的排名呢?"

沉思片刻后，我意識到，如果只在團隊里排名，格局就小了

我要給洛谷全網用戶排名！

開始整活！

初步構想

既然是要整活，那一定要整大活！

經過幾天的構想，我列出了這個系統必須要有的幾個功能：

1.可以獲取到用戶的咕值排名
2.可以獲取到用戶每種難度的題切了多少
3.能把所有數據整合在一起（比如Excel表格）
4.不會對洛谷網站運行產生影響（你谷日爆我可賠不起）

實現這些功能的話，還得用上我的副業——Python，對，就是用爬蟲來實現

數據來源

為了同時保證程序運行效率和不影響洛谷網站，我選用的數據源是Luogu-card

長這樣↓

Luogu-card的數據半天刷新一次，所以應該不會對Luogu本網造成影響

至於怎么找到需要提取的數據...用正則表達式在HTML源碼里匹配就行

話不多說，上題解ヾ(≧▽≦*)o

程序詳解

首先需要引用的庫有以下幾個

import re  #正則表達式
from matplotlib.pyplot import text  #我也不知道做什么的庫
import requests  #最基礎的爬蟲庫
import xlwt   #Excel表格操作庫

接下來把要匹配的題目難度弄成數組，並對表格進行初始化

dic = ['未評定','入門','普及-','普及/提高-','/提高','省選-','省選/NOI-','CTSC','寫掛了']
workbook = xlwt.Workbook(encoding= 'ascii') #創建新表格
worksheet = workbook.add_sheet("Luogu-rank")  #在表格中創建新工作表
worksheet.write(0,0,'用戶ID',style2)     #worksheet.write(i,j,k)會在表格第i行第j列寫入k
worksheet.write(0,1,'用戶名',style2)     #style2是字體格式，一會兒會說
worksheet.write(0,2,'咕值排名',style2)
worksheet.write(0,3,'總通過題數',style2)  
for i in range(13):
    if i == 0 | i > 2:
        worksheet.col(i).width = 256 * 10 #調節列的大小
    else:
        worksheet.col(i).width = 256 * 20
    if i > 3:
        worksheet.write(0,i,dic[i - 4],style2)   #在表頭寫入難度等級

要准備的都准備完了，開始上硬貨！（代碼塊可能有些長，但也不好拆分，湊合看吧）

for i in range(1,10000):  #這里是想要生成數據的ID范圍
    worksheet.write(i,0,i)
    responses = requests.get('https://statcard.vercel.app/practice?id=' + str(i))  #爬蟲訪問數據網站
    if len(responses.text) > 900:  #數據可以獲取的卡片HTML源碼字符數在900字一下（試出來的）
        if re.search("NULL",responses.text) != None:  #空用戶的昵稱會顯示為“NULL”（也是試出來的）
            worksheet.write_merge(i, i, 1, 12, '無此用戶',style2)   #合並單元格並寫入數據
        else:
            username = re.search(r'(.+)\s+</text>\s+<text x=".*" y=".*" class="title" font-weight="normal">\s+的賀題情況',responses.text)  #從HTML源碼上下文中用正則表達式匹配用戶名
            if username == None:
                username = re.search(r'<text x=".*" y=".*" fill=".*" font-weight=".*" textLength=".*">\s+(.*)\s+</text>\s+<svg xmlns="http://www.w3.org/2000/svg" x=".*" y=".*" width=".*" height=".*"',responses.text)  #還有另一個地方可以匹配到，放起來備用
            worksheet.write(i,1,username.group(1).lstrip(),style)
            problem_summ = re.search(r'已賀(.*)題, 被(.*)人吊打',responses.text)  #匹配AC數與排名
            if problem_summ.group(2) == 'INF':
                worksheet.write(i,2,'暫無數據',style3)  #沒有咕值時會顯示INF（叒是試出來的）
            else:
                worksheet.write(i,2,int(problem_summ.group(2)))
            worksheet.write(i,3,int(problem_summ.group(1)))
            for j in range(9):
                search_obj = re.search(str(dic[j]) + r'</text>\s+<text x=".*" y="15" class="text">(.*)題</text>',responses.text)   #循環匹配各難度的AC數
                worksheet.write(i,j + 4,int(search_obj.group(1)))
    else:
        worksheet.write_merge(i, i, 1, 12, '用戶開啟了“完全隱私保護”，獲取數據失敗',style2)
    print(i)  #這是調試代碼，你不要也罷
workbook.save("Luogu-rank1.xls")  #保存表格，程序結束

最后把完整代碼放出來

import re
from matplotlib.pyplot import text
import requests
import xlwt

style = xlwt.XFStyle()
font = xlwt.Font()
font.name = '宋體'
style.font = font
al = xlwt.Alignment()
al.horz = 0x01
al.vert = 0x01
style.alignment = al

style2 = xlwt.XFStyle()
font2 = xlwt.Font()
font2.name = '宋體'
style2.font = font2
al2 = xlwt.Alignment()
al2.horz = 0x02
al2.vert = 0x01
style2.alignment = al2

style3 = xlwt.XFStyle()
font3 = xlwt.Font()
al3 = xlwt.Alignment()
al3.horz = 0x03
al3.vert = 0x01
style3.alignment = al3  #這里就是前面的字體樣式，參數什么的網上都有，我就不詳細說了

dic = ['未評定','入門','普及-','普及/提高-','/提高','省選-','省選/NOI-','CTSC','寫掛了']
workbook = xlwt.Workbook(encoding= 'ascii')
worksheet = workbook.add_sheet("Luogu-rank")
worksheet.write(0,0,'用戶ID',style2)
worksheet.write(0,1,'用戶名',style2)
worksheet.write(0,2,'咕值排名',style2)
worksheet.write(0,3,'總通過題數',style2)

for i in range(13):
    if i == 0 | i > 2:
        worksheet.col(i).width = 256 * 10
    else:
        worksheet.col(i).width = 256 * 20
    if i > 3:
        worksheet.write(0,i,dic[i - 4],style2)

for i in range(1,10000):
    worksheet.write(i,0,i)
    responses = requests.get('https://statcard.vercel.app/practice?id=' + str(i))
    if len(responses.text) > 900:
        if re.search("NULL",responses.text) != None:
            worksheet.write_merge(i, i, 1, 12, '無此用戶',style2)
        else:
            username = re.search(r'(.+)\s+</text>\s+<text x=".*" y=".*" class="title" font-weight="normal">\s+的賀題情況',responses.text)
            if username == None:
                username = re.search(r'<text x=".*" y=".*" fill=".*" font-weight=".*" textLength=".*">\s+(.*)\s+</text>\s+<svg xmlns="http://www.w3.org/2000/svg" x=".*" y=".*" width=".*" height=".*"',responses.text)
            worksheet.write(i,1,username.group(1).lstrip(),style)
            problem_summ = re.search(r'已賀(.*)題, 被(.*)人吊打',responses.text)
            if problem_summ.group(2) == 'INF':
                worksheet.write(i,2,'暫無數據',style3)
            else:
                worksheet.write(i,2,int(problem_summ.group(2)))
            worksheet.write(i,3,int(problem_summ.group(1)))
            for j in range(9):
                search_obj = re.search(str(dic[j]) + r'</text>\s+<text x=".*" y="15" class="text">(.*)題</text>',responses.text)
                worksheet.write(i,j + 4,int(search_obj.group(1)))
    else:
        worksheet.write_merge(i, i, 1, 12, '用戶開啟了“完全隱私保護”，獲取數據失敗',style2)
    print(i)
workbook.save("Luogu-rank1.xls")

注意事項

程序里的大部分東西都是從網上學習的，所以你看不懂的東西也都能查到
在輸入ID范圍時請不要讓范圍太大，如果訪問次數過多的話，Luogu-card可能會向程序關閉服務器
截止寫作時，Luogu的最大ID為719011，再往后會找不到東西（叕是試出來的）
當程序異常中斷時，已獲取到的數據不會保存，解決辦法是一次只獲取一小部分，最后將表格粘貼到一起
在程序運行時不要在同級目錄下放與保存文件同名的文件，程序也會報錯

最終結果

這張圖片是ID前20的，利用Excel表格可以方便地對數據排序

~~往后生成的數據越多，奇怪的用戶名也越多[doge]~~

寫在后面

我萬萬沒想到我一個OIer寫出的第一篇博客竟是爬蟲技術😅

歡迎來洛谷與我私信交流！

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 個人洛谷賬號地址——https://www.luogu.org/space/show?uid=181909 附上NOIP查分系統 Luogu P1738 洛谷的文件夾【洛谷 5020】貨幣系統 $[Luogu]$ 洛谷 $P2766$ 題解【最長不下降子序列問題】 [洛谷P4463] calc （生成函數）神奇的洛谷IDE 洛谷---各個評測狀態 hive之RANK排名洛谷入門題洛谷背景更改