題記

開始學習python開發工具了，不得不說，確實簡單好用。我打算先整個批量驗證thinkphp日志泄露的流程性代碼。下載安裝了pycharm，激活碼卻沒用。誰有新的激活碼可以幫幫我，等到期了看看要不要整個補丁，先用着玩吧。

一、先學會獲取第一頁的內容，首先從鏈接開始。

//這里的意思是把要請求的第一頁的鏈接構造好然后經過requests請求獲取頁面信息。

import requests

import base64



search_data='app="thinkphp"'

url='https://fofa.so/result?qbase64='

#這里需要強制轉換下編碼

search_data_bs=str(base64.b64encode(search_data.encode('utf-8')),"utf-8")

urls=url+search_data_bs

result=requests.get(urls).content

print(result.decode('utf-8'))

//成功獲取響應內容，然后在內容里找到我們需要的東西，ip地址。

二、獲取第一頁內ip

import requests

from lxml import etree

import base64



search_data='app="thinkphp"'

url='https://fofa.so/result?qbase64='

#這里需要強制轉換下編碼

search_data_bs=str(base64.b64encode(search_data.encode('utf-8')),"utf-8")

urls=url+search_data_bs

#獲取訪問fofa后的內容

result=requests.get(urls).content

#將結果傳為能被etree分解的形式

soup = etree.HTML(result)

#取想要的字段href，按照特征去除干擾內容

ip_data=soup.xpath('//div[@class="re-domain"]/a[@target="_blank"]/@href')

#將數組化形式的內容一個個分隔開來

ipdata='\n'.join(ip_data)

print(ipdata)

#創建ip.txt並存起來

with open(r'ip.txt','a+') as f:

    f.write(ipdata+'\n')

    f.close()

三、翻頁加登錄狀態載入

//這里主要加了page變量與登錄驗證，通過cookie獲取內容，用的時候cookie處換成你自己的就行，因為我不是會員這里只能獲取5頁。

import requests

from lxml import etree

import base64

import time



search_data='app="thinkphp" && country="CN"'

headers={

    'cookie':'Hm_lvt_9490413c5eebdadf757c2be2c816aedf=1616029,161698533,16447,16171152; search_history=app%3D%22thinkphp%22; _fofapro_ars_session=1e1e66e681f5ca085635005; referer_url=%2Fresult%3Fq%3Dapp%253D%2522thinkphp%2522%26qbase64%3DYXBwPSJ0aGlua3BocCI%253D%26file%3D%26file%3D; Hm_lpvt_9490413c5eebdadf757c2be2c816aedf=16175430'

}

for yeshu in range(1,6):

    url='https://fofa.so/result?page='+str(yeshu)+'&qbase64='

    #這里需要強制轉換下編碼

    search_data_bs=str(base64.b64encode(search_data.encode('utf-8')),"utf-8")

    urls=url+search_data_bs

    print('正在提取第'+str(yeshu))

    try:

        #獲取訪問fofa后的內容

        result=requests.get(urls,headers=headers,timeout=3).content

        #將結果傳為能被etree分解的形式

        soup = etree.HTML(result)

        #取想要的字段href，按照特征去除干擾內容

        ip_data=soup.xpath('//div[@class="re-domain"]/a[@target="_blank"]/@href')

        #將數組化形式的內容一個個分隔開來

        ipdata='\n'.join(ip_data)

        print(ipdata)

        #創建ip.txt並存起來

        with open(r'ip.txt','a+') as f:

            f.write(ipdata+'\n')

            f.close()

    except Exception as e:

        pass

//成功獲取並存入到ip.txt

　　當然，你們可以自己改改，比如加個sys模塊進行參數化傳值，我覺得反正用戶版每次都要復制cookie，還不如復制cookie的時候把該改的一起在頁面改了。如果你是永久用戶可以找一個key值身份驗證的腳本那樣當然也方便，或者找個軟件。當然也可以參數化一下，然后把python打包成exe。

參考文章

　　解決PyCharm無法使用lxml庫的問題(圖解)：http://iis7.com/a/nr/wz/202102/8956.html

　　友情感謝：小迪python開發課

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python3 fofa爬取類 linux上安裝Python3和django流程詳解 python3實現漏洞批量探測詳解python2 和 python3的區別[附實例] python3實現IP地址查詢工具 python3 selenium squoosh(網頁版) 批量壓縮圖片 python實現批量ping IP，並將結果寫入 python實現本地批量ping多個IP 簡單版fofa工具 python實現批量提取圖片中的信息並保存