【Python網頁分析】httplib庫的重定向處理


1. 網頁處理

下圖是實際操作抓包分析結果,其他的步驟不再描述。

1、從選定的POST /main.aspx開始

2、后面服務器回復302重定向到/cd_chose.aspx頁面

3、抓包數據有GET重定向URL,GET css和js文件不再贅述

4、POST到/cd_chose.aspx

image

 

 

2. Python模擬

2.1 抓包分析,后面的GET方法發送不去

 

image

 

再查看IE上抓包結果

image

沒有出現GET方法

image

懷疑是需要直接POST,嘗試了之后仍然失敗,但仔細看了下POST內容,頭里面有GET頭,由於不太了解IE的頭顯示,不再深究。

image

 

2.2 檢查消息格式

由於GET這個重定向頁面之前定義了HTTP頭,

image

 

對比網頁上實際操作成功發送的頭,發現我在Python中多定義了一個頭”Content-Type",主要是前面的POST方法需要和頭

實際流程里面,前面其他GET消息需要這個頭,但本消息中確實不需要這個頭。

 

image

 

去掉這個頭

查看Python的消息流程正常

 

image

 

 

這個問題由於自己http基礎不踏實,遇到問題不太確定方向,總覺得重定向流程有什么其他的復雜處理。耽擱了很多時間,

結果其實就只是一個頭的問題。

 

最后附上封裝的http get和post方法,調用的httplib庫,比較靈活方便,可以根據前端js代碼,模仿自己生成一些特殊字段認證服務器。

 

def http_get(self,connDefault=None,url='',bodyFlag=False,refererFresh=False,referer = ''):

        status,infor = 1,''       
        if connDefault is None:
            conn = HTTPConnection(self.host,timeout=60)
        else:
            conn = connDefault

        try:

            print 'http_get -> enter to get ',url
            start = time.time()           
           
            print 'http_get -> connect init OK'
            conn.request('GET',url,headers=self.headers)

            print 'http_get -> wait the  response...'
            response = conn.getresponse()
            end = time.time()
            print "http_get -> info:",end - start,response.status

            print 'http_get -> response headers' ,response.getheaders()

            #狀態碼
            status = response.status
            if status != 200:
                print 'http_get -> http status error',status
                infor = 'error'

            else:
                #獲取Cookie,格式如下ASP.NET_SessionId=pzt0bs55tc2fjrbv0canht45; path=/; HttpOnly
                cookie=response.getheader('Set-Cookie','')
                #print "http_get -> cookie -> ",cookie

                """
                Cookie疊加
                """
                if cookie != '':
                    #cookie鍵值分兩種類型
                    print 'http_get -> peer Set-Cookie'  , cookie
                    pattern = re.compile(r'(key=[\w=+/]+;|ASP.NET_SessionId=[\w=+/]+;)')
                    _list = pattern.search(cookie)
                    #print 'http_get -> _list',_list   
                    if _list is not None:
                        #print 'http_get -> _list' ,url,_list.groups()
                        oCookie = self.headers.get('Cookie','')
                        if oCookie == '':
                            self.headers["Cookie"] = str(_list.groups()[0][:-1])
                        else:
                            self.headers["Cookie"] = oCookie + ';'  + str(_list.groups()[0][:-1])
                        print 'http_get -> request Cookie' ,self.headers["Cookie"]
                    else:
                        pass
                else:
                    pass

                """
                更新Referer
                """

                if refererFresh:
                    if referer != '':
                        self.headers["Referer"] = "http://" + self.host + referer
                    else:
                        self.headers["Referer"] = "http://" + self.host + url


                #獲取編碼格式,gzip編碼會在頭中顯示定義
                content_encoding = response.getheader('Content-Encoding','')
                if bodyFlag:
                    """
                    gzip解碼
                    """
                    if content_encoding == 'gzip':
                        buf = StringIO(response.read())
                        infor = GzipFile(fileobj=buf).read()
                    else:
                        infor = response.read()

        except Exception,ex:
            print 'http_get -> error:',ex
            status,infor = 1,ex
        finally:
            if connDefault is None:
                conn.close()
            return status,infor


    def http_post(self,connDefault=None,url='',PostStr=''):
        status,response = 1,''
        try:
            headers = deepcopy(self.headers)
            headers["Content-Type"] ="application/x-www-form-urlencoded"
            start = time.time()
            if connDefault is None:
                conn = HTTPConnection(self.host,timeout=60)
            else:
                conn = connDefault

            headers["Content-Length"] = len(PostStr)
            conn.request('POST',url,PostStr,headers=headers)
            response = conn.getresponse()
            end = time.time()
            print "http_post info:",end - start,response.status
           
            #重定向
            if response.status == 302:
                Location=response.getheader('Location','')
                status,response = 302,Location
            #正常提交
            elif response.status == 200:
                status,response = 200,''
            else:
                status,response = response.status,'does not support'
        except Exception,ex:
            print 'http_post -> error:',ex
            status,response = 1,ex
        finally:
            if connDefault is None:
                conn.close()
            return status,response


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM