[搜片神器]直接從DHT網絡下載BT種子的方法


DHT抓取程序開源地址:https://github.com/h31h31/H31DHTDEMO

數據處理程序開源地址:https://github.com/h31h31/H31DHTMgr

DHT系列文章:

1.[搜片神器] 之P2P中DHT網絡爬蟲原理

2.[搜片神器]之DHT網絡爬蟲的代碼實現方法

3.[搜片神器]之DHT網絡爬蟲的C++程序初步開源

--------------------------------------------------------------------------------------------------------------------

看懂此文章需要提前看明白上面的系列文章,還需要你有TCP網絡編程和bencode編碼方法基礎上,如果都看不明白,可以到娛樂區http://www.sosobta.com  去看看,休息下...

 

在介紹了這么多期文章后,最后介紹BT網絡里面一個比較重要種子下載協議,方便大家知道如何從DHT網絡直接下載種子的問題.

先說下我們目前下載電影等文件是如何下載的,比如我們有個BT種子,就可以去下載對應的文件,但如果我們只有個文件名字,如何去找BT種子呢?

首先我們可以去通過搜索得到磁連接,然后就由此字符串去下載對應的種子文件和電影等信息,但如果沒有網站讓你下載種子,我們又當如何去搜索這個種子呢?

目前我們下載BT種子有兩種方式:                                                                           

  1. 通過HTTP直接從WEB服務器上下載,這種直接方便,比如從迅雷服務器上下載種子,
  2. 再就是通過BT軟件從網絡里面去獲取BT網絡里面專門有個下載種子的協議文件,只能下載種子,然后種子下載好后就可以交給BT軟件來下載數據了.

如何從DHT網絡下載種子,必須先看兩個協議文章:

http://www.bittorrent.org/beps/bep_0009.html

http://www.bittorrent.org/beps/bep_0010.html

這里面有介紹,但還是需要說明一下如何操作的流程方便大家更好的理解.

 我們的代碼流程必須還是基於 DHT抓取程序開源地址:https://github.com/h31h31/H31DHTDEMO 之上,因為是從DHT網絡里面獲取數據,

需要我們在此之上操作后續流程.

之前的DHT有SEARCH的相關代碼來搜索這個HASH對應的哪些IP在提供下載.

        /* This is how you trigger a search for a torrent hash.  If port (the second argument) is non-zero, it also performs an announce.
           Since peers expire announced data after 30 minutes, it's a good idea to reannounce every 28 minutes or so. */
        if(searching) {
            //m_dht.dht_random_bytes((void*)hashList[2],20);
            if(m_soListen >= 0)
                m_dht.dht_search(hashList[2], 0, AF_INET, DHT_callback, this);
            if(s6 >= 0)
                m_dht.dht_search(hashList[2], 0, AF_INET6, DHT_callback, this);
            searching = 0;
        }

搜索到對方返回的IP信息和端口號后,大家可以分析dht.c里面的函數代碼dht_periodic(const void *buf, size_t buflen,const struct sockaddr *fromAddr, int fromlen,time_t *tosleep,dht_callback *callback, void *closure)函數里面的ANNOUNCE_PEER返回請求里面帶有對方表明自己此BT種子對應的認證碼peerid.

dht_periodic(const void *buf, size_t buflen,const struct sockaddr *fromAddr, int fromlen,time_t *tosleep,dht_callback *callback, void *closure)
函數里面的ANNOUNCE_PEER

        case ANNOUNCE_PEER:
            _dout("Announce peer!From IP:%s:%d\n",inet_ntoa(tempip->sin_addr),tempip->sin_port);
            new_node(id, fromAddr, fromlen, 1);

            if(id_cmp(info_hash, zeroes) == 0) 
            {
                _dout("Announce_peer with no info_hash.\n");
                send_error(fromAddr, fromlen, tid, tid_len,203, "Announce_peer with no info_hash");
                break;
            }
            if(!token_match(token, token_len, fromAddr)) {
                _dout("Incorrect token for announce_peer.\n");
                send_error(fromAddr, fromlen, tid, tid_len,203, "Announce_peer with wrong token");
                break;
            }
            if(port == 0) {
                _dout("Announce_peer with forbidden port %d.\n", port);
                send_error(fromAddr, fromlen, tid, tid_len,203, "Announce_peer with forbidden port number");
                break;
            }
            if(callback) 
            {
                (*callback)(closure, DHT_EVENT_ANNOUNCE_PEER_VALUES, info_hash,(void *)fromAddr, port,id);//此ID就是peerid,
            }

知道了對應的IP,端口號,還有種子ID號,就可以向對方發送請求了.

獲取HASH是通過UDP網絡,但下載BT種子是通過TCP來處理,相當於別人是TCP服務器,我們連接過去,直接下載對應PEERID的種子就行了.

BT種子在DHT網絡下載流程                                                                  

 先看http://www.bittorrent.org/beps/bep_0010.html協議介紹,我們必須先握手  

 

 此包構造比較簡單,按照格式進行組裝就行了,然后發送出去,對方就會回應自己是什么客戶端的軟件提供種子下載.

void CH31BTMgr::Encode_handshake()
{
	//a byte with value 19 (the length of the string that follows);
	//the UTF-8 string "BitTorrent protocol" (which is the same as in ASCII);
	//eight reserved bytes used to mark extensions;
	//the 20 bytes of the torrent info hash;
	//the 20 bytes of the peer ID.
	char btname[256];
	memset(btname,0,sizeof(btname));
	sprintf(btname,"BitTorrent protocol");
	char msg[1280];
	memset(msg,0,sizeof(msg));
	msg[0]=19;
	memcpy(&msg[1],btname,19);
	char ext[8];
	memset(ext,0,sizeof(ext));
	ext[5]=0x10;

	memcpy(&msg[20],ext,8);
	memcpy(&msg[28],m_hash,20);
	memcpy(&msg[48],m_peer_id,20);
	int res1=Write(msg, 68);//TCP發送消息
}

在發送握手后,我們可以接着發送種子數據請求包,需要學習http://www.bittorrent.org/beps/bep_0009.html 里面的內容:  

extension header
The metadata extension uses the extension protocol (specified in BEP 0010 ) to advertize its existence. It adds the "ut_metadata" entry to the "m" dictionary in the extension header hand-shake message. This identifies the message code used for this message. It also adds "metadata_size" to the handshake message (not the "m" dictionary) specifying an integer value of the number of bytes of the metadata.

Example extension handshake message:

{'m': {'ut_metadata', 3}, 'metadata_size': 31235}
extension message
The extension messages are bencoded. There are 3 different kinds of messages:

0 request 
1 data 
2 reject 
The bencoded messages have a key "msg_type" which value is an integer corresponding to the type of message. They also have a key "piece", which indicates which part of the metadata this message refers to.

In order to support future extensability, an unrecognized message ID MUST be ignored.

這就需要會bencode的相關代碼,這個大家可以網上搜索進行編譯,如果實現搞不定,可以留下郵箱我將此類代碼發送給你,其實也是網上收集整理的.

void CH31BTMgr::Encode_Ext_handshake()
{
    entry m;
    m["ut_metadata"] = 0;
    entry e;
    e["m"]=m;

    char msg[200];
    char* header = msg;
    char* p = &msg[6];
    int len = bencode(p, e);
    int total_size = 2 + len;
    namespace io = detail;
    io::write_uint32(total_size, header);
    io::write_uint8(20, header);
    io::write_uint8(0, header);

    int res1=Write(msg, len + 6);
}

如果別人回應的是2,那就直接退出吧,說明別人拒絕了你.

如果回應是1,則返回的是數據區,每塊是16K大小,最后一包不是.

 

data
The data message adds another entry to the dictionary, "total_size". This key has the same semantics as the "metadata_size" in the extension header. This is an integer.

The metadata piece is appended to the bencoded dictionary, it is not a part of the dictionary, but it is a part of the message (the length prefix MUST include it).

If the piece is the last piece of the metadata, it may be less than 16kiB. If it is not the last piece of the metadata, it MUST be 16kiB.

Example:

{'msg_type': 1, 'piece': 0, 'total_size': 3425}
d8:msg_typei1e5:piecei0e10:total_sizei34256eexxxxxxxx...
The x represents binary data (the metadata).

下面給出如何進行提交我需要第幾包的數據代碼:

void CH31BTMgr::write_metadata_packet(int type, int piece)
{
    ASSERT(type >= 0 && type <= 2);
    ASSERT(piece >= 0);

    entry e;
    e["msg_type"] = type;
    e["piece"] = piece;

    char const* metadata = 0;
    int metadata_piece_size = 0;

    if (type == 1)
    {
        e["total_size"] = 14132;
        int offset = piece * 16 * 1024;
        //metadata = m_tp.metadata().begin + offset;
        metadata_piece_size = (std::min)(int(14132 - offset), 16 * 1024);
    }

    char msg[200];
    char* header = msg;
    char* p = &msg[6];
    int len = bencode(p, e);
    int total_size = 2 + len + metadata_piece_size;
    namespace io = detail;
    io::write_uint32(total_size, header);
    io::write_uint8(20, header);
    io::write_uint8(m_message_index, header);

    int res1=Write(msg, len + 6);
}

在接收到一包請求后我們才可以繼續下一包的請求,下面給了我們如何解析這一包的問題代碼:

// 處理一個完整的包數據
bool CH31BTMgr::DeCodeFrameData(char * buffer,int buflen)
{
    char * p = (char *)mhFindstr(buffer, buflen, "ut_metadatai", 12);
    if(p) 
    {
        m_message_index=atoi(&p[12]);
        if(m_message_index==2)
        {
            return false;
        }
        write_metadata_packet(0,0);
        char filename[256];
        memset(filename,0,sizeof(filename));
        sprintf(filename,"%s\\torrent.txt",m_workPath);
        DelFile(filename);
    } 

    p = (char *)mhFindstr(buffer, buflen, "metadata_sizei", 14);
    if(p) 
    {
        m_metadata_size=atoi(&p[14]);
        m_fileCnt=(int)(m_metadata_size/16384)+1;
    } 

    p = (char *)mhFindstr(buffer, buflen, "msg_typei", 9);
    if(p) 
    {
        int type1=atoi(&p[9]);
        if(type1==1)
        {
            p = (char *)mhFindstr(buffer, buflen, "piecei", 6);
            if(p) 
            {
                int piece=atoi(&p[6]);
                p = (char *)mhFindstr(buffer, buflen, "total_sizei", 11);
                if(p) 
                {
                    int total_size=atoi(&p[11]);
                    p = (char *)mhFindstr(buffer, buflen, "ee", 2);
                    if(p) 
                    {
                        //保存數據
                        FILE* pfile=NULL;
                        char filename[256];

                        memset(filename,0,sizeof(filename));
                        sprintf(filename,"%s\\torrent.txt",m_workPath);
                        char openmethod[5]="a";
                        if(piece==0)
                            sprintf(openmethod,"w");
                        if((pfile=fopen(filename,openmethod))!=NULL)
                        {
                            if((piece+1)*16*1024<total_size)
                            {
                                fseek(pfile,(piece)*16*1024,SEEK_SET);
                                fwrite(&p[2],1,16*1024,pfile);
                                write_metadata_packet(0,piece+1);
                                fclose(pfile);
                            }
                            else
                            {
                                fwrite(&p[2],1,total_size-(piece)*16*1024,pfile);
                                fclose(pfile);
                                ManageTorrentFileToRealFile(filename);
                            }
                        }
                    }
                }
            }
        }
        else if(type1==2)
        {
            return false;
        }
    } 
    
    return true;
}

void * mhFindstr(const void *haystack, size_t haystacklen,const void *needle, size_t needlelen)
{
const char *h =(const char *) haystack;
const char *n =(const char *) needle;
size_t i;

 
         

/* size_t is unsigned */
if(needlelen > haystacklen)
return NULL;

 
         

for(i = 0; i <= haystacklen - needlelen; i++) {
if(memcmp(h + i, n, needlelen) == 0)
return (void*)(h + i);
}
return NULL;
}

 

 

接下來說下如何進行快速調試的問題:                                                                                        

       第一次調試也很天真的等着DHT網絡上的數據過來,需要等很久,而且調試總是發現別人不回應,要么就是拒絕,經過一段時間后,

問朋友總是不對問題,結果是協議沒有構造對.下面就需要注意的地方總結下:

1.一定要接收到別的人PEERID后才能夠與別人交流,不然別人肯定不理你;

2.構造協議調試不能夠在外網絡上調試,最好大家將mono-monotorrent源代碼下載回來,調試分析下,本地開啟服務器;

3.通過本地與mono-monotorrent進行調試,你就可以分析出是哪里不對的問題,是不是協議哪些封裝得不對的問題.

4.通過DHT網絡下載回來的種子肯定是最新的,WEB下載的可能還沒有呢..

5.通過協議下載回來的種子好像沒有announce-list,不知道為什么不提供一些內容,可能還有些什么關鍵地方沒有下載,分析mono-monotorrent代碼里面就是不提供下載,希望高手指點.

6.TCPClient接收數據區需要開到16K以上,這樣方便處理,當然如果會前后拼接包就更好.

7.如果需要bencode相關的編碼C++代碼,可以在此留言或者給h31h31#163.com發郵件.

 

如果此文章看不太明白,請先看看之前的文章,分析調試下代碼,再來學習此文章可能就比較懂一些.

希望有了解的朋友更好的交流和進步.在此留言學習討論.

 

希望大家多多推薦哦...大家的推薦才是下一篇介紹的動力...

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM