title: 微博用戶標識詳解
date: 2017-09-06 03:15:27
tags: [爬蟲]

微博用戶標識詳解

微博用戶id

微博主要用三種手段標注用戶:

用戶昵稱: 顯示在頁面的名字
用戶名: 系統中用戶的名字
用戶Id: 系統中用戶的ID編號

其中用戶昵稱是可以修改的, 剩下兩個不可修改.

比方說吾愛破解論壇這個微博用戶(http://weibo.com/52pojie),如下圖所示:

其昵稱為吾愛破解論壇, 用戶名為52pojie, 用戶ID為1780478695.

無論要抓取的微博鏈接是以用戶昵稱還是用戶名作為標識的, 我們最后都要將其轉化為用戶ID, 方便后續的處理.

用戶containerId

通過用戶的特定containerId, 我們可以任意一個用戶的信息. 比方說其所有發布過的微博和關注好友列表等.

如果只要抓取一個用戶所發布的所有微博的話, 則containerId等於107603+UID.
比如我們要抓取吾愛破解論壇的微博,則其對應的containerId為1076031780478695.

我們訪問以下的鏈接, 就可以獲取到用戶的前25條微博.

https://m.weibo.cn/api/container/getIndex?page=1&count=25&containerid=1076031780478695

結果如下所示:

從上面我們可以看到, 用戶所發的微博已經全部顯示在JSON文件之中了, 我們可以根據自己的需求獲取到相應的內容.

用戶昵稱 screen_name
用戶ID user.id
所發圖片 pics
...

各種Id相互轉換的代碼

下面的contianerId指用戶微博頁面的contianerId

uid轉contianerId

/**
* uid轉contianerId
* @author yanximin
* */
static String uidToContainerId(String uid){
	if(uid==null)
		throw new IllegalArgumentException("uid is null");
	return 107603+uid;
}

昵稱轉contianerId

/**
 * 昵稱轉contianerId
 * @author yanximin
 * @throws IOException 
 * @throws ClientProtocolException 
 * */
static String nicknameToContainerId(String nickname) throws ClientProtocolException, IOException{
	String url = "http://m.weibo.com/n/"+nickname;
	HttpClient httpClient = HttpClients.createDefault();
	HttpPost post = new HttpPost(url);
	post.setHeader("User-Agent", USER_AGENT);
	HttpResponse response = httpClient.execute(post);
	post.abort();
	if(response.getStatusLine().getStatusCode()==302){
		String cid = response.getLastHeader("Location").getValue().substring(27);
			return "107603" + cid;
		}
		return null;
	}

用戶名轉contianerId

/**
* 用戶名轉contianerId
* @author yanximin
* @throws IOException 
* @throws ClientProtocolException 
* */
static String usernameToContainerId(String name) throws ClientProtocolException, IOException{
    String url = "https://weibo.cn/"+name;
    HttpClient httpClient = HttpClients.createDefault();
    HttpGet get = new HttpGet(url);
    get.setHeader("User-Agent", USER_AGENT);
    HttpResponse response = httpClient.execute(get);
    String ret = EntityUtils.toString(response.getEntity(), "utf-8");
    Pattern pattern = Pattern.compile("href=\"/([\\d]*?)/info\"");
    Matcher matcher = pattern.matcher(ret);
    while(matcher.find()){
        return "107603" + matcher.group(1);
    }
    return null;
}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 PHP生成唯一用戶標識GUID 微信小程序雲開發：小程序更新手動插入數據庫的數據失敗，關鍵openid用戶標識零授權抓取新浪微博任何用戶的微博內容基於LDA對關注的微博用戶進行聚類【Python3爬蟲】微博用戶爬蟲 Python——通過用戶cookies訪問微博首頁微博API怎么爬取其它未授權用戶的微博/怎么爬取指定用戶公布的微博新浪微博錯誤代碼詳解 Scrapy爬取新浪微博移動版用戶首頁第一條微博 pyhton爬蟲爬取微博某個用戶所有微博配圖