python爬蟲爬取58同城商品信息

本文轉載自查看原文 2016-03-04 16:16 2836

title: python爬蟲爬去58同城二手平板電腦信息
tags: python,爬蟲
grammar_cjkRuby: true

爬去http://bj.58.com/pbdn/0/pn2/中除轉轉、推廣商品以外的產品信息

# coding:utf-8
# 爬取58同城二手電腦信息
# 進入http://bj.58.com/pbdn/0/pn2/頁面
# 爬取列表中除轉轉、推廣商品外的正常商品

from bs4 import BeautifulSoup
import requests
import time


def get_links_from(who_sells):  # 爬取列表中除轉轉、推廣商品外的正常商品爬取列表中除轉轉、推廣商品外的正常商品的連接
    urls = []
    list_view = 'http://bj.58.com/pbdn/{}/pn2/'.format(str(who_sells))
    wb_data = requests.get(list_view)
    soup = BeautifulSoup(wb_data.text, 'lxml')

    # 通過對頁面分析 發現商品鏈接在 tr > td.t > a.t 中
    for link in soup.select('tr td.t a.t'):
        if len(link.get('href').split('?')[0]) == 53: # 因為轉轉商品也符合 tr > td.t > a.t,要排除，觀察發現正常商品鏈接
            # 的長度為53, 可通過字符串長度篩選去正常的連接
            urls.append(link.get('href').split('?')[0])
    return urls


def get_views(url):
    id = url.split('/')[-1].strip('x.shtml')
    api = 'http://jst1.58.com/counter?infoid={}'.format(id)
    js = requests.get(api)
    views = js.text.split('=')[-1]
    return views


def get_item_info(who_sells=0): #
    urls = get_links_from(who_sells)

    for url in urls:

        time.sleep(2)
        web_data = requests.get(url)
        soup = BeautifulSoup(web_data.text, 'lxml')
        data = {
            'title': soup.title.text,
            'price': soup.find_all('span', 'price c_f50')[0].text,
            'area': list(soup.select('.c_25d')[0].stripped_strings) if soup.find_all('span','c_25d') else None,
            'date': soup.select('.time')[0].text,
            'cate': '個人' if who_sells == 0 else '商家',
            'views': get_views(url)
        }
        print(data)

get_item_info()

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python爬蟲：爬取京東商品信息 python爬蟲爬取京東商品信息初次小爬蟲:58同城招聘信息爬取 python3爬蟲-爬取58同城上所有城市的租房信息 Java爬蟲爬取京東商品信息利用python爬取某寶熱賣網站商品信息（爬蟲之路，永無止境！） python爬取並分析淘寶商品信息 JAVA爬取亞馬遜的商品信息京東app商品信息爬取【Python爬蟲】：爬取58同城二手房的所有房產標題

python爬蟲 爬取58同城商品信息

title: python爬蟲 爬去58同城二手平板電腦信息 tags: python,爬蟲 grammar_cjkRuby: true

爬去http://bj.58.com/pbdn/0/pn2/中除轉轉、推廣商品以外的產品信息

免責聲明！

python爬蟲爬取58同城商品信息

title: python爬蟲爬去58同城二手平板電腦信息
tags: python,爬蟲
grammar_cjkRuby: true