Python爬蟲+顏值打分，5000+圖片找到你的Mrs. Right

本文轉載自查看原文 2018-08-03 09:39 2072

一見鍾情鍾的不是情，是臉
日久生情生的不是臉，是情

項目簡介

本項目利用Python爬蟲和百度人臉識別API，針對簡書交友專欄，爬取用戶照片（侵刪），並進行打分。
本項目包括以下內容：

圖片爬蟲
人臉識別API使用
顏值打分並進行文件歸類

圖片爬蟲

現在各大交友網站都會有一些用戶會爆照，本文爬取簡書交友專欄（https://www.jianshu.com/c/bd38bd199ec6）的所有帖子，並進入詳細頁，獲取所有圖片並下載到本地。

代碼

import requests
from lxml import etree
import time

headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'
}

def get_url(url):
    res = requests.get(url,headers=headers)
    html = etree.HTML(res.text)
    infos = html.xpath('//ul[@class="note-list"]/li')
    for info in infos:
        root = 'https://www.jianshu.com'
        url_path = root + info.xpath('div/a/@href')[0]
        # print(url_path)
        get_img(url_path)
    time.sleep(3)

def get_img(url):
    res = requests.get(url, headers=headers)
    html = etree.HTML(res.text)
    title = html.xpath('//div[@class="article"]/h1/text()')[0].strip('|').split('，')[0]
    name = html.xpath('//div[@class="author"]/div/span/a/text()')[0].strip('|')
    infos = html.xpath('//div[@class = "image-package"]')
    i = 1
    for info in infos:
        try:
            img_url = info.xpath('div[1]/div[2]/img/@data-original-src')[0]
            print(img_url)
            data = requests.get('http:' + img_url,headers=headers)
            try:
                fp = open('row_img/' + title + '+' + name + '+' + str(i) + '.jpg','wb')
                fp.write(data.content)
                fp.close()
            except OSError:
                fp = open('row_img/' + name + '+' + str(i) + '.jpg', 'wb')
                fp.write(data.content)
                fp.close()
        except IndexError:
            pass
        i = i + 1

if __name__ == '__main__':
    urls = ['https://www.jianshu.com/c/bd38bd199ec6?order_by=added_at&page={}'.format(str(i)) for i in range(1,201)]
    for url in urls:
        get_url(url)

人臉識別API使用

由於爬取了帖子下面的所有圖片，里面有各種圖片（不包括人臉），而且是為了找到高顏值小姐姐，如果人工篩選費事費力，這里調用百度的人臉識別API，進行圖片過濾和顏值打分。

人臉識別應用申請

首先，進入百度人臉識別官網（http://ai.baidu.com/tech/face），點擊立即使用，登陸百度賬號（沒有就注冊一個）。

創建應用，完成后，點擊管理應用，就能看到AppID等，這些在調用API時需要使用的。

API調用

這里使用楊超越的圖片先試下水。通過結果，可以看到75分，還算比較高了（自己用了一些網紅和明星測試了下，分數平均在80左右，最高也沒有90以上的）。

from aip import AipFace
import base64
 
APP_ID = ''
API_KEY = ''
SECRET_KEY = ''
 
aipFace = AipFace(APP_ID, API_KEY, SECRET_KEY)
 
filePath = r'C:\Users\LP\Desktop\6.jpg'
def get_file_content(filePath):
    with open(filePath, 'rb') as fp:
        content = base64.b64encode(fp.read())
        return content.decode('utf-8')
    
imageType = "BASE64"
    
options = {}
options["face_field"] = "age,gender,beauty"

result = aipFace.detect(get_file_content(filePath),imageType,options)
print(result)

顏值打分並進行文件歸類

最后結合圖片數據和顏值打分，設計代碼，過濾掉非人物以及男性圖片，獲取小姐姐圖片的分數（這里處理為1-10分），並分別存在不同的文件夾中。

from aip import AipFace
import base64
import os
import time

APP_ID = ''
API_KEY = ''
SECRET_KEY = ''
 
aipFace = AipFace(APP_ID, API_KEY, SECRET_KEY)

def get_file_content(filePath):
    with open(filePath, 'rb') as fp:
        content = base64.b64encode(fp.read())
        return content.decode('utf-8')
    
imageType = "BASE64"
    
options = {}
options["face_field"] = "age,gender,beauty"

file_path = 'row_img'
file_lists = os.listdir(file_path)
for file_list in file_lists:
    result = aipFace.detect(get_file_content(os.path.join(file_path,file_list)),imageType,options)
    error_code = result['error_code']
    if error_code == 222202:
        continue
        
    try:
        sex_type = result['result']['face_list'][-1]['gender']['type']
        if sex_type == 'male':
            continue
    #     print(result)
        beauty = result['result']['face_list'][-1]['beauty']
        new_beauty = round(beauty/10,1)
        print(file_list,new_beauty)
        if new_beauty >= 8:
            os.rename(os.path.join(file_path,file_list),os.path.join('8分',str(new_beauty) +  '+' + file_list))
        elif new_beauty >= 7:
            os.rename(os.path.join(file_path,file_list),os.path.join('7分',str(new_beauty) +  '+' + file_list))
        elif new_beauty >= 6:
            os.rename(os.path.join(file_path,file_list),os.path.join('6分',str(new_beauty) +  '+' + file_list))
        elif new_beauty >= 5:
            os.rename(os.path.join(file_path,file_list),os.path.join('5分',str(new_beauty) +  '+' + file_list))
        else:
            os.rename(os.path.join(file_path,file_list),os.path.join('其他分',str(new_beauty) +  '+' + file_list))
        time.sleep(1)
    except KeyError:
        pass
    except TypeError:
        pass

最后結果8分以上的小姐姐很少，如圖（侵刪）。

最后傳播一個喜大普奔的消息

騰訊雲有史以來最大優惠，新用戶福利1000減750！雲服務器最低3折，1核1G內存50G硬盤1年最低325元！戳此了解詳情！

作者：羅羅攀
鏈接：https://www.jianshu.com/p/7ba9c90ff12d
來源：簡書

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 人臉識別男神女神顏值打分系統 10-01 女神顏值打分系統 Python3+BaiduAI識別高顏值妹子圖片 Python3+selenium+BaiduAI識別並下載花瓣網高顏值妹子圖片詳解 Java 中的自動裝箱與拆箱，5000＋字，看了不懂你打我！機器學習框架ML.NET學習筆記【7】人物圖片顏值判斷 Python 爬蟲保存圖片《犬夜叉2021》我想通過Binder找到你 @Service注解讓spring找到你的Service bean python F score打分