RT。
閑來無事,隨便找了一個玩彩票的網址,突發奇想把歷史開獎結果拉取下來,並進行分析,看看有什么規律可以幫助到買彩票的。。
首先使用抓包工具charles, 分析這個歷史開獎結果的請求方式。


看似就兩個參數,但是實際上還有一個cookies,這個是最關鍵的,通過分析js代碼,發現會有一個登陸接口去拿cookies,也就是sessionId,拿到后,放入這個歷史數據接口的cookies就可以順利拿到數據啦~~~

然而數據的返回並不是json格式的,是html,所以采用了大名鼎鼎的jsoup來直接分析,具體方式可以百度。
這里直接貼源碼~
package com.wsm.lottery.JSSC10; import com.alibaba.fastjson.JSON; import com.wsm.lottery.dao.LotteryJsscDAO; import com.wsm.lottery.dao.LotteryJsscDAOImpl; import com.wsm.lottery.dao.LotteryJsscDO; import com.wsm.lottery.utils.DateUtils; import com.wsm.lottery.utils.HttpUtils; import com.wsm.lottery.model.JSSC10; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.select.Elements; import java.util.*; public class JSSC10Crawler { private static final String JSSC10Url = "**************"; private static final LotteryJsscDAO lotteryDao = new LotteryJsscDAOImpl(); public static void main(String[] args) throws Exception{ String today = DateUtils.getCurrentDate(); System.out.println(today); Date date = new Date(); int i=20; while(i>5){ Date newDate = DateUtils.addDay(date,-i); i--; String todayNew = DateUtils.dateToString(newDate); spiderDataIntoDB(todayNew); } } public static void spiderDataIntoDB(String today) throws Exception{ //1、獲取sessionID String cashUrl = JSSC10Url + "/cashlogin"; Map param = new HashMap(); param.put("account","!guest!"); param.put("password","!guest!"); String sessionId = JSON.parseObject(HttpUtils.postForm(cashUrl,param)).getString("message"); //2、獲取游客權限--將session注入 測試無需。 //GET member/agreement?_OLID_=2ba5aebd150775549fa83a5cfba750297d887bd1 HTTP/1.1 System.out.println(sessionId); // HttpUtils.getSessionId(JSSC10Crawler+"member/agreement?"+sessionId); //3、請求歷史記錄 //87c1bf6e271f = sessionId Map headMap = new HashMap(); headMap.put("Cookie","87c1bf6e271f"+sessionId.substring(sessionId.indexOf("="))); String jsUrl = JSSC10Url + "member/dresult?lottery=PK10JSC&date="; String resHtml = HttpUtils.get(jsUrl+today,headMap); Document resultDocument = Jsoup.parse(resHtml); //#drawTable > table > tbody > tr:nth-child(1) > td.period //#drawTable > table > tbody > tr:nth-child(1) > td.drawTime //#drawTable > table > tbody > tr:nth-child(1) > td:nth-child(3) > span //#drawTable > table > tbody > tr:nth-child(1104) > td:nth-child(3) > span //4、解析html List<JSSC10> jssc10List = new ArrayList<>(); int i = 1; while (true){ String trChild = "tr:nth-child("+i+")"; Elements period = resultDocument.select("#drawTable") .select("table").select("tbody").select(trChild).select("td.period"); Elements drawTime = resultDocument.select("#drawTable") .select("table").select("tbody").select(trChild).select("td.drawTime"); if (period.isEmpty()){ break; } JSSC10 jssc10 = new JSSC10(); jssc10.setPeriod(period.text()); String time = drawTime.text(); time = "2018-" + time.substring(0,5) + " "+ time.substring(time.length()-8); jssc10.setDrawTime(DateUtils.stringToDate(time,DateUtils.DATE_TIME_FORMAT)); List<Integer> ballNames = new ArrayList<>(); for (int j=3; j<=12; j++){ String tdChild = "td:nth-child("+j+")"; Elements ballName = resultDocument.select("#drawTable") .select("table").select("tbody").select(trChild).select(tdChild).select("span"); ballNames.add(Integer.valueOf(ballName.text())); } jssc10.setBallNames(ballNames); System.out.println(jssc10); jssc10List.add(jssc10); i ++; } //分析數據 System.out.println(jssc10List); //插入DB for(JSSC10 jssc10 : jssc10List){ LotteryJsscDO lotteryJsscDO = new LotteryJsscDO(); lotteryJsscDO.setCreatePin("siming.wang"); lotteryJsscDO.setCreateTime(new Date()); lotteryJsscDO.setPeriod(jssc10.getPeriod()); lotteryJsscDO.setDrawTime(jssc10.getDrawTime()); List<Integer> ballNames = jssc10.getBallNames(); lotteryJsscDO.setBallOne(ballNames.get(0)); lotteryJsscDO.setBallTwo(ballNames.get(1)); lotteryJsscDO.setBallThree(ballNames.get(2)); lotteryJsscDO.setBallFour(ballNames.get(3)); lotteryJsscDO.setBallFive(ballNames.get(4)); lotteryJsscDO.setBallSix(ballNames.get(5)); lotteryJsscDO.setBallSeven(ballNames.get(6)); lotteryJsscDO.setBallEight(ballNames.get(7)); lotteryJsscDO.setBallNine(ballNames.get(8)); lotteryJsscDO.setBallTen(ballNames.get(9)); lotteryJsscDO.setYn("Y"); Map paramDb = new HashMap(); paramDb.put("period",lotteryJsscDO.getPeriod()); List<LotteryJsscDO> lotteryJsscDOS = lotteryDao.selectListByMap(paramDb); if(lotteryDao.selectListByMap(paramDb).isEmpty()){ lotteryDao.insert(lotteryJsscDO); } } } }
數據庫對應的表結構這里也貼一下:
CREATE TABLE `lottery_jssc` ( `sys_no` bigint(20) NOT NULL AUTO_INCREMENT, `period` varchar(125) DEFAULT NULL, `draw_time` datetime DEFAULT NULL, `ball_one` int(2) DEFAULT NULL, `ball_two` int(2) DEFAULT NULL, `ball_three` int(2) DEFAULT NULL, `ball_four` int(2) DEFAULT NULL, `ball_five` int(2) DEFAULT NULL, `ball_six` int(2) DEFAULT NULL, `ball_seven` int(2) DEFAULT NULL, `ball_eight` int(2) DEFAULT NULL, `ball_nine` int(2) DEFAULT NULL, `ball_ten` int(2) DEFAULT NULL, `create_time` datetime DEFAULT NULL, `create_pin` varchar(20) DEFAULT NULL, `Yn` varchar(1) DEFAULT NULL, PRIMARY KEY (`sys_no`), KEY `idx_period` (`period`), KEY `idx_draw_time` (`draw_time`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=103311 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
采用的是mybatis+druid,相關配置我已經提交到了github,文末會有相關地址。
最后是分析啦~
直接貼分析結果,以一天為例子,代碼可以去github下載。
----------開始分析從2018-09-14 07:00:30到2018-09-15 05:59:15的數據----------- 分析1,出現對子,就買13568 10 : 1期購買中獎次數:331 2期購買中獎次數:113 3期購買中獎次數:52 4期購買中獎次數:9 5期購買中獎次數:4 6期購買中獎次數:1 分析2,出現對子12359,就買68 10 : 1期購買中獎次數:102 2期購買中獎次數:88 3期購買中獎次數:67 4期購買中獎次數:36 5期購買中獎次數:27 6期購買中獎次數:14 7期購買中獎次數:5 8期購買中獎次數:2 9期購買中獎次數:1 分析3,出現對子46780,就買135 : 1期購買中獎次數:87 2期購買中獎次數:79 3期購買中獎次數:52 4期購買中獎次數:24 5期購買中獎次數:20 6期購買中獎次數:14 7期購買中獎次數:4 8期購買中獎次數:4 9期購買中獎次數:3 10期購買中獎次數:1 ---------------2018-09-14 數據分析結束! *
GITHUB地址:https://github.com/wangchaun/lottery-crawlers
歡迎一起交流~
