如何下載web資源



如何下載web資源

目的

最近機工社宣布開放工程科技數字圖書館, 全網免費共克時艱!

發現有些書是以web頁面的方式給用戶看的,一張一張,很難一次性下載

有沒有辦法一次性下載他們呢?

比如

1580562344539

研究

test 1: chrome extension

上網查到很多chrome extension但是他們都認不到頁面內的連接。這是因為頁面里面根本沒有連接

biru

頁面鏈接如下

<a href="javascript:void(0);" onclick="probation.readBook(this);" id="678612" ref="/openresources/teach_ebook/uncompressed/13780/OEBPS/Text/chapter33.html#heading_id_3">3.1 協商原則</a>

該鏈接其實最終變成http://www.hzcourse.com/resource/readBook?path=/openresources/teach_ebook/uncompressed/13780/OEBPS/Text/chapter33.html

所以怪不得擴展不認識了

看來還是要自己寫一個了

最簡單就是用python了

測試以上鏈接

C:\Users\cutep>python -m wget http://www.hzcourse.com/resource/readBook?path=/openresources/teach_ebook/uncompressed/13780/OEBPS/Text/chapter33.html -o 33.html
100% [................................................................................] 4000 / 4000
Saved under 33.html

成功!

test 2: 最終寫了如下python腳本

import os 
#from selenium import webdriver
#from urllib2 import urlopen
import requests

def my_system(cmd):
	print(cmd)
	os.system(cmd)
	
def download(url, file):
	cmd = 'python -m wget %s -o %s'%(url, file)
	my_system(cmd)
	
def download_chapter(click_url, file):
	download('http://www.hzcourse.com/resource/readBook?path=%s'%click_url, file)
	
def get_bookname(cont):
	s='<div class="book-name">'
	p1 = cont.find(s)
	p1 = p1 + len(s)
	p1 = cont.find('<span>', p1)
	p1 = p1 + len('<span>')
	
	p2 = cont.find('</span>', p1)
	#print(p1, p2)
	name=cont[p1:p2]
	return name
	
def get_value_token(cont):
	s='"ebookId" value="'
	p1 = cont.find(s)
	p1 = p1 + len(s)
	p2 = cont.find('"/>', p1)
	#print(p1, p2)
	ebookId=cont[p1:p2]
	s2 = 'name="token" value="'
	p3 = cont.find(s2, p2)
	p3 = p3 + len(s2)
	p4 = cont.find('"/>', p3)
	#print(p3, p4)
	token=cont[p3:p4]
	print('ebookId, token %s %s'%(ebookId, token))
	return [ebookId, token]
	
def download_book(main_link):
	my_system('del main*.html')
	
	download(main_link, 'main.html')
	main_cont = open('main.html', 'r', encoding='utf-8').read()
	[ebookId, token] = get_value_token(main_cont)
	bookname = get_bookname(main_cont)
	print(bookname)

	if os.path.isdir(bookname): return
	
	my_system('rd/s/q my_temp')
	my_system('md my_temp')
	os.chdir('my_temp')
	my_system('cd')
	
	#response = requests.post('http://www.hzcourse.com/web/refbook/queryAllChapterList', data={'ebookId':15917,'token':"e87436c8bc7849c397a1db2f27c0ba5d"})
	response = requests.post('http://www.hzcourse.com/web/refbook/queryAllChapterList', data={'ebookId':ebookId,'token':token})
	resp_json = response.json()
	#print(resp_json)
	for i in resp_json['data']['data']:
		ref_link = i['ref']
		file = ref_link[ref_link.rfind('/')+1:]
		print(ref_link, file)
		download_chapter(ref_link, file)
	os.chdir('..')
	my_system('cd')
	my_system('md "%s"'%bookname)
	my_system('xcopy /c/d/e/y my_temp "%s"'%bookname)
	
#download_book('http://www.hzcourse.com/web/refbook/probationAll/6736/e87436c8bc7849c397a1db2f27c0ba5d')
download_book('http://www.hzcourse.com/web/refbook/probationAll/6736/e87436c8bc7849c397a1db2f27c0ba5d')
download_book('http://www.hzcourse.com/web/refbook/probationAll/6856/e87436c8bc7849c397a1db2f27c0ba5d')
download_book('http://www.hzcourse.com/web/refbook/probationAll/7899/e87436c8bc7849c397a1db2f27c0ba5d')
download_book('http://www.hzcourse.com/web/refbook/probationAll/7249/e87436c8bc7849c397a1db2f27c0ba5d')
download_book('http://www.hzcourse.com/web/refbook/probationAll/7165/e87436c8bc7849c397a1db2f27c0ba5d')
download_book('http://www.hzcourse.com/web/refbook/probationAll/7186/e87436c8bc7849c397a1db2f27c0ba5d')
download_book('http://www.hzcourse.com/web/refbook/probationAll/7523/e87436c8bc7849c397a1db2f27c0ba5d')
download_book('http://www.hzcourse.com/web/refbook/probationAll/6965/e87436c8bc7849c397a1db2f27c0ba5d')
download_book('http://www.hzcourse.com/web/refbook/probationAll/6826/e87436c8bc7849c397a1db2f27c0ba5d')
download_book('http://www.hzcourse.com/web/refbook/probationAll/6166/e87436c8bc7849c397a1db2f27c0ba5d')
download_book('http://www.hzcourse.com/web/refbook/probationAll/6188/e87436c8bc7849c397a1db2f27c0ba5d')
download_book('http://www.hzcourse.com/web/refbook/probationAll/6853/e87436c8bc7849c397a1db2f27c0ba5d')
download_book('http://www.hzcourse.com/web/refbook/probationAll/4599/e87436c8bc7849c397a1db2f27c0ba5d')
download_book('http://www.hzcourse.com/web/refbook/probationAll/6759/e87436c8bc7849c397a1db2f27c0ba5d')

Test result

Saved under chapter51.xhtml
/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter52.xhtml chapter52.xhtml
python -m wget http://www.hzcourse.com/resource/readBook?path=/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter52.xhtml -o chapter52.xhtml
100% [................................................................................] 1058 / 1058
Saved under chapter52.xhtml
/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter53.xhtml chapter53.xhtml
python -m wget http://www.hzcourse.com/resource/readBook?path=/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter53.xhtml -o chapter53.xhtml
100% [................................................................................] 4625 / 4625
Saved under chapter53.xhtml
/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter54.xhtml chapter54.xhtml
python -m wget http://www.hzcourse.com/resource/readBook?path=/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter54.xhtml -o chapter54.xhtml
100% [..................................................................................] 705 / 705
Saved under chapter54.xhtml
/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter55.xhtml chapter55.xhtml
python -m wget http://www.hzcourse.com/resource/readBook?path=/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter55.xhtml -o chapter55.xhtml
100% [................................................................................] 1814 / 1814
Saved under chapter55.xhtml
/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter56.xhtml chapter56.xhtml
python -m wget http://www.hzcourse.com/resource/readBook?path=/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter56.xhtml -o chapter56.xhtml
100% [..............................................................................] 10025 / 10025
Saved under chapter56.xhtml
/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter57.xhtml chapter57.xhtml
python -m wget http://www.hzcourse.com/resource/readBook?path=/openresources/teach_ebook/uncompressed/16571/OEBPS/Text/chapter57.xhtml -o chapter57.xhtml

1580569390465

其他

下面這個是啥框架寫的?

A: avalonjs

                            <li ms-for="bookChapter in @bookChapters">
                            	<a href="javascript:void(0);" onclick="probation.readBook(this);" ms-attr="{id : bookChapter.id, ref : bookChapter.ref}">{{bookChapter.title}}</a>
                            </li>

bookChapter在哪里定義的?

var probation = {
	search:function(){
		var key = $.trim($("#condition").val());
		ebookRead.queryEbookChapterList(key);
	},
	queryEbookChapterList:function(key){
		var ebookId = $.trim($("#ebookId").val());
		var token = $.trim($("#token").val());
		debugger;
		jQuery.ajax({
	    	type : "post" , 
	    	url : "web/refbook/queryAllChapterList", 
	    	dataType : "json" , 
	    	data : {ebookId:ebookId,key:key,token:token},
	    	success : function(obj) {
	    		if(obj.data.code==1){
	    			var bookChapters = obj.data.data;
	    			if(bookChapters.length > 0){
	    				bookChaptertCtrl.bookChapters = bookChapters;
	    				$("#chapterCont").load();
	    				$("#directories").find("li").first().children("a").click();
	    			}
	    		} else {
	    			alert(obj.data.message);
	    		}
	    	}
	    });
	},

1580564786707

如何獲取連接?

萬能的chrome F12了

1580563424388


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM