PHP curl_setopt函數用法介紹中篇


此篇已實例為主。

一.一般的實例

demo1.php

<?php
    $user = "admin123";
    $pass = "admin456";
    // $curlPost = "user=$user&pass=$pass";    ####  測試一
######測試二
$curlPost = array( 'a'=>123, 'b'=>456, 'c'=>789 ); $ch = curl_init(); //初始化一個CURL對象 curl_setopt($ch, CURLOPT_URL, "demo2.php"); //設置你所需要抓取的URL curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0); //設置curl參數,要求結果是否輸出到屏幕上,為true的時候是不返回到網頁中 // 假設上面的0換成1的話,那么接下來的$data就需要echo一下。 curl_setopt($ch, CURLOPT_POST, 1); //post提交 curl_setopt($ch, CURLOPT_POSTFIELDS, $curlPost); $data = curl_exec($ch); // echo $data; //運行curl,請求網頁。 curl_close($ch); ?>

查看結果:

<?php
// echo "aaaaa";
echo "<pre>";
var_dump($_POST);
echo "<br/>";
echo $_POST['b'];


echo "<br/>";

$a = array('1'=>1234,'2'=>567);
var_dump($a);

?>
<?php
// 初始化一個 cURL 對象
$curl = curl_init();

// 設置你需要抓取的URL
curl_setopt($curl, CURLOPT_URL, 'http://www.baidu.com');

// 設置header
curl_setopt($curl, CURLOPT_HEADER, 1);

// 設置cURL 參數,要求結果保存到字符串中還是輸出到屏幕上。
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);

// 運行cURL,請求網頁
$data = curl_exec($curl);

// 關閉URL請求
curl_close($curl);

// 顯示獲得的數據
var_dump($data);
?>

 

 二.采集數據信息

在實踐的項目中,要求采集某網站產品列表的信息。右擊網站,查看源碼,發現並無產品信息。通過分析該網站,產品信息是通過ajax調用的。

利用跑數據的形式,獲取產品信息,分析url地址,得到的json形式的產品數據,通過轉換,將產品屬性“對坐入號”獲取,我采用的是插入到數據庫當中。

代碼如下:

 

<?php
header('Content-Type:text/html;charset=UTF-8');

set_time_limit(0);

$id = isset($_GET['id']) ? intval($_GET['id']) : 1;

$listurl = "http://www.ptgcn.com/Handler/PublicationHandler.ashx?r=0.38488502931227453&func=GetDataByPCD&index={$id}&size=20&sort=&dir=asc&param=";
// echo $listurl;
exit;    ####默認結束狀態
$ch = curl_init();
// var_dump($ch);
// 2. 設置選項,包括URL
curl_setopt($ch, CURLOPT_URL, $listurl);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_HTTPHEADER,array ("Content-Type: text/xml; charset=utf-8","Expect: 100-continue"));

// 3. 執行並獲取HTML文檔內容
$output = curl_exec($ch);


$httpCode = curl_getinfo($ch,CURLINFO_HTTP_CODE);

curl_close($ch);
// var_dump($httpCode);
if($httpCode == '200'){
// echo "123456";
    // var_dump($output);
    $json_b = json_decode($output,true);
    // var_dump($json_b);
    $new = $json_b['rows'];
    // var_dump($new);
    foreach($new as $row){

      $catalog ='';
      $catalog = addslashes(trim($row['CatalogNo']));
      $catalog = "'". $catalog."'";

      $url = '';
      $url = "http://www.ptgcn.com/Products/".trim($row['PermaLink']).".html";
      $catalog_addr = $url;
      $catalog_addr = "'".$catalog_addr."'";
      
      $pubmed_id ='';
      $pubmed_id = addslashes(trim($row['PMID']));
      $pubmed_id = "'".$pubmed_id."'";
      
      $author = '';
      $author = addslashes(trim($row['Author']));
      $author = "'".$author."'";
      
      $journal_a = '';
      $journal_b = '';
      $journal = '';
      $journal_a = addslashes(trim($row['Journal']));
      $journal_b = addslashes(trim($row['PubDate']));
      $journal = $journal_a.",".$journal_b;
      $journal = "'".$journal."'";
      // $journal = addslashes(trim($row['CatalogNo']));
      $application = '';
      $application = addslashes(trim($row['App']));
      $application = "'".$application."'";

      $species = '';
      $species = addslashes(trim($row['Species']));
      $species = "'".$species."'";
      
      $title = '';
      $title = addslashes(trim($row['Subject']));
      $title = "'".$title."'";

     var_dump($row);
     exit;
      $con = mysql_connect('localhost',"root","root");
      mysql_set_charset("utf8");
      $select = mysql_select_db("wh");

      $sql = '';
      $sql = "insert into ptg(catalog,catalog_addr,pubmed_id,author,journal,application,species,title)values({$catalog},{$catalog_addr},
{$pubmed_id},{$author},{$journal},{$application},{$species},{$title})"; echo $sql; $query_insert = mysql_query($sql); if($query_insert){ echo "success!"; }else{ $html = ''; $html = $html.$catalog."\r\n"; file_put_contents('error.txt',$html,FILE_APPEND); //用於裝載執行失敗的數據 } mysql_close($con); // var_dump($con,$select); // exit; } }else{ echo 'finished !'; exit; } ?> <script> function JumpUrl() { location.href='?id=<?php echo ($id+1);?>'; } setTimeout('JumpUrl()',1); </script>

說到采集再介紹一種方法:

三.利用phpQuery.采集數據信息

下載地址:http://www.jb51.net/article/59522.htm

       git:https://github.com/TobiaszCudnik/phpquery

1.使用場合,右擊某網站,查看源碼,如果源碼信息與網站本身一樣,可以考慮使用phpQuery進行采集

2.phpQuery,顧名思義:及將源碼的編譯方式轉換為jQuery的形式。然后一切調用都使用jQuery風格即可

3.說明:先要引用  include 'phpQuery/phpQuery.php';

如:

 

簡單的三行代碼,就可以獲取頭條內容。首先在程序中包含phpQuery.php核心程序,然后調用讀取目標網頁,最后輸出對應標簽下的內容。

 

pq()是一個功能強大的方法,跟jQuery的$()如出一轍,jQuery的選擇器基本上都能使用在phpQuery上,只要把“.”變成 “->”。

如上例中,pq(".blkTop h1:eq(0)")抓取了頁面class屬性為blkTop的DIV元素,並找到該DIV內部的第一個h1標簽,

然后用html()方法獲取h1標簽里 的內容(帶html標簽),也就是我們要獲取的頭條信息,如果使用text()方法,則只獲取頭條的文本內容。

當然要使用好phpQuery,關鍵是要找 對文檔中對應內容的節點。

4.具體的采集代碼如下:

<?php
header('Content-Type:text/html;charset=UTF-8');
include 'phpQuery/phpQuery.php';

set_time_limit(0);

$id = isset($_GET['id']) ? intval($_GET['id']) : 1;


if($id > 14172){
   echo "finish!";
   exit;
}



$con = mysql_connect('localhost',"root","root");
mysql_set_charset("utf8");
$select = mysql_select_db("wh");

$sql = "select catalog_addr from ptg where id = {$id}";
$query = mysql_query($sql);
$row = mysql_fetch_assoc($query);
// var_dump($row);
$url = stripslashes(trim($row['catalog_addr']));
$url = str_replace("html","htm",$url); 
// echo $url;
// exit;


// echo "aaaaa";
// phpQuery::newDocumentFile('http://helloweba.com/blog.html');
####$url,就是單純的url地址
phpQuery::newDocumentFile($url); // echo "aaaaa"; // exit; $artList = pq(".proTitle"); // var_dump($artList); #### No.1 foreach($artList as $li){ $one = pq($li)->find('h1')->html(); $one = strip_tags($one); $one = trim($one); echo $one.'<br/>'; } #### No.2 $artList_a = pq("#dvProApp"); foreach($artList_a as $li){ $two_a = pq($li) -> find('tr td')->eq(1) ->html(); $two_b = pq($li) -> find('tr td')->eq(1) ->find('a') ->html(); $two_a = strip_tags($two_a); $two_a = trim($two_a); // var_dump($two_a); // echo '<br/>'; $two_b = strip_tags($two_b); $two_b = trim($two_b); // var_dump($two_b); // echo '<br/>'; $two = str_replace($two_b,"",$two_a); $two = strip_tags($two); $two = str_replace(" ","",$two); // $two = str_replace(" ","",$two); $two = trim($two); echo $two.'<br/>'; // $two = preg_replace('/<(\/?a.*?)>/si','',$two); // echo $two; // exit; } ##### N0.3 $artList_b = pq("#dvProApp"); foreach($artList_b as $li){ $three = pq($li) -> find('tr')->eq(1)->find('td')->eq(1)->html(); $three = trim(strip_tags($three)); echo $three.'<br/>'; } ##### N0.4 $artList_c = pq("#dvProApp"); foreach($artList_c as $li){ $four = pq($li) -> find('tr')->eq(2)->find('td')->eq(1)->html(); $four = trim(strip_tags($four)); echo $four.'<br/>'; } ##### N0.5 $artList_d = pq("#dvProApp"); foreach($artList_d as $li){ $five = pq($li) -> find('tr')->eq(3)->find('td')->eq(1)->html(); $five = trim(strip_tags($five)); echo $five.'<br/>'; } ##### N0.6 $artList_e = pq("#dvProImm"); foreach($artList_e as $li){ $six = pq($li) -> find('tr')->eq(2)->find('td')->eq(1)->html(); $six = trim(strip_tags($six)); echo $six.'<br/>'; } ##### N0.7 $artList_f = pq("#dvProImm"); foreach($artList_f as $li){ $seven = pq($li) -> find('tr')->eq(2)->find('td')->eq(3)->html(); $seven = trim(strip_tags($seven)); echo $seven.'<br/>'; } ##### N0.8 $artList_g = pq("#dvProImm"); foreach($artList_g as $li){ $ba = pq($li) -> find('tr')->eq(3)->find('td')->eq(1)->html(); $ba = trim(strip_tags($ba)); echo $ba.'<br/>'; } ##### N0.9 $artList_h = pq("#dvProImm"); foreach($artList_h as $li){ $nine = pq($li) -> find('tr')->eq(3)->find('td')->eq(3)->html(); $nine = trim(strip_tags($nine)); echo $nine.'<br/>'; } ####執行插入操作 $one = "'".addslashes(trim($one))."'"; $two = "'".addslashes(trim($two))."'"; $three = "'".addslashes(trim($three))."'"; $four = "'".addslashes(trim($four))."'"; $five = "'".addslashes(trim($five))."'"; $six = "'".addslashes(trim($six))."'"; $seven = "'".addslashes(trim($seven))."'"; $ba = "'".addslashes(trim($ba))."'"; $nine = "'".addslashes(trim($nine))."'"; $sql_in = "insert into ptg_page(title,house_app,pub_app,spe_spe,pub_spe,gen_num,gen_id,gene_sym,syn)values({$one},{$two},{$three},{$four},{$five},{$six},{$seven},{$ba},{$nine})"; echo $sql_in.'<br/>'; $query_in = mysql_query($sql_in); if($query_in){ echo "success!"; }else{ $html = ''; $html = $html.$url."\r\n"; file_put_contents('error_url.txt',$html,FILE_APPEND); //用於裝載執行失敗的數據 } mysql_close($con); ?> <script> function JumpUrl(){ location.href='?id=<?php echo ($id+1);?>'; } setTimeout('JumpUrl()',1); </script>

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM