php獲取頁面並切割頁面div內容

本文轉載自查看原文 2012-08-06 10:05 4624 PHP/ php

亮點：

1、利用php也能實現對頁面div的切割處理。這里的做法拋磚引玉，希望讀者能夠提供更加完美的解決方案。

2、切割處理方法已經封裝成一個方法，可以直接引用。

3、順便加上博客園標簽雲的截取。//getWebDiv('id="taglist"','http://www.cnblogs.com/Zjmainstay/tag/');

View Code

<?php
    header("Content-type: text/html; charset=utf-8"); 
    function getWebDiv($div_id,$url=false,$data=false){
        if($url !== false){
            $data = file_get_contents( $url );
        }
        $charset_pos = stripos($data,'charset');
        if($charset_pos) {
            if(stripos($data,'charset=utf-8',$charset_pos)) {
                $data = iconv('utf-8','utf-8',$data);
            }else if(stripos($data,'charset=gb2312',$charset_pos)) {
                $data = iconv('gb2312','utf-8',$data);
            }else if(stripos($data,'charset=gbk',$charset_pos)) {
                $data = iconv('gbk','utf-8',$data);
            }
        }
        
        preg_match_all('/<div/i',$data,$pre_matches,PREG_OFFSET_CAPTURE);    //獲取所有div前綴
        preg_match_all('/<\/div/i',$data,$suf_matches,PREG_OFFSET_CAPTURE); //獲取所有div后綴
        $hit = strpos($data,$div_id);
        if($hit == -1) return false;    //未命中
        $divs = array();    //合並所有div
        foreach($pre_matches[0] as $index=>$pre_div){
            $divs[(int)$pre_div[1]] = 'p';
            $divs[(int)$suf_matches[0][$index][1]] = 's';    
        }
        
        //對div進行排序
        $sort = array_keys($divs);
        asort($sort);
        
        $count = count($pre_matches[0]);
        foreach($pre_matches[0] as $index=>$pre_div){
            //<div $hit <div+1    時div被命中
            if(($pre_matches[0][$index][1] < $hit) && ($hit < $pre_matches[0][$index+1][1])){
                $deeper = 0;
                //彈出被命中div前的div
                while(array_shift($sort) != $pre_matches[0][$index][1] && ($count--)) continue;
                //對剩余div進行匹配，若下一個為前綴，則向下一層，$deeper加1，
                //否則后退一層，$deeper減1，$deeper為0則命中匹配，計算div長度
                foreach($sort as $key){
                    if($divs[$key] == 'p') $deeper++;
                    else if($deeper == 0) {
                        $length = $key-$pre_matches[0][$index][1];
                        break;
                    }else {
                        $deeper--;
                    }
                }
                $hitDivString = substr($data,$pre_matches[0][$index][1],$length).'</div>';
                break;
            }
        }
        return $hitDivString;
    }
    
    echo getWebDiv('id="taglist"','http://www.cnblogs.com/Zjmainstay/tag/');

//End_php

考慮到id符號問題，id="u"由用戶自己填寫。

聲明：此段php只針對帶 id div內容的讀取。

——————————————————————————完善：匹配任意可閉合帶id標簽————————————————————————————————————————————

View Code

 1 <?php
 2     header("Content-type: text/html; charset=utf-8"); 
 3     function getWebTag($tag_id,$url=false,$tag='div',$data=false){
 4         if($url !== false){
 5             $data = file_get_contents( $url );
 6         }
 7         $charset_pos = stripos($data,'charset');
 8         if($charset_pos) {
 9             if(stripos($data,'charset=utf-8',$charset_pos)) {
10                 $data = iconv('utf-8','utf-8',$data);
11             }else if(stripos($data,'charset=gb2312',$charset_pos)) {
12                 $data = iconv('gb2312','utf-8',$data);
13             }else if(stripos($data,'charset=gbk',$charset_pos)) {
14                 $data = iconv('gbk','utf-8',$data);
15             }
16         }
17         
18         preg_match_all('/<'.$tag.'/i',$data,$pre_matches,PREG_OFFSET_CAPTURE);    //獲取所有div前綴
19         preg_match_all('/<\/'.$tag.'/i',$data,$suf_matches,PREG_OFFSET_CAPTURE); //獲取所有div后綴
20         $hit = strpos($data,$tag_id);
21         if($hit == -1) return false;    //未命中
22         $divs = array();    //合並所有div
23         foreach($pre_matches[0] as $index=>$pre_div){
24             $divs[(int)$pre_div[1]] = 'p';
25             $divs[(int)$suf_matches[0][$index][1]] = 's';    
26         }
27         
28         //對div進行排序
29         $sort = array_keys($divs);
30         asort($sort);
31         
32         $count = count($pre_matches[0]);
33         foreach($pre_matches[0] as $index=>$pre_div){
34             //<div $hit <div+1    時div被命中
35             if(($pre_matches[0][$index][1] < $hit) && ($hit < $pre_matches[0][$index+1][1])){
36                 $deeper = 0;
37                 //彈出被命中div前的div
38                 while(array_shift($sort) != $pre_matches[0][$index][1] && ($count--)) continue;
39                 //對剩余div進行匹配，若下一個為前綴，則向下一層，$deeper加1，
40                 //否則后退一層，$deeper減1，$deeper為0則命中匹配，計算div長度
41                 foreach($sort as $key){
42                     if($divs[$key] == 'p') $deeper++;
43                     else if($deeper == 0) {
44                         $length = $key-$pre_matches[0][$index][1];
45                         break;
46                     }else {
47                         $deeper--;
48                     }
49                 }
50                 $hitDivString = substr($data,$pre_matches[0][$index][1],$length).'</'.$tag.'>';
51                 break;
52             }
53         }
54         return $hitDivString;
55     }
56     
57     echo getWebTag('id="nav"','http://mail.163.com/html/mail_intro/','ul');
58     echo getWebTag('id="homeBanners"','http://mail.163.com/html/mail_intro/');
59     echo getWebTag('id="performance"','http://mail.163.com/html/mail_intro/','section');
60 
61 //End_php

修復：stripos($data,'charset=utf-8',$charset_pos) 加入charset=，避免有些gb2312格式的網頁中包含utf-8造成錯誤。或者用戶可以自行修改函數傳入一個確定的charset參數。

演示地址：parseDiv

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 PHP切割漢字 PHP CURL獲取頁面內容輸出例子 PHP獲取微信頁面的指定內容 PHP curl獲取頁面內容，不直接輸出到頁面，CURLOPT_RETURNTRANSFER參數設置 java獲取靜態頁面內容 js獲取頁面高度賦值給div 高德地圖+vue實現頁面點擊繪制多邊形及多邊形切割拆分 PHP cURL庫函數抓取頁面內容 PHP cURL庫函數抓取頁面內容 C#獲取頁面內容的幾種方式