java基礎之“在后端使用爬蟲Jsoup工具根據標簽id獲取字符串中的標簽html代碼(java后端實現前端根據標簽id獲取標簽對象)”


一.場景

在電商項目中產品描述時必不可少的存在,每個不同的項目所需的描述不同,不能一概而論

在產品的描述中的部分數據是我們所需要的,如價格,尺碼表等

如何在不依靠前端的前提下,完成數據的提取就成了問題

二.思路

首先看產品描述的存儲方式:我這邊是直接整個以字符串存儲在表字段中,

盡然是字符串,那我們就能使用Jsoup工具類來獲取Document對象(也可以用其他的方案)

再用getElementById("標簽id")方法獲取標簽對象

因為我這里是直接要標簽對象(包括html標簽)

所以我直接toString()既可,如果是要內部的內容,不要html標簽,就用test()方法

三.需要獲取的結果

三.代碼

/**
* 功能描述: 實現在java中根據字符串中的標簽id獲取對應的標簽對象
*
* @author 王子威
*/
@Test
public void extractChart()
{
    // 產品描述:假數據
    String desc = "<p align=\"center\"></p>\n" + "<p align=\"center\">啊啊啊啊</p>\n" + "<p align=\"center\"></p>\n" + "<p " + "align=\"center\">\n" + "</p>\n"
            + "<div id=\"sizechart-template1\">\n" + "<table " + "border=\"1\" style=\"width:800px;margin:10px auto;\">\n" + "<thead> <tr>\n" + "<td "
            + "style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;" + "\">Size</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;" + "text-align:center;\">Label Size</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;" + "font-family:Arial;padding:5px;text-align:center;\">Bust</td>\n" + "<td style=\"font-size"
            + ":11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Waist</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;" + "\">Length</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;" + "padding:5px;text-align:center;\">Height</td>\n" + "</tr>\n" + "</thead> <tbody>\n" + "<tr>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">100</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">56cm/22.0</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">23cm/9.1</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">11cm/4.3</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">36cm/14.2</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">36cm/14.2</td>\n" + "</tr>\n"
            + "<tr><td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">100</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">56cm/22.0</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">23cm/9.1</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">11cm/4.3</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">36cm/14.2</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">36cm/14.2</td>\n" + "</tr>\n" + "</tbody>\n" + "</table>\n"
            + "</div>\n" + "<br />\n" + "<br />\n" + "<div id=\"sizechart-template2\"><table border=\"1\" style=\"width:800px;margin:10px auto;\">\n" + "<tbody>\n" + "<tr>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Size:100</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Label Size:56cm/22.0</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Bust:23cm/9.1</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Waist:11cm/4.3</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Length:36cm/14.2</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Height:36cm/14.2</td>\n" + "</tr>\n" + "<tr>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Size:100</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Label Size:56cm/22.0</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Bust:23cm/9.1</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Waist:11cm/4.3</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Length:36cm/14.2</td>\n"
            + "<td style=\"font-size:11pt;font-weight:700;font-family:Arial;padding:5px;text-align:center;\">Height:36cm/14.2</td>\n" + "</tr>\n" + "</tbody>\n" + "</table>\n"
            + "</div>\n" + "<p></p>\n" + "<p align=\"center\"></p>\n" + "<p align=\"center\" style=\"text-align:left;\"></p>\n" + "<p align=\"center\"></p>\n"
            + "<p align=\"center\"><img src=\"https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fup.enterdesk.com%2Fedpic%2F09%2F3a%2Fbc%2F093abce7b31f4c8ffdbf345375ff4abb.jpg&refer=http%3A%2F%2Fup.enterdesk.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=auto?sec=1652421336&t=d2da9a6657364617cdcdbf0aa8e0002e\" /></p>\n"
            + "<p align=\"center\"></p>\n"
            + "<p align=\"center\"><p align=\"center\"><img src=\"https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fimg.jj20.com%2Fup%2Fallimg%2F1111%2F04261Q53521%2F1P426153521-1-1200.jpg&refer=http%3A%2F%2Fimg.jj20.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=auto?sec=1652421677&t=05a703168ad75e76ce2bddf3b32382dd\" /></p>\n" + "</p>";
    // 獲取Document對象
    Document doc = Jsoup.parse(desc);
    
    // 根據<div>標簽中的id獲取標簽對象
    Element elementById1 = doc.getElementById("sizechart-template1");
    Element elementById2 = doc.getElementById("sizechart-template2");
    // 標簽轉String
    String a = elementById1.toString();
    System.out.println("a = " + a);
    String b = elementById2.toString();
    System.out.println("b = " + b);
    // 獲取內容
    String text = elementById1.text();
    System.out.println("text = " + text);
}

 

結果

標簽對象1

標簽對象2

標簽對象內容

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。