需要用python寫個腳本出來提取img標簽和src的內容,在存數據的時候,搞藤了很久,原因是把list類型的數據直接放入sql語句里面了,一直報下面這個錯誤
腦子抽了,以為是src里面轉義字符的問題,就一直往這個方向整
后面才發現,是直接把list類型放sql format里面了,然后將[]一起轉成了字符串 如 '[' http://www.baidu.com ']'
執行的時候將'['作為了一個字符,后面的http.......就肯定識別不到了嘛,,哎呀,,,笨
解決:將list的元素插入sql 占位符對應位置,而不是將 imgSrc直接放img占位的地方
源碼如下:
# coding=utf-8 import pymssql import re def connectDB(): conn = pymssql.connect(server='****', user='User', password='****', database='*****', charset='cp936') cur = conn.cursor() sql = 'select ProductID,Content from Products WHERE (not Content IS NULL )' cur.execute(sql) row = cur.fetchone() resultList = [] while row: # print("ProductID=%s,Content=%s" % (row[0], row[1])) result = parseContent(row[1]) if result: tmp = [] # print("解析出的img為:") # print(result) # tmp.append(int(row[0])) # 將productID轉成int類型,方便下面的比較 tmp.append(row[0]) tmp.append(result) resultList.append(tmp) try: row = cur.fetchone() except UnicodeDecodeError: continue conn.close() return resultList def parseContent(content): pattern = '<img[^>]*/>' result = re.findall(pattern, content) return result def saveImg(resultList): productIdList = getExtraBookProductIDList() conn = pymssql.connect(server='****', user='User', password='****', database='*****', charset='cp936') cur = conn.cursor() for result in resultList: # 遍歷解析出來的imgList if result[0] in productIdList: # 提取src imgSrc = getImgSrc(result[1]) for img in imgSrc: sql_1 = """update ExtraBookInfo set YImage='{img}' WHERE ProductID='{pID}'""".format( img=img, pID=result[0]) print(sql_1) cur.execute(sql_1) conn.commit() else: # sql_2 = """insert into ExtraBookInfo (ProductID,YImage) values( '{pID}','{img}')""".format( # pID=result[0], img=tmp) for img in imgSrc: cur.execute('insert into ExtraBookInfo ProductID,YImage values(%s,%s)', (result[0], img)) conn.commit() conn.close() def getExtraBookProductIDList(): conn = pymssql.connect(server='****', user='User', password='****', database='*****', charset='cp936') cur = conn.cursor() sql = 'select ProductID from ExtraBookInfo' cur.execute(sql) productIdList = [] row = cur.fetchone() while row: productIdList.append(row[0]) try: row = cur.fetchone() except UnicodeDecodeError: continue conn.close() return productIdList def getImgSrc(result): for r in result: pattern_2 = 'http.*?\.jpg' p2 = re.findall(pattern_2, r) print(p2) return p2 resultList = connectDB() saveImg(resultList)
*********
*******
不要軸。。。。。。。。