How does Python crawl JD product information and comments and store them into MySQL?-Mysql Tutorial-php.cn

Table of Contents

Build mysql data table

First version:

Second version:

Third edition:

Home

Database

Mysql Tutorial

How does Python crawl JD product information and comments and store them into MySQL?

PHPz

May 26, 2023 pm 07:58 PM

mysql python

Build mysql data table

Question: When using SQL alchemy, non-primary keys cannot be set to auto-increment, but I want this non-primary key to be used only as an index, autoincrement=True Invalid, how to make it grow automatically?

from sqlalchemy import String,Integer,Text,Column
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm import scoped_session
from sqlalchemy.ext.declarative import declarative_base
 
engine=create_engine(
    "mysql+pymysql://root:root@127.0.0.1:3306/jdcrawl?charset=utf8",
    pool_size=200,
    max_overflow=300,
    echo=False
)
 
BASE=declarative_base() # 实例化
 
class Goods(BASE):
    __tablename__=&#39;goods&#39;
    id=Column(Integer(),primary_key=True,autoincrement=True)
    sku_id = Column(String(200), primary_key=True, autoincrement=False)
    name=Column(String(200))
    price=Column(String(200))
    comments_num=Column(Integer)
    shop=Column(String(200))
    link=Column(String(200))
 
class Comments(BASE):
    __tablename__=&#39;comments&#39;
    id=Column(Integer(),primary_key=True,autoincrement=True,nullable=False)
    sku_id=Column(String(200),primary_key=True,autoincrement=False)
    comments=Column(Text())
 
BASE.metadata.create_all(engine)
Session=sessionmaker(engine)
sess_db=scoped_session(Session)

Copy after login

First version:

Problem:After crawling a few pages of comments, a blank page will be crawled. This is still the case after adding a refer

Try the solution: Change the thread pool where comments are obtained to a single thread, and increase the delay by 1s for each page of comments

# 不能爬太快！！！不然获取不到评论
 
from bs4 import BeautifulSoup
import requests
from urllib import parse
import csv,json,re
import threadpool
import time
from jd_mysqldb import Goods,Comments,sess_db
 
headers={
    &#39;user-agent&#39;: &#39;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36&#39;,
    &#39;Cookie&#39;: &#39;__jdv=76161171|baidu|-|organic|%25E4%25BA%25AC%25E4%25B8%259C|1613711947911; __jdu=16137119479101182770449; areaId=7; ipLoc-djd=7-458-466-0; PCSYCityID=CN_410000_0_0; shshshfpa=07383463-032f-3f99-9d40-639cb57c6e28-1613711950; shshshfpb=u8S9UvxK66gfIbM1mUNrIOg%3D%3D; user-key=153f6b4d-0704-4e56-82b6-8646f3f0dad4; cn=0; shshshfp=9a88944b34cb0ff3631a0a95907b75eb; __jdc=122270672; 3AB9D23F7A4B3C9B=SEELVNXBPU7OAA3UX5JTKR5LQADM5YFJRKY23Z6HDBU4OT2NWYGX525CKFFVHTRDJ7Q5DJRMRZQIQJOW5GVBY43XVI; jwotest_product=99; __jda=122270672.16137119479101182770449.1613711948.1613738165.1613748918.4; JSESSIONID=C06EC8D2E9384D2628AE22B1A6F9F8FC.s1; shshshsID=ab2ca3143928b1b01f6c5b71a15fcebe_5_1613750374847; __jdb=122270672.5.16137119479101182770449|4.1613748918&#39;,
    &#39;Referer&#39;: &#39;https://www.jd.com/&#39;
}
 
num=0   # 商品数量
comments_num=0   # 评论数量
 
# 获取商品信息和SkuId
def getIndex(url):
    session=requests.Session()
    session.headers=headers
    global num
    res=session.get(url,headers=headers)
    print(res.status_code)
    res.encoding=res.apparent_encoding
    soup=BeautifulSoup(res.text,&#39;lxml&#39;)
    items=soup.select(&#39;li.gl-item&#39;)
    for item in items[:3]:  # 爬取3个商品测试
        title=item.select_one(&#39;.p-name a em&#39;).text.strip().replace(&#39; &#39;,&#39;&#39;)
        price=item.select_one(&#39;.p-price strong&#39;).text.strip().replace(&#39;￥&#39;,&#39;&#39;)
        try:
            shop=item.select_one(&#39;.p-shopnum a&#39;).text.strip()   # 获取书籍时查找店铺的方法
        except:
            shop=item.select_one(&#39;.p-shop a&#39;).text.strip()  #   获取其他商品时查找店铺的方法
        link=parse.urljoin(&#39;https://&#39;,item.select_one(&#39;.p-img a&#39;).get(&#39;href&#39;))
        SkuId=re.search(&#39;\d+&#39;,link).group()
        comments_num=getCommentsNum(SkuId,session)
        print(SkuId,title, price, shop, link, comments_num)
        print("开始存入数据库...")
        try:
            IntoGoods(SkuId,title, price, shop, link, comments_num)
        except Exception as e:
            print(e)
            sess_db.rollback()
        num += 1
        print("正在获取评论...")
        # 获取评论总页数
        url1 = f&#39;https://club.jd.com/comment/productPageComments.action?productId={SkuId}&score=0&sortType=5&page=0&pageSize=10&#39;
        headers[&#39;Referer&#39;] = f&#39;https://item.jd.com/{SkuId}.html&#39;
        headers[&#39;Connection&#39;]=&#39;keep-alive&#39;
        res2 = session.get(url1,headers=headers)
        res2.encoding = res2.apparent_encoding
        json_data = json.loads(res2.text)
        max_page = json_data[&#39;maxPage&#39;]  # 经测试最多可获取100页评论，每页10条
        args = []
        for i in range(0, max_page):
            # 使用此链接获取评论得到的为json格式
            url2 = f&#39;https://club.jd.com/comment/productPageComments.action?productId={SkuId}&score=0&sortType=5&page={i}&pageSize=10&#39;
            # 使用此链接获取评论得到的非json格式，需要提取
            # url2_2=f&#39;https://club.jd.com/comment/productPageComments.action?callback=jQuery9287224&productId={SkuId}&score=0&sortType=5&page={i}&pageSize=10&#39;
            args.append(([session,SkuId,url2], None))
        pool2 = threadpool.ThreadPool(2)   # 2个线程
        reque2 = threadpool.makeRequests(getComments,args)  # 创建任务
        for r in reque2:
            pool2.putRequest(r) # 提交任务到线程池
        pool2.wait()
 
# 获取评论总数量
def getCommentsNum(SkuId,sess):
    headers[&#39;Referer&#39;]=f&#39;https://item.jd.com/{SkuId}.html&#39;
    url=f&#39;https://club.jd.com/comment/productCommentSummaries.action?referenceIds={SkuId}&#39;
    res=sess.get(url,headers=headers)
    try:
        res.encoding=res.apparent_encoding
        json_data=json.loads(res.text)  # json格式转为字典
        num=json_data[&#39;CommentsCount&#39;][0][&#39;CommentCount&#39;]
        return num
    except:
        return &#39;Error&#39;
 
# 获取评论
def getComments(sess,SkuId,url2):
    global comments_num
    print(url2)
    headers[&#39;Referer&#39;] = f&#39;https://item.jd.com/{SkuId}.html&#39;
    res2 = sess.get(url2,headers=headers)
    res2.encoding=&#39;gbk&#39;
    json_data=res2.text
    &#39;&#39;&#39;
    # 如果用url2_2需要进行如下操作提取json
    start = res2.text.find(&#39;jQuery9287224(&#39;) + len(&#39;jQuery9287224(&#39;)
    end = res2.text.find(&#39;);&#39;)
    json_data=res2.text[start:end]
    &#39;&#39;&#39;
    dict_data = json.loads(json_data)
    try:
        comments=dict_data[&#39;comments&#39;]
        for item in comments:
            comment=item[&#39;content&#39;].replace(&#39;\n&#39;,&#39;&#39;)
            # print(comment)
            comments_num+=1
            try:
                IntoComments(SkuId,comment)
            except Exception as e:
                print(e)
                sess_db.rollback()
    except:
        pass
 
# 商品信息入库
def IntoGoods(SkuId,title, price, shop, link, comments_num):
    goods_data=Goods(
        sku_id=SkuId,
        name=title,
        price=price,
        comments_num=comments_num,
        shop=shop,
        link=link
    )
    sess_db.add(goods_data)
    sess_db.commit()
 
# 评论入库
def IntoComments(SkuId,comment):
    comments_data=Comments(
        sku_id=SkuId,
        comments=comment
    )
    sess_db.add(comments_data)
    sess_db.commit()
 
if __name__ == &#39;__main__&#39;:
    start_time=time.time()
    urls=[]
    KEYWORD=parse.quote(input("请输入要查询的关键词："))
    for i in range(1,2):    # 爬取一页进行测试
        url=f&#39;https://search.jd.com/Search?keyword={KEYWORD}&wq={KEYWORD}&page={i}&#39;
        urls.append(([url,],None))  # threadpool要求必须这样写
    pool=threadpool.ThreadPool(2)  # 2个线程的线程池
    reque=threadpool.makeRequests(getIndex,urls)    # 创建任务
    for r in reque:
        pool.putRequest(r)  # 向线程池提交任务
    pool.wait() # 等待所有任务执行完毕
    print("共获取{}件商品，获得{}条评论，耗时{}".format(num,comments_num,time.time()-start_time))

Copy after login

Second version:

After testing, there will indeed be no blank page

Further optimization: Get reviews of more than 2 products at the same time

# 不能爬太快！！！不然获取不到评论
from bs4 import BeautifulSoup
import requests
from urllib import parse
import csv,json,re
import threadpool
import time
from jd_mysqldb import Goods,Comments,sess_db
 
headers={
    &#39;user-agent&#39;: &#39;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36&#39;,
    &#39;Cookie&#39;: &#39;__jdv=76161171|baidu|-|organic|%25E4%25BA%25AC%25E4%25B8%259C|1613711947911; __jdu=16137119479101182770449; areaId=7; ipLoc-djd=7-458-466-0; PCSYCityID=CN_410000_0_0; shshshfpa=07383463-032f-3f99-9d40-639cb57c6e28-1613711950; shshshfpb=u8S9UvxK66gfIbM1mUNrIOg%3D%3D; user-key=153f6b4d-0704-4e56-82b6-8646f3f0dad4; cn=0; shshshfp=9a88944b34cb0ff3631a0a95907b75eb; __jdc=122270672; 3AB9D23F7A4B3C9B=SEELVNXBPU7OAA3UX5JTKR5LQADM5YFJRKY23Z6HDBU4OT2NWYGX525CKFFVHTRDJ7Q5DJRMRZQIQJOW5GVBY43XVI; jwotest_product=99; __jda=122270672.16137119479101182770449.1613711948.1613738165.1613748918.4; JSESSIONID=C06EC8D2E9384D2628AE22B1A6F9F8FC.s1; shshshsID=ab2ca3143928b1b01f6c5b71a15fcebe_5_1613750374847; __jdb=122270672.5.16137119479101182770449|4.1613748918&#39;,
    &#39;Referer&#39;: &#39;https://www.jd.com/&#39;
}
 
num=0   # 商品数量
comments_num=0   # 评论数量
 
# 获取商品信息和SkuId
def getIndex(url):
    session=requests.Session()
    session.headers=headers
    global num
    res=session.get(url,headers=headers)
    print(res.status_code)
    res.encoding=res.apparent_encoding
    soup=BeautifulSoup(res.text,&#39;lxml&#39;)
    items=soup.select(&#39;li.gl-item&#39;)
    for item in items[:2]:  # 爬取2个商品测试
        title=item.select_one(&#39;.p-name a em&#39;).text.strip().replace(&#39; &#39;,&#39;&#39;)
        price=item.select_one(&#39;.p-price strong&#39;).text.strip().replace(&#39;￥&#39;,&#39;&#39;)
        try:
            shop=item.select_one(&#39;.p-shopnum a&#39;).text.strip()   # 获取书籍时查找店铺的方法
        except:
            shop=item.select_one(&#39;.p-shop a&#39;).text.strip()  #   获取其他商品时查找店铺的方法
        link=parse.urljoin(&#39;https://&#39;,item.select_one(&#39;.p-img a&#39;).get(&#39;href&#39;))
        SkuId=re.search(&#39;\d+&#39;,link).group()
        headers[&#39;Referer&#39;] = f&#39;https://item.jd.com/{SkuId}.html&#39;
        headers[&#39;Connection&#39;] = &#39;keep-alive&#39;
        comments_num=getCommentsNum(SkuId,session)
        print(SkuId,title, price, shop, link, comments_num)
        print("开始将商品存入数据库...")
        try:
            IntoGoods(SkuId,title, price, shop, link, comments_num)
        except Exception as e:
            print(e)
            sess_db.rollback()
        num += 1
        print("正在获取评论...")
        # 获取评论总页数
        url1 = f&#39;https://club.jd.com/comment/productPageComments.action?productId={SkuId}&score=0&sortType=5&page=0&pageSize=10&#39;
        res2 = session.get(url1,headers=headers)
        res2.encoding = res2.apparent_encoding
        json_data = json.loads(res2.text)
        max_page = json_data[&#39;maxPage&#39;]  # 经测试最多可获取100页评论，每页10条
        print("{}评论共{}页".format(SkuId,max_page))
        if max_page==0:
            IntoComments(SkuId,&#39;0&#39;)
        else:
            for i in range(0, max_page):
                # 使用此链接获取评论得到的为json格式
                url2 = f&#39;https://club.jd.com/comment/productPageComments.action?productId={SkuId}&score=0&sortType=5&page={i}&pageSize=10&#39;
                # 使用此链接获取评论得到的非json格式，需要提取
                # url2_2=f&#39;https://club.jd.com/comment/productPageComments.action?callback=jQuery9287224&productId={SkuId}&score=0&sortType=5&page={i}&pageSize=10&#39;
                print("开始获取第{}页评论:{}".format(i+1,url2) )
                getComments(session,SkuId,url2)
                time.sleep(1)
 
# 获取评论总数量
def getCommentsNum(SkuId,sess):
    url=f&#39;https://club.jd.com/comment/productCommentSummaries.action?referenceIds={SkuId}&#39;
    res=sess.get(url)
    try:
        res.encoding=res.apparent_encoding
        json_data=json.loads(res.text)  # json格式转为字典
        num=json_data[&#39;CommentsCount&#39;][0][&#39;CommentCount&#39;]
        return num
    except:
        return &#39;Error&#39;
 
# 获取评论
def getComments(sess,SkuId,url2):
    global comments_num
    res2 = sess.get(url2)
    res2.encoding=res2.apparent_encoding
    json_data=res2.text
    &#39;&#39;&#39;
    # 如果用url2_2需要进行如下操作提取json
    start = res2.text.find(&#39;jQuery9287224(&#39;) + len(&#39;jQuery9287224(&#39;)
    end = res2.text.find(&#39;);&#39;)
    json_data=res2.text[start:end]
    &#39;&#39;&#39;
    dict_data = json.loads(json_data)
    comments=dict_data[&#39;comments&#39;]
    for item in comments:
        comment=item[&#39;content&#39;].replace(&#39;\n&#39;,&#39;&#39;)
        # print(comment)
        comments_num+=1
        try:
            IntoComments(SkuId,comment)
        except Exception as e:
            print(e)
            sess_db.rollback()
 
# 商品信息入库
def IntoGoods(SkuId,title, price, shop, link, comments_num):
    goods_data=Goods(
        sku_id=SkuId,
        name=title,
        price=price,
        comments_num=comments_num,
        shop=shop,
        link=link
    )
    sess_db.add(goods_data)
    sess_db.commit()
 
# 评论入库
def IntoComments(SkuId,comment):
    comments_data=Comments(
        sku_id=SkuId,
        comments=comment
    )
    sess_db.add(comments_data)
    sess_db.commit()
 
if __name__ == &#39;__main__&#39;:
    start_time=time.time()
    urls=[]
    KEYWORD=parse.quote(input("请输入要查询的关键词："))
    for i in range(1,2):    # 爬取一页进行测试
        url=f&#39;https://search.jd.com/Search?keyword={KEYWORD}&wq={KEYWORD}&page={i}&#39;
        urls.append(([url,],None))  # threadpool要求必须这样写
    pool=threadpool.ThreadPool(2)  # 2个线程的线程池
    reque=threadpool.makeRequests(getIndex,urls)    # 创建任务
    for r in reque:
        pool.putRequest(r)  # 向线程池提交任务
    pool.wait() # 等待所有任务执行完毕
    print("共获取{}件商品，获得{}条评论，耗时{}".format(num,comments_num,time.time()-start_time))

Copy after login

Third edition:

. . . . No, a blank page appears again

# 不能爬太快！！！不然获取不到评论
from bs4 import BeautifulSoup
import requests
from urllib import parse
import csv,json,re
import threadpool
import time
from jd_mysqldb import Goods,Comments,sess_db
 
headers={
    &#39;user-agent&#39;: &#39;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36&#39;,
    &#39;Cookie&#39;: &#39;__jdv=76161171|baidu|-|organic|%25E4%25BA%25AC%25E4%25B8%259C|1613711947911; __jdu=16137119479101182770449; areaId=7; ipLoc-djd=7-458-466-0; PCSYCityID=CN_410000_0_0; shshshfpa=07383463-032f-3f99-9d40-639cb57c6e28-1613711950; shshshfpb=u8S9UvxK66gfIbM1mUNrIOg%3D%3D; user-key=153f6b4d-0704-4e56-82b6-8646f3f0dad4; cn=0; shshshfp=9a88944b34cb0ff3631a0a95907b75eb; __jdc=122270672; 3AB9D23F7A4B3C9B=SEELVNXBPU7OAA3UX5JTKR5LQADM5YFJRKY23Z6HDBU4OT2NWYGX525CKFFVHTRDJ7Q5DJRMRZQIQJOW5GVBY43XVI; jwotest_product=99; __jda=122270672.16137119479101182770449.1613711948.1613738165.1613748918.4; JSESSIONID=C06EC8D2E9384D2628AE22B1A6F9F8FC.s1; shshshsID=ab2ca3143928b1b01f6c5b71a15fcebe_5_1613750374847; __jdb=122270672.5.16137119479101182770449|4.1613748918&#39;,
    &#39;Referer&#39;: &#39;https://www.jd.com/&#39;
}
 
num=0   # 商品数量
comments_num=0   # 评论数量
 
# 获取商品信息和SkuId
def getIndex(url):
    global num
    skuids=[]
    session=requests.Session()
    session.headers=headers
    res=session.get(url,headers=headers)
    print(res.status_code)
    res.encoding=res.apparent_encoding
    soup=BeautifulSoup(res.text,&#39;lxml&#39;)
    items=soup.select(&#39;li.gl-item&#39;)
    for item in items[:3]:  # 爬取3个商品测试
        title=item.select_one(&#39;.p-name a em&#39;).text.strip().replace(&#39; &#39;,&#39;&#39;)
        price=item.select_one(&#39;.p-price strong&#39;).text.strip().replace(&#39;￥&#39;,&#39;&#39;)
        try:
            shop=item.select_one(&#39;.p-shopnum a&#39;).text.strip()   # 获取书籍时查找店铺的方法
        except:
            shop=item.select_one(&#39;.p-shop a&#39;).text.strip()  #   获取其他商品时查找店铺的方法
        link=parse.urljoin(&#39;https://&#39;,item.select_one(&#39;.p-img a&#39;).get(&#39;href&#39;))
        SkuId=re.search(&#39;\d+&#39;,link).group()
        skuids.append(([SkuId,session],None))
        headers[&#39;Referer&#39;] = f&#39;https://item.jd.com/{SkuId}.html&#39;
        headers[&#39;Connection&#39;] = &#39;keep-alive&#39;
        comments_num=getCommentsNum(SkuId,session)  # 评论数量
        print(SkuId,title, price, shop, link, comments_num)
        print("开始将商品存入数据库...")
        try:
            IntoGoods(SkuId,title, price, shop, link, comments_num)
        except Exception as e:
            print(e)
            sess_db.rollback()
        num += 1
    print("开始获取评论并存入数据库...")
    pool2=threadpool.ThreadPool(3)   # 可同时获取3个商品的评论
    task=threadpool.makeRequests(getComments,skuids)
    for r in task:
        pool2.putRequest(r)
    pool2.wait()
 
# 获取评论
def getComments(SkuId,sess):
    # 获取评论总页数
    url1 = f&#39;https://club.jd.com/comment/productPageComments.action?productId={SkuId}&score=0&sortType=5&page=0&pageSize=10&#39;
    res2 = sess.get(url1, headers=headers)
    res2.encoding = res2.apparent_encoding
    json_data = json.loads(res2.text)
    max_page = json_data[&#39;maxPage&#39;]  # 经测试最多可获取100页评论，每页10条
    print("{}评论共{}页".format(SkuId, max_page))
    if max_page == 0:
        IntoComments(SkuId, &#39;0&#39;)
    else:
        for i in range(0, max_page):
            # 使用此链接获取评论得到的为json格式
            url2 = f&#39;https://club.jd.com/comment/productPageComments.action?productId={SkuId}&score=0&sortType=5&page={i}&pageSize=10&#39;
            # 使用此链接获取评论得到的非json格式，需要提取
            # url2_2=f&#39;https://club.jd.com/comment/productPageComments.action?callback=jQuery9287224&productId={SkuId}&score=0&sortType=5&page={i}&pageSize=10&#39;
            print("开始获取第{}页评论:{}".format(i + 1, url2))
            getComments_one(sess, SkuId, url2)
            time.sleep(1)
 
# 获取评论总数量
def getCommentsNum(SkuId,sess):
    url=f&#39;https://club.jd.com/comment/productCommentSummaries.action?referenceIds={SkuId}&#39;
    res=sess.get(url)
    try:
        res.encoding=res.apparent_encoding
        json_data=json.loads(res.text)  # json格式转为字典
        num=json_data[&#39;CommentsCount&#39;][0][&#39;CommentCount&#39;]
        return num
    except:
        return &#39;Error&#39;
 
# 获取单个评论
def getComments_one(sess,SkuId,url2):
    global comments_num
    res2 = sess.get(url2)
    res2.encoding=res2.apparent_encoding
    json_data=res2.text
    &#39;&#39;&#39;
    # 如果用url2_2需要进行如下操作提取json
    start = res2.text.find(&#39;jQuery9287224(&#39;) + len(&#39;jQuery9287224(&#39;)
    end = res2.text.find(&#39;);&#39;)
    json_data=res2.text[start:end]
    &#39;&#39;&#39;
    dict_data = json.loads(json_data)
    comments=dict_data[&#39;comments&#39;]
    for item in comments:
        comment=item[&#39;content&#39;].replace(&#39;\n&#39;,&#39;&#39;)
        # print(comment)
        comments_num+=1
        try:
            IntoComments(SkuId,comment)
        except Exception as e:
            print(e)
            print("rollback！")
            sess_db.rollback()
 
# 商品信息入库
def IntoGoods(SkuId,title, price, shop, link, comments_num):
    goods_data=Goods(
        sku_id=SkuId,
        name=title,
        price=price,
        comments_num=comments_num,
        shop=shop,
        link=link
    )
    sess_db.add(goods_data)
    sess_db.commit()
 
# 评论入库
def IntoComments(SkuId,comment):
    comments_data=Comments(
        sku_id=SkuId,
        comments=comment
    )
    sess_db.add(comments_data)
    sess_db.commit()
 
if __name__ == &#39;__main__&#39;:
    start_time=time.time()
    urls=[]
    KEYWORD=parse.quote(input("请输入要查询的关键词："))
    for i in range(1,2):    # 爬取一页进行测试
        url=f&#39;https://search.jd.com/Search?keyword={KEYWORD}&wq={KEYWORD}&page={i}&#39;
        urls.append(([url,],None))  # threadpool要求必须这样写
    pool=threadpool.ThreadPool(2)  # 2个线程的线程池
    reque=threadpool.makeRequests(getIndex,urls)    # 创建任务
    for r in reque:
        pool.putRequest(r)  # 向线程池提交任务
    pool.wait() # 等待所有任务执行完毕
    print("共获取{}件商品，获得{}条评论，耗时{}".format(num,comments_num,time.time()-start_time))

Copy after login

The above is the detailed content of How does Python crawl JD product information and comments and store them into MySQL?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055612 fails to install in Windows 10?

4 weeks ago By DDD

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks ago By DDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Nordhold: Fusion System, Explained

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1669

CakePHP Tutorial

1428

Laravel Tutorial

1329

PHP Tutorial

1273

C# Tutorial

1256

Related knowledge

MySQL and phpMyAdmin: Core Features and Functions Apr 22, 2025 am 12:12 AM

MySQL and phpMyAdmin are powerful database management tools. 1) MySQL is used to create databases and tables, and to execute DML and SQL queries. 2) phpMyAdmin provides an intuitive interface for database management, table structure management, data operations and user permission management.

Python vs. JavaScript: Development Environments and Tools Apr 26, 2025 am 12:09 AM

Both Python and JavaScript's choices in development environments are important. 1) Python's development environment includes PyCharm, JupyterNotebook and Anaconda, which are suitable for data science and rapid prototyping. 2) The development environment of JavaScript includes Node.js, VSCode and Webpack, which are suitable for front-end and back-end development. Choosing the right tools according to project needs can improve development efficiency and project success rate.

Python vs. C : Understanding the Key Differences Apr 21, 2025 am 12:18 AM

Python and C each have their own advantages, and the choice should be based on project requirements. 1) Python is suitable for rapid development and data processing due to its concise syntax and dynamic typing. 2)C is suitable for high performance and system programming due to its static typing and manual memory management.

Golang vs. Python: The Pros and Cons Apr 21, 2025 am 12:17 AM

Golangisidealforbuildingscalablesystemsduetoitsefficiencyandconcurrency,whilePythonexcelsinquickscriptinganddataanalysisduetoitssimplicityandvastecosystem.Golang'sdesignencouragesclean,readablecodeanditsgoroutinesenableefficientconcurrentoperations,t

Laravel vs. Python (with Frameworks): A Comparative Analysis Apr 21, 2025 am 12:15 AM

Laravel is suitable for projects that teams are familiar with PHP and require rich features, while Python frameworks depend on project requirements. 1.Laravel provides elegant syntax and rich features, suitable for projects that require rapid development and flexibility. 2. Django is suitable for complex applications because of its "battery inclusion" concept. 3.Flask is suitable for fast prototypes and small projects, providing great flexibility.

Explain the purpose of foreign keys in MySQL. Apr 25, 2025 am 12:17 AM

In MySQL, the function of foreign keys is to establish the relationship between tables and ensure the consistency and integrity of the data. Foreign keys maintain the effectiveness of data through reference integrity checks and cascading operations. Pay attention to performance optimization and avoid common errors when using them.

Compare and contrast MySQL and MariaDB. Apr 26, 2025 am 12:08 AM

The main difference between MySQL and MariaDB is performance, functionality and license: 1. MySQL is developed by Oracle, and MariaDB is its fork. 2. MariaDB may perform better in high load environments. 3.MariaDB provides more storage engines and functions. 4.MySQL adopts a dual license, and MariaDB is completely open source. The existing infrastructure, performance requirements, functional requirements and license costs should be taken into account when choosing.

SQL vs. MySQL: Clarifying the Relationship Between the Two Apr 24, 2025 am 12:02 AM

SQL is a standard language for managing relational databases, while MySQL is a database management system that uses SQL. SQL defines ways to interact with a database, including CRUD operations, while MySQL implements the SQL standard and provides additional features such as stored procedures and triggers.

See all articles