新学python,在做一个把爬虫爬下来的股票数据从文件插入到mysql数据库,速度很慢,不知道瓶颈是文件处理还是sql插入问题,望大神不吝赐教,感激不尽!
# encoding: utf-8
#!/usr/bin/python
import MySQLdb
import os
db = MySQLdb.connect(host="localhost",user="root",passwd="4242",\
db="STOCKS", charset="utf8", use_unicode=True)
cursor = db.cursor()
files = os.listdir("shanghaiHistory")
for fileName in files:
print "start insert " + fileName
isTheFirstLine = True
for line in open("shanghaiHistory/" + fileName):
if isTheFirstLine:
isTheFirstLine = False
continue
lineList = line.split(',')
lineList.insert(0,fileName[:6])
sql = "insert into shanghaiHistory(Code,Date,Open,High,Low,Close,Volume,AdjClose)\
values ('%s','%s','%s','%s','%s','%s','%s','%s')" % tuple(lineList)
try:
cursor.execute(sql)
db.commit()
except:
db.rollback()
db.close()
文件格式是形如:
Date,Open,High,Low,Close,Volume,Adj Close
2012-12-21,0.9151,0.9151,0.9151,0.9151,000,0.9151
2012-12-20,0.896,0.898,0.896,0.898,22924300,0.898
2012-12-19,0.895,0.898,0.895,0.897,37858100,0.897
2012-12-18,0.896,0.896,0.895,0.896,43542500,0.896
2012-12-17,0.895,0.897,0.895,0.896,41877700,0.896
2012-12-14,0.895,0.897,0.894,0.897,35734700,0.897
2012-12-13,0.894,0.896,0.894,0.895,11613800,0.895
2012-12-12,0.89,0.895,0.89,0.894,26586500,0.894
两个多小时才插入45W条数据,怎么才能提速一点呢.
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号
很常识性的优化:批量插入,然后再提交,而不是插一条提交一条。
另一个常识性的优化:使用 executemany 来一次执行多条(但也不要太多)行的插入。
最后,但是也是非常重要的,请尽量不要自己拼接 SQL 语句!
.execute方法可以接受两个参数,第一个是 SQL 模板,第二个是参数值的 tuple。加粗文字