SQLite分析大表格
我的桌子大小为649876358线。我需要在列中获取具有重复值的所有行,并在另一个文件中写入每个组,但是sqlite工作非常慢(10分钟后,没有一个组未写)。我为更快的工作做了什么?
我使用了python3 + sqlite
db = sql.connect(path_to_db)
with db:
db_cursor = db.cursor()
countLine = 649876358 #db_cursor.execute("SELECT COUNT(*) FROM 'T'").fetchall()[0][0]
for i in range(1, countLine + 1):
line = db_cursor.execute(f"SELECT * FROM 'T' WHERE ROWID = {i}", ).fetchall()[0]
value = line[0]
fileName = f"{path_to_save}{value}.json"
if os.path.exists(fileName):
continue
allValue = db_cursor.execute(f"SELECT * FROM 'T' WHERE value = '{value}'").fetchall()
if len(allValue) > 1:
with open(fileName, "w") as file:
json.dump(allValue, file)
i have tables in sqlite size of 649876358 line. I need get all lines with repetitive value in column and write every group in another file but sqlite working very slowing (after 10 minutes not one group not writed). What i have do for faster work?
I used python3 + sqlite
db = sql.connect(path_to_db)
with db:
db_cursor = db.cursor()
countLine = 649876358 #db_cursor.execute("SELECT COUNT(*) FROM 'T'").fetchall()[0][0]
for i in range(1, countLine + 1):
line = db_cursor.execute(f"SELECT * FROM 'T' WHERE ROWID = {i}", ).fetchall()[0]
value = line[0]
fileName = f"{path_to_save}{value}.json"
if os.path.exists(fileName):
continue
allValue = db_cursor.execute(f"SELECT * FROM 'T' WHERE value = '{value}'").fetchall()
if len(allValue) > 1:
with open(fileName, "w") as file:
json.dump(allValue, file)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
有几件事会有所帮助,但首先我会在您的数据库中添加索引:
我刚刚创建了一个带有3列ID,名称,类似数字的数据库,其中有100万行(您有600百万行):
搜索对于我的数据库中的10个名称,在索引看起来像这样:
创建索引之后:
您还可以设置一个光标来进行第二次搜索,而不是进行计数/范围。您的第一个查询可能是:
然后使用结果来生成第二个查询:
最后,SQLite可能不是与此行相关的最佳数据库。您可以使用更性能的数据库,然后将sqlite One将其转移到MariadB之类的东西上:
然后使用以下命令进入新数据库中:
There are a few things that would help but first of all I would add an index to your database:
I just created a database with 3 columns id, name, number like this with 1 million lines (you have 600+ million lines):
Searching for 10 names within my database before the index looked like this:
After the index was created:
Your can also set up a multiple cursors to do the second search rather than do a count/range. Your first query could be:
Then use the results to generate the a second query:
Finally SQLite might not be the best database to work with for this volume of rows. You could use a more performant database and dump the SQLite one to something like MariaDB:
Then into your new database with a command like: