使用 python 将非唯一行附加到另一个数据库
嘿大家, 我有两个数据库。一个有 145000 行,大约。 12 列。我有另一个数据库,约有 40000 行和 5 列。我正在尝试根据两列值进行比较。例如,如果在 CSV#1 中,第 1 列显示 100-199,第二列显示 Main St(意味着该行包含在主街的 100 个街区内),我将如何将其与 CSV# 中类似的两列进行比较2.我需要将 CSV#1 中的每一行与 CSV#2 中的每一行进行比较。如果存在匹配项,我需要将每个匹配行的 5 列附加到 CSV#2 行的末尾。因此,无论列的排序方式如何,CSV#2 的列数都会显着增加并具有重复条目。有关如何将两列与单独数据库中的另外两列进行比较,然后迭代所有行的任何建议。到目前为止,我一直在使用 python 和 import csv 来完成其余的工作,但这部分问题让我难住了。
提前致谢 -约翰
Hey all,
I have two databases. One with 145000 rows and approx. 12 columns. I have another database with around 40000 rows and 5 columns. I am trying to compare based on two columns values. For example if in CSV#1 column 1 says 100-199 and column two says Main St(meaning that this row is contained within the 100 block of main street), how would I go about comparing that with a similar two columns in CSV#2. I need to compare every row in CSV#1 to each single row in CSV#2. If there is a match I need to append the 5 columns of each matching row to the end of the row of CSV#2. Thus CSV#2's number of columns will grow significantly and have repeat entries, doesnt matter how the columns are ordered. Any advice on how to compare two columns with another two columns in a separate database and then iterate across all rows. I've been using python and the import csv so far with the rest of the work, but this part of the problem has me stumped.
Thanks in advance
-John
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
csv 文件不是数据库。 csv 文件只是文本块的行;适当的数据库(例如 PostgreSQL 或 Mysql 或 SQL Server 或 SQLite 或许多其他)为您提供正确的数据类型和表连接和索引以及行迭代以及对多个匹配和许多匹配的正确处理其他您确实不想从头开始重写的东西。
它怎么知道Address("100-199")==Address("Main Street")?您必须想出某种知识库,将每一位文本转换为规范的地址或地址范围,然后您可以进行比较;请参阅哪里有一个好的地址解析器,但要注意它处理的是单一地址(不是地址范围)。
编辑:
感谢斯文;如果您使用的是真实的数据库,您可以执行类似
if街道号码和街道名称完全匹配的操作;否则你仍然需要考虑上面的第 2 点。
A csv file is NOT a database. A csv file is just rows of text-chunks; a proper database (like PostgreSQL or Mysql or SQL Server or SQLite or many others) gives you proper data types and table joins and indexes and row iteration and proper handling of multiple matches and many other things which you really don't want to rewrite from scratch.
How is it supposed to know that Address("100-199")==Address("Main Street")? You will have to come up with some sort of knowledge-base which transforms each bit of text into a canonical address or address-range which you can then compare; see Where is a good Address Parser but be aware that it deals with singular addresses (not address ranges).
Edit:
Thanks to Sven; if you were using a real database, you could do something like
if streetnumber and streetname are exact matches; otherwise you still need to consider point #2 above.