节点导入和性能问题
对于我的一位客户,我必须导入政府提供的 Medicare 计划的 CSV(第一部分在 此处提供)导入到 Drupal 7 中。该 CSV 中有大约 500,000 行数据,其中大部分数据仅在 FIPS 县代码字段上有所不同 - 基本上,计划可用的每个县都算作一行。
我应该将所有 500k 行作为单独的节点导入到 Drupal 7 中,还是为每个计划创建一个节点并将与该计划关联的大量 FIPS 代码放入多值文本字段中?我一开始选择了后一条路线,但是当我查看计划数据库时,似乎有 10,000 多个县提供了一些计划。我想找到最有效的、Drupal 式的解决方案来存储所有这些计划以及它们的可用位置。
For one of my clients I have to import a CSV of Medicare plans provided by the government (part one provided here) into Drupal 7. There are about 500,000 rows of data in that CSV, most of which differ only by the FIPS County code field - basically, every county that a plan is available in counts as one row.
Should I import all 500k rows into Drupal 7 as individual nodes, or create a single node for every plan and put the numerous FIPS codes associated with that plan in a multi-value text field? I opted for the latter route to begin with, however when I looked in the plan database it looks like some plans are available in more than 10,000 counties. I'd like to find the most efficient, Drupal-esque solution to storing all these plans and where they are available.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一般来说,避免存储任何重复数据非常有用,所以你是对的,创建 500k 行作为单个节点是一个坏主意。我宁愿创建两种内容类型(使用 CCK):
然后在它们之间创建多对多关系(使用 CCK 节点参考,如果需要,可能是相应的节点参考以实现相互关系)。
然后,您可以创建一个视图,其中列出附加到特定 Medicare 计划的所有 FIPS 县代码。
Generally it is very useful to avoid storing any duplicate data, so you are right, create 500k rows as individual nodes is a bad idea. I would rather create two content types (using CCK):
And then create a many-to-many relationship between them (using CCK Node Reference, maybe Corresponding node references for mutual relationships if needed).
You can then create a view that will list all FIPS County codes attached to a particular Medicare Plan.
我最终按照每个计划进行了一行 - 事实证明,我错过了它们之间的细微差别。感谢所有回答的人!
I ended up going with a row per plan - as it turned out, there were subtle differences between them that I missed. Thanks to all who answered!