MLS 数据库架构
我正在尝试构建一个应用程序,该应用程序将从 CSV 中获取房地产 MLS(多重列表服务)数据并将其插入数据库中。我已经处理了 CSV 解析,但在提高数据库效率方面遇到了困难。问题在于,众所周知,MLS 数据提供者会在没有太多通知的情况下快速更改属性元素的格式。因此,拥有一个与数据一对一匹配的表可能会导致将来加载数据时出现问题。
似乎大多数开发人员都将每个元素放在一行中。 IE 我当前的设置:
id = int
property_id = longint
element_key = char
element_value = text
正如您可以想象的那样,这非常慢,有 1000 个属性,每个属性大约有 80 多个元素。
如何提高效率并保持数据库的灵活性?
是的,我了解内存缓存并计划使用它。
I'm trying to build an app that will take a real estate MLS(Multiple Listing Service) data from a CSV and insert it into the database. I have the CSV parsing taken care of but I'm having trouble making the database efficient. The problem is that the MLS data providers are known to change the format of the property elements quickly without much notice. So having one table that would match-up 1to1 with the data would possibly cause issues with loading data in the future.
It seems most developer put each element in a single row. IE my current setup:
id = int
property_id = longint
element_key = char
element_value = text
As you can imagine this is very slow with 1000s of properties with about 80+ elements each.
How can I make this more efficient but keep the database flexible?
And yes I know about memcache and plan on using it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
除非有办法控制数据提供者,否则您将受到数据提供者的摆布。大约五十年来,这一直是数据库工作的祸根,并且不太可能很快改变。 CSV 的使用与根本问题关系不大。
我怀疑不仅是数据的格式发生了变化,而且数据的语义也发生了变化,即使您没有这么说。
最好的选择是拥有一个或多个临时表,这些表将按照您收到的格式记录 CSV 数据。每当提供商对您进行更改时,请准备好更改这些表。然后编写一些过程,将该数据转换为适合基表的形式,并将转换后的数据复制到基表。这些过程需要定期维护,但只要您不必添加更多信息存储功能来匹配供应商提供的更改,您的基表就会保持更加稳定。
如果您的数据库必须完全动态才能跟上不断变化的输入,那么您就会陷入像 EAV 这样的模型,它完全不了解数据的逻辑结构。这种无知使得 EAV 可能变得非常动态,但当您尝试将 EAV 数据转换为有意义的信息时,它会给您带来严重破坏。
You are at the mercy of the data providers, unless there's a way to bring them under control. This has been the bane of database work for about fifty years now, and it isn't likely to change any time soon. The use of CSV has little to do with the underlying problem.
I suspect that it isn't just the format of the data that changes, but also the semantics of the data, even though you didn't say so.
Your best bet is to have one or more staging tables that will record the CSV data pretty much in the format you receive it. Be prepared to change these tables whenever the providers change things on you. Then write some procedures that transform this data into a suitable form for your base tables, and copy the transformed data to the base tables. These procedures will need periodic maintenance, but your base tables will remain more stable, as long as you don't have to add more information storing capabilities in order to match the changes offered by the suppliers.
If your database has to be completely dynamic in order to keep up with the changing inputs, then your stuck with some model like EAV, which is completely ignorant of the logical structure of the data. That ignorance makes it possible for EAV to be very dynamic, but it will create havoc for you when you go to try and turn the EAV data into meaningful information.
这实际上取决于您想对数据做什么。文档式数据库加上全文索引器可能对您来说就足够了(实际上,只是内存缓存的持久形式)。然后,您只需将所有项目数据存储在单行/文档中,并在需要时将其解压。
也许这里的一些内容可能有用。
It really depends what you want to do with the data. A document-style database plus a full-text indexer may be enough for you (really, just a persistent form of memcache). You would just store all the item data in a single row/document then, and unpack it when you need it.
Perhaps some of the stuff here may be useful.