核心数据模型设计ߞ搜索与关系?
我熟悉核心数据基础知识,也做过一些尝试,但还没有真正做过任何主要的应用程序。现在我需要制定一个计划。这个问题并不是专门关于 Core Data,而是更多关于一般的数据设计,尽管我将使用 Core Data 在 iPhone 上实现它,这对于考虑性能很重要。
想象一下我正在制作一个电子邮件应用程序,其中电子邮件是核心对象。我需要提供电子邮件存储的多个视图:按用户搜索以及许多其他条件:例如,“所有收件人超过两个的电子邮件”、“主题长于 X 的所有电子邮件”、“包含单词 X 的所有电子邮件” ”等。
有些对象,例如人(发件人/收件人),自然而然地适合被建模为一流对象,因此我可以这样做,并在人和电子邮件之间创建多对多关系。其他搜索(例如上面的一些示例)更加人工,并且没有自然的方法来对其进行建模。然而,我能够提前枚举新的搜索,即我事先知道标准是什么。
因此,要执行“收件人大于 2 的电子邮件”和“主题长于 X 的电子邮件”之类的操作,我认为我有两种策略:
1)将它们建模为特殊的“搜索”对象,并创建多对多将新对象插入存储时,电子邮件和搜索对象之间存在许多关系,因此搜索时这是一个简单的连接查询;
2) 不要对核心电子邮件对象之外的任何内容进行建模,而只是在运行时使用存储中的谓词进行搜索。
我的问题是:
根据您的核心数据直觉,从性能角度来看,这两种策略之间的差异有多大?我的直觉告诉我,#1 总是会更快,但如果是 10%,我愿意承受性能损失,以便更灵活地使用 #2。但如果#2 会慢 200%,我需要投入更多的工作来对搜索对象进行建模,并基本上预先生成所有搜索结果。
我知道确切的答案将取决于数据的具体情况,但您一定有一种直觉:) 假设有数万个但不是数百万个内容对象,并且每个记录都是几个段落具有多个元数据字段的内容文本。
I'm familiar with Core Data basics and have done some dabbling, but have not really done any major apps. Now I need to plan for one. And the question is not specifically about Core Data, but more about data design in general, though I am going to use Core Data to implement it on iPhone which is important for considering performance.
Imagine I am making an email app, where emails are the core object. I need to provide multiple views into the email store: search by user as well as many other criteria: say, "all emails with more than two recipients", "all emails where subject is longer than X", "all emails containing word X" etc.
Some objects, like people (senders/recipients), lend themselves naturally to being modeled as first-class objects, so I could do that and just create many-to-many relations between people and emails. Other searches, such as some examples above, are more artificial and there is no natural way to model them. However, I am able to enumerate the new searches in advance, i.e I know beforehand what will be the criteria.
So, to do things like "emails with >2 recipients" and "emails where subject is longer than X", I think I have two strategies:
1) model these as a special "search" object, and create many-to-many relations between emails and search objects when inserting new objects into store so it is a simple join query when searching;
2) not model anything beyond the core email object and just do searches with predicates from the store at runtime.
My question is:
based on your Core Data instincts, how big is the difference between these two strategies from a performance perspective? My gut tells me #1 will always be faster, but if it is 10%, I am willing to take the performance hit in order to be more flexible with #2. But if #2 will be 200% slower, I need to put more work into modeling the search object and essentially pre-generating all the search results.
I know the exact answer will depend on specifics of data, but there must be a gut feeling you have :) Let's say there are on the order of tens of thousands, but not millions, of content objects, and each record is a few paragraphs of content text with several fields of metadata.
通常,我建议采用策略二,并且仅在测试期间实际遇到性能问题时才花时间研究和开发其他技术。 Core Data 通常比人们想象的要快,尤其是在 iPhone 上。
但是,如果您能够提前确定所有可能的搜索,这确实会给您带来优势。听起来就像创建电子邮件后,您将检查它并将其添加到所有适当的“搜索”对象中。我的直觉是,策略一会明显更快,尤其是在处理数万个电子邮件对象时。
Typically, I would recommend going with strategy two and only spend time researching and developing other techniques if you actually run into performance issues during testing. Core Data is often faster than people think especially on the iPhone.
However, if you are able to determine all the possible searches ahead of time, that does give you an advantage. It sounds like as an email is created, you would check it and add it to all the appropriate "search" objects. My gut feeling is that strategy one would be significantly faster especially at tens of thousands of email objects.