核心数据模型设计ߞ搜索与关系？

发布于 08-18 08:42 字数 835 浏览 8 评论 0原文

我熟悉核心数据基础知识，也做过一些尝试，但还没有真正做过任何主要的应用程序。现在我需要制定一个计划。这个问题并不是专门关于 Core Data，而是更多关于一般的数据设计，尽管我将使用 Core Data 在 iPhone 上实现它，这对于考虑性能很重要。

想象一下我正在制作一个电子邮件应用程序，其中电子邮件是核心对象。我需要提供电子邮件存储的多个视图：按用户搜索以及许多其他条件：例如，“所有收件人超过两个的电子邮件”、“主题长于 X 的所有电子邮件”、“包含单词 X 的所有电子邮件” ”等。

有些对象，例如人（发件人/收件人），自然而然地适合被建模为一流对象，因此我可以这样做，并在人和电子邮件之间创建多对多关系。其他搜索（例如上面的一些示例）更加人工，并且没有自然的方法来对其进行建模。然而，我能够提前枚举新的搜索，即我事先知道标准是什么。

因此，要执行“收件人大于 2 的电子邮件”和“主题长于 X 的电子邮件”之类的操作，我认为我有两种策略：

1）将它们建模为特殊的“搜索”对象，并创建多对多将新对象插入存储时，电子邮件和搜索对象之间存在许多关系，因此搜索时这是一个简单的连接查询；

2) 不要对核心电子邮件对象之外的任何内容进行建模，而只是在运行时使用存储中的谓词进行搜索。

我的问题是：

根据您的核心数据直觉，从性能角度来看，这两种策略之间的差异有多大？我的直觉告诉我，#1 总是会更快，但如果是 10%，我愿意承受性能损失，以便更灵活地使用 #2。但如果#2 会慢 200%，我需要投入更多的工作来对搜索对象进行建模，并基本上预先生成所有搜索结果。

我知道确切的答案将取决于数据的具体情况，但您一定有一种直觉:) 假设有数万个但不是数百万个内容对象，并且每个记录都是几个段落具有多个元数据字段的内容文本。

原文

I'm familiar with Core Data basics and have done some dabbling, but have not really done any major apps. Now I need to plan for one. And the question is not specifically about Core Data, but more about data design in general, though I am going to use Core Data to implement it on iPhone which is important for considering performance.

Imagine I am making an email app, where emails are the core object. I need to provide multiple views into the email store: search by user as well as many other criteria: say, "all emails with more than two recipients", "all emails where subject is longer than X", "all emails containing word X" etc.

Some objects, like people (senders/recipients), lend themselves naturally to being modeled as first-class objects, so I could do that and just create many-to-many relations between people and emails. Other searches, such as some examples above, are more artificial and there is no natural way to model them. However, I am able to enumerate the new searches in advance, i.e I know beforehand what will be the criteria.

So, to do things like "emails with >2 recipients" and "emails where subject is longer than X", I think I have two strategies:

1) model these as a special "search" object, and create many-to-many relations between emails and search objects when inserting new objects into store so it is a simple join query when searching;

2) not model anything beyond the core email object and just do searches with predicates from the store at runtime.

My question is:

based on your Core Data instincts, how big is the difference between these two strategies from a performance perspective? My gut tells me #1 will always be faster, but if it is 10%, I am willing to take the performance hit in order to be more flexible with #2. But if #2 will be 200% slower, I need to put more work into modeling the search object and essentially pre-generating all the search results.

I know the exact answer will depend on specifics of data, but there must be a gut feeling you have :) Let's say there are on the order of tens of thousands, but not millions, of content objects, and each record is a few paragraphs of content text with several fields of metadata.

分享到QQ

分享到微博