如何建立“标签” 支持使用CouchDB吗?
我使用以下视图函数来迭代数据库中的所有项目(以便找到标签),但我认为如果数据集很大,性能会很差。 还有其他方法吗?
def by_tag(tag):
return '''
function(doc) {
if (doc.tags.length > 0) {
for (var tag in doc.tags) {
if (doc.tags[tag] == "%s") {
emit(doc.published, doc)
}
}
}
};
''' % tag
I'm using the following view function to iterate over all items in the database (in order to find a tag), but I think the performance is very poor if the dataset is large.
Any other approach?
def by_tag(tag):
return '''
function(doc) {
if (doc.tags.length > 0) {
for (var tag in doc.tags) {
if (doc.tags[tag] == "%s") {
emit(doc.published, doc)
}
}
}
};
''' % tag
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
免责声明:我没有对此进行测试,也不知道它是否可以表现得更好。
创建单个 perm 视图:
并查询
_view/your_view/all?startkey=['your_tag_here']&endkey=['your_tag_here', {}]
生成的 JSON 结构会略有不同,但您仍然会得到发布日期排序。
Disclaimer: I didn't test this and don't know if it can perform better.
Create a single perm view:
And query with
_view/your_view/all?startkey=['your_tag_here']&endkey=['your_tag_here', {}]
Resulting JSON structure will be slightly different but you will still get the publish date sorting.
正如巴哈迪尔建议的那样,您可以定义一个永久视图。 但是,在进行此类索引时,不要输出每个键的文档。 相反,emit([tag, doc.published], null)。 在当前的发行版本中,您必须对每个文档进行单独的查找,但是 SVN trunk 现在支持在查询字符串中指定“include_docs=True”,并且 CouchDB 会自动将文档合并到您的视图中,而无需空间开销。
You can define a single permanent view, as Bahadir suggests. when doing this sort of indexing, though, don't output the doc for each key. Instead, emit([tag, doc.published], null). In current release versions you'd then have to do a separate lookup for each doc, but SVN trunk now has support for specifying "include_docs=True" in the query string and CouchDB will automatically merge the docs into your view for you, without the space overhead.
您的观点非常正确。 不过,有一个想法列表:
视图生成是增量的。 如果您的读取流量大于写入流量,那么您的视图根本不会引起问题。 对此感到担忧的人通常不应该如此。 参考框架,如果您将数百条记录转储到视图中而不进行更新,您应该担心。
发出整个文档会减慢速度。 您应该只发出使用视图所需的内容。
不确定 val == "%s" 的性能会如何,但你不应该想太多。 如果有标签数组,您应该发出标签。 当然,如果您期望标签数组包含非字符串,则忽略它。
You are very much on the right track with the view. A list of thoughts though:
View generation is incremental. If you're read traffic is greater than you're write traffic, then your views won't cause an issue at all. People that are concerned about this generally shouldn't be. Frame of reference, you should be worried if you're dumping hundreds of records into the view without an update.
Emitting an entire document will slow things down. You should only emit what is necessary for use of the view.
Not sure what the val == "%s" performance would be, but you shouldn't over think things. If there's a tag array you should emit the tags. Granted if you expect a tags array that will contain non-strings, then ignore this.
byTag 映射函数返回“key”中每个唯一标签的数据,然后返回
value
中带有该标签的每个帖子,因此当您获取 key =“mytag”时,它将检索带有该标签的所有帖子标签“mytag”。我已经针对大约 10 个条目对其进行了测试,每个查询似乎需要大约 0.0025 秒,不确定它对于大型数据集的效率如何。
The byTag map function returns the data with each unique tag in the "key", then each post with that tag in
value
, so when you grab key = "mytag", it will retrieve all posts with the tag "mytag".I've tested it against about 10 entries and it seems to take about 0.0025 seconds per query, not sure how efficient it is with large data sets..