我想将数据分类为搜索可以标记的内容。但是,我也希望一个标签与其他标签有关系,因此可以搜索相关标签
。我可能有一个用于生物图片元数据的数据库,因此我想将其标记为物种,甚至是动物的繁殖。例如德国牧羊犬的图像,我会将其标记为德语牧羊人
,
但是如果有人搜索 dog
canine ,它将包括此也在搜索结果中图片。因为我建立了一种关系,德语牧羊人
是 dog
dog 也是 canine
(和 canine canine
也是哺乳动物
和动物
等)
,我们可以看到它很复杂,因此我不知道为此设计的任何解决方案系统
I want to categorize data to search for content that can be tagged. However I want a tag to have relationship with other tag too and so it could be searched for related tags
For example. I might have a database for metadata of pictures of living things and so I would like to tag it by species or even breed of animal. Such as an image of a German Shepherd dog I would tag it as German Shepherd
But then if someone search for Dog
or Canine
it would include this picture in the search result too. Because I was make a relationship that German Shepherd
is Dog
and Dog
is also Canine
(and Canine
is also Mammal
and Animal
and so on)
As we can see it was complex set and subset so I don't know any solution that was designed for this system
发布评论
评论(2)
一种解决方案是将层次定向的语义图设计为关系表形式。
表格形式必须是图形的扩展形式,其中包含从一个术语到另一个术语的每个链接,该链接使用两个列中的图中有针对性路径:特定和通用。
例如,
从高级别到低水平的阶级搜索标签的所有必要特定术语,反之亦然。您不需要进行递归查询。
当然,您总是可以设计一张仅包含两个术语之间直接关系的表:
以扩展以查找图中的所有有向路径。该算法甚至可用于自动生成扩展的表。这适用于简单的层次结构,但是如果您在层次结构中有特殊情况,则扩展的表格和调整例外的形式会更好。
One solution is to devise a hierarchically directed semantic graph into a relational table form.
The tabular form would have to be an expanded form of the graph, containing every link from one term to another that has a directed path in the graph using two columns: specific and generic.
For example,
To go from high to low level hierarchy, a single query of generic term will get all the necessary specific terms to search the tags, and vice versa. You don't need to do recursive queries.
Of course you can always devise a table containing just the direct relationship between two terms:
Then use an algorithm that does recursive queries to expand to find all directed paths in the graph. The algorithm can even be used to automatically generate the expanded table. This works for simple hierarchy, but if you have special cases in the hierarchy, the expanded form with tweaked exceptions would work better.
这种搜索经常在设计用于文本搜索的数据库中进行,例如Elasticsearch等。
这些数据库使用的索引类型称为“倒置索引” ,它基本上是一个从每个搜索词到出现的所有文档的高度压缩映射。同时搜索许多术语是有效的。
通常,当您在这样的产品中搜索文本时,您的搜索词会浏览一个称为,找到每个单词的词根形式和替代形式,并将它们添加为搜索术语。例如,如果您搜索“太阳能”,则可以将“太阳”作为根形式添加。这几乎正是您想做的。
如果您有自己从标签到相关标签的映射,那么这些搜索索引/产品将使您仅通过将所有相关标签添加到查询中来完成所需的搜索。
This kind of search is done very often in databases designed for text searching, like ElasticSearch, etc.
The type of index that these databases use is called an "inverted index", which is basically a highly compressed mapping from each search term to all the documents in which it appears. It is efficient for searching for many terms simultaneously.
Typically, when you search for text in a product like this, your search terms go through a process called "stemming", which finds root forms and alternate forms of each word and adds them as search terms. If you search for "solar", for example, then "sun" may be added as a root form. This is pretty much exactly what you want to do.
If you have your own mapping from tags to related tags, then these kinds of search indexes/products will let you do the kind of search you want just by adding all the related tags to the query.