Solr 支持的标签云

发布于 2024-11-03 04:13:50 字数 849 浏览 1 评论 0原文

我似乎陷入了 Solr 分面支持的标签云的逻辑背后。首先,我使用 OpenNLP 解析我的文档并从中获取相关单词,因此每个文档都被分成 n 个单词。 这基本上就是我的 Solr 响应的样子:

<docID>
<title>My Doc Title</title>
<content>My Doc Title</content>
<date_published>My Doc Title</date_published>
</docID>

我相信一定有一种方法可以将这些单词整合到这里。我首先想到的是这样的:

<docID>
<title>My Doc Title</title>
<content>My Doc Title</content>
<date_published>My Doc Title</date_published>
<words>word</words>
<words1>word1</words1>
<words2>word2</words2>
<words3>word3</words3>
<wordsN>wordN</wordsN>
</docID>

但是分面是不可能的,因为我不知道每个 docID 会得到多少个单词字段,那么分面必须跨字段完成(我什至不确定)这是可能的)。我正在尝试寻找可能的答案,但我似乎陷入了困境......最后,我需要对 n 个单词进行分面,以获取索引中的每个文档。非常感谢您的想法。

I seem to be stuck behind the logic of a Solr faceting-powered tag cloud. First of all, I'm using OpenNLP to parse my docs and obtain relevant words out of it, so every single document gets split into n number of words.
And here's basically what my Solr response looks like:

<docID>
<title>My Doc Title</title>
<content>My Doc Title</content>
<date_published>My Doc Title</date_published>
</docID>

I believe there must be a way to integrate the words in here. I first thought of something like this:

<docID>
<title>My Doc Title</title>
<content>My Doc Title</content>
<date_published>My Doc Title</date_published>
<words>word</words>
<words1>word1</words1>
<words2>word2</words2>
<words3>word3</words3>
<wordsN>wordN</wordsN>
</docID>

But the faceting wouldn't be possible, as i have no idea how many words fields i would get per docID, then the faceting would have to be done across fields (which i;m not even sure it;s possible). I am trying to look into possible answers but I seem to be stuck... at the end, i need to make a faceting of n words that would get each single doc I have in my index. Thoughts would highly be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

So尛奶瓶 2024-11-10 04:13:50

我建议使用一个多值的单词字段并存储每个文档的单词列表。

拥有无限数量的 word\d+ 字段将使事情变得复杂。

如果您使用单个单词多值字段,您可以获得所有单词及其频率,这足以创建标签云。

I would suggest using a single words field that is multivalued and stores the list of words per document.

having unbound number of word\d+ fields will complicate things.

if you use a single words multivalued field you can get all the words along with their frequencies which should be enough for creating the tag cloud.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文