存储单词关联的数据结构

发布于 2024-12-15 05:14:45 字数 627 浏览 9 评论 0原文

我试图通过分析句子来实现预测。考虑以下[相当无聊]的句子

Call ABC
Call ABC again
Call DEF

我希望上述句子的数据结构如下：

Call: (ABC, 2), (again, 1), (DEF, 1)
ABC: (Call, 2), (again, 1)
again: (Call, 1), (ABC, 1)
DEF: (Call, 1)

一般来说，Word：(Word_it_appears_with，Frequency)，...

请注意固有的此类数据中存在冗余。显然，如果Call下ABC的频率为2，则ABC下Call的频率为2。我该如何优化这个？

这个想法是在输入新句子时使用这些数据。例如，如果已输入 Call，从数据来看，很容易判断 ABC 更有可能出现在句子中，并将其作为第一个建议提供，接下来是再次和DEF。

我意识到这是实现预测的一百万种可能方法之一，并且我热切期待其他方法的建议。

谢谢

原文

I'm trying to implement prediction by analyzing sentences. Consider the following [rather boring] sentences

Call ABC
Call ABC again
Call DEF

I'd like to have a data structure for the above sentences as follows:

Call: (ABC, 2), (again, 1), (DEF, 1)
ABC: (Call, 2), (again, 1)
again: (Call, 1), (ABC, 1)
DEF: (Call, 1)

In general, Word: (Word_it_appears_with, Frequency), ....

Please note the inherent redundancy in this type of data. Obviously, if the frequency of ABC is 2 under Call, the frequency of Call is 2 under ABC. How do I optimize this?

The idea is to use this data when a new sentence is being typed. For example, if Call has been typed, from the data, it's easy to say ABC is more likely to be present in the sentence, and offer it as the first suggestion, followed by again and DEF.

I realise this is one of a million possible ways of implementing prediction, and I eagerly look forward to suggestions of other ways to do it.

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅唱ヾ落雨殇 2024-12-22 05:14:45

也许使用双向图。您可以将单词存储为节点，将边存储为频率。

回复收藏 0 原文

悲喜皆因你 2024-12-22 05:14:45

您也可以使用以下数据结构：

Map<String, Map<String, Long>>

You can use the following data structure too:

Map<String, Map<String, Long>>

回复收藏 0 原文

抱着落日 2024-12-22 05:14:45

我会考虑以下两个选项之一：

选项 1：

class Freq {
    String otherWord;
    int freq;
}

Multimap<String, Freq> mymap;

或者可能是一个表

Table<String, String, int>

给定上述频率：您可能想要进行双向映射：

class Freq{
    String thisWord;
    int otherFreq;
    Freq otherWord;
}

这将允许非常快速地更新数据对。

I would consider one of two options:

Option 1:

class Freq {
    String otherWord;
    int freq;
}

Multimap<String, Freq> mymap;

or maybe a Table

Table<String, String, int>

Given the above Freq: you might want to do bi-directional mapping:

class Freq{
    String thisWord;
    int otherFreq;
    Freq otherWord;
}

This would allow for very quick updating of data pairs.

回复收藏 0 原文

~没有更多了~

关于作者

自在安然

暂无简介

文章

29 人气

关注发私信

友情链接

文江博客

存储单词关联的数据结构

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

存储单词关联的数据结构

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。