存储单词关联的数据结构

发布于 2024-12-15 05:14:45 字数 627 浏览 5 评论 0原文

我试图通过分析句子来实现预测。考虑以下[相当无聊]的句子

Call ABC
Call ABC again
Call DEF

我希望上述句子的数据结构如下:

Call: (ABC, 2), (again, 1), (DEF, 1)
ABC: (Call, 2), (again, 1)
again: (Call, 1), (ABC, 1)
DEF: (Call, 1)

一般来说,Word:(Word_it_appears_with,Frequency),...

请注意固有的此类数据中存在冗余。显然,如果CallABC的频率为2,则ABCCall的频率为2。我该如何优化这个?

这个想法是在输入新句子时使用这些数据。例如,如果已输入 Call,从数据来看,很容易判断 ABC 更有可能出现在句子中,并将其作为第一个建议提供,接下来是再次和DEF

我意识到这是实现预测的一百万种可能方法之一,并且我热切期待其他方法的建议。

谢谢

I'm trying to implement prediction by analyzing sentences. Consider the following [rather boring] sentences

Call ABC
Call ABC again
Call DEF

I'd like to have a data structure for the above sentences as follows:

Call: (ABC, 2), (again, 1), (DEF, 1)
ABC: (Call, 2), (again, 1)
again: (Call, 1), (ABC, 1)
DEF: (Call, 1)

In general, Word: (Word_it_appears_with, Frequency), ....

Please note the inherent redundancy in this type of data. Obviously, if the frequency of ABC is 2 under Call, the frequency of Call is 2 under ABC. How do I optimize this?

The idea is to use this data when a new sentence is being typed. For example, if Call has been typed, from the data, it's easy to say ABC is more likely to be present in the sentence, and offer it as the first suggestion, followed by again and DEF.

I realise this is one of a million possible ways of implementing prediction, and I eagerly look forward to suggestions of other ways to do it.

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

浅唱ヾ落雨殇 2024-12-22 05:14:45

也许使用双向图。您可以将单词存储为节点,将边存储为频率。

Maybe using a bidirectional graph. You can store the words as nodes, with edges as frequencies.

悲喜皆因你 2024-12-22 05:14:45

您也可以使用以下数据结构:

Map<String, Map<String, Long>>

You can use the following data structure too:

Map<String, Map<String, Long>>
抱着落日 2024-12-22 05:14:45

我会考虑以下两个选项之一:

选项 1:

class Freq {
    String otherWord;
    int freq;
}

Multimap<String, Freq> mymap;

或者可能是一个表

Table<String, String, int>

给定上述频率:您可能想要进行双向映射:

class Freq{
    String thisWord;
    int otherFreq;
    Freq otherWord;
}

这将允许非常快速地更新数据对。

I would consider one of two options:

Option 1:

class Freq {
    String otherWord;
    int freq;
}

Multimap<String, Freq> mymap;

or maybe a Table

Table<String, String, int>

Given the above Freq: you might want to do bi-directional mapping:

class Freq{
    String thisWord;
    int otherFreq;
    Freq otherWord;
}

This would allow for very quick updating of data pairs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文