存储单词关联的数据结构
我试图通过分析句子来实现预测。考虑以下[相当无聊]的句子
Call ABC
Call ABC again
Call DEF
我希望上述句子的数据结构如下:
Call: (ABC, 2), (again, 1), (DEF, 1)
ABC: (Call, 2), (again, 1)
again: (Call, 1), (ABC, 1)
DEF: (Call, 1)
一般来说,Word:(Word_it_appears_with,Frequency),...
请注意固有的此类数据中存在冗余。显然,如果Call
下ABC
的频率为2,则ABC
下Call
的频率为2。我该如何优化这个?
这个想法是在输入新句子时使用这些数据。例如,如果已输入 Call
,从数据来看,很容易判断 ABC
更有可能出现在句子中,并将其作为第一个建议提供,接下来是再次和DEF
。
我意识到这是实现预测的一百万种可能方法之一,并且我热切期待其他方法的建议。
谢谢
I'm trying to implement prediction by analyzing sentences. Consider the following [rather boring] sentences
Call ABC
Call ABC again
Call DEF
I'd like to have a data structure for the above sentences as follows:
Call: (ABC, 2), (again, 1), (DEF, 1)
ABC: (Call, 2), (again, 1)
again: (Call, 1), (ABC, 1)
DEF: (Call, 1)
In general, Word: (Word_it_appears_with, Frequency), ....
Please note the inherent redundancy in this type of data. Obviously, if the frequency of ABC
is 2 under Call
, the frequency of Call
is 2 under ABC
. How do I optimize this?
The idea is to use this data when a new sentence is being typed. For example, if Call
has been typed, from the data, it's easy to say ABC
is more likely to be present in the sentence, and offer it as the first suggestion, followed by again and DEF
.
I realise this is one of a million possible ways of implementing prediction, and I eagerly look forward to suggestions of other ways to do it.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
也许使用双向图。您可以将单词存储为节点,将边存储为频率。
Maybe using a bidirectional graph. You can store the words as nodes, with edges as frequencies.
您也可以使用以下数据结构:
You can use the following data structure too:
我会考虑以下两个选项之一:
选项 1:
或者可能是一个表
给定上述频率:您可能想要进行双向映射:
这将允许非常快速地更新数据对。
I would consider one of two options:
Option 1:
or maybe a Table
Given the above Freq: you might want to do bi-directional mapping:
This would allow for very quick updating of data pairs.