如何在`texplot_xray()`中获取单列答案?
我想对多语言平行语料库进行扫描,以评估可能的等效性。为此,我需要texplot_xray()
才能在单列中返回多个答案。
在第一次搜索中,拉丁语的单词在英语,意大利语和西班牙语中同样使用,似乎可以解释一定程度的等价性,而法语 human =&gt并非如此; l'Homme 。
# require(quanteda)
# require(quanteda.corpora)
# require(quanteda.texplots)
corpusa <- data_corpus_udhr[c('ita', 'eng', 'eus', 'spa', 'fra')]
quanteda.textplots::textplot_xray(kwic(x = corpusa, pattern = '*uman*'))
更仔细地搜索时,我希望总结一个相关列中的等效物。
bilaketa <- c('umani', 'human', 'giza', 'humanos', "l'homme")
quanteda.textplots::textplot_xray(kwic(corpusa, pattern = phrase(bilaketa)))
有没有办法解决此类查询?
I want to do a scan of a multilingual parallel corpus to evaluate possible equivalences. For that I need texplot_xray()
to return multiple answers in a single column.
In the first search, where the word of Latin origin is used equally in English, Italian and Spanish, some degree of equivalence seems to be interpreted, which is not the case for French human => l'homme.
# require(quanteda)
# require(quanteda.corpora)
# require(quanteda.texplots)
corpusa <- data_corpus_udhr[c('ita', 'eng', 'eus', 'spa', 'fra')]
quanteda.textplots::textplot_xray(kwic(x = corpusa, pattern = '*uman*'))
Results of a search in four languages (five, one no result)
When searching more closely, I would like to summarise the equivalents in the one relevant column.
bilaketa <- c('umani', 'human', 'giza', 'humanos', "l'homme")
quanteda.textplots::textplot_xray(kwic(corpusa, pattern = phrase(bilaketa)))
Results reducible to a single relevant column
Is there a way to resolve such queries?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以将字典用作
kwic()
中的模式,尽管您将将字典键作为列总计而不是单个(模式)值,就像五列一样。You can use a dictionary as a pattern in the
kwic()
, although you will get the dictionary key as the column total rather than the individual (pattern) value, as in the case with the five columns.