如何在`texplot_xray（）`中获取单列答案？

发布于 2025-02-05 05:08:29 字数 955 浏览 3 评论 0原文

我想对多语言平行语料库进行扫描，以评估可能的等效性。为此，我需要texplot_xray（）才能在单列中返回多个答案。

在第一次搜索中，拉丁语的单词在英语，意大利语和西班牙语中同样使用，似乎可以解释一定程度的等价性，而法语 human =＆gt并非如此； l'Homme 。

# require(quanteda)
# require(quanteda.corpora)
# require(quanteda.texplots)
corpusa <- data_corpus_udhr[c('ita', 'eng', 'eus', 'spa', 'fra')]
quanteda.textplots::textplot_xray(kwic(x = corpusa, pattern = '*uman*'))

搜索的结果是四种语言（五个，一个否结果）

更仔细地搜索时，我希望总结一个相关列中的等效物。

bilaketa <- c('umani', 'human', 'giza', 'humanos', "l'homme")
quanteda.textplots::textplot_xray(kwic(corpusa, pattern = phrase(bilaketa)))

结果可还原为单个相关列

有没有办法解决此类查询？

原文

I want to do a scan of a multilingual parallel corpus to evaluate possible equivalences. For that I need texplot_xray() to return multiple answers in a single column.

In the first search, where the word of Latin origin is used equally in English, Italian and Spanish, some degree of equivalence seems to be interpreted, which is not the case for French human => l'homme.

# require(quanteda)
# require(quanteda.corpora)
# require(quanteda.texplots)
corpusa <- data_corpus_udhr[c('ita', 'eng', 'eus', 'spa', 'fra')]
quanteda.textplots::textplot_xray(kwic(x = corpusa, pattern = '*uman*'))

Results of a search in four languages (five, one no result)

When searching more closely, I would like to summarise the equivalents in the one relevant column.

bilaketa <- c('umani', 'human', 'giza', 'humanos', "l'homme")
quanteda.textplots::textplot_xray(kwic(corpusa, pattern = phrase(bilaketa)))

Results reducible to a single relevant column

Is there a way to resolve such queries?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无所的.畏惧 2025-02-12 05:08:29

您可以将字典用作kwic（）中的模式，尽管您将将字典键作为列总计而不是单个（模式）值，就像五列一样。

library("quanteda")
## Package version: 3.2.1
## Unicode version: 14.0
## ICU version: 70.1
## Parallel computing: 8 of 8 threads used.
## See https://quanteda.io for tutorials and examples.
library("quanteda.textplots")

data(data_corpus_udhr, package = "quanteda.corpora")
corpusa <- data_corpus_udhr[c("ita", "eng", "eus", "spa", "fra")]

bilaketa <- c("umani", "human", "giza", "humanos", "l'homme")

corpusa %>%
  tokens() %>%
  kwic(pattern = dictionary(list(human = bilaketa))) %>%
  textplot_xray()

You can use a dictionary as a pattern in the kwic(), although you will get the dictionary key as the column total rather than the individual (pattern) value, as in the case with the five columns.

library("quanteda")
## Package version: 3.2.1
## Unicode version: 14.0
## ICU version: 70.1
## Parallel computing: 8 of 8 threads used.
## See https://quanteda.io for tutorials and examples.
library("quanteda.textplots")

data(data_corpus_udhr, package = "quanteda.corpora")
corpusa <- data_corpus_udhr[c("ita", "eng", "eus", "spa", "fra")]

bilaketa <- c("umani", "human", "giza", "humanos", "l'homme")

corpusa %>%
  tokens() %>%
  kwic(pattern = dictionary(list(human = bilaketa))) %>%
  textplot_xray()