使用 synonym() 从 wordnet 中提取同义词

发布于 2024-12-06 10:34:09 字数 1192 浏览 4 评论 0原文

假设我通过 synonym() 函数从 wordnet 中提取“help”的同义词,并得到以下结果:

Str = synonyms("help")    
Str
[1] "c(\"aid\", \"assist\", \"assistance\", \"help\")"     
[2] "c(\"aid\", \"assistance\", \"help\")"                 
[3] "c(\"assistant\", \"helper\", \"help\", \"supporter\")"
[4] "c(\"avail\", \"help\", \"service\")"  

然后我可以在最后使用一个字符串

unique(unlist(lapply(parse(text=Str),eval)))

,如下所示:

[1] "aid"        "assist"     "assistance" "help"       "assistant"  "helper"     "supporter" 
[8] "avail"      "service"

上述过程是由 Gabor Grothendieck 建议的。他/她的解决方案很好,但我仍然无法弄清楚,如果我将查询词更改为“公司”、“男孩”或其他人,则会响应错误消息。

一个可能的原因可能是“公司”的“第六个”同义词(请参见下文)是一个单独的术语,并且不遵循“c(\”公司\“)”的格式。

synonyms("company")

[1] "c(\"caller\", \"company\")"                                    
[2] "c(\"company\", \"companionship\", \"fellowship\", \"society\")"
[3] "c(\"company\", \"troupe\")"                                    
[4] "c(\"party\", \"company\")"                                     
[5] "c(\"ship's company\", \"company\")"                            
[6] "company"

有人可以帮我解决这个问题吗? 非常感谢。

Supposed I am pulling the synonyms of "help" by the function of synonyms() from wordnet and get the followings:

Str = synonyms("help")    
Str
[1] "c(\"aid\", \"assist\", \"assistance\", \"help\")"     
[2] "c(\"aid\", \"assistance\", \"help\")"                 
[3] "c(\"assistant\", \"helper\", \"help\", \"supporter\")"
[4] "c(\"avail\", \"help\", \"service\")"  

Then I can get a one character string using

unique(unlist(lapply(parse(text=Str),eval)))

at the end that looks like this:

[1] "aid"        "assist"     "assistance" "help"       "assistant"  "helper"     "supporter" 
[8] "avail"      "service"

The above process was suggested by Gabor Grothendieck. His/Her solution is good, but I still couldn't figure out that if I change the query term into "company", "boy", or someone else, an error message will be responsed.

One possible reason maybe due to the "sixth" synonym of "company" (please see below) is a single term and does not follow the format of "c(\"company\")".

synonyms("company")

[1] "c(\"caller\", \"company\")"                                    
[2] "c(\"company\", \"companionship\", \"fellowship\", \"society\")"
[3] "c(\"company\", \"troupe\")"                                    
[4] "c(\"party\", \"company\")"                                     
[5] "c(\"ship's company\", \"company\")"                            
[6] "company"

Could someone kindly help me to solve this problem.
Many thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

情场扛把子 2024-12-13 10:34:09

您可以通过创建一个小辅助函数来解决这个问题,该函数使用 R 的 try 机制来捕获错误。在这种情况下,如果eval产生错误,则返回原始字符串,否则返回eval的结果:

创建一个辅助函数:

evalOrValue <- function(expr, ...){
  z <- try(eval(expr, ...), TRUE)
  if(inherits(z, "try-error")) as.character(expr) else unlist(z)
}

unique(unlist(sapply(parse(text=Str), evalOrValue)))

产生:

[1] "caller"         "company"        "companionship" 
[4] "fellowship"     "society"        "troupe"        
[7] "party"          "ship's company"

我重现了您的数据并然后使用 dput 在这里重现它:

Str <- c("c(\"caller\", \"company\")", "c(\"company\", \"companionship\", \"fellowship\", \"society\")", 
"c(\"company\", \"troupe\")", "c(\"party\", \"company\")", "c(\"ship's company\", \"company\")", 
"company")

You can solve this by creating a little helper function that uses R's try mechanism to catch errors. In this case, if the eval produces an error, then return the original string, else return the result of eval:

Create a helper function:

evalOrValue <- function(expr, ...){
  z <- try(eval(expr, ...), TRUE)
  if(inherits(z, "try-error")) as.character(expr) else unlist(z)
}

unique(unlist(sapply(parse(text=Str), evalOrValue)))

Produces:

[1] "caller"         "company"        "companionship" 
[4] "fellowship"     "society"        "troupe"        
[7] "party"          "ship's company"

I reproduced your data and then used dput to reproduce it here:

Str <- c("c(\"caller\", \"company\")", "c(\"company\", \"companionship\", \"fellowship\", \"society\")", 
"c(\"company\", \"troupe\")", "c(\"party\", \"company\")", "c(\"ship's company\", \"company\")", 
"company")
白鸥掠海 2024-12-13 10:34:09

这些同义词的形式看起来像表达式,因此您应该能够按照所示方式解析它们。但是:当我执行上面的原始代码时,我从同义词调用中收到错误,因为您没有包含词性参数。

> synonyms("help")
Error in charmatch(x, WN_synset_types) : 
  argument "pos" is missing, with no default

请观察 synonyms 的代码使用 getSynonyms 并且它的代码有一个 unique 包裹着它,所以您正在做的所有预处理都是不再需要(如果您更新);:

> synonyms("company", "NOUN")
[1] "caller"         "companionship"  "company"       
[4] "fellowship"     "party"          "ship's company"
[7] "society"        "troupe"        
> synonyms
function (word, pos) 
{
    filter <- getTermFilter("ExactMatchFilter", word, TRUE)
    terms <- getIndexTerms(pos, 1L, filter)
    if (is.null(terms)) 
        character()
    else getSynonyms(terms[[1L]])
}
<environment: namespace:wordnet>

> getSynonyms
function (indexterm) 
{
    synsets <- .jcall(indexterm, "[Lcom/nexagis/jawbone/Synset;", 
        "getSynsets")
    sort(unique(unlist(lapply(synsets, getWord))))
}
<environment: namespace:wordnet>

Those synonyms are in a form that looks like an expression, so you should be able to parse them as you illustrated. BUT: When I execute your original code above I get an error from the synonyms call because you included no part-of-speech argument.

> synonyms("help")
Error in charmatch(x, WN_synset_types) : 
  argument "pos" is missing, with no default

Observe that the code of synonyms uses getSynonyms and that its code has a unique wrapped around it so all of the pre-processing you are doing is no longer needed (if you update);:

> synonyms("company", "NOUN")
[1] "caller"         "companionship"  "company"       
[4] "fellowship"     "party"          "ship's company"
[7] "society"        "troupe"        
> synonyms
function (word, pos) 
{
    filter <- getTermFilter("ExactMatchFilter", word, TRUE)
    terms <- getIndexTerms(pos, 1L, filter)
    if (is.null(terms)) 
        character()
    else getSynonyms(terms[[1L]])
}
<environment: namespace:wordnet>

> getSynonyms
function (indexterm) 
{
    synsets <- .jcall(indexterm, "[Lcom/nexagis/jawbone/Synset;", 
        "getSynsets")
    sort(unique(unlist(lapply(synsets, getWord))))
}
<environment: namespace:wordnet>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文