Bing/Google/Flickr API:如何找到与 150,000 个日语句子中的每个句子对应的图像?

发布于 2024-11-05 14:18:38 字数 370 浏览 0 评论 0原文

我正在做词性和词性日语句子的形态分析项目。每个句子都有自己的网页。为了使这个页面更加直观,我想展示一张与这句话有某种关系的图片。例如,对于句子“私は学生です”(“我是学生”),相关图片将是学校、日语课本、学生等的图片。我有:每个词的词性标记单词。我现在的方法是:在每个句子中使用 2-3 个名词,并使用 Bing Images API 从搜索结果中检索第一张图像。注意:到目前为止所有的句子处理都是用 Java 完成的。


不过有几个问题: 1)对于日语中的名词搜索,Google Images API、Bing Images API、Flickr API 等哪个更好(更丰富的语料库和强大的搜索)? 2)如何从句子中选择最重要的名词在图像搜索引擎中进行查询,而不需要进行复杂的主题建模等? 谢谢!

I'm doing part-of-speech & morphological analysis project for Japanese sentences. Each sentence will have its own webpage. To make this page more visual, I want to show one picture which is somehow related to the sentence. For example, For the sentence "私は学生です" ("I'm a student"), the relevant pictures would be pictures of school, Japanese textbook, students, etc. What I have: part-of-speech tagging for every word. My approach now: use 2-3 nouns from every sentence and retrieve the first image from search results using Bing Images API. Note: all the sentence processing up to this point was done in Java.


Have a couple of questions though:
1) what is better (richer corpus & powerful search), Google Images API, Bing Images API, Flickr API, etc. for searching nouns in Japanese?
2) how do you select the most important noun from the sentence to do the query in Image Search Engine without doing complicated topic modeling, etc.?
Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

月野兔 2024-11-12 14:18:38

日语 WordNet 具有指向 OpenClipart 图片。这可能是另一个相关来源。他们在名为“Enhancing the Japanese WordNet”的论文中对此进行了描述。

Japanese WordNet has links to OpenClipart pictures. That could be another relevant source. They describe it in their paper called "Enhancing the Japanese WordNet".

看海 2024-11-12 14:18:38

我以为你会首先在“は”、“が”和“を”之前选择任何名词,并给予这些优先级——可能按照这个顺序。

但这假设您的词性标记足以正确识别 は=subject (因为我猜您知道 は 并不总是主语标记)。

我用这种技术查看了一堆示例句子,发现它很好正如预期的那样。除非没有使用这些,这是很少见的。

而像这样的句子,在没有 を 或 は 的情况下,你必须考虑寻找 で 和它前面的名词。因为如果你注意到这里,“人”这个词实际上并没有告诉你任何关于所说内容的信息。如果没有正确解析上下文,您甚至不知道名词是还是

毎年交通事故で 多くの人が死にます
(每年都有很多人死于交通事故)

但是基本上,你不能实现这样的优先/后备类型系统吗?

顺便说一句,我希望你的句子都使用汉字,否则当你看到はし(在链接到的句子之一中)时,你将不知道是否要显示桥或筷子 - 并且显示错误的可能会不好。

I thought you would start by choosing any noun before は、が and を and giving these priority - probably in that order.

But that assumes that your part-of-speech tagging is good enough to get は=subject identified properly (as I guess you know that は is not always the subject marker).

I looked at a bunch of sample sentences here with this technique in mind and found it as good as could be expected. Except where none of those are used, which is rarish.

And sentences like this one, where you'd have to consider maybe looking for で and a noun before it in the case where there is no を or は. Because if you notice here, the word 人 (people) really doesn't tell you anything about what's being said. Without parsing context properly, you don't even know if the noun is person or people.

毎年 交通事故で 多くの人が 死にます
(many people die in traffic accidents every year)

But basically, couldn't you implement a priority/fallback type system like this?

BTW I hope your sentences all use kanji, or when you see はし (in one of the sentences linked to) you won't know whether to show a bridge or chopsticks - and showing the wrong one will probably not be good.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文