在Python中检索一个术语的第一个城市词典结果

发布于 2025-01-04 23:34:43 字数 1427 浏览 0 评论 0原文

我编写了一个非常简单的代码来获取urbandictionary.com 上任何术语的第一个结果。我首先写了一个简单的东西来看看他们的代码是如何格式化的。

def parseudtest(searchurl):    
    url = 'http://www.urbandictionary.com/define.php?term=%s' %searchurl
    url_info = urllib.urlopen(url)
    for lines in url_info:
        print lines

为了进行测试，我搜索了 'cats' 并将其用作变量搜索网址。我收到的输出当然是一个巨大的页面，但这是我关心的部分：

<meta content='He set us up the bomb. Also took all our base.' name='Description' />

<meta content='He set us up the bomb. Also took all our base.' property='og:description' />

<meta content='cats' property='og:title' />

<meta content="http://static3.urbandictionary.com/rel-1e0b481/images/og_image.png" property="og:image" />

<meta content='Urban Dictionary' property='og:site_name' />

正如您所看到的，元素“元内容”第一次出现在网站上时，它是搜索词的第一个定义。所以我编写了这段代码来检索它：

def parseud(searchurl):    
    url = 'http://www.urbandictionary.com/define.php?term=%s' %searchurl
    url_info = urllib.urlopen(url)
    if (url_info):
        xmldoc = minidom.parse(url_info)
    if (xmldoc):
        definition = xmldoc.getElementsByTagName('meta content')[0].firstChild.data
        print definition

由于某种原因，解析似乎不起作用并且每次都会遇到错误。这尤其令人困惑，因为该网站似乎使用与我已成功检索特定数据的其他网站基本相同的格式。如果有人能帮助我弄清楚我在这里搞砸了什么，我将不胜感激。

原文

I have written a pretty simple code to get the first result for any term on urbandictionary.com. I started by writing a simple thing to see how their code is formatted.

def parseudtest(searchurl):    
    url = 'http://www.urbandictionary.com/define.php?term=%s' %searchurl
    url_info = urllib.urlopen(url)
    for lines in url_info:
        print lines

For a test, I searched for 'cats', and used that as the variable searchurl. The output I receive is of course a gigantic page, but here is the part I care about:

<meta content='He set us up the bomb. Also took all our base.' name='Description' />

<meta content='He set us up the bomb. Also took all our base.' property='og:description' />

<meta content='cats' property='og:title' />

<meta content="http://static3.urbandictionary.com/rel-1e0b481/images/og_image.png" property="og:image" />

<meta content='Urban Dictionary' property='og:site_name' />

As you can see, the first time the element "meta content" appears on the site, it is the first definition for the search term. So I wrote this code to retrieve it:

def parseud(searchurl):    
    url = 'http://www.urbandictionary.com/define.php?term=%s' %searchurl
    url_info = urllib.urlopen(url)
    if (url_info):
        xmldoc = minidom.parse(url_info)
    if (xmldoc):
        definition = xmldoc.getElementsByTagName('meta content')[0].firstChild.data
        print definition

For some reason the parsing doesn't seem to be working and invariably encounters an error every time. It is especially confusing since the site appears to use basically the same format as other sites I have successfully retrieved specific data from. If anyone could help me figure out what I am messing up here, it would be greatly appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

断肠人 2025-01-11 23:34:43

由于您没有对发生的错误进行回溯，因此很难具体说明，但我认为虽然该网站声称是 XHTML，但它实际上并不是有效的 XML。你最好使用 Beautiful Soup 因为它是为解析 HTML 而设计的，并且会正确处理损坏的标记。

回复收藏 0 原文

听风吹 2025-01-11 23:34:43

我从未使用过 minidom 解析器，但我认为问题是你调用：

xmldoc.getElementsByTagName('meta content')

虽然标签名称是 meta，content 只是第一个属性（如图所示很好地通过突出显示您的 html 代码）。

尝试用以下内容替换该位：

xmldoc.getElementsByTagName('meta')

I never used the minidom parser, but I think the problem is that you call:

xmldoc.getElementsByTagName('meta content')

while tha tag name is meta, content is just the first attribute (as shown pretty well by the highlighting of your html code).

Try to replace that bit with:

xmldoc.getElementsByTagName('meta')

回复收藏 0 原文

~没有更多了~

关于作者

-小熊_

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

在Python中检索一个术语的第一个城市词典结果

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

卷耳

佚名

℉服软

qq_2gSKZM

凉宸

gyhjy

友情链接

在Python中检索一个术语的第一个城市词典结果

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

卷耳

佚名

℉服软

qq_2gSKZM

凉宸

gyhjy

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。