在Python中检索一个术语的第一个城市词典结果

发布于 2025-01-04 23:34:43 字数 1427 浏览 0 评论 0原文

我编写了一个非常简单的代码来获取urbandictionary.com 上任何术语的第一个结果。我首先写了一个简单的东西来看看他们的代码是如何格式化的。

def parseudtest(searchurl):    
    url = 'http://www.urbandictionary.com/define.php?term=%s' %searchurl
    url_info = urllib.urlopen(url)
    for lines in url_info:
        print lines

为了进行测试,我搜索了 'cats' 并将其用作变量搜索网址。我收到的输出当然是一个巨大的页面,但这是我关心的部分:

<meta content='He set us up the bomb. Also took all our base.' name='Description' />

<meta content='He set us up the bomb. Also took all our base.' property='og:description' />

<meta content='cats' property='og:title' />

<meta content="http://static3.urbandictionary.com/rel-1e0b481/images/og_image.png" property="og:image" />

<meta content='Urban Dictionary' property='og:site_name' />

正如您所看到的,元素“元内容”第一次出现在网站上时,它是搜索词的第一个定义。所以我编写了这段代码来检索它:

def parseud(searchurl):    
    url = 'http://www.urbandictionary.com/define.php?term=%s' %searchurl
    url_info = urllib.urlopen(url)
    if (url_info):
        xmldoc = minidom.parse(url_info)
    if (xmldoc):
        definition = xmldoc.getElementsByTagName('meta content')[0].firstChild.data
        print definition

由于某种原因,解析似乎不起作用并且每次都会遇到错误。这尤其令人困惑,因为该网站似乎使用与我已成功检索特定数据的其他网站基本相同的格式。如果有人能帮助我弄清楚我在这里搞砸了什么,我将不胜感激。

I have written a pretty simple code to get the first result for any term on urbandictionary.com. I started by writing a simple thing to see how their code is formatted.

def parseudtest(searchurl):    
    url = 'http://www.urbandictionary.com/define.php?term=%s' %searchurl
    url_info = urllib.urlopen(url)
    for lines in url_info:
        print lines

For a test, I searched for 'cats', and used that as the variable searchurl. The output I receive is of course a gigantic page, but here is the part I care about:

<meta content='He set us up the bomb. Also took all our base.' name='Description' />

<meta content='He set us up the bomb. Also took all our base.' property='og:description' />

<meta content='cats' property='og:title' />

<meta content="http://static3.urbandictionary.com/rel-1e0b481/images/og_image.png" property="og:image" />

<meta content='Urban Dictionary' property='og:site_name' />

As you can see, the first time the element "meta content" appears on the site, it is the first definition for the search term. So I wrote this code to retrieve it:

def parseud(searchurl):    
    url = 'http://www.urbandictionary.com/define.php?term=%s' %searchurl
    url_info = urllib.urlopen(url)
    if (url_info):
        xmldoc = minidom.parse(url_info)
    if (xmldoc):
        definition = xmldoc.getElementsByTagName('meta content')[0].firstChild.data
        print definition

For some reason the parsing doesn't seem to be working and invariably encounters an error every time. It is especially confusing since the site appears to use basically the same format as other sites I have successfully retrieved specific data from. If anyone could help me figure out what I am messing up here, it would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

断肠人 2025-01-11 23:34:43

由于您没有对发生的错误进行回溯,因此很难具体说明,但我认为虽然该网站声称是 XHTML,但它实际上并不是有效的 XML。你最好使用 Beautiful Soup 因为它是为解析 HTML 而设计的,并且会正确处理损坏的标记。

As you don't give the traceback for the errors that occur it's hard to be specific, but I assume that although the site claims to be XHTML it's not actually valid XML. You'd be better off using Beautiful Soup as it is designed for parsing HTML and will correctly handle broken markup.

听风吹 2025-01-11 23:34:43

我从未使用过 minidom 解析器,但我认为问题是你调用:

xmldoc.getElementsByTagName('meta content')

虽然标签名称是 metacontent 只是第一个属性(如图所示很好地通过突出显示您的 html 代码)

尝试用以下内容替换该位:

xmldoc.getElementsByTagName('meta')

I never used the minidom parser, but I think the problem is that you call:

xmldoc.getElementsByTagName('meta content')

while tha tag name is meta, content is just the first attribute (as shown pretty well by the highlighting of your html code).

Try to replace that bit with:

xmldoc.getElementsByTagName('meta')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文