使用 DBPedia 的 Python 脚本示例?

发布于 2024-12-05 16:32:54 字数 247 浏览 8 评论 0原文

我正在编写一个 python 脚本,从来自几个国家和语言的数千篇新闻文章中提取“实体名称”。

我想利用令人惊叹的 DBPedia 结构化知识,例如查找“埃及艺术家”的名称和“公司”的名称在加拿大”。

(如果这些信息是 SQL 形式,我就不会有任何问题。)

我更愿意下载 DBPedia 内容并离线使用。关于需要做什么以及如何从 python 本地查询它的任何想法?

I am writing a python script to extract "Entity names" from a collection of thousands of news articles from a few countries and languages.

I would like to make use of the amazing DBPedia structured knwoledge, say for example to look up the names of "artists in egypt" and names of "companies in Canada".

(If these information was in SQL form, I would have had no problem.)

I would prefer to download the DBPedia content and use it offline. any ideas of what is needed to do so and how to query it locally from python ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不念旧人 2024-12-12 16:32:54

DBpedia 内容采用 RDF 格式。可以从此处下载转储

Dbpedia 是 RDF 中的大型数据集,用于处理您需要的数据量使用三重存储技术。对于 Dbpedia,您将需要本机三重存储之一,我建议您使用 Virtuoso< /a> 或 4store。我个人更喜欢4store。

一旦您设置了带有 Dbpedia 的三重商店。您可以使用 SPARQL 查询 Dbpedia RDF 三元组。有一些 Python 库可以帮助你做到这一点。 4store 和 Virtuoso 可以以 JSON 格式返回结果,因此您无需任何库即可轻松完成。

一些简单的 urllib 脚本,如 ...

def query(q,epr,f='application/json'):
    try:
        params = {'query': q}
        params = urllib.urlencode(params)
        opener = urllib2.build_opener(urllib2.HTTPHandler)
        request = urllib2.Request(epr+'?'+params)
        request.add_header('Accept', f)
        request.get_method = lambda: 'GET'
        url = opener.open(request)
        return url.read()
    except Exception, e:
        traceback.print_exc(file=sys.stdout)
        raise e 

可以帮助您运行 SPARQL ... 例如,

>>> q1 = """
... select ?birthPlace where {
... <http://dbpedia.org/resource/Claude_Monet> <http://dbpedia.org/property/birthPlace> ?birthPlace .
...  }"""
>>> print query(q1,"http://dbpedia.org/sparql")

{ "head": { "link": [], "vars": ["birthPlace"] },
  "results": { "distinct": false, "ordered": true, "bindings": [
    { "birthPlace": { "type": "literal", "xml:lang": "en", "value": "Paris, France" }} ] } }
>>> 

我希望这能让您了解如何开始。

DBpedia content is in RDF format. The dumps can be download from here

Dbpedia is a large dataset in RDF, for handling that amount of data you need to use Triple Store technology. For Dbpedia you will need one of native triple stores, I recommend you to use either Virtuoso or 4store. I personally prefer 4store.

Once you have your triple store set up with Dbpedia in it. You can use SPARQL to query the Dbpedia RDF triples. There are Python libraries that can help you with that. 4store and Virtuoso can give you results back in JSON so you can easily get by without any libraries.

Some simple urllib script like ...

def query(q,epr,f='application/json'):
    try:
        params = {'query': q}
        params = urllib.urlencode(params)
        opener = urllib2.build_opener(urllib2.HTTPHandler)
        request = urllib2.Request(epr+'?'+params)
        request.add_header('Accept', f)
        request.get_method = lambda: 'GET'
        url = opener.open(request)
        return url.read()
    except Exception, e:
        traceback.print_exc(file=sys.stdout)
        raise e 

can help you out to run SPARQL ... for instance

>>> q1 = """
... select ?birthPlace where {
... <http://dbpedia.org/resource/Claude_Monet> <http://dbpedia.org/property/birthPlace> ?birthPlace .
...  }"""
>>> print query(q1,"http://dbpedia.org/sparql")

{ "head": { "link": [], "vars": ["birthPlace"] },
  "results": { "distinct": false, "ordered": true, "bindings": [
    { "birthPlace": { "type": "literal", "xml:lang": "en", "value": "Paris, France" }} ] } }
>>> 

I hope this gives you an idea of how to start.

往日 2024-12-12 16:32:54

在 python3 中,使用 requests 库的答案将如下所示:

def query(q, epr, f='application/json'):
    try:
        params = {'query': q}
        resp = requests.get(epr, params=params, headers={'Accept': f})
        return resp.text
    except Exception as e:
        print(e, file=sys.stdout)
        raise

In python3 the answer will look like this using the requests library:

def query(q, epr, f='application/json'):
    try:
        params = {'query': q}
        resp = requests.get(epr, params=params, headers={'Accept': f})
        return resp.text
    except Exception as e:
        print(e, file=sys.stdout)
        raise
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文