使用 DBPedia 的 Python 脚本示例？

发布于 2024-12-05 16:32:54 字数 247 浏览 8 评论 0原文

我正在编写一个 python 脚本，从来自几个国家和语言的数千篇新闻文章中提取“实体名称”。

我想利用令人惊叹的 DBPedia 结构化知识，例如查找“埃及艺术家”的名称和“公司”的名称在加拿大”。

（如果这些信息是 SQL 形式，我就不会有任何问题。）

我更愿意下载 DBPedia 内容并离线使用。关于需要做什么以及如何从 python 本地查询它的任何想法？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不念旧人 2024-12-12 16:32:54

DBpedia 内容采用 RDF 格式。可以从此处下载转储

Dbpedia 是 RDF 中的大型数据集，用于处理您需要的数据量使用三重存储技术。对于 Dbpedia，您将需要本机三重存储之一，我建议您使用 Virtuoso< /a> 或 4store。我个人更喜欢4store。

一旦您设置了带有 Dbpedia 的三重商店。您可以使用 SPARQL 查询 Dbpedia RDF 三元组。有一些 Python 库可以帮助你做到这一点。 4store 和 Virtuoso 可以以 JSON 格式返回结果，因此您无需任何库即可轻松完成。

一些简单的 urllib 脚本，如 ...

def query(q,epr,f='application/json'):
    try:
        params = {'query': q}
        params = urllib.urlencode(params)
        opener = urllib2.build_opener(urllib2.HTTPHandler)
        request = urllib2.Request(epr+'?'+params)
        request.add_header('Accept', f)
        request.get_method = lambda: 'GET'
        url = opener.open(request)
        return url.read()
    except Exception, e:
        traceback.print_exc(file=sys.stdout)
        raise e

可以帮助您运行 SPARQL ... 例如，

>>> q1 = """
... select ?birthPlace where {
... <http://dbpedia.org/resource/Claude_Monet> <http://dbpedia.org/property/birthPlace> ?birthPlace .
...  }"""
>>> print query(q1,"http://dbpedia.org/sparql")

{ "head": { "link": [], "vars": ["birthPlace"] },
  "results": { "distinct": false, "ordered": true, "bindings": [
    { "birthPlace": { "type": "literal", "xml:lang": "en", "value": "Paris, France" }} ] } }
>>>

我希望这能让您了解如何开始。

DBpedia content is in RDF format. The dumps can be download from here

Dbpedia is a large dataset in RDF, for handling that amount of data you need to use Triple Store technology. For Dbpedia you will need one of native triple stores, I recommend you to use either Virtuoso or 4store. I personally prefer 4store.

Once you have your triple store set up with Dbpedia in it. You can use SPARQL to query the Dbpedia RDF triples. There are Python libraries that can help you with that. 4store and Virtuoso can give you results back in JSON so you can easily get by without any libraries.

Some simple urllib script like ...

def query(q,epr,f='application/json'):
    try:
        params = {'query': q}
        params = urllib.urlencode(params)
        opener = urllib2.build_opener(urllib2.HTTPHandler)
        request = urllib2.Request(epr+'?'+params)
        request.add_header('Accept', f)
        request.get_method = lambda: 'GET'
        url = opener.open(request)
        return url.read()
    except Exception, e:
        traceback.print_exc(file=sys.stdout)
        raise e

can help you out to run SPARQL ... for instance

>>> q1 = """
... select ?birthPlace where {
... <http://dbpedia.org/resource/Claude_Monet> <http://dbpedia.org/property/birthPlace> ?birthPlace .
...  }"""
>>> print query(q1,"http://dbpedia.org/sparql")

{ "head": { "link": [], "vars": ["birthPlace"] },
  "results": { "distinct": false, "ordered": true, "bindings": [
    { "birthPlace": { "type": "literal", "xml:lang": "en", "value": "Paris, France" }} ] } }
>>>

I hope this gives you an idea of how to start.

回复收藏 0 原文

往日 2024-12-12 16:32:54

在 python3 中，使用 requests 库的答案将如下所示：

def query(q, epr, f='application/json'):
    try:
        params = {'query': q}
        resp = requests.get(epr, params=params, headers={'Accept': f})
        return resp.text
    except Exception as e:
        print(e, file=sys.stdout)
        raise

In python3 the answer will look like this using the requests library:

def query(q, epr, f='application/json'):
    try:
        params = {'query': q}
        resp = requests.get(epr, params=params, headers={'Accept': f})
        return resp.text
    except Exception as e:
        print(e, file=sys.stdout)
        raise

回复收藏 0 原文

~没有更多了~