将非常大的 RDF 文件加载到 openrdf Sesame 本体管理器中

发布于 2024-09-16 05:40:44 字数 643 浏览 9 评论 0原文

我需要将表示为 N-三元组文件(1gb)的非常大的本体加载到 openrdf Sesame 应用程序。我正在使用工作台界面来做到这一点。我知道这个文件太大,无法在一个请求中加载。为了解决这个问题,我将文件分成大小为 100mb 的文件。但我仍然从 openrdf Sesame 服务器收到错误:

HTTP ERROR 500

Problem accessing /openrdf-workbench/repositories/business/add. Reason:

    Unbuffered entity enclosing request can not be repeated.
Caused by:

org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated.
 at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)

有谁对 openrdf Sesame 或其他我可以用于我的任务的本体管理器有很好的了解吗?

非常感谢您的投入

K.

I need to load very large ontology represented as N-triples file(1gb) to the openrdf Sesame application. I'm using the workbench interface to do that. I know that this file is too big to be loaded in one request. To get around that, I splitted my files in files of size 100mb. But I still get a error form the openrdf Sesame server :

HTTP ERROR 500

Problem accessing /openrdf-workbench/repositories/business/add. Reason:

    Unbuffered entity enclosing request can not be repeated.
Caused by:

org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated.
 at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)

Has anyone a good knowledge of openrdf Sesame or other ontology manager that I could use for my task ?

Thanks a lot for your input

K.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

清风挽心 2024-09-23 05:40:44

Sesame Workbench 实际上并不是执行此类任务的理想工具 - 尽管我希望它能够处理 100MB 的文件。您运行 Sesame 的 Tomcat 可能设置了 POST 限制?你可以在芝麻的邮件列表上问问,那里知识渊博的人也很少。但这里有两种可能的想法来完成任务:

处理此问题的一种方法是使用 Sesame 的存储库 API 以编程方式进行上传。请查看 Sesame 网站 上的用户文档以获取代码示例。

或者,如果您使用的是 Sesame 本机存储,则可以使用 Sesame 的命令行控制台执行“脏”解决方法:创建本地本机三重存储并将数据上传到该本地存储(这应该更快,因为没有 HTTP 通信)必要的)。然后,关闭您的 Sesame 服务器,将本地本机存储的数据文件复制到服务器中的存储数据文件上,然后重新启动。

The Sesame Workbench is really not the ideal tool for these kinds of tasks - although I would expect it to be able to cope with 100MB files. It might be that the Tomcat on which you run Sesame has a POST limit set? You could ask around on Sesame's mailinglist, there's quite few knowledgeable people there as well. But here are two possible ideas to get things done:

One way to handle this is to do your upload programmatically, using Sesame's Repository API. Have a look at the user documentation on the Sesame website for code examples.

Alternatively, if you are using a Sesame native store, you could do a 'dirty' workaround using Sesame's command line console: create a local native triple store and upload your data to that local store (this should be much quicker because no HTTP communication is necessary). Then, shut down your Sesame server, copy the datafiles of the local native store over the store data files in your server, and restart.

空城仅有旧梦在 2024-09-23 05:40:44

我也有同样的问题。当我尝试上传“大”RDF(大约 40MB)时,上传过程失败并出现错误:

包含请求的无缓冲实体不能重复。

我尝试了其他版本的Tomcat和芝麻,但没有成功。然后我尝试使用 sesame 控制台和本地存储库(不是 tomcat 服务器上的本地主机 - 正如 Jeen 在另一个答案中所说),它向我显示另一个错误:

格式错误的文档:JAXP00010001:解析器在此文档中遇到了超过“64000”个实体扩展;这是 JDK 施加的限制。 [第 1 行,第 1 列]

因此,我认为有关 Entity Limit 的错误已被有关 Umbuffered 实体的错误覆盖在 tomcat 中的某处。

然后我发现这个主题 什么在读取 Storm 集群中的 AWS SQS 队列时导致这些 ParseError 异常,并在 tomcat 启动之前添加此语句:

export JAVA_OPTS="${JAVA_OPTS} -Djdk.xml.entityExpansionLimit=0"

此语句禁用 XML 解析器中的实体限制(如错误消息所述,默认值为 64 000)。在此步骤之后,我可以加载“大”RDF(在 40-800MB 上测试)。

I had the same problem. When i tried to upload "large" RDF (around 40MB) the upload process faild with error:

Unbuffered entity enclosing request can not be repeated.

I try other wersion of Tomcat and also sesame but without success. Then I try to use sesame console and local repository (not localhost on tomcat server - as Jeen say in another answer) it show me another error:

Malformed document: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK. [line 1, column 1]

So I think error about Entity Limit is covered somewhere in tomcat by error about Umbuffered entity.

Then I found this topic What's causing these ParseError exceptions when reading off an AWS SQS queue in my Storm cluster and add this statement before tomcat starting:

export JAVA_OPTS="${JAVA_OPTS} -Djdk.xml.entityExpansionLimit=0"

This statement disable entity limit in XML parser (default is 64 000 as error message says). After this step im possible to load "large" RDF (tested on 40-800MB).

橘亓 2024-09-23 05:40:44

我不知道您到底希望实现什么任务,但您可能想查看此处具有非正式(主要是自称)可扩展性结果的可扩展三重存储列表。在此,Sesame 仅报告处理 70M 条语句(不是那么多......可能是您遇到麻烦的原因。)

I don't know exactly what task you hope to achieve, but you may want to check out here for a list of scalable triple stores with informal (mainly self-claimed) scalability results. In this, Sesame only reports handling 70M statements (not so many... might be the cause of your troubles.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文