将非常大的 RDF 文件加载到 openrdf Sesame 本体管理器中

发布于 2024-09-16 05:40:44 字数 643 浏览 13 评论 0原文

我需要将表示为 N-三元组文件（1gb）的非常大的本体加载到 openrdf Sesame 应用程序。我正在使用工作台界面来做到这一点。我知道这个文件太大，无法在一个请求中加载。为了解决这个问题，我将文件分成大小为 100mb 的文件。但我仍然从 openrdf Sesame 服务器收到错误：

HTTP ERROR 500

Problem accessing /openrdf-workbench/repositories/business/add. Reason:

    Unbuffered entity enclosing request can not be repeated.
Caused by:

org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated.
 at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)

有谁对 openrdf Sesame 或其他我可以用于我的任务的本体管理器有很好的了解吗？

非常感谢您的投入

原文

I need to load very large ontology represented as N-triples file(1gb) to the openrdf Sesame application. I'm using the workbench interface to do that. I know that this file is too big to be loaded in one request. To get around that, I splitted my files in files of size 100mb. But I still get a error form the openrdf Sesame server :

HTTP ERROR 500

Problem accessing /openrdf-workbench/repositories/business/add. Reason:

    Unbuffered entity enclosing request can not be repeated.
Caused by:

org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated.
 at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)

Has anyone a good knowledge of openrdf Sesame or other ontology manager that I could use for my task ?

Thanks a lot for your input

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清风挽心 2024-09-23 05:40:44

Sesame Workbench 实际上并不是执行此类任务的理想工具 - 尽管我希望它能够处理 100MB 的文件。您运行 Sesame 的 Tomcat 可能设置了 POST 限制？你可以在芝麻的邮件列表上问问，那里知识渊博的人也很少。但这里有两种可能的想法来完成任务：

处理此问题的一种方法是使用 Sesame 的存储库 API 以编程方式进行上传。请查看 Sesame 网站上的用户文档以获取代码示例。

或者，如果您使用的是 Sesame 本机存储，则可以使用 Sesame 的命令行控制台执行“脏”解决方法：创建本地本机三重存储并将数据上传到该本地存储（这应该更快，因为没有 HTTP 通信）必要的）。然后，关闭您的 Sesame 服务器，将本地本机存储的数据文件复制到服务器中的存储数据文件上，然后重新启动。

回复收藏 0 原文

空城仅有旧梦在 2024-09-23 05:40:44

我也有同样的问题。当我尝试上传“大”RDF（大约 40MB）时，上传过程失败并出现错误：

包含请求的无缓冲实体不能重复。

我尝试了其他版本的Tomcat和芝麻，但没有成功。然后我尝试使用 sesame 控制台和本地存储库（不是 tomcat 服务器上的本地主机 - 正如 Jeen 在另一个答案中所说），它向我显示另一个错误：

格式错误的文档：JAXP00010001：解析器在此文档中遇到了超过“64000”个实体扩展；这是 JDK 施加的限制。 [第 1 行，第 1 列]

因此，我认为有关 Entity Limit 的错误已被有关 Umbuffered 实体的错误覆盖在 tomcat 中的某处。

然后我发现这个主题什么在读取 Storm 集群中的 AWS SQS 队列时导致这些 ParseError 异常，并在 tomcat 启动之前添加此语句：

export JAVA_OPTS="${JAVA_OPTS} -Djdk.xml.entityExpansionLimit=0"

此语句禁用 XML 解析器中的实体限制（如错误消息所述，默认值为 64 000）。在此步骤之后，我可以加载“大”RDF（在 40-800MB 上测试）。

I had the same problem. When i tried to upload "large" RDF (around 40MB) the upload process faild with error:

Unbuffered entity enclosing request can not be repeated.

I try other wersion of Tomcat and also sesame but without success. Then I try to use sesame console and local repository (not localhost on tomcat server - as Jeen say in another answer) it show me another error:

Malformed document: JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK. [line 1, column 1]

So I think error about Entity Limit is covered somewhere in tomcat by error about Umbuffered entity.

Then I found this topic What's causing these ParseError exceptions when reading off an AWS SQS queue in my Storm cluster and add this statement before tomcat starting:

export JAVA_OPTS="${JAVA_OPTS} -Djdk.xml.entityExpansionLimit=0"

This statement disable entity limit in XML parser (default is 64 000 as error message says). After this step im possible to load "large" RDF (tested on 40-800MB).

回复收藏 0 原文