在持久性和 XML 方面使用 InputStream 的最佳方式

发布于 2024-07-15 03:06:18 字数 486 浏览 7 评论 0原文

我有一个 REST Web 服务,它侦听 POST 请求并从客户端获取 XML 有效负载,并将其最初存储为 InputStream,即在可以调用 getStream() 的 Representation 对象上

我想利用 InputStream 中保存的 XML,并且我开始认为保留它是明智的,这样我就可以多次询问数据 - 因为一旦你读完它,对象就会变成 null。 所以我想到了将InputStream转换为字符串。 这不是一个好主意,因为 javax.xml.parsers 库中的 DocumentBuilder.parse() 只允许您传递:

  • InputStreams
  • 文件
  • URL
  • SAX InputSources

而不是字符串。

我到底应该用 InputStreams 做什么来解析它的 XML? 请记住,我将希望在未来的流程中通过代码重新询问该 XML。

I have a REST webservice that listens to POST requests and grabs hold of an XML payload from the client and stores it initially as an InputStream i.e. on the Representation object you can call getStream().

I want to utilise the XML held in the InputStream and I am begining to think it would be wise to persist it, so I can interrogate the data multiple times - as once you read through it, the object becomes null. So I thought about converting the InputStream to a string. This is not a good idea as DocumentBuilder.parse() from javax.xml.parsers library will only allow you to pass:

  • InputStreams
  • Files
  • URLs
  • SAX InputSources

not strings.

What should I really be doing here with InputStreams in relation to parsing XML out of it?
Bearing in mind I will want to re-interrogate that XML in future processes by the code.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

昵称有卵用 2024-07-22 03:06:18

如果您有一个 InputStream,并且希望将其用作 XML 文档,那么为什么不简单地解析它并传递 Document 对象呢? 如果您想保留此对象,请使用序列化器将其作为文本写回。

正如我在给 Tom Hawtin 的评论中指出的那样,在处理 XML 时编码非常重要。 不要在这里写一篇可能会错过您的具体情况的长帖子,这里有一篇 文章< /a> 我写的。

编辑:实际上,由于我的文章没有专门讨论 Web 服务,所以我应该在这里深入探讨一下。 有两个地方可以指定内容编码:在 XML 序言中,或者在 Content-Type 响应标头中。 根据 XML 规范,前者是您想要使用的,也是解析器将使用的。 在大多数情况下,这并不重要:由不了解规范的人设置的 Web 服务通常会使用没有字符集规范的文本/xml(这是不正确的,但可能不会造成伤害)。 如果他们做得正确,他们将使用 utf-8 编码指定 application/xml。 但是,您应该验证您所得到的内容,这样您就不会得到解析器无法处理的一些奇怪的编码。

If you have an InputStream, and want to use it as an XML document, then why aren't you simply parsing it and passing around the Document object? If you want to persist this object, then use a serializers to write it back out as text.

As I noted in my comment to Tom Hawtin, encoding is very important when dealing with XML. Rather than write a long posting here that may miss your specific situation, here's an article that I wrote.

Edit: actually, since my article doesn't specifically talk about web services, I should dive into it a little here. There are two places where the content encoding can be specified: in the XML prologue, or in the Content-Type response header. According to the XML spec, the former is the one that you want to use, and it's what the parser will use. In most cases, that doesn't matter: a webservice set up by a person who doesn't know the spec will typically use a text/xml without a character set specification (which is incorrect but probably not going to cause harm). If they do things correctly, they'll specify application/xml, with utf-8 encoding. However, you should verify what you're getting, so that you don't end up with some strange encoding that the parser can't handle.

终止放荡 2024-07-22 03:06:18

我建议使用 Apache Commons IO 库。 IOUtils 类包含许多方便的方法将输入流转换为字符串,反之亦然。

I would advise to use the Apache Commons IO library. The IOUtils class contains many convenience methods to convert InputStreams to String and vice versa.

乖乖 2024-07-22 03:06:18

一般来说,当我们谈论持久性时,我们谈论的是将其写入磁盘或其他介质。 那里的性能会受到影响,您必须考虑磁盘空间问题。 您需要权衡这一点与长期保留该 XML 的价值。

如果您只是谈论将其保存在内存中(这听起来像您所要求的),那么您可以分配一个字节数组,并将整个内容读入字节数组。 您可以使用 ByteArrayInputStream 读取并重新读取该流。

这样做的成本是两倍。 首先,您在内存中保存了一个副本,并且您需要根据可扩展性要求来权衡它。 其次,解析 XML 的成本较高,因此如果可能的话,最好只解析一次,并将结果保存在对象中。

编辑:

要分配和读取字节数组,您通常(但并非总是)可以依靠 InputStream 的 available() 方法来告诉您要分配多少。 并用 DataInputStream 包装 InputStream,以便您可以调用 readFully() 通过一次调用将整个内容吸入字节数组中。

再次编辑:

阅读下面斯蒂恩的评论。 他是对的,在这种情况下使用 available() 是一个坏主意。

Generally, when we're talking persistence, we're talking about writing it to disk or other media. There's a performance hit there, and you have to think about disk space concerns. You'll want to weigh that against the value of having that XML around for the long term.

If you're just talking about holding it in memory (which sounds like what you're asking), then you could allocate a byte array, and read the whole thing into the byte array. The you can use ByteArrayInputStream to read and re-read that stream.

The cost with that is two-fold. First, you're holding a copy in memory, and you need to weigh that against your scalability requirements. Second, parsing XML is somewhat expensive, so it's best to parse it once only, if possible, and save the result in an object.

Edit:

To allocate and read the byte array, you can often (but not always) rely on InputStream's available() method to tell you how much to allocate. and wrap the InputStream with a DataInputStream so that you can call readFully() to suck the whole thing into the byte array with one call.

Edit again:

Read Steen's comment below. He's right that it's a bad idea to use available() in this case.

风吹短裙飘 2024-07-22 03:06:18

如果你想多次使用XML,为什么不从InputStream中解析一次(这是一项繁重的工作),然后保留返回的Document呢?

If you want to use the XML multiple times, why not parse it once from the InputStream (which is the heavy work), and then hold on to the Document returned?

べ映画 2024-07-22 03:06:18

我认为你应该研究一些更适合保留编码的结构(即更多的编码不可知论)。 对于低级结构,请考虑 byte[] (但要小心内存释放!)或者您可以尝试设计适合您需要的数据类型。

您可以阅读 InputStream ByteArrayOutputStream(使用 read() 方法)并从 那里

I think you should look into some structures better suited for preserving encodings (ie. more encoding agnostic). For low-level structures, consider byte[] (but be careful with memory deallocation!) or you could try to design a data type that fits your needs.

You could read the InputStream into a ByteArrayOutputStream (using one of the read() methods) and extract the byte[] from there.

丘比特射中我 2024-07-22 03:06:18

java.io.StringReader 将允许您使用InputSource

您可能希望将数据存储在 byte[] 中,然后使用 ByteArrayInputStream 读取。 如果它特别大,您可能需要考虑压缩。 这可以通过 GzipInputStream 读出,它通常应该包装在 BufferedInputStream 中。

java.io.StringReader will allow you to use InputSource.

You might want to store the data in a byte[] and then read with ByteArrayInputStream. If it's particular large, you might might want to consider compression. This can be read out iwth GzipInputStream, which should often be wrapped in a BufferedInputStream.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文