大量数据 - 发送它们的最佳方式是什么?
我们有这样的场景:
包含所需数据的服务器和这些数据所需的客户端组件。
服务器上存储了两种类型的数据: - 一些信息 - 基本上只是几个字符串 - 二进制数据
我们在获取二进制数据时遇到问题。 双方都是用 Java 5 编写的,所以我们有几种方法......
由于速度、内存等原因,Web 服务不是最好的解决方案......
那么,您会喜欢什么?
如果可能的话,我想错过低级套接字连接...
提前感谢
Vitek
we have this scenario:
A server which contains needed data and client component which these data wants.
On the server are stored 2 types of data:
- some information - just a couple of strings basically
- binary data
We have a problem with getting binary data. Both sides are written in Java 5 so we have couple of ways....
Web Service is not the best solution because of speed, memory etc...
So, What would you prefer?
I would like to miss low level socket connection if possible...
thanks in advance
Vitek
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
我认为处理大量数据的唯一方法是使用原始套接字访问。
使用大多数其他方法,您都会遇到大文件的内存不足问题。
在 Java 中,套接字处理确实非常简单,它可以让您流式传输数据,而无需将整个文件加载到内存中(这是在没有您自己的缓冲的情况下在幕后发生的事情)。
使用这种策略,我成功构建了一个允许传输任意大文件的系统(我使用 7+ GB DVD 映像来测试系统),而不会遇到内存问题。
I think the only way to do LARGE amounts of data is going to be with raw socket access.
You will hit the Out of Memory issues on large files with most other methods.
Socket handling is really pretty straight forward in Java, and it will let you stream the data without loading the entire file into memory (which is what happens behind the scenes without your own buffering).
Using this strategy I managed to build a system that allowed for the transfer of arbitrarily large files (I was using a 7+ GB DVD image to test the system) without hitting memory issues.
查看 W3C 标准 MTOM,将二进制数据作为 SOAP 服务的一部分进行传输。 它的效率很高,因为它作为二进制文件发送,也可以作为缓冲块发送。 它还可以与其他客户端或提供商进行互操作:
如何进行 MTOM互操作
服务器端 - 使用 SOAP 发送附件
Take a look at the W3C standard MTOM to transfer binary data as part of a SOAP service. It is efficient in that it sends as a binary and can also send as buffered chunks. It will also interop with other clients or providers:
How to do MTOM Interop
Server Side - Sending Attachments with SOAP
你可能想看看protobuf,这是google用来交换的库数据。 它非常高效且可扩展。 顺便说一句,永远不要低估一辆装满 1TB 硬盘的旅行车的带宽!
You might want to have a look at protobuf, this is the library that google uses to exchange data. Its very efficient and extensible. On a sidenote, Never underestimate the bandwidth of a station wagon full of 1TB harddisks!
我尝试过将二进制数据转换为 Base64,然后通过 SOAP 调用发送它,它对我有用。 我不知道这是否算作网络服务,但如果算的话,那么您就几乎陷入了套接字困境。
I've tried converting the binary data to Base64 and then sending it over via SOAP calls and it's worked for me. I don't know if that counts as a web service, but if it does, then you're pretty much stuck with sockets.
一些选项:
您可以使用 RMI,它将为您隐藏套接字级别的内容,并且可能对数据进行 gzip...但如果连接失败,它将不会为您恢复。 可能也会遇到内存问题。
仅使用二进制 mime 类型对数据进行 HTTP 传输(可能再次在网络服务器上配置 gzip)。 简历上也有类似的问题。
生成类似wget的东西(我认为这可以恢复)
如果客户端已经拥有数据(它的先前版本),则rsync 仅复制更改
,rsync 将仅复制更改
Some options:
You could use RMI which will hide the socket level stuff for you, and perhaps gzip the data...but if the connection fails it won't resume for you. Probably will encounter memory issues too.
just HTTP the data with a binary mime type (again perhaps configuring gzip on the webserver). similar problem on resume.
spawn something like wget (I think this can do resume)
if the client already has the data (a previous version of it), rsync would copy only the changes
旧的、价格实惠且功能强大的 FTP 怎么样? 例如,您可以轻松地将 FTP 服务器嵌入到服务器端组件中,然后编写 FTP 客户端。 FTP 正是为此而诞生(文件传输协议,不是吗?),而带有附件的 SOAP 的设计并未考虑到这一点,因此性能可能会很差。
例如,您可以查看:
http://mina.apache.org/ftpserver/
但还有其他实现,Apache Mina 只是我能记得的第一个。
祝你好运& 问候
What about the old, affordable and robust FTP? You can for example easily embed an FTP server in your server-side components and then code a FTP client. FTP was born exactly for that (File Transfer Protocol, isn't it?), while SOAP with attachments was not designed with that stuff in mind and can perform very badly.
For example you could have a look at:
http://mina.apache.org/ftpserver/
But there are other implementations out there, Apache Mina is just the first one I can recall.
Good luck & regards
sneakernet 是一个选项吗? :P
RMI 以其易用性和内存泄漏而闻名。 被警告。 根据我们讨论的数据量,sneakernet 和套接字都是不错的选择。
Is sneakernet an option? :P
RMI is well known for its ease-of-use and its memory leaks. Be warned. Depending on just how much data we're talking about, sneakernet and sockets are both good options.
将 GridFTP 视为您的传输层。 另请参阅此问题。
Consider GridFTP as your transport layer. See also this question.