RFC:远程编辑非常大的二进制文件的好方法是什么?

发布于 2024-09-07 00:51:33 字数 1090 浏览 7 评论 0原文

我有许多相当大的二进制文件(固定长度记录,其布局在另一个文本文件中描述)。数据文件最大可达 6 GB。布局文件(cobol copybook)较小,通常小于 5 KB。

所有数据文件都集中在 GNU/Linux 服务器中(尽管它们是在大型机中生成的)。

我需要为测试人员提供编辑这些二进制文件的方法。有一个名为 RecordEdit 的免费产品 (http://record-editor.sourceforge.net/) ,但它有两个严重的缺点:

  1. 它强制测试人员下载 通过SFTP传输大文件,只为了 每次稍微上传一次 已经做出改变。 非常 效率低下。

  2. 它加载整个 文件存入工作内存,渲染 除了相对较小的以外,它对所有人都没用 数据文件。

我想到的是基于 Java 的客户端/服务器架构:

  • 服务器将永久运行 过程,监听 面向版本的请求来自 客户。此类请求将 包括类似的内容

    • 返回可用文件列表

    • 锁定特定文件的版本

    • 修改该记录中的数据

    • 返回第 n 页记录

    等等...

  • 客户端可以采用任何形式 (基于桌面的 RCP——这是我的第一个候选者——,ncurses 在同一台服务器上,一个中间网络 应用程序...)只要它能够 向服务器发送请求。

为了实现该方案,我一直在探索 NIO(因为它的缓冲区)和 MINA(因为协议透明性)。然而,在进一步推进这项工作之前,我想收集你们的专家意见。

我的方法是解决问题的合理方法吗?

使用我正在考虑的语言和框架来做到这一点是否可行?方便吗?

您是否知道任何与我正在尝试做的事情类似或与之相关的模式、蓝图、成功案例或开放项目?

I have a number of rather large binary files (fixed length records, the layout of which is described in another –textual– file). Data files can get as big as 6 GB. Layout files (cobol copybooks) are small in size, usually less than 5 KB.

All data files are concentrated in a GNU/Linux server (although they were generated in a mainframe).

I need to provide the testers with the means to edit those binary files. There is a free product called RecordEdit (http://record-editor.sourceforge.net/), but it has two severe drawbacks:

  1. It forces the testers to download
    the huge files through SFTP, only to
    upload them once again every time a slight
    change has been made. Very
    inefficient.

  2. It loads the entire
    file into working memory, rendering
    it useless for all but the relatively small
    data files.

What I have in mind is a client/server architecture based in Java:

  • The server would be running a permanent
    process, listening for
    edition-oriented requests coming from
    the client. Such requests would
    include stuff like

    • return the list of available files

    • lock certain file for edition

    • modify this data in that record

    • return the n-th page of records

    and so on…

  • The client could take any form
    (RCP-based in a desktop –which is my first candidate-, ncurses in the same server, a middle web
    application…) as long as it is able to
    send requests to the server.

I've been exploring NIO (because of its buffers) and MINA (because of protocol transparency) in order to implement the scheme. However, before any further advancement of this endeavor, I would like to collect your expert opinions.

Is mine a reasonable way to frame the problem?

Is it feasible to do it using the language and frameworks I'm thinking of? Is it convenient?

Do you know of any patterns, blue prints, success cases or open projects that resemble or have to do with what I'm trying to do?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

她说她爱他 2024-09-14 00:51:33

在我看来,这里棘手的事情是解码服务器上的文件。一旦你写好了,它应该很容易。

我建议,无论您使用客户端是什么,它基本上都应该上传该人的更改的“差异”。

为这些数据创建一个类似数据库(或使用现有数据库)的东西可能有意义吗?或者是太多了?

根据需要执行此操作的人数,快速而肮脏的解决方案是通过 X 转发运行程序——这可以消除许多问题......只要该服务器有相当多的可用 RAM。

As I see it, the tricky thing here is decoding the files on the server. Once you've written that, it should be pretty easy.

I would suggest that, whatever the thing you use client-side is, it should basically upload a 'diff' of the person's changes.

Might it make sense to make something that acts like a database (or use an existing database) for this data? Or is there just too much of it?

Depending on how many people need to do this, the quick-and-dirty solution is to run the program via X forwarding -- that eliminates a number of the issues.. as long as that server has quite a lot of RAM free.

神也荒唐 2024-09-14 00:51:33

我的方法是解决问题的合理方法吗?

海事组织,是的。

使用我正在考虑的语言和框架来做到这一点是否可行?

我想是的。但还有其他选择。例如:

  • 将记录存入数据库,通过文件名+记录号组成的key进行访问。可以是完整的 RDBMS,或更轻量级的解决方案。

  • 通过使用 HTML + javascript 实现的 UI 实现为 RESTful Web 服务。

  • 使用可扩展的分布式文件系统实现。

另外,从您的描述来看,似乎并不迫切需要使用高度可扩展/传输独立的层......除非您需要支持数百个并发用户。

方便吗?

方便谁?如果您谈论的是开发人员,则取决于您是否已经熟悉这些框架。

Is mine a reasonable way to frame the problem?

IMO, yes.

Is it feasible to do it using the language and frameworks I'm thinking of?

I think so. But there are other alternatives. For example:

  • Put the records into a database, and access by a key consisting of a filename + a record number. Could be a full RDBMS, or a more lightweight solution.

  • Implement as a RESTful web service with a UI implemented in HTML + javascript.

  • Implement using a scalable distributed file-system.

Also, from your description there doesn't seem to be a pressing need to use a highly scalable / transport independent layer ... unless you need to support hundreds of simultaneous users.

Is it convenient?

Convenient for who? If you are talking about you the developer, it depends if you are already familiar with those frameworks.

九歌凝 2024-09-14 00:51:33

您是否考虑过使用像 OpenAFS 这样的分布式文件系统?这应该能够处理非常大的文件。然后,您可以编写一个客户端应用程序来编辑文件,就像它们是本地文件一样。

Have you considered using a distributed file system like OpenAFS? That should be able to handle very large files. Then you can write a client-side app for editing the files as if they are local.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文