是否有比 LOAD/NEXT 更细的粒度来读取结构化数据?

发布于 2024-09-30 05:30:04 字数 598 浏览 5 评论 0原文

想象一下,我有一个很长的 Rebol 格式数据文件,有一百万行,看起来像

REBOL []

[
    [employee name: {Tony Romero} salary: $10,203.04]
    [employee name: {Marcus "Marco" Marcami} salary: default]
    [employee name: {Serena Derella} salary: ($10,000 + $203.04)]

......

    [employee name: {Stacey Christie} salary: (10% * $102,030.40)]
]

如果封闭块不存在,我可以使用 LOAD/NEXT 来读取一次处理一个员工项目(而不是使用 LOAD 将整个文件解析为结构化数据)。如果封闭块存在,是否有任何方法可以执行类似的操作?

如果我想返回之前访问过的项目怎么办?是否存在“结构性寻求”?

是否有一种可行的数据库解决方案可以用于满足这种对 Rebol 结构数据的需求,甚至可能允许随机访问插入?

Imagine that I have a long file of Rebol-formatted data, with a million lines, that look something like

REBOL []

[
    [employee name: {Tony Romero} salary: $10,203.04]
    [employee name: {Marcus "Marco" Marcami} salary: default]
    [employee name: {Serena Derella} salary: ($10,000 + $203.04)]

...

    [employee name: {Stacey Christie} salary: (10% * $102,030.40)]
]

If the enclosing block wasn't there, I could use LOAD/NEXT to read through the employee items one at a time (as opposed to parsing the entire file into structured data with LOAD). Is there any way to do something similar if the enclosing block is there?

What if I wanted to go back to a previously visited item? Could there be a "structural seek"?

Is there a viable database solution that one could use for this kind of desire for Rebol-structured data, which might even permit random access insertions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

水溶 2024-10-07 05:30:04

我记得,是你证明了,这应该在 PARSE 中可行? ;-)

尽管如此,为了给您一个有用的答案:我为 编写的代码链接文本可以准确地描述为解析(本质上)REBOL,当需要其他内容时不使用默认的 LOAD/NEXT。因此,请查看、阅读文档、运行测试、编写一些测试,如果您有更多问题,请提问。

I recall, that it was you who proved, that this should be doable in PARSE? ;-)

Nevertheless, to give you a useful answer: the code I wrote for the link text can be described exactly as parsing (in essence) REBOL not using the default LOAD/NEXT when needing something else. So, have a look, read the documentation, run the tests, write some tests, and if you have more questions, just ask.

内心激荡 2024-10-07 05:30:04

如果您愿意稍微调整一下文件格式,使其成为每行一条记录的文件,没有封闭块,也没有 REBOL 标头:

employee-name: {Tony Romero} salary: $10203.04
employee-name: {Marcus "Marco" Marcami} salary: 'default
employee-name: {Serena Derella} salary: ($10000 + $203.04)
employee-name: {Stacey Christie} salary: (10% * $102030.40)

那么....

data: read/lines %data-file.txt

....为您提供一块已卸载的字符串

一种使用方法它们是这样的:

foreach record data [
    record: make object! load/all record
    probe record
]

我也必须调整你的数据格式,使其可以通过 REBOL 轻松加载:

  • 员工姓名而不是员工姓名
  • $10203.04 而不是 $10'203.04
  • 10% -- 仅适用于 REBOL3

如果你无法调整数据像这样的格式,您始终可以在 LOAD/ALL 之前对每个字符串进行一些编辑,以将其标准化为 REBOL。

If you are happy to tweak your file format a little so it is a file with one record per line, no enclosing blocks nor REBOL header:

employee-name: {Tony Romero} salary: $10203.04
employee-name: {Marcus "Marco" Marcami} salary: 'default
employee-name: {Serena Derella} salary: ($10000 + $203.04)
employee-name: {Stacey Christie} salary: (10% * $102030.40)

Then....

data: read/lines %data-file.txt

....gets you a block of unloaded strings

One way to work with them is like this:

foreach record data [
    record: make object! load/all record
    probe record
]

I had to tweak your data format too to make it easily loadable by REBOL:

  • employee-name rather than employee name
  • $10203.04 rather than $10'203.04
  • 10% -- only works with REBOL3

If you can't tweak the data format like that, you could always do some edits on each string prior to LOAD/ALL to normalise it for REBOL.

韶华倾负 2024-10-07 05:30:04

Sunanda 的答案不好,因为你可以有多行数据!
您可以使用类似的东西:

data: {REBOL []

[
    [employee name: {Tony Romero} salary: $10'203.04]
    [employee name: {Marcus "Marco" Marcami} salary: default]
    [employee name: {Serena Derella} salary: ($10'000 + $203.04)]
]}

unless all [
    set [value data] load/next data
    value = 'REBOL
][  print "Not a REBOL data file!" halt ]
set [header data] load/next data
print ["data-file-header:" mold header]
data: find/tail data #"["

attempt [
    ;you must use attempt as there will be at least one error at the end of file!
    ;** Syntax Error: Missing [ at end-of-block
    indexes: copy []
    while [
        append indexes data
        set [loaded-row data] load/next data
        data
    ][
        probe loaded-row
    ]

]
print "done"

remove back tail indexes ;removes the last erroneous position

foreach data-at-pos reverse indexes [
    probe first load/next data-at-pos
]

所以输出将是:

[employee name: "Tony Romero" salary: $10203.04]
[employee name: {Marcus "Marco" Marcami} salary: default]
[employee name: "Serena Derella" salary: ($10000.00 + $203.04)]
done
[employee name: "Serena Derella" salary: ($10000.00 + $203.04)]
[employee name: {Marcus "Marco" Marcami} salary: default]
[employee name: "Tony Romero" salary: $10203.04]

Sunanda's answer is not good as you can have multiline data!
You can use something like that:

data: {REBOL []

[
    [employee name: {Tony Romero} salary: $10'203.04]
    [employee name: {Marcus "Marco" Marcami} salary: default]
    [employee name: {Serena Derella} salary: ($10'000 + $203.04)]
]}

unless all [
    set [value data] load/next data
    value = 'REBOL
][  print "Not a REBOL data file!" halt ]
set [header data] load/next data
print ["data-file-header:" mold header]
data: find/tail data #"["

attempt [
    ;you must use attempt as there will be at least one error at the end of file!
    ;** Syntax Error: Missing [ at end-of-block
    indexes: copy []
    while [
        append indexes data
        set [loaded-row data] load/next data
        data
    ][
        probe loaded-row
    ]

]
print "done"

remove back tail indexes ;removes the last erroneous position

foreach data-at-pos reverse indexes [
    probe first load/next data-at-pos
]

So the output would be:

[employee name: "Tony Romero" salary: $10203.04]
[employee name: {Marcus "Marco" Marcami} salary: default]
[employee name: "Serena Derella" salary: ($10000.00 + $203.04)]
done
[employee name: "Serena Derella" salary: ($10000.00 + $203.04)]
[employee name: {Marcus "Marco" Marcami} salary: default]
[employee name: "Tony Romero" salary: $10203.04]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文