嵌入平面文件模式的良好约定
我们收到大量平面文件数据:带分隔符的记录或固定长度的记录。有时很难找出文件实际包含的内容。
是否有任何既定的做法可以将文件的模式嵌入到文件的开头或结尾以使文件不言自明?
为了得到一个想法,想象一下这样的事情:
<data name=test records=2 type=fixed>
<field name=foo start=0 length=2 type=numeric>
<field name=bar start=2 length=4 type=text>
</data>
11test
12ing
我们将在开始时解析 xml 并使用它来读取记录。
We receive lots of data as flat files: delimitted or just fixed length records. It's sometimes hard to find out what the files actually contain.
Are there any well established practices for embedding the schema of the file to the beginning or the end of a file to make the file self-explanatory?
Just to get an idea, imagine something like this:
<data name=test records=2 type=fixed>
<field name=foo start=0 length=2 type=numeric>
<field name=bar start=2 length=4 type=text>
</data>
11test
12ing
We would parse the xml in the beginning and use it for reading the records.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
发布评论
评论(3)
~没有更多了~
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
据我所知,没有——或者至少不是很大。
我唯一知道的(就广泛接受的标准而言)是数据文件的第一行是列名称 - 至少对于分隔记录来说,对于固定长度来说更困难,特别是如果您的数据可以包含多个记录类型(我发现固定长度比定界更有可能)。
从我的立场来看,我建议您不能真正将定义嵌入到文件中,我假设您从外部源获取数据,因此您不太可能从他们那里获得帮助,即使您这样做了,您也会立即创建挑战,因为您无法(例如)在必要时使用 Excel 轻松打开文件。
稍微横向思考一下,如果使用 XML,您可以将文件嵌入到定义中(一大块 CDATA)。这是一个稍微更实用的解决方案,因为它对外部数据进行了包装,而不要求修改数据本身。不确定这有多实用——但对我来说,这比相反更好。
So far as I'm aware no - or at least not hugely.
The only thing I'm aware of (in terms of a widely accepted standard) is for the first row of the data file to be the column names - at least for delimited records, for fixed length its harder especially if your data can contain multiple record types (which I've found to be far more likely with fixed length than with delimited).
From where I sit I'd suggest that you can't really embed the definition into the file I'm assuming you're getting data from external sources so you're unlikely to get help from them and even if you do you immediately create challenges as you can't (for example) easily open the files with Excel if necessary.
Thinking a bit laterally you could - if using XML - potentially embed the file into the definition (big lump of CDATA). This is a slightly more practical solution as its putting a wrapper round your external data not asking that the data itself be modified. Not sure how practical this is - but it feels better to me than the other way round.