有哪些“主流”?以 .csv 格式存储文件的轻量级替代方案?

发布于 2024-11-02 11:40:31 字数 653 浏览 6 评论 0原文

我正在进行的一个项目非常喜欢使用 .csv 文件进行数据存储。我发现使用 .csv 存在很多问题,尤其是存储关系数据时。解析 .csv 通常是 痛苦,特别是在使用临时列分配时。

我提倡使用 XML 和最小数据库,例如 SQLite,但我正在寻找“更快、更好、更便宜”的替代品。

.csv 文件的其他“主流”轻量级替代品还有哪些?

另外,CouchDB 怎么样。就轻量级而言,它与 SQLite 相比如何?

编辑:我错过了。 之前已经提出过这个问题。

I'm on a project which heavily favors the use of .csv files for data storage. I see many issues with using .csv, especially for storing relational data. Parsing .csv is generally a pain, particularly when using ad-hoc column assigments.

I've advocated the use of XML and minimal databases such as SQLite, but I'm looking for "faster, better, cheaper" alternatives.

What are some other, "mainstream" lightweight alternatives to .csv files?

Also, what about CouchDB. How does it compare to SQLite in terms of lightweight-ness?

EDIT: I missed it. This question has been asked before.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

疯到世界奔溃 2024-11-09 11:40:31

我认为无法直接替代 CSV 文件。 CSV 是一种面向索引的平面文件格式。如果你用管道符或其他东西替换逗号也没关系。这是同一件事,但规则略有不同。

话虽如此,当数据在我的控制范围内时,我经常选择 SQLite。

使用 SQLite 始终有助于使用相同的工具,可以用作临时存储或关系模型,有一个“独立”RDBMS 的“升级”计划,“免费”提供 DQL(这是一个很大的优势)对我来说是加号)等。除非空间是一个真正的问题或者不支持数据访问,否则为什么不呢? (现代 Firefox 也使用 SQLite)。

(有许多对象数据库,例如 DB4O,甚至更简单的键/值分层存储等。并不是想说 SQLite 是在微型/嵌入式数据库中获取关系的唯一方法。)

XML 的一个缺点是需要特殊的工具(sqlite/适配器)。 XML 虽然不是最人性化的格式,但可以在记事本中很好地编辑。此外,除了标记/数据本身之外,XML 中没有额外的开销(碎片或结构),并且 XML 通常非常适合压缩。还有许多库可以将整个对象图映射到 XML(从而维护关系),因此这可能是一个很好的功能。

JSON 等其他格式也存在——但如果格式不透明,那么它与 XML 相比并没有真正的区别(更多的是工具支持的问题)。

所以......“这取决于”。

I would argue there is no direct replacement for a CSV file. CSV is a flat file index-oriented format. It doesn't matter if you replace commas with pipes or whatnot. It's the same thing with slightly different rules.

With that being said, I often opt for SQLite when the data is in my control.

Using SQLite consistently lends to using the same tooling, can be used as either an ad-hoc store or a relational model, has a 'step up' plan to a "standalone" RDBMS, provides DQL "for free" (which is a big plus for me), etc. Unless space is a real issue or there isn't support for the data-access, why not? (Modern Firefox also uses SQLite).

(There are a number of object-database out there, such as DB4O as well -- or even simpler key/value hierarchical stores, etc. Not trying to say SQLite is the only way to obtain relationships in a micro/embedded database.)

One down-side over say, XML is that special tooling (sqlite/adapter) is required. XML, while not the most human-friendly format, can be edited just fine in notepad. Additionally, there is no extra overhead (fragmentation or structure) in XML beside the markup/data itself and XML is generally quite amendable to compression. There are also many libraries to map an entire object graph to XML (and thus maintain relationships) so that might be a nice feature.

Other formats like JSON are also out there -- but if the format is opaque then it doesn't really make a difference over XML (it's more of a matter of tooling support).

So... "it depends".

绝不放开 2024-11-09 11:40:31

看起来 YAML 与 XML 等格式相比相对较小,但比 JSON 更具描述性(它是一个超集) )。这是我会考虑的另一个候选人。

It looks like YAML is relatively small compared to formats such as XML, but slightly more descriptive than JSON (it's a superset). It's another candidate I'll consider.

活雷疯 2024-11-09 11:40:31

这都是关于用例的。

我的经验法则:如果两条数据之间存在依赖关系或关系,则使用 SQLite;如果只是平面数据文件,请使用 CSV(或其他“平面”格式)。最简单有效的方法通常也是最可靠的解决方案。

(注意:确保 CSV 格式良好。没有人喜欢绕过糟糕的 CSV 实现。)

It's all about use-case.

My rule of thumb: use SQLite if there are dependencies or relations between two pieces of data; use CSV (or some other "flat" format) if it's just flat data files. The simplest thing that just works is often the most reliable solution as well.

(Note: Ensure the CSV is well formed. Nobody likes having to hack around bad CSV implementations.)

心头的小情儿 2024-11-09 11:40:31

如果不需要并发写入,HDF5 是存储大型表格数据集的不错选择。

在 Python 中, Pandas + PyTables 非常易于使用。
Pandas 文档中的示例:

In [259]: store = HDFStore('store.h5')

In [260]: print(store)
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
Empty
Objects can be written to the file just like adding key-value pairs to a dict:

In [261]: np.random.seed(1234)

In [262]: index = date_range('1/1/2000', periods=8)

In [263]: s = Series(randn(5), index=['a', 'b', 'c', 'd', 'e'])

In [264]: df = DataFrame(randn(8, 3), index=index,
   .....:                columns=['A', 'B', 'C'])
   .....: 

In [265]: wp = Panel(randn(2, 5, 4), items=['Item1', 'Item2'],
   .....:            major_axis=date_range('1/1/2000', periods=5),
   .....:            minor_axis=['A', 'B', 'C', 'D'])
   .....: 

# store.put('s', s) is an equivalent method
In [266]: store['s'] = s

In [267]: store['df'] = df

In [268]: store['wp'] = wp

# the type of stored data
In [269]: store.root.wp._v_attrs.pandas_type
Out[269]: 'wide'

In [270]: store
Out[270]: 
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df            frame        (shape->[8,3])  
/s             series       (shape->[5])    
/wp            wide         (shape->[2,5,4])

HDF5 is a good choice for storing large tabular datasets, if you do not require concurrent writes.

In Python, Pandas + PyTables are very easy to use.
Example from the Pandas documentation:

In [259]: store = HDFStore('store.h5')

In [260]: print(store)
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
Empty
Objects can be written to the file just like adding key-value pairs to a dict:

In [261]: np.random.seed(1234)

In [262]: index = date_range('1/1/2000', periods=8)

In [263]: s = Series(randn(5), index=['a', 'b', 'c', 'd', 'e'])

In [264]: df = DataFrame(randn(8, 3), index=index,
   .....:                columns=['A', 'B', 'C'])
   .....: 

In [265]: wp = Panel(randn(2, 5, 4), items=['Item1', 'Item2'],
   .....:            major_axis=date_range('1/1/2000', periods=5),
   .....:            minor_axis=['A', 'B', 'C', 'D'])
   .....: 

# store.put('s', s) is an equivalent method
In [266]: store['s'] = s

In [267]: store['df'] = df

In [268]: store['wp'] = wp

# the type of stored data
In [269]: store.root.wp._v_attrs.pandas_type
Out[269]: 'wide'

In [270]: store
Out[270]: 
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df            frame        (shape->[8,3])  
/s             series       (shape->[5])    
/wp            wide         (shape->[2,5,4])
顾北清歌寒 2024-11-09 11:40:31

XML 被设计为主流且相对“轻量级”。 JSON 是另一种流行的选择,但更适合对象建模而不是数据存储。

如果您需要关系查询功能,MySQL 是一个不错的选择。

XML is designed to be mainstream and relativey "lightweight". JSON is another popular choice but much more suited to object modeling as opposed to data storage.

MySQL is a good option if you need relational querying capabilities.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文