使用什么来进行 FileStream、嵌入式 SQL 和 NoSQl 之间的基准（读取性能）测试？

发布于 2024-10-19 09:19:31 字数 1286 浏览 7 评论 0 原文

我正在尝试编写一个搜索键值类型结构的小程序。我的搜索是找到搜索键值的最快方法。

我更愿意使用 C# 来编写这个程序，除非另一种语言给我带来了显着的优势。我提出的另一个限制是所有内容都必须在同一台计算机上。我不想使用 Oracle 或 SQL Server 数据库，因为我相信其他选项会更快。数据大部分被读取，很少被写入。每当数据发生更改或更新时，都会创建一个新集合，如果数据写入需要时间也没关系。

假设：
数据按数字顺序排序。
结构就这么简单：

Char3文件：（这个文件只会存储3个字符键）
键|值
100|2,5,6,7:9:3,4,5:3,4,5:2,5,6,7
999|2,5,6,7:9:3,4:3:2,5

Char5 文件：（此文件仅存储 5 个字符键）
键|值
A1000|2,5,6,7:9:3,4,5:3,4,5:2,5,6,7

Char3 和 Char5 遵循相同的存储结构，但具有不同类型的密钥。然而，给定文件中的密钥长度相同，

我有多个这样的文件，每个文件都将遵循相同的结构。唯一的变化是每个文件中的密钥长度。

该任务给定一组1-200（可变长度）的键，找到与每个键相关的所有数据。

我从数据库生成这些数据，因此可以以任何格式创建数据。

对于 FileStream 测试，我将填充给定文件的每一行，然后使用 FileStream.Seek 根据填充快速跳转到给定位置。

我想做的是找出这些方法中哪一个最快？

FileStream - 我最终也会查看内存映射文件。 (开放其他选项)
嵌入式 SQL - SQLite (开放其他选项)
NoSql - ?? （寻找建议）

我的问题是我应该在每个类别中使用什么来进行适当的比较。例如，如果我使用 FileStream 而不是使用 FileStream.Seek，那么它就不是正确的比较。

我最终也想尽可能并行地运行搜索。我的主要要求是搜索性能。

任何想法或建议都会很棒。

谢谢，

更新：我将在处理时列出选项详细信息和结果
在包含 10K 行、2.28 MB 的文件中查找 5000 个随机条目（按行号或其他类似特征）。

文件流选项 - 最佳时间：00： 00:00.0398530 毫秒

原文

I am trying to write a small program that searches a key-value type structure. My search is to find the FASTEST approach possible for searching the key-value.

I would prefer to use C# for this program, unless another language gives me a significant advantage. Another limitation that I am putting is that everything has to be on the same computer. I don't want to use an Oracle or SQL Server database, beacuse I belive the other options will me much faster. The data is mostly read and rarely written. Whenever there are changes or updates to the data a new set is created and it is ok if writing of the data takes time.

Assumptions:
The data is sorted in a numeric order.
The structure is as simple as this:

Char3 file: (This file will only store 3 character keys)
Key|Value
100|2,5,6,7:9:3,4,5:3,4,5:2,5,6,7
999|2,5,6,7:9:3,4:3:2,5

Char5 file: (This file will only store 5 character keys)
Key|Value
A1000|2,5,6,7:9:3,4,5:3,4,5:2,5,6,7

Char3 and Char5 follow the same storage structure but have different types of keys. The key however will be of same length in a given file

I have multiple files like these each file will follow the same structure. The only variation will be the key length in each file.

The task is given a set of 1-200 (variable lengths) Keys find all the data related to each key.

I am generating this data from a database and thus can create the data in any format.

For the FileStream test I am going to pad each line for a given file and then use FileStream.Seek to quickly jump to a given location based on the padding.

What I want to do is find out which of these apporaches would be the fastest?

FileStream - I will eventually also look at memory-mapped files. (Open to other options)
Embedded SQL - SQLite (Open to other options)
NoSql - ?? (Looking for Suggstions)

My question is what should I be using in each of these categories for a proper comparisson. For example, if I was using FileStream and not using FileStream.Seek than it would not be a proper comparisson.

I would eventually also like to run the searches in parallel as much as I can. My pripary requirement is SEARCH performance.

Any ideas or suggestions would be great.

Thanks,

UPDATE: I will list the option details and results as I process them
Find 5000 random entries (by line numebr or some other similar charateristic) in a file that contains 10K lines, 2.28 MB.