OleDb 连接到 Excel; 如何选择固定宽度、无限高度?
我正在使用 OleDb 从 Excel 电子表格中选择数据。 每个电子表格可以包含许多小表格,可能还包含标题和标签等家具。 所以它可能看起来像这样,我们有两个表和一些标题;
A B C D 1 . . . . 2 . . . . 3 Table1 . . . 4 Header1 HEADER2 . . 5 h huey . . 6 d dewey . . 7 l loius . . 8 s scrooge . . 9 . . . . 10 . . . . 11 . . . . 12 . . . . 13 . Table 2 . . 14 . HEADER1 HEADER2 HEADER3 15 . 1 foo x 16 . 2 bar y 17 . 3 baz z 18 . . . . 19 . . . .
在上一步中,用户选择了他们感兴趣的表的标题; 在这种情况下,查看表 2,他们将选择范围 B14:D14
。
这些设置已保存,然后我需要查询该表。 随着电子表格数据的更新,这种情况可能会反复发生; 可以随时添加更多行,但标题始终是固定的。 有一个哨兵(空白行)标记数据的结尾
要选择表中的数据,我正在编写这样的查询;
SELECT * FROM [Sheet1$B14:D65535]
选择表2中的数据,然后手动检查前哨行,但这似乎并不令人满意。 Excel 2003 只能读取 65,535 行 (uint16),但 Excel 2007 可以读取更多行 (uint32),因此我必须编写代码,根据文件的扩展名(.xls 与 .xls)为 Excel 2003 和 2007 提供不同的查询。 xls?)。
有谁知道一种编写查询的方法;
- “选择 B14 下方和右侧的所有内容”?
- '选择 B->D 列中的所有内容' '
- 选择 B12:D*' 其中 * 表示“您可以选择的所有内容”
I'm using OleDb to select data from excel spreadsheets. Each spreadsheet can contain many small tables, and possibly furniture like titles and labels. So it might look like this, where we have two tables and some titles;
A B C D 1 . . . . 2 . . . . 3 Table1 . . . 4 Header1 HEADER2 . . 5 h huey . . 6 d dewey . . 7 l loius . . 8 s scrooge . . 9 . . . . 10 . . . . 11 . . . . 12 . . . . 13 . Table 2 . . 14 . HEADER1 HEADER2 HEADER3 15 . 1 foo x 16 . 2 bar y 17 . 3 baz z 18 . . . . 19 . . . .
In a previous step, the user has selected the headers of the table they are interested in; in this case, looking at table 2 they will have selected the range B14:D14
.
These settings are saved, and then I need to query that table. It may happen over and over, as the spreadsheet data is updated; more rows may be added at any time, but the headers are always fixed. There is a sentinel (blank row) marking the end of data
To select the data in the table, I'm writing a query like this;
SELECT * FROM [Sheet1$B14:D65535]
to select the data in table 2, and then manually checking for the sentinel row, but this seems unsatisfying. Excel 2003 can only read 65,535 rows (uint16), but excel 2007 can read many more (uint32), so I have to write code which gives a different query for Excel 2003 and 2007 based on the extension of the file (.xls vs .xls?).
Does anyone know of a way to write a query that says either;
- 'select everything down and right of B14'?
- 'select everything in columns B->D'
- 'select B12:D*' where * means 'everything you can'
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
先决条件:您可以轻松地确定代码中的最大行数是多少。
假设 (1) 每个 SELECT 有很大的开销,因此一次选择一行很慢 (2) 选择 64K 或 8M 行(即使是空白)很慢......所以你想看看中间的某个地方是否可以快点。 试试这个:
一次选择 CHUNKSIZE(例如 100 或 1000)行(当您超出 MAX_ROWS 时选择较少的行)。 扫描每个块以查找标记数据结束的空白行。
更新:实际上回答明确的问题:
问:有谁知道如何编写一个查询,说:
Q1:“选择下面和右边的所有内容B14'?
A1:
select * from [Sheet1$B12:]
不起作用。 您必须在 Excel 2003 中执行...B12:IV
以及在 Excel 2007 中执行的任何操作。但是您不需要这样做,因为您知道最右边的列是什么; 见下文。Q2:“选择 B->D 列中的所有内容”
A2:
选择
* 来自 [Sheet1$B:D]
Q3:“选择 B12:D
*
”,其中*
表示“您可以做的一切”A3: select * from [Sheet1$B12:D]
使用以下代码使用 Python 2.5 进行测试:
Pre-requisite: you can easily determine in your code what the maximum number number of rows is.
Assuming (1) there's a big overhead per SELECT, so SELECTing a row at a time is slow (2) SELECTing 64K or 8M rows (even if blank) is slow ... so you want to see if somewhere in the middle can be faster. Try this:
Select CHUNKSIZE (e.g. 100 or 1000) rows at a time (less when you would otherwise over-run MAX_ROWS). Scan each chunk for the blank row that marks end-of-data.
UPDATE: Actually answering the explicit questions:
Q: Does anyone know of a way to write a query that says either;
Q1: 'select everything down and right of B14'?
A1:
select * from [Sheet1$B12:]
doesn't work. You would have to do...B12:IV
in Excel 2003 and whatever it is in Excel 2007. However you don't need that because you know what your rightmost column is; see below.Q2: 'select everything in columns B->D'
A2:
select
* from [Sheet1$B:D]
Q3: 'select B12:D
*
' where*
means 'everything you can'A3: select * from [Sheet1$B12:D]
Tested with Python 2.5 using the following code:
几个可能的解决方案:
希望有帮助。
编辑
这是第三个想法:
我不确定您使用什么来查询数据库,但如果您的查询引擎支持变量(例如,Sql Server),您可以存储结果。 ..
SELECT COUNT(*) FROM NameOfServer...Sheet1$
...在名为 @UsedRowCount 的变量中,这将为您提供工作表中实际使用的行数。 因此,@UsedRowCount = LastRowUsed - InitialBlankRows。
然后,您可以使用字符串连接将“65535”替换为@UsedRowCount + @InitialBlankRows。 您必须将 @InitialBlankRows 设置为常量(在您的示例中,该常量为 3,因为第一个表的标题行位于第 4 行)。
Couple possible solutions:
Hope that helps.
EDIT
Here's a third idea:
I'm not sure what you're using to query your database, but if your query engine supports variables (like Sql Server, for example) you could store the result of...
SELECT COUNT(*) FROM NameOfServer...Sheet1$
...in a variable called @UsedRowCount, that will give you the number of rows actually used in the worksheet. So, @UsedRowCount = LastRowUsed - InitialBlankRows.
You might then be able to use string concatenation to replace "65535" with @UsedRowCount + @InitialBlankRows. You would have to set @InitialBlankRows to a constant (in your example, it would be 3, since the heading row of the first table is located at Row 4).
您说在上一步中,用户已经选择了标题。 谁敢说在当前感兴趣的区域下方没有几个空白行,后面跟着另一个不相关的表? 我建议您让他们选择他们感兴趣的整个范围——这应该可以解决这两个问题。
You say that in a previous step, the users have selected the headers. Who's to say that below the region of current interest there aren't a few blank rows followed by another unrelated table? I suggest that you get them to select the whole range that they are interested in -- that should fix both problems.
我会采用 John 的解决方案(一次读取 1000 行)。
如果您安装了 Excel,您还可以使用 OLE 自动化。
我在 Excel 中录制了一个简单的宏,它选择当前表中的最后一个单元格。
现在您只需将其翻译为 C# 并读取活动单元格的地址。
I would go with the solution from John ( reading 1000 rows at a time ).
If you have Excel installed you could also use OLE automation.
I have recorded a simple macro in Excel which select the last cell in the current table.
Now you just need to translate this in C# and read the address of the active cell.
我们读取整个电子表格(即:SELECT * FROM [Sheet1$])并处理应用程序代码中的其他所有内容。 很容易通过生成的 OleDbDataReader 到达数据的起点并开始处理。
它可能不是从 Excel 中获取数据的绝对最快的方法,但它是可靠的。
We read the entire spreadsheet (ie: SELECT * FROM [Sheet1$]) and handle everything else in our application code. It's easy enough to race through the resultant OleDbDataReader to get to the starting point of your data and start processing.
It may not be the absolutely fastest way to suck data from Excel, but it is reliable.