OleDb 连接到 Excel; 如何选择固定宽度、无限高度?

发布于 2024-07-27 02:52:54 字数 1582 浏览 1 评论 0原文

我正在使用 OleDb 从 Excel 电子表格中选择数据。 每个电子表格可以包含许多小表格,可能还包含标题和标签等家具。 所以它可能看起来像这样,我们有两个表和一些标题;

            A           B         C          D
    1    .           .         .          .
    2    .           .         .          .
    3    Table1      .         .          .
    4    Header1     HEADER2   .          .
    5    h           huey      .          .
    6    d           dewey     .          .
    7    l           loius     .          .
    8    s           scrooge   .          .
    9    .           .         .          .
    10   .           .         .          .
    11   .           .         .          .
    12   .           .         .          .
    13   .           Table 2   .          .
    14   .           HEADER1   HEADER2    HEADER3
    15   .           1         foo        x
    16   .           2         bar        y
    17   .           3         baz        z
    18   .           .         .          .
    19   .           .         .          .

在上一步中,用户选择了他们感兴趣的表的标题; 在这种情况下,查看表 2,他们将选择范围 B14:D14

这些设置已保存,然后我需要查询该表。 随着电子表格数据的更新,这种情况可能会反复发生; 可以随时添加更多行,但标题始终是固定的。 有一个哨兵(空白行)标记数据的结尾

要选择表中的数据,我正在编写这样的查询;

SELECT * FROM [Sheet1$B14:D65535]

选择表2中的数据,然后手动检查前哨行,但这似乎并不令人满意。 Excel 2003 只能读取 65,535 行 (uint16),但 Excel 2007 可以读取更多行 (uint32),因此我必须编写代码,根据文件的扩展名(.xls 与 .xls)为 Excel 2003 和 2007 提供不同的查询。 xls?)。

有谁知道一种编写查询的方法;

  • “选择 B14 下方和右侧的所有内容”?
  • '选择 B->D 列中的所有内容' '
  • 选择 B12:D*' 其中 * 表示“您可以选择的所有内容”

I'm using OleDb to select data from excel spreadsheets. Each spreadsheet can contain many small tables, and possibly furniture like titles and labels. So it might look like this, where we have two tables and some titles;

            A           B         C          D
    1    .           .         .          .
    2    .           .         .          .
    3    Table1      .         .          .
    4    Header1     HEADER2   .          .
    5    h           huey      .          .
    6    d           dewey     .          .
    7    l           loius     .          .
    8    s           scrooge   .          .
    9    .           .         .          .
    10   .           .         .          .
    11   .           .         .          .
    12   .           .         .          .
    13   .           Table 2   .          .
    14   .           HEADER1   HEADER2    HEADER3
    15   .           1         foo        x
    16   .           2         bar        y
    17   .           3         baz        z
    18   .           .         .          .
    19   .           .         .          .

In a previous step, the user has selected the headers of the table they are interested in; in this case, looking at table 2 they will have selected the range B14:D14.

These settings are saved, and then I need to query that table. It may happen over and over, as the spreadsheet data is updated; more rows may be added at any time, but the headers are always fixed. There is a sentinel (blank row) marking the end of data

To select the data in the table, I'm writing a query like this;

SELECT * FROM [Sheet1$B14:D65535]

to select the data in table 2, and then manually checking for the sentinel row, but this seems unsatisfying. Excel 2003 can only read 65,535 rows (uint16), but excel 2007 can read many more (uint32), so I have to write code which gives a different query for Excel 2003 and 2007 based on the extension of the file (.xls vs .xls?).

Does anyone know of a way to write a query that says either;

  • 'select everything down and right of B14'?
  • 'select everything in columns B->D'
  • 'select B12:D*' where * means 'everything you can'

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

请帮我爱他 2024-08-03 02:52:54

先决条件:您可以轻松地确定代码中的最大行数是多少。

假设 (1) 每个 SELECT 有很大的开销,因此一次选择一行很慢 (2) 选择 64K 或 8M 行(即使是空白)很慢......所以你想看看中间的某个地方是否可以快点。 试试这个:

一次选择 CHUNKSIZE(例如 100 或 1000)行(当您超出 MAX_ROWS 时选择较少的行)。 扫描每个块以查找标记数据结束的空白行。

更新:实际上回答明确的问题:

问:有谁知道如何编写一个查询,说:

Q1:“选择下面和右边的所有内容B14'?

A1:select * from [Sheet1$B12:] 不起作用。 您必须在 Excel 2003 中执行 ...B12:IV 以及在 Excel 2007 中执行的任何操作。但是您不需要这样做,因为您知道最右边的列是什么; 见下文。

Q2:“选择 B->D 列中的所有内容”

A2:选择
* 来自 [Sheet1$B:D]

Q3:“选择 B12:D*”,其中 * 表示“您可以做的一切”

A3: select * from [Sheet1$B12:D]

使用以下代码使用 Python 2.5 进行测试:

import win32com.client
import sys
filename, sheetname, range = sys.argv[1:4]
DSN= """
    PROVIDER=Microsoft.Jet.OLEDB.4.0;
    DATA SOURCE=%s;
    Extended Properties='Excel 8.0;READONLY=true;IMEX=1';
    """ % filename
conn = win32com.client.Dispatch("ADODB.Connection")
conn.Open(DSN)
rs = win32com.client.Dispatch("ADODB.Recordset")
sql = (
    "SELECT * FROM [Excel 8.0;HDR=NO;IMEX=1;Database=%s;].[%s$%s]"
    % (filename, sheetname, range)
    )
rs.Open(sql, conn)
nrows = 0
while not rs.EOF:
    nrows += 1
    nf = rs.Fields.Count
    values = [rs.Fields.Item(i).Value for i in xrange(nf)]
    print nrows, values
    if not any(value is not None for value in values):
        print "sentinel found"
        break
    rs.MoveNext()
rs.Close()
conn.Close()

Pre-requisite: you can easily determine in your code what the maximum number number of rows is.

Assuming (1) there's a big overhead per SELECT, so SELECTing a row at a time is slow (2) SELECTing 64K or 8M rows (even if blank) is slow ... so you want to see if somewhere in the middle can be faster. Try this:

Select CHUNKSIZE (e.g. 100 or 1000) rows at a time (less when you would otherwise over-run MAX_ROWS). Scan each chunk for the blank row that marks end-of-data.

UPDATE: Actually answering the explicit questions:

Q: Does anyone know of a way to write a query that says either;

Q1: 'select everything down and right of B14'?

A1: select * from [Sheet1$B12:] doesn't work. You would have to do ...B12:IV in Excel 2003 and whatever it is in Excel 2007. However you don't need that because you know what your rightmost column is; see below.

Q2: 'select everything in columns B->D'

A2: select
* from [Sheet1$B:D]

Q3: 'select B12:D*' where * means 'everything you can'

A3: select * from [Sheet1$B12:D]

Tested with Python 2.5 using the following code:

import win32com.client
import sys
filename, sheetname, range = sys.argv[1:4]
DSN= """
    PROVIDER=Microsoft.Jet.OLEDB.4.0;
    DATA SOURCE=%s;
    Extended Properties='Excel 8.0;READONLY=true;IMEX=1';
    """ % filename
conn = win32com.client.Dispatch("ADODB.Connection")
conn.Open(DSN)
rs = win32com.client.Dispatch("ADODB.Recordset")
sql = (
    "SELECT * FROM [Excel 8.0;HDR=NO;IMEX=1;Database=%s;].[%s$%s]"
    % (filename, sheetname, range)
    )
rs.Open(sql, conn)
nrows = 0
while not rs.EOF:
    nrows += 1
    nf = rs.Fields.Count
    values = [rs.Fields.Item(i).Value for i in xrange(nf)]
    print nrows, values
    if not any(value is not None for value in values):
        print "sentinel found"
        break
    rs.MoveNext()
rs.Close()
conn.Close()
楠木可依 2024-08-03 02:52:54

几个可能的解决方案:

  1. 将表放在单独的工作表上,然后简单地查询整个工作表。
  2. 为 Excel 中的每个表格指定一个名称(在 Excel 2007 中,选择表格,右键单击,然后选择命名范围...),然后在查询中使用此名称而不是“Sheet1$B14:D65535”。

希望有帮助。

编辑

这是第三个想法:

我不确定您使用什么来查询数据库,但如果您的查询引擎支持变量(例如,Sql Server),您可以存储结果。 ..

SELECT COUNT(*) FROM NameOfServer...Sheet1$

...在名为 @UsedRowCount 的变量中,这将为您提供工作表中实际使用的行数。 因此,@UsedRowCount = LastRowUsed - InitialBlankRows。

然后,您可以使用字符串连接将“65535”替换为@UsedRowCount + @InitialBlankRows。 您必须将 @InitialBlankRows 设置为常量(在您的示例中,该常量为 3,因为第一个表的标题行位于第 4 行)。

Couple possible solutions:

  1. Put your tables on separate worksheets, then simply query the whole worksheet.
  2. Give each table in Excel a name (in Excel 2007, select the table, right-click, and choose Name a range...), then in your query, use this name instead of "Sheet1$B14:D65535".

Hope that helps.

EDIT

Here's a third idea:

I'm not sure what you're using to query your database, but if your query engine supports variables (like Sql Server, for example) you could store the result of...

SELECT COUNT(*) FROM NameOfServer...Sheet1$

...in a variable called @UsedRowCount, that will give you the number of rows actually used in the worksheet. So, @UsedRowCount = LastRowUsed - InitialBlankRows.

You might then be able to use string concatenation to replace "65535" with @UsedRowCount + @InitialBlankRows. You would have to set @InitialBlankRows to a constant (in your example, it would be 3, since the heading row of the first table is located at Row 4).

归途 2024-08-03 02:52:54

您说在上一步中,用户已经选择了标题。 谁敢说在当前感兴趣的区域下方没有几个空白行,后面跟着另一个不相关的表? 我建议您让他们选择他们感兴趣的整个范围——这应该可以解决这两个问题。

You say that in a previous step, the users have selected the headers. Who's to say that below the region of current interest there aren't a few blank rows followed by another unrelated table? I suggest that you get them to select the whole range that they are interested in -- that should fix both problems.

泪意 2024-08-03 02:52:54

我会采用 John 的解决方案(一次读取 1000 行)。

如果您安装了 Excel,您还可以使用 OLE 自动化。

我在 Excel 中录制了一个简单的宏,它选择当前表中的最后一个单元格。


Sub Macro2()
    Range("B14").Select
    Selection.End(xlDown).Select
    //MsgBox ActiveCell.Address, vbOKOnly
End Sub

现在您只需将其翻译为 C# 并读取活动单元格的地址。

I would go with the solution from John ( reading 1000 rows at a time ).

If you have Excel installed you could also use OLE automation.

I have recorded a simple macro in Excel which select the last cell in the current table.


Sub Macro2()
    Range("B14").Select
    Selection.End(xlDown).Select
    //MsgBox ActiveCell.Address, vbOKOnly
End Sub

Now you just need to translate this in C# and read the address of the active cell.

酷炫老祖宗 2024-08-03 02:52:54

我们读取整个电子表格(即:SELECT * FROM [Sheet1$])并处理应用程序代码中的其他所有内容。 很容易通过生成的 OleDbDataReader 到达数据的起点并开始处理。

它可能不是从 Excel 中获取数据的绝对最快的方法,但它是可靠的。

We read the entire spreadsheet (ie: SELECT * FROM [Sheet1$]) and handle everything else in our application code. It's easy enough to race through the resultant OleDbDataReader to get to the starting point of your data and start processing.

It may not be the absolutely fastest way to suck data from Excel, but it is reliable.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文