如何解析 MS SQL 2005 中存储在 varbinary 中的 excel (.xls) 文件?

发布于 2024-07-05 13:29:38 字数 1326 浏览 6 评论 0原文

问题

如何最好地解析/访问/提取作为二进制数据存储在 SQL 2005 字段中的“excel 文件”数据?

(因此所有数据最终都可以存储在其他表的其他字段中。)

背景

基本上,我们的客户需要来自其用户的大量详细数据。 不幸的是,我们的客户不能要求他们的用户进行任何类型的数据库导出。 因此我们的客户必须提供某种 UI 供用户输入数据。 我们的客户认为所有用户都能接受的 UI 非常出色,因为它具有相当强大的 UI。 因此,考虑到所有这些,我们的客户需要自动解析这些数据并将其存储在他们的数据库中。

我们试图说服我们的客户,用户只会执行一次此操作,然后坚持导出数据库! 但客户不能要求其用户的数据库导出。

  • 我们的客户要求我们解析 Excel 文件
  • 客户的用户使用 Excel 作为“最佳”用户界面来输入所有必需的数据
  • 用户获得了必须填写的空白 Excel 模板
    • 这些模板具有固定数量的唯一命名选项卡
    • 这些模板有许多必须完成的固定区域(单元格)
    • 这些模板还包含用户可以插入多达数千个格式相同的行的区域
  • 在完成后将插入多达数千个格式相同的行的区域, excel 文件是用户通过标准 html 文件上传提交的,
  • 我们的客户将此文件原始存储到他们的 SQL 数据库

给定

  • 标准 excel(“.xls”)文件(本机格式,不是逗号或制表符分隔)
  • 文件原始存储在 varbinary(max) SQL 2005 字段
  • Excel 文件数据在行之间不一定是“统一的”——也就是说,我们不能仅仅假设一列的数据类型都是相同的(例如,可能有行标题、列标题、空单元格、不同的“格式”...)

要求

  • 代码完全在 SQL 2005 内(存储过程、SSIS?)
  • 能够访问任何工作表(选项卡)
  • 都能够访问任何单元格中的值(不需要公式数据或取消引用)
  • 单元格值不得假定为行之间“统一” - 即,我们不能仅仅假设一列完全相同数据类型(例如,可能有行标题、列标题、空单元格、公式、不同的“格式”等)

首选项

  • 无文件系统访问权限(不写入临时 .xls 文件)
  • 检索以下位置的值定义的格式(例如,实际日期值而不是像 39876 这样的原始数字)

problem

how to best parse/access/extract "excel file" data stored as binary data in an SQL 2005 field?

(so all the data can ultimately be stored in other fields of other tables.)

background

basically, our customer is requiring a large volume of verbose data from their users. unfortunately, our customer cannot require any kind of db export from their user. so our customer must supply some sort of UI for their user to enter the data. the UI our customer decided would be acceptable to all of their users was excel as it has a reasonably robust UI. so given all that, and our customer needs this data parsed and stored in their db automatically.

we've tried to convince our customer that the users will do this exactly once and then insist on db export! but the customer can not require db export of their users.

  • our customer is requiring us to parse an excel file
  • the customer's users are using excel as the "best" user interface to enter all the required data
  • the users are given blank excel templates that they must fill out
    • these templates have a fixed number of uniquely named tabs
    • these templates have a number of fixed areas (cells) that must be completed
    • these templates also have areas where the user will insert up to thousands of identically formatted rows
  • when complete, the excel file is submitted from the user by standard html file upload
  • our customer stores this file raw into their SQL database

given

  • a standard excel (".xls") file (native format, not comma or tab separated)
  • file is stored raw in a varbinary(max) SQL 2005 field
  • excel file data may not necessarily be "uniform" between rows -- i.e., we can't just assume one column is all the same data type (e.g., there may be row headers, column headers, empty cells, different "formats", ...)

requirements

  • code completely within SQL 2005 (stored procedures, SSIS?)
  • be able to access values on any worksheet (tab)
  • be able to access values in any cell (no formula data or dereferencing needed)
  • cell values must not be assumed to be "uniform" between rows -- i.e., we can't just assume one column is all the same data type (e.g., there may be row headers, column headers, empty cells, formulas, different "formats", ...)

preferences

  • no filesystem access (no writing temporary .xls files)
  • retrieve values in defined format (e.g., actual date value instead of a raw number like 39876)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

轮廓§ 2024-07-12 13:29:38

听起来您正在尝试将整个数据库表存储在电子表格中,然后存储在单个表的字段中。 首先将数据存储在数据库表中,然后在需要时将其导出为 XLS,不是更简单吗?

如果不打开 Excel 实例并让 Excel 解析工作表引用,我不确定它是否可行。

It sounds like you're trying to store an entire database table inside a spreadsheet and then inside a single table's field. Wouldn't it be simpler to store the data in a database table to begin with and then export it as an XLS when required?

Without opening up an instance Excel and having Excel resolve worksheet references I'm not sure it's doable at all.

小兔几 2024-07-12 13:29:38

我的想法是,任何事都可以做,但要付出代价。 在这种特殊情况下,价格似乎太高了。

我没有经过测试的解决方案供您使用,但我可以分享我将如何首次尝试解决此类问题。

我的第一种方法是在 SqlServer 计算机上安装 excel,并编写一些程序集以使用 excel API 使用行上的文件,然后将它们作为程序集加载到 Sql 服务器上。

正如我所说,这只是一个想法,我没有细节,但我相信这里的其他人可以补充或批评我的想法。

但我真正的建议是重新考虑整个项目。 读取存储在数据库表行的单元格中的二进制文件的表格数据是没有意义的。

My thought is that anything can be done, but there is a price to pay. In this particular case, the price seems to bee too high.

I don't have a tested solution for you, but I can share how I would give my first try on a problem like that.

My first approach would be to install excel on the SqlServer machine and code some assemblies to consume the file on your rows using excel API and then load them on Sql server as assembly procedures.

As I said, This is just a idea, I don't have details, but I'm sure others here can complement or criticize my idea.

But my real advice is to rethink the whole project. It makes no sense to read tabular data on binary files stored on a cell of a row of a table on database.

旧伤慢歌 2024-07-12 13:29:38

这看起来像是一个“我不会从这里开始”的问题。

“在服务器上安装 Excel 并开始编码”的答案看起来像是唯一的途径,但它必须首先值得探索替代方案:这将是痛苦、昂贵且耗时的。

我强烈地感觉到我们正在寻找一个“需求”,它是错误问题的答案。

什么样的业务问题导致了这种需求? 是什么推动了这一点? 尝试使用五个为什么作为探索历史的可能方法。

This looks like an "I wouldn't start from here" kind of a question.

The "install Excel on the server and start coding" answer looks like the only route, but it simply has to be worth exploring alternatives first: it's going to be painful, expensive and time-consuming.

I strongly feel that we're looking at a "requirement" that is the answer to the wrong problem.

What business problem is creating this need? What's driving that? Try the Five Whys as a possible way to explore the history.

我恋#小黄人 2024-07-12 13:29:38

好吧,正如其他人已经指出的那样,整个设置似乎有点扭曲:-)。

如果您确实无法更改要求和整个设置:为什么不探索诸如 Aspose.CellsSyncfusion XlsIO,本机 .NET 组件,允许您读取和解释本机 Excel (XLS) 文件。 我对这两者中的任何一个都非常满意,您应该能够将二进制 Excel 读入 MemoryStream,然后将其输入到其中一个 Excel 读取组件中,然后就可以了。

因此,通过一些 .NET 开发和 SQL CLR,我想这应该是可行的 - 不确定这是否是最好的方法,但它应该可行。

Well, the whole setup seems a bit twisted :-) as others have already pointed out.

If you really cannot change the requirements and the whole setup: why don't you explore components such as Aspose.Cells or Syncfusion XlsIO, native .NET components, that allow you to read and interpret native Excel (XLS) files. I'm pretty such with either of the two, you should be able to read your binary Excel into a MemoryStream and then feed that into one of those Excel-reading components, and off you go.

So with a bit of .NET development and SQL CLR, I guess this should be doable - not sure if it's the best way to do it, but it should work.

你与清晨阳光 2024-07-12 13:29:38

您可以将 varbinary 写入原始文件目标吗? 然后使用 Excel 源作为优先级约束中下一步的输入。

我还没有尝试过,但这就是我会尝试的。

Could you write the varbinary to a Raw File Destination? And then use an Excel Source as your input to whatever step is next in your precedence constraints.

I haven't tried it, but that's what I would try.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文