将逗号分隔文本文件读取到 C# DataTable,列被截断为 255 个字符
我们正在从 CSV 导入到 SQL。 为此,我们读取 CSV 文件并使用 schema.ini 写入临时 .txt 文件。 (我还不确定为什么要写入这个临时文件,但这就是代码当前的工作方式)。 从那里,我们使用以下连接字符串(对于 ASCII 文件)通过 OleDB 加载 DataTable。
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + sPath + ";Extended Properties=\"text;HDR=Yes;FMT=Delimited\"";
我们遇到的问题是超过 255 个字符的字段会被截断。 我在网上阅读过有关此问题的信息,似乎默认情况下,文本字段会因此被截断。
我在 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel
中设置了注册表设置 ImportMixedTypes=Majority Type
和 TypeGuessRows=0
,希望 mycolumns
不再被解释为文本。 执行此操作后,可以从 CSV 文件正确写入临时 txt 文件,但是当我调用 dataAdapter.Fill 时,生成的 DataTable 仍然具有被截断的值。
这是有问题的列定义。 CommaDelimited#txt 注释 2 假 234 真 130 0 0
任何帮助,将不胜感激。 目前,我对使用任何 3d 方代码来解决这个问题不感兴趣,必须有一种使用内置工具的方法。
这是表定义:
<Columns>
<TABLE_NAME>CommaDelimited#txt</TABLE_NAME>
<COLUMN_NAME>Notes</COLUMN_NAME>
<ORDINAL_POSITION>2</ORDINAL_POSITION>
<COLUMN_HASDEFAULT>false</COLUMN_HASDEFAULT>
<COLUMN_FLAGS>234</COLUMN_FLAGS>
<IS_NULLABLE>true</IS_NULLABLE>
<DATA_TYPE>130</DATA_TYPE>
<CHARACTER_MAXIMUM_LENGTH>0</CHARACTER_MAXIMUM_LENGTH>
<CHARACTER_OCTET_LENGTH>0</CHARACTER_OCTET_LENGTH>
</Columns>
谢谢,
Greg
我尝试编辑 schema.ini 指定宽度的文本,但这没有帮助(之前已设置为备忘录)
[CommaDelimited.txt] 格式=CSVDelimited 小数符号=. Col1=注释文本宽度 5000
We are importing from CSV to SQL. To do so, we are reading the CSV file and writing to a temporary .txt file using a schema.ini. (I'm not sure yet exactly why are are writing to this temporary file, but that's how the code currently works). From there, we are loading a DataTable via OleDB using the following connection string (for ASCII files).
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + sPath + ";Extended Properties=\"text;HDR=Yes;FMT=Delimited\"";
The problem we are having is that fields with more than 255 characters get truncated. I've read online about this problem and it seems that by default, text fields get truncated thusly.
I set my registry settings ImportMixedTypes=Majority Type
and TypeGuessRows=0
in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel
, hoping that mycolumns
will no longer be interpreted as text. After doing that, the temporary txt file is being written correctly from the CSV file, but when I call dataAdapter.Fill
, the resulting DataTable still has a truncated value.
Here is the column definition in question.
CommaDelimited#txt Notes 2 false 234 true 130 0 0
Any help would be appreciated. At this time, I'm not interested in using any 3d party code to solve this problem, there must be a way using built in tools.
Here is the table definition:
<Columns>
<TABLE_NAME>CommaDelimited#txt</TABLE_NAME>
<COLUMN_NAME>Notes</COLUMN_NAME>
<ORDINAL_POSITION>2</ORDINAL_POSITION>
<COLUMN_HASDEFAULT>false</COLUMN_HASDEFAULT>
<COLUMN_FLAGS>234</COLUMN_FLAGS>
<IS_NULLABLE>true</IS_NULLABLE>
<DATA_TYPE>130</DATA_TYPE>
<CHARACTER_MAXIMUM_LENGTH>0</CHARACTER_MAXIMUM_LENGTH>
<CHARACTER_OCTET_LENGTH>0</CHARACTER_OCTET_LENGTH>
</Columns>
Thanks,
Greg
I tried editing the schema.ini specifying text with a width, and that did not help (it was set to memo before)
[CommaDelimited.txt]
Format=CSVDelimited
DecimalSymbol=.
Col1=Notes Text Width 5000
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我的 .Net 代码也遇到了类似的问题,只需将注册表设置更改为多数类型(如原始帖子中所述)即可使其正常工作。 我在我的案例中所做的不同之处是:
因为我们尝试从 CSV 而不是 Excel 导入。 所以我必须更改注册表中文本的设置(不是 Excel)。
尝试这样做,我认为它应该有效
I was facing a similar issue with my .Net code, and i was able to make it work just by changing the Registry settings to Majortity Type as mentioned in the original post. What i did different in my case is:
Since we are trying to import from a CSV and not an Excel. So i had to change the settings for Text in the Registry (Not Excel).
Try doing it and i think it should work
我认为最好的方法是使用以下博客中的 CSVReader:
http://ronaldlemmen.blogspot.com/2008/ 03/stopping-and-continuing-save-event.html
I think the best way to do it is by using CSVReader in the following blog:
http://ronaldlemmen.blogspot.com/2008/03/stopping-and-continuing-save-event.html
我倾向于在读取 CSV 文件时直接创建 DataTable,而不是执行将数据写入不同文本文件的额外步骤,然后再次将其读回内存。
就这个问题而言,最终如何将数据从 DataTable 获取到 SQL 数据库中? 如果您只是循环访问 DataTable 并执行一堆 INSERT 语句,为什么不跳过两个中间人并在最初读取 CSV 文件时调用相同的 INSERT 语句呢?
My inclination would be to create the DataTable directly when reading the CSV file, rather than going through the extra step of writing the data out to a different text file, only to read it back into memory a second time.
For that matter, how are you ultimately getting the data from the DataTable into the SQL database? If you're just looping through the DataTable and doing a bunch of INSERT statements, why not skip two middlemen and call the same INSERT statements while you're initially reading the CSV file?
您可以通过正确指定 schema.ini 文件。 我相信这两个选项是将列设置为备忘录类型,或者将宽度设置为“备注”。 255.
You can correct this by correctly specifying your schema.ini file. I believe the two options are to either set the column to a Memo type, or to set the Width > 255.
这是一个简单的类,用于读取分隔文件并返回不截断字符串的 DataTable(所有字符串)。 它有一个重载方法来指定列名(如果它们不在文件中)。 也许你可以用它?
导入的命名空间
代码
所需的命名空间
以下是调用它并上传到 SQL 数据库的示例:
Here's a simple class for reading a delimited file and returning a DataTable (all strings) that doesn't truncate strings. It has an overloaded method to specify column names if they're not in the file. Maybe you can use it?
Imported Namespaces
Code
Required Namespaces
Here's an example of calling it and uploading to a SQL Database:
如果您要求 Jet 数据库引擎根据备注处理数据,Jet 数据库引擎会截断备注字段:聚合、去重、格式化等。
http://allenbrowne.com/ser-63.html
The Jet database engine truncates memo fields if you ask it to process the data based on the memo: aggregating, de-duplicating, formatting, and so on.
http://allenbrowne.com/ser-63.html