确定 SSIS 不处理哪些 Unicode 字符

发布于 2024-12-09 19:51:23 字数 543 浏览 1 评论 0原文

我正在使用一个 SSIS 包,它从 SQL Server 获取数据并创建文本文件以发送给供应商。当前,文件使用 ANSI 1252 进行编码,并且平面文件连接管理器上未选中 Unicode 复选框。

当遇到这个符号时,包失败了:♥

这让我相信,如果该步骤尝试写出任何非 ASCII 字符,它就会失败。但是,它将成功处理:“ş”,将其转换为标准“s”。就我们的目的而言,这种行为很棒,如果它做了类似于心形符号的事情,就不会有问题。我试图避免发送 Unicode 文件,因为该文件已经非常大,并且将其大小加倍并不可取。

我正在寻找的是 SSIS 不会自动为我转换的 unicode 字符范围。然后我需要做的是替换原始 SQL 语句,以清除那些字符,例如 ♥。

我们从 REPLACE(NAME, SUBSTRING(NAME, PATINDEX('%[^ -ÿ]%', NAME COLLATE Latin1_General_BIN2), 1), '') 开始,但这将替换“ş”带有空格,我们试图避免这种情况,因为 SSIS 可以很好地处理“ş”。

感谢您阅读这个问题!

I'm working with a SSIS package that takes data from SQL Server and creates text files to ship to a vendor. Currently the files are being encoded using ANSI 1252 and the Unicode checkbox is not checked on the Flat File Connection Manager.

The package failed when it encountered this symbol: ♥

This led me to believe that if the step attempted to write out any non-ascii character, it would fail. However, it will succesfully handle: "ş" by converting it to a standard "s". For our purposes, this behavior is great, and if it did something similar to the heart symbol, there would be no issue. I'm trying to avoid sending a Unicode file, as the file is already very large and doubling its size is not preferable.

What I'm looking for is the range of unicode characters that SSIS will not automatically convert for me. Then what I'll need to do is a replace on the original SQL statement, to clear out those characters like the ♥.

We started with REPLACE(NAME, SUBSTRING(NAME, PATINDEX('%[^ -ÿ]%', NAME COLLATE Latin1_General_BIN2), 1), ''), but this will replace the "ş" with a space, which we are attempting to avoid since SSIS handles the "ş" just fine.

Thanks for reading this question!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

倚栏听风 2024-12-16 19:51:23

您将获得 Windows 的“最适合后备”编码。它具体转换哪些字符没有正式记录,并且行为因区域设置而异。很多替换在很多情况下都是不合适的,甚至可能存在安全问题。最好总是避免这种情况。 背景

我试图避免发送 Unicode 文件,因为该文件已经非常大,并且将其大小加倍并不可取。

UTF-16LE(微软工具称之为“Unicode”)可能是 ASCII 大小的两倍,但为什么不是另一种 UTF,最明显的是 UTF-8?

You're getting Windows's “best-fit fallback” encoding. Exactly which characters it converts are not officially documented, and the behaviour differs depending on the locale. Many of the replacements are inappropriate in many cases, and there can even be security problems. It is almost always best avoided. Background

I'm trying to avoid sending a Unicode file, as the file is already very large and doubling its size is not preferable.

UTF-16LE (what Microsoft tools call “Unicode”) may be twice the size of ASCII, but why not another UTF, most obviously UTF-8?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文