在 ColdFusion 中使用 cgi.PATH_INFO 在 URL 中使用 unicode 时出现问题

发布于 2024-08-31 02:34:45 字数 875 浏览 6 评论 0原文

我的 ColdFusion(IIS 6 上的 MX7)站点具有搜索功能,可将搜索词附加到 URL,例如 http://www.example.com/search.cfm/searchterm

我遇到的问题是,这是一个多语言网站,因此搜索词可能是另一种语言,例如 ????? 导致搜索 URL 例如 http://www.example。 com/search.cfm/???

问题是当我从 URL 中检索搜索词时。我使用 cgi.PATH_INFO 检索搜索页面和搜索词的路径,并从中提取搜索词,例如 /search.cfm/searchterm 但是,当unicode 字符在搜索中使用,它们被转换为问号,例如 /search.cfm/??????

这些出现了实际的问号,而不是浏览器无法格式化 unicode 字符,或者它们在输出中被破坏。

我找不到任何有关 ColdFusion 是否支持 URL 中的 unicode 的信息,或者如何解决此问题并以某种方式获取完整的 URL - 有人有任何想法吗?

干杯,

Tom

编辑:进一步的研究使我相信该问题可能与 IIS 而不是 ColdFusion 有关,但我最初的查询仍然成立。

进一步编辑

GetPageContext().GetRequest().GetRequestUrl().ToString() 的结果是 http://www.example.com/search。 cfm/searchterm/?????? 所以看来问题相当深入。

My ColdFusion (MX7 on IIS 6) site has search functionality which appends the search term to the URL e.g. http://www.example.com/search.cfm/searchterm.

The problem I'm running into is this is a multilingual site, so the search term may be in another language e.g. القاهرة leading to a search URL such as http://www.example.com/search.cfm/القاهرة

The problem is when I come to retrieve the search term from the URL. I'm using cgi.PATH_INFO to retrieve the path of the search page and the search term and extracting the search term from this e.g. /search.cfm/searchterm however, when unicode characters are used in the search they are converted to question marks e.g. /search.cfm/??????.

These appear actual question marks, rather than the browser not being able to format unicode characters, or them being mangled on output.

I can't find any information about whether ColdFusion supports unicode in the URL, or how I can go about resolving this and getting hold of the complete URL in some way - does anyone have any ideas?

Cheers,

Tom

Edit: Further research has lead me to believe the issue may related to IIS rather than ColdFusion, but my original query still stands.

Further edit

The result of GetPageContext().GetRequest().GetRequestUrl().ToString() is http://www.example.com/search.cfm/searchterm/????? so it appears the issue goes fairly deep.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

未央 2024-09-07 02:34:45

是的,这并不是 ColdFusion 的错。这是一个常见问题。

这主要是原始 CGI 规范的错误,该规范指定 PATH_INFO 必须进行 % 解码,从而丢失了允许您工作的原始 %xx 字节序列真正的字符是什么意思。

这部分是 IIS 的错误,因为它总是尝试将路径部分中提交的 %xx 字节读取为 UTF-8 编码的 Unicode(除非该路径不是有效的 UTF-8 字节序列,其中情况下它会填充 Windows 默认代码页,但让您无法发现这种情况已经发生)。完成此操作后,它将其作为 Unicode 字符串放入环境变量中(因为 envvars 在 Windows 下是 Unicode)。

然而,大多数使用 C stdio 的基于字节的工具(我假设这适用于 ColdFusion,就像在 Perl、Python 2、PHP 等下一样)然后尝试将环境变量读取为字节,并且 MS C 运行时进行编码再次使用 Windows 默认代码页更改 Unicode 内容。因此,任何不适合默认代码页的字符都将永久丢失。这将包括在西方 Windows 安装上运行时的阿拉伯字符。

一个可以直接访问 Win32 GetEnvironmentVariableW API 的巧妙脚本可以调用它来检索本机 Unicode 环境变量,然后将其编码为 UTF-8 或他们想要的任何其他内容,假设输入是还有 UTF-8(这是您今天通常想要的)。但是,我认为 CodeFusion 不会为您提供此访问权限,并且无论如何它仅适用于 IIS6 及以上版本; IIS5.x 将在任何非默认代码页字符到达环境变量之前将其丢弃。

否则,最好的选择是 URL 重写。如果 CF 之上的层可以将 search.cfm/?????? 转换为 search.cfm/?q=???????????? 那么您就不会遇到同样的问题,因为 >QUERY_STRING 变量与 PATH_INFO 不同,未指定为 % 解码,因此 %xx 字节保留在 CF 级别的工具可以看到的位置。

Yeah, it's not really ColdFusion's fault. It's a common problem.

It's mostly the fault of the original CGI specification, which specifies that PATH_INFO has to be %-decoded, thus losing the original %xx byte sequences that would have allowed you to work out which real characters were meant.

And it's partly IIS's fault, because it always tries to read submitted %xx bytes in the path part as UTF-8-encoded Unicode (unless the path isn't a valid UTF-8 byte sequence in which case it plumps for the Windows default code page, but gives you no way to find out this has happened). Having done so, it puts it in environment variables as a Unicode string (as envvars are Unicode under Windows).

However most byte-based tools using the C stdio (and I'm assuming this applies to ColdFusion, as it does under Perl, Python 2, PHP etc.) then try to read the environment variables as bytes, and the MS C runtime encodes the Unicode contents again using the Windows default code page. So any characters that don't fit in the default code page are lost for good. This would include your Arabic characters when running on a Western Windows install.

A clever script that has direct access to the Win32 GetEnvironmentVariableW API could call that to retrieve a native-Unicode environment variable which they could then encode to UTF-8 or whatever else they wanted, assuming that the input was also UTF-8 (which is what you'd generally want today). However, I don't think CodeFusion gives you this access, and in any case it only works from IIS6 onwards; IIS5.x will throw away any non-default-codepage characters before they even reach the environment variables.

Otherwise, your best bet is URL-rewriting. If a layer above CF can convert that search.cfm/القاهرة to search.cfm/?q=القاهرة then you don't face the same problem, as the QUERY_STRING variable, unlike PATH_INFO, is not specified to be %-decoded, so the %xx bytes remain where a tool at CF's level can see them.

可爱咩 2024-09-07 02:34:45

这是你可以做的:

<cfset url.searchTerm = URLEncodedFormat("القاهر", "utf-8") >

<cfset myVar = URLDecode(url.searchTerm , "utf-8") >

当然,我建议你在这种情况下使用类似的东西:

yourtemplate.cfm?searchTerm=%C3%98%C2%A7%C3%99%E2%80%9E

然后你在 IIS 中进行 URL 重写(如果应用程序的框架/其余部分尚未完成)http://learn.iis.net/page.aspx/461/creating-rewrite-rules-for-the-url-rewrite-module/ 到符合你的模式。

Here's what you could do:

<cfset url.searchTerm = URLEncodedFormat("القاهر", "utf-8") >

<cfset myVar = URLDecode(url.searchTerm , "utf-8") >

Ofcourse, I'd recommend that you work with something like this in that case:

yourtemplate.cfm?searchTerm=%C3%98%C2%A7%C3%99%E2%80%9E

And then you do URL rewriting in IIS (if not already done by framework/rest of the app) http://learn.iis.net/page.aspx/461/creating-rewrite-rules-for-the-url-rewrite-module/ to match your pattern.

指尖微凉心微凉 2024-09-07 02:34:45

您可以使用 setEncoding() 函数设置 URL 和 FORM 范围的字符编码:

http://www.adobe.com/livedocs/coldfusion/7/htmldocs/wwhelp/wwhimpl/common/html/wwhelp.htm ?context=ColdFusion_Documentation&file=00000623.htm

您需要在访问此范围内的任何变量之前执行此操作。

但是,这些范围的默认编码已经是 UTF-8,因此这可能没有帮助。此外,这可能不会影响 CGI 范围。

IIS 服务器是否将正确的字符记录到请求日志中?

You can set the character encoding of the URL and FORM scope using the setEncoding() function:

http://www.adobe.com/livedocs/coldfusion/7/htmldocs/wwhelp/wwhimpl/common/html/wwhelp.htm?context=ColdFusion_Documentation&file=00000623.htm

You need to do this before you access any of the variables in this scope.

But, the default encoding of those scopes is already UTF-8, so this may not help. Also, this would probably not affect the CGI scope.

Is the IIS Server logging the correct characters into the request log?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文