在 ColdFusion 中使用 cgi.PATH_INFO 在 URL 中使用 unicode 时出现问题
我的 ColdFusion(IIS 6 上的 MX7)站点具有搜索功能,可将搜索词附加到 URL,例如 http://www.example.com/search.cfm/searchterm
。
我遇到的问题是,这是一个多语言网站,因此搜索词可能是另一种语言,例如 ?????
导致搜索 URL 例如 http://www.example。 com/search.cfm/???
问题是当我从 URL 中检索搜索词时。我使用 cgi.PATH_INFO
检索搜索页面和搜索词的路径,并从中提取搜索词,例如 /search.cfm/searchterm
但是,当unicode 字符在搜索中使用,它们被转换为问号,例如 /search.cfm/??????
。
这些出现了实际的问号,而不是浏览器无法格式化 unicode 字符,或者它们在输出中被破坏。
我找不到任何有关 ColdFusion 是否支持 URL 中的 unicode 的信息,或者如何解决此问题并以某种方式获取完整的 URL - 有人有任何想法吗?
干杯,
Tom
编辑:进一步的研究使我相信该问题可能与 IIS 而不是 ColdFusion 有关,但我最初的查询仍然成立。
进一步编辑
GetPageContext().GetRequest().GetRequestUrl().ToString()
的结果是 http://www.example.com/search。 cfm/searchterm/??????
所以看来问题相当深入。
My ColdFusion (MX7 on IIS 6) site has search functionality which appends the search term to the URL e.g. http://www.example.com/search.cfm/searchterm
.
The problem I'm running into is this is a multilingual site, so the search term may be in another language e.g. القاهرة
leading to a search URL such as http://www.example.com/search.cfm/القاهرة
The problem is when I come to retrieve the search term from the URL. I'm using cgi.PATH_INFO
to retrieve the path of the search page and the search term and extracting the search term from this e.g. /search.cfm/searchterm
however, when unicode characters are used in the search they are converted to question marks e.g. /search.cfm/??????
.
These appear actual question marks, rather than the browser not being able to format unicode characters, or them being mangled on output.
I can't find any information about whether ColdFusion supports unicode in the URL, or how I can go about resolving this and getting hold of the complete URL in some way - does anyone have any ideas?
Cheers,
Tom
Edit: Further research has lead me to believe the issue may related to IIS rather than ColdFusion, but my original query still stands.
Further edit
The result of GetPageContext().GetRequest().GetRequestUrl().ToString()
is http://www.example.com/search.cfm/searchterm/?????
so it appears the issue goes fairly deep.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
是的,这并不是 ColdFusion 的错。这是一个常见问题。
这主要是原始 CGI 规范的错误,该规范指定
PATH_INFO
必须进行 % 解码,从而丢失了允许您工作的原始%xx
字节序列真正的字符是什么意思。这部分是 IIS 的错误,因为它总是尝试将路径部分中提交的
%xx
字节读取为 UTF-8 编码的 Unicode(除非该路径不是有效的 UTF-8 字节序列,其中情况下它会填充 Windows 默认代码页,但让您无法发现这种情况已经发生)。完成此操作后,它将其作为 Unicode 字符串放入环境变量中(因为 envvars 在 Windows 下是 Unicode)。然而,大多数使用 C stdio 的基于字节的工具(我假设这适用于 ColdFusion,就像在 Perl、Python 2、PHP 等下一样)然后尝试将环境变量读取为字节,并且 MS C 运行时进行编码再次使用 Windows 默认代码页更改 Unicode 内容。因此,任何不适合默认代码页的字符都将永久丢失。这将包括在西方 Windows 安装上运行时的阿拉伯字符。
一个可以直接访问 Win32
GetEnvironmentVariableW
API 的巧妙脚本可以调用它来检索本机 Unicode 环境变量,然后将其编码为 UTF-8 或他们想要的任何其他内容,假设输入是还有 UTF-8(这是您今天通常想要的)。但是,我认为 CodeFusion 不会为您提供此访问权限,并且无论如何它仅适用于 IIS6 及以上版本; IIS5.x 将在任何非默认代码页字符到达环境变量之前将其丢弃。否则,最好的选择是 URL 重写。如果 CF 之上的层可以将
search.cfm/??????
转换为search.cfm/?q=????????????
那么您就不会遇到同样的问题,因为>QUERY_STRING
变量与PATH_INFO
不同,未指定为 % 解码,因此%xx
字节保留在 CF 级别的工具可以看到的位置。Yeah, it's not really ColdFusion's fault. It's a common problem.
It's mostly the fault of the original CGI specification, which specifies that
PATH_INFO
has to be %-decoded, thus losing the original%xx
byte sequences that would have allowed you to work out which real characters were meant.And it's partly IIS's fault, because it always tries to read submitted
%xx
bytes in the path part as UTF-8-encoded Unicode (unless the path isn't a valid UTF-8 byte sequence in which case it plumps for the Windows default code page, but gives you no way to find out this has happened). Having done so, it puts it in environment variables as a Unicode string (as envvars are Unicode under Windows).However most byte-based tools using the C stdio (and I'm assuming this applies to ColdFusion, as it does under Perl, Python 2, PHP etc.) then try to read the environment variables as bytes, and the MS C runtime encodes the Unicode contents again using the Windows default code page. So any characters that don't fit in the default code page are lost for good. This would include your Arabic characters when running on a Western Windows install.
A clever script that has direct access to the Win32
GetEnvironmentVariableW
API could call that to retrieve a native-Unicode environment variable which they could then encode to UTF-8 or whatever else they wanted, assuming that the input was also UTF-8 (which is what you'd generally want today). However, I don't think CodeFusion gives you this access, and in any case it only works from IIS6 onwards; IIS5.x will throw away any non-default-codepage characters before they even reach the environment variables.Otherwise, your best bet is URL-rewriting. If a layer above CF can convert that
search.cfm/القاهرة
tosearch.cfm/?q=القاهرة
then you don't face the same problem, as theQUERY_STRING
variable, unlikePATH_INFO
, is not specified to be %-decoded, so the%xx
bytes remain where a tool at CF's level can see them.这是你可以做的:
当然,我建议你在这种情况下使用类似的东西:
yourtemplate.cfm?searchTerm=%C3%98%C2%A7%C3%99%E2%80%9E
然后你在 IIS 中进行 URL 重写(如果应用程序的框架/其余部分尚未完成)http://learn.iis.net/page.aspx/461/creating-rewrite-rules-for-the-url-rewrite-module/ 到符合你的模式。
Here's what you could do:
Ofcourse, I'd recommend that you work with something like this in that case:
yourtemplate.cfm?searchTerm=%C3%98%C2%A7%C3%99%E2%80%9E
And then you do URL rewriting in IIS (if not already done by framework/rest of the app) http://learn.iis.net/page.aspx/461/creating-rewrite-rules-for-the-url-rewrite-module/ to match your pattern.
您可以使用 setEncoding() 函数设置 URL 和 FORM 范围的字符编码:
http://www.adobe.com/livedocs/coldfusion/7/htmldocs/wwhelp/wwhimpl/common/html/wwhelp.htm ?context=ColdFusion_Documentation&file=00000623.htm
您需要在访问此范围内的任何变量之前执行此操作。
但是,这些范围的默认编码已经是 UTF-8,因此这可能没有帮助。此外,这可能不会影响 CGI 范围。
IIS 服务器是否将正确的字符记录到请求日志中?
You can set the character encoding of the URL and FORM scope using the setEncoding() function:
http://www.adobe.com/livedocs/coldfusion/7/htmldocs/wwhelp/wwhimpl/common/html/wwhelp.htm?context=ColdFusion_Documentation&file=00000623.htm
You need to do this before you access any of the variables in this scope.
But, the default encoding of those scopes is already UTF-8, so this may not help. Also, this would probably not affect the CGI scope.
Is the IIS Server logging the correct characters into the request log?