使用 Oracle、PHP 和 Oci8 处理尖锐字符和其他特殊字符

发布于 2024-08-23 04:38:59 字数 2691 浏览 8 评论 0原文

您好,我正在尝试将名称存储到 Oracle 数据库中并使用 PHP 和 oci8 取回它们。

但是,如果我将 é 直接插入 Oracle 数据库并使用 oci8 将其取回,我只会收到 e

我是否必须对所有特殊字符(包括 ) 到 html 实体(即:é),然后插入数据库......或者我遗漏了什么?

谢谢


更新:3 月 1 日 18:40

发现了这个函数: http://www.php.net/manual/en/ function.utf8-decode.php#85034

function charset_decode_utf_8($string) {
    if(@!ereg("[\200-\237]",$string) && @!ereg("[\241-\377]",$string)) {
        return $string;
    }
$string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e","'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",$string);
$string = preg_replace("/([\300-\337])([\200-\277])/e","'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'",$string);
return $string;
}

似乎可以工作,尽管不确定它是否是最佳解决方案


更新:3 月 8 日 15:45

Oracle 的字符集是 ISO-8859-1。
在 PHP 中我添加:

putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1");

强制 oci8 连接使用该字符集。 现在可以使用 oci8 从 PHP 检索 é 了! (对于 varchars,但不是 CLOB,必须执行 utf8_encode 来提取它)
然后我尝试将数据从 PHP 保存到 Oracle...但它不起作用...从 PHP 到 Oracle 的过程中,é 变成了


更新:3 月 9 日 14:47

越来越近了。 添加 NLS_LANG 变量后,使用 é 进行直接 oci8 插入即可。

问题实际上出在PHP方面。 通过使用 ExtJs 框架,在提交表单时,它使用 encodeURIComponent 对其进行编码。
因此 é 作为 %C3%A9 发送,然后重新编码为 é
但它的长度现在是 2 (strlen($my_sent_value) = 2) 而不是 1。 如果在 PHP 中我尝试: $my_sent_value == é = FALSE

我想如果我能够将 PHP 中的所有这些字符重新编码回字节大小 1 的长度然后将它们插入Oracle,它应该可以工作。

但仍然没有运气


更新:3 月 10 日 11:05

我一直认为我是如此接近(但又如此遥远)。

putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P9"); 工作非常零星。

我创建了一个小的 php 脚本来测试:

header('Content-Type: text/plain; charset=ISO-8859-1');
putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P9");
$conn= oci_connect("user", "pass", "DB");
$stmt = oci_parse($conn, "UPDATE temp_tb SET string_field = '|é|'");
oci_execute($stmt, OCI_COMMIT_ON_SUCCESS);

运行一次并直接登录 Oracle 数据库后,我看到 STRING_FIELD 设置为 |¿|。显然这不是我从之前的经历中所期望的。
但是,如果我快速刷新该 PHP 页面两次...它就有效了!!!
在 Oracle 中,我正确地看到了 |é|

看起来环境变量可能没有在第一次执行脚本时正确设置或发送,但可用于第二次执行。

我的下一个实验是将变量导出到 PHP 的环境中,但是,我需要为此重置 Apache...所以我们将看看会发生什么,希望它能起作用。

Hi I am trying to store names into an Oracle database and fetch them back using PHP and oci8.

However, if I insert the é directly into the Oracle database and use oci8 to fetch it back I just receive an e

Do I have to encode all special characters (including é) into html entities (ie: é) before inserting into database ... or am I missing something ?

Thx


UPDATE: Mar 1 at 18:40

found this function:
http://www.php.net/manual/en/function.utf8-decode.php#85034

function charset_decode_utf_8($string) {
    if(@!ereg("[\200-\237]",$string) && @!ereg("[\241-\377]",$string)) {
        return $string;
    }
$string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e","'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",$string);
$string = preg_replace("/([\300-\337])([\200-\277])/e","'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'",$string);
return $string;
}

seems to work, although not sure if its the optimal solution


UPDATE: Mar 8 at 15:45

Oracle's character set is ISO-8859-1.
in PHP I added:

putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1");

to force the oci8 connection to use that character set.
Retrieving the é using oci8 from PHP now worked ! (for varchars, but not CLOBs had to do utf8_encode to extract it )
So then I tried saving the data from PHP to Oracle ... and it doesnt work..somewhere along the way from PHP to Oracle the é becomes a ?


UPDATE: Mar 9 at 14:47

So getting closer.
After adding the NLS_LANG variable, doing direct oci8 inserts with é works.

The problem is actually on the PHP side.
By using ExtJs framework, when submitting a form it encodes it using encodeURIComponent.
So é is sent as %C3%A9 and then re-encoded into é.
However it's length is now 2 (strlen($my_sent_value) = 2) and not 1.
And if in PHP I try: $my_sent_value == é = FALSE

I think if I am able to re-encode all these characters in PHP back into lengths of byte size 1 and then inserting them into Oracle, it should work.

Still no luck though


UPDATE: Mar 10 at 11:05

I keep thinking I am so close (yet so far away).

putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P9"); works very sporadicly.

I created a small php script to test:

header('Content-Type: text/plain; charset=ISO-8859-1');
putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P9");
$conn= oci_connect("user", "pass", "DB");
$stmt = oci_parse($conn, "UPDATE temp_tb SET string_field = '|é|'");
oci_execute($stmt, OCI_COMMIT_ON_SUCCESS);

After running this once and loggin into the Oracle Database directly I see that STRING_FIELD is set to |¿|. Obviously not what I had come to expect from my previous experience.
However, if I refresh that PHP page twice quickly.... it worked !!!
In Oracle I correctly saw |é|.

It seems like maybe the environment variable is not being correctly set or sent in time for the first execution of the script, but is available for the second execution.

My next experiment is to export the variable into PHP's environment, however, I need to reset Apache for that...so we'll see what happens, hopefully it works.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

芸娘子的小脾气 2024-08-30 04:38:59

我想您已经了解这些事实:

  • 有许多不同的字符集:您必须选择一种,当然,还要知道您正在使用哪一种。
  • Oracle 完全能够存储没有 HTML 实体 (é) 的文本。 HTML 实体用在 HTML 中。 Oracle 不是 Web 浏览器;-)

您还必须知道 HTML 实体不绑定到特定的字符集;相反,它们用于表示独立于字符集的上下文中的字符。

你含糊其辞地谈论ISO-8859-1和UTF-8。您想使用什么字符集? ISO-8859-1 易于使用,但它只能存储某些拉丁语言(例如西班牙语)的文本,并且缺少一些常见字符,例如 € 符号。 UTF-8 使用起来比较棘手,但它可以存储 Unicode 联盟定义的所有字符(其中包括您需要的所有字符)。

一旦做出决定,您必须将 Oracle 配置为以此类字符集保存数据并选择适当的列类型。例如,VARCHAR2 适用于纯 ASCII,NVARCHAR2 适用于 UTF-8。

I presume you are aware of these facts:

  • There are many different character sets: you have to pick one and, of course, know which one you are using.
  • Oracle is perfectly capable of storing text without HTML entities (é). HTML entities are used in, well, HTML. Oracle is not a web browser ;-)

You must also know that HTML entities are not bind to a specific charset; on the contrary, they're used to represent characters in a charset-independent context.

You indistinctly talk about ISO-8859-1 and UTF-8. What charset do you want to use? ISO-8859-1 is easy to use but it can only store text in some latin languages (such as Spanish) and it lacks some common chars like the € symbol. UTF-8 is trickier to use but it can store all characters defined by the Unicode consortium (which include everything you'll ever need).

Once you've taken the decision, you must configure Oracle to hold data in such charset and choose an appropriate column type. E.g., VARCHAR2 is fine for plain ASCII, NVARCHAR2 is good for UTF-8.

红墙和绿瓦 2024-08-30 04:38:59

这就是我最终解决这个问题的方法:

修改运行 PHP 的守护进程的配置文件:

NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1

以便 oci8 连接使用 ISO-8859-1。

然后在我的 PHP 配置中将默认内容类型设置为 ISO-8859-1:

default_charset = "iso-8859-1"

当我从 PHP 通过 oci8 插入 Oracle 表时,我会这样做:

utf8_decode($my_sent_value)

当从 Oracle 接收数据时,打印变量应该像这样工作:

echo $my_received_value

然而,当通过 ajax 发送数据时,我不得不使用:

utf8_encode($my_received_value)

This is what I finally ended up doing to solve this problem:

Modified the profile of the daemon running PHP to have:

NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1

So that the oci8 connection uses ISO-8859-1.

Then in my PHP configuration set the default content-type to ISO-8859-1:

default_charset = "iso-8859-1"

When I am inserting into an Oracle Table via oci8 from PHP, I do:

utf8_decode($my_sent_value)

And when receiving data from Oracle, printing the variable should just work as so:

echo $my_received_value

However when sending that data over ajax I have had to use:

utf8_encode($my_received_value)
巷雨优美回忆 2024-08-30 04:38:59

如果服务器端代码(本例中为 php)和 Oracle 数据库之间的字符集不同,则应在 Oracle 连接中设置服务器端代码字符集,然后 Oracle 进行转换。

示例:假设:

  • php 字符集 utf-8 (默认)。
  • Oracle 字符集 AMERICAN_AMERICA.WE8ISO8859P1

在 php 与 Oracle 的连接中,您应该设置 UTF8 (第三个参数)。

oci_pconnect("USER", "PASS", "URL"),"UTF8");

执行此操作时,您可以使用 utf-8 编写代码(根本不进行任何转换),并通过此连接从数据库获取 utf-8

因此,您可以编写类似 SELECT * FROM SOME_TABLE WHERE TEXT = 'SOME TEXT LIKE áéíóú Ñ' 的内容,并获得 utf-8 文本作为结果。

根据php文档,默认情况下,Oracle客户端( oci_pconnect) 从操作系统获取 NLS_LANG 环境变量。一些基于 debian 的系统没有 NLS_LANG 环境变量,因此我认为如果我们不指定第三个参数,Oracle 客户端将使用它自己的字符集(AMERICAN_AMERICA.WE8ISO8859P1)。

If you have different charsets between the server side code (php in this case) and the Oracle database, you should set server side code charset in the Oracle connection, then Oracle made the conversion.

Example: Let's assume:

  • php charset utf-8 (default).
  • Oracle charset AMERICAN_AMERICA.WE8ISO8859P1

In the connection to Oracle made by php you should set UTF8 (third parameter).

oci_pconnect("USER", "PASS", "URL"),"UTF8");

Doing this, you write code in utf-8 (not doing any conversion at all) and get utf-8 from the database through this connection.

So you could write something like SELECT * FROM SOME_TABLE WHERE TEXT = 'SOME TEXT LIKE áéíóú Ñ' and also get utf-8 text as a result.

According to the php documentation, by default, Oracle client (oci_pconnect) takes the NLS_LANG environment variable from the Operating system. Some debian based systems has no NLS_LANG enviromental variable, so I think Oracle client use it's own charset (AMERICAN_AMERICA.WE8ISO8859P1) if we don't specify the third parameter.

待天淡蓝洁白时 2024-08-30 04:38:59

如果您确实无法更改 Oracle 将使用的字符集,那么在将数据存储到数据库之前对数据进行 Base64 编码怎么样?这样,您可以接受任何字符集中的字符并将它们存储为 ISO-8859-1(因为 Base64 将输出精确映射到 ISO-8859-1 的 ASCII 字符集的子集)。 Base64 编码将使字符串长度平均增加 37%

如果您的数据仅以 HTML 形式显示,那么您不妨按照您的建议存储 HTML 实体,但请注意,单个实体可能会增加每个未编码字符最多 10 个字符,例如 ϑ 是 ϑ

If you really cannot change the character set that oracle will use then how about Base64 encoding your data before storing it in the database. That way, you can accept characters from any character set and store them as ISO-8859-1 (because Base64 will output a subset of the ASCII character set which maps exactly to ISO-8859-1). Base64 encoding will increase the length of the string by, on average, 37%

If your data is only ever going to be displayed as HTML then you might as well store HTML entities as you suggested, but be aware that a single entity can be up to 10 characters per unencoded character e.g. ϑ is ϑ

最后的乘客 2024-08-30 04:38:59

我不得不面对这个问题:拉丁美洲特殊字符存储为“?”或我的 Oracle 数据库中的“¿”...我无法更改 NLS_CHARACTER_SET,因为我们不是数据库所有者。

所以,我找到了一个解决方法:

1)ASP.NET代码
创建一个将字符串转换为十六进制字符的函数:

    public string ConvertirStringAHex(String input)
    {
        Encoding encoding = System.Text.Encoding.GetEncoding("ISO-8859-1");
        Byte[] stringBytes = encoding.GetBytes(input);
        StringBuilder sbBytes = new StringBuilder(stringBytes.Length);
        foreach (byte b in stringBytes)
        {
            sbBytes.AppendFormat("{0:X2}", b);
        }
        return sbBytes.ToString();
    }

2) 将上面的函数应用于要编码的变量,如下所示

     myVariableHex = ConvertirStringZHex( myVariable );

在 ORACLE 中,使用以下命令:

 PROCEDURE STORE_IN_TABLE( iTEXTO IN VARCHAR2 )
 IS
 BEGIN
   INSERT INTO myTable( SPECIAL_TEXT )  
   VALUES ( UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW( iTEXTO ));
   COMMIT;
 END;

当然,iTEXTO 是 Oracle 参数,它从以下位置接收“myVariableHex”的值: ASP.NET 代码。

希望它有所帮助......如果有什么需要改进的地方,请随时发表您的评论。

资料来源:
http:// /www.nullskull.com/faq/834/convert-string-to-hex-and-hex-to-string-in-net.aspx
https://forums.oracle.com/thread/44799

I had to face this problem : the LatinAmerican special characters are stored as "?" or "¿" in my Oracle database ... I can't change the NLS_CHARACTER_SET because we're not the database owners.

So, I found a workaround :

1) ASP.NET code
Create a function that converts string to hexadecimal characters:

    public string ConvertirStringAHex(String input)
    {
        Encoding encoding = System.Text.Encoding.GetEncoding("ISO-8859-1");
        Byte[] stringBytes = encoding.GetBytes(input);
        StringBuilder sbBytes = new StringBuilder(stringBytes.Length);
        foreach (byte b in stringBytes)
        {
            sbBytes.AppendFormat("{0:X2}", b);
        }
        return sbBytes.ToString();
    }

2) Apply the function above to the variable you want to encode, like this

     myVariableHex = ConvertirStringZHex( myVariable );

In ORACLE, use the following:

 PROCEDURE STORE_IN_TABLE( iTEXTO IN VARCHAR2 )
 IS
 BEGIN
   INSERT INTO myTable( SPECIAL_TEXT )  
   VALUES ( UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW( iTEXTO ));
   COMMIT;
 END;

Of course, iTEXTO is the Oracle parameter which receives the value of "myVariableHex" from ASP.NET code.

Hope it helps ... if there's something to improve pls don't hesitate to post your comments.

Sources:
http://www.nullskull.com/faq/834/convert-string-to-hex-and-hex-to-string-in-net.aspx
https://forums.oracle.com/thread/44799

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文