使用 PHP 将希伯来语文本插入 MySQL(垃圾文本)

发布于 2024-12-07 08:36:42 字数 5492 浏览 0 评论 0原文

我在将希伯来语文本插入 mysql 时遇到了一个奇怪的问题。

基本上问题是:
我有一个 php 脚本,它从 csv 文件中获取希伯来语文本,然后将其发送到 mysql 数据库。数据库和表的所有字段的字符集都设置为UTF8,排序规则设置为utf8_bin。但是当我使用 mysql 插入它时,随机垃圾值出现在文本中,这使得它对于输出完全无用。注意:我仍然可以看到一半的单词正确显示。

这是我的作业,可能会帮助您理解:
1. 正如我提到的,表字符集和排序规则是 utf8。
2.我已经发送 header('Content-Type: text/html; charset=utf-8')
3. 如果我回显文本,它看起来很完美。当我使用 utf-8_encode 转换它时 它得到正确转换。 (例如,שפפת 转换为 ×©× ×פת)
4.当我对转换后的变量使用utf-8_decode并使用echo时,它仍然显示完美。
之后使用了这些

5.我在 mysql_connect
mysql_query("SET character_set_client = 'utf8';");
mysql_query("设置字符集结果 = 'utf8';");
mysql_query("设置名称'utf8'");
mysql_set_charset('utf8');

甚至尝试过这个:
mysql_query(“SET character_set_results = 'utf8',character_set_client = 'utf8',character_set_connection = 'utf8',character_set_database = 'utf8',character_set_server = 'utf8'”,$ con)

  1. 添加了default_charset =“UTF-8”我的 php.ini 文件。
  2. 我不知道 csv 中使用的编码文件,但当我用notepad++打开它时,编码是没有BOM的utf-8。
  3. 这是实际垃圾的示例:
    原文: שй פת
    utf8_encode 后的文本:×©× ×פת
    同一脚本中 utf8_decode 后的文本:שй פת(完美)
    文本发送到mysql数据库:ש×? ×?פת(注意中间的 ?)
    如果我们从 mysql 回显文本:ש�? ?פת(输出接近)
  4. 在utf8_encoding之前使用了addslashes和stripslashes。 (甚至在没有运气后尝试)
  5. 服务器位于运行 xamp 1.7.4 的 Windows 上
    • 阿帕奇2.2.17
    • MySQL 5.5.8(社区服务器)
    • PHP 5.3.5(VC6 X86 32位)

编辑1:只是为了澄清我确实在网站上搜索了类似的问题并实施了找到的建议(SET NAME UTF8)和很多其他选择等)但它没有成功。因此,请不要将此问题标记为重复。

编辑2: 这是完整的脚本:

    <?php
header('Content-Type: text/html; charset=utf-8'); 

if (isset($_GET['filename'])==true)
{
$databasehost = "localhost";
$databasename = "what_csv";


$databaseusername="root";
$databasepassword="";
$databasename= "csv";

$fieldseparator = "\n";
$lineseparator = "@contact\n";


$csvfile = $_GET['filename'];
/********************************/


if(!file_exists($csvfile)) {
    echo "File not found. Make sure you specified the correct path.\n";
    exit;
}

$file = fopen($csvfile,"r");

if(!$file) {
    echo "Error opening data file.\n";
    exit;
}

$size = filesize($csvfile);

if(!$size) {
    echo "File is empty.\n";
    exit;
}

$csvcontent = fread($file,$size);

fclose($file);

$con = @mysql_connect($databasehost,$databaseusername,$databasepassword) or die(mysql_error());

mysql_query( "SET NAMES utf8" );
mysql_set_charset('utf8',$con);
/*
mysql_query("SET character_set_client = 'utf8';"); 
mysql_query("SET character_set_result = 'utf8';");

mysql_query("SET NAMES 'utf8'");
mysql_set_charset('utf8');

mysql_query("SET character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'", $con);
*/

@mysql_select_db($databasename) or die(mysql_error());



$lines = 0;
$queries = "";
$linearray = array();

foreach(explode($lineseparator,$csvcontent) as $line) {

$Name="";
$Landline1="";
$Landline2="";
$Mobile="";
$Address="";
$Email="";
$IMEI="temp";
$got_imei=false;

//echo $line.'<br>';
    $lines++;

    $line = trim($line," \t");

    $line = str_replace("\r","",$line);

    $linearray = explode($fieldseparator,$line);
    //check for values to insert
    foreach($linearray as $field)
    {
    if (is_numeric($field)){ $got_imei=true;$IMEI=trim($field);}
    if (stristr($field, 'Name:')) {$Name=trim(str_replace("Name:", "", $field));}   
    if (stristr($field, 'Landline:')) {$Landline1=trim(str_replace("Landline:", "", $field));}  
    if (stristr($field, 'Landline2:')) {$Landline2=trim(str_replace("Landline2:", "", $field));}    
    if (stristr($field, 'Mobile:')) {$Mobile=trim(str_replace("Mobile:", "", $field));} 
    if (stristr($field, 'Address:')) {$Address=trim(str_replace("Address:", "", $field));}
    if (stristr($field, 'Email:')) {$Email=trim(str_replace("Email:", "", $field));}



    }
    if ($got_imei==true)
    {

    $query = "UPDATE $databasetable SET imei=$IMEI where imei='temp'";
        mysql_query($query);

    }



    else if (($Name=="") &&  ($Landline1=="" ) && ($Landline2=="")  && ($Mobile=="")  && ($Address=="")) {echo "";}
    else
    {
        //$Name = utf8_encode("$Name");
        //$Name = addslashes("$Name");
        $Name = utf8_encode(mysql_real_escape_string("$Name"));

        echo"$Name,$Landline1,$Landline2,$Address,$IMEI<br>";
        $query = "insert into $databasetable (imei, name, landline1, landline2, mobile, address, email) values('$IMEI','$Name', '$Landline1','$Landline2','$Mobile', '$Address', '$Email');";
        mysql_query($query);
        $Name = utf8_decode(($Name));   
        echo $Name."<br>";

    }
}
@mysql_close($con);



echo "Found a total of $lines records in this csv file.\n";

}
?>


<form>
Enter file name <input type="text" name="filename" /><br />
<input type="submit" value="Submit" /><br>
NOTE : File must be present in same directory as this script. Please include full filename, for example filename.csv.
</form>

这是 csv 文件的示例:

@contact
Name: שי יפת
Mobile: 0547939898

@IMEI
355310042074173

编辑 3:

如果我直接通过 cmd 输入字符串,我会收到此警告:

Warning Code : 1366
Incorrect string value: '\xD7\xA9\xD7\x99 \xD7...' for column 'name' at row 1

这是我在网上找到的可能相关的内容,有什么帮助吗? http://bugs.mysql.com/bug.php?id=30131

I'm facing a weird problem with inserting hebrew text into mysql.

Basically the problem is :
I have a php script which picks up hebrew text from a csv file then send it to mysql database. The charset of both database and all fields of tables are set to UTF8 and collation to utf8_bin. But when I insert it using mysql, random garbage value appears inside the text which renders it completely useless for output. NOTE : I can still see half of the words appear correctly.

Here is my homework which might help you in understanding :
1. As I mentioned the table charset and collation are utf8.
2. I've send header('Content-Type: text/html; charset=utf-8')
3. If I echo out the text, it appears perfectly. When I convert it using utf-8_encode
it get converted properly. (eg. שי יפת get converted to ×©× ×פת)
4. When I use utf-8_decode on the converted variable and use echo, it still displays perfectly.
5. I've used these after mysql_connect

mysql_query("SET character_set_client = 'utf8';");
mysql_query("SET character_set_result = 'utf8';");
mysql_query("SET NAMES 'utf8'");
mysql_set_charset('utf8');

and even tried this :
mysql_query("SET character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'", $con)

  1. Added default_charset = "UTF-8" in my php.ini file.
  2. I am unaware of the encoding used in csv file but when I open it with notepad++ the encoding is utf-8 without BOM.
  3. Here is a sample of the actual garbage :
    original text : שי יפת
    text after utf8_encode : ×©× ×פת
    text after utf8_decode in same script : שי יפת (perfect)
    text send to mysql database : ש×? ×?פת (notice the ? in between)
    text if we echo from mysql : ש�? �?פת (the output is close)
  4. Used addslashes and stripslashes before utf8_encoding. (even tried after no luck)
  5. Server is on windows running xamp 1.7.4
    • Apache 2.2.17
    • MySQL 5.5.8 (Community Server)
    • PHP 5.3.5 (VC6 X86 32bit)

EDIT 1 : Just to clarify that I did searched the site for similar questions and did implemented the suggestions found (SET NAME UTF8 and alot other options etc) but it didn't work out. So please don't mark this question as repeat.

EDIT 2 :
Here is the full script :

    <?php
header('Content-Type: text/html; charset=utf-8'); 

if (isset($_GET['filename'])==true)
{
$databasehost = "localhost";
$databasename = "what_csv";


$databaseusername="root";
$databasepassword="";
$databasename= "csv";

$fieldseparator = "\n";
$lineseparator = "@contact\n";


$csvfile = $_GET['filename'];
/********************************/


if(!file_exists($csvfile)) {
    echo "File not found. Make sure you specified the correct path.\n";
    exit;
}

$file = fopen($csvfile,"r");

if(!$file) {
    echo "Error opening data file.\n";
    exit;
}

$size = filesize($csvfile);

if(!$size) {
    echo "File is empty.\n";
    exit;
}

$csvcontent = fread($file,$size);

fclose($file);

$con = @mysql_connect($databasehost,$databaseusername,$databasepassword) or die(mysql_error());

mysql_query( "SET NAMES utf8" );
mysql_set_charset('utf8',$con);
/*
mysql_query("SET character_set_client = 'utf8';"); 
mysql_query("SET character_set_result = 'utf8';");

mysql_query("SET NAMES 'utf8'");
mysql_set_charset('utf8');

mysql_query("SET character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'", $con);
*/

@mysql_select_db($databasename) or die(mysql_error());



$lines = 0;
$queries = "";
$linearray = array();

foreach(explode($lineseparator,$csvcontent) as $line) {

$Name="";
$Landline1="";
$Landline2="";
$Mobile="";
$Address="";
$Email="";
$IMEI="temp";
$got_imei=false;

//echo $line.'<br>';
    $lines++;

    $line = trim($line," \t");

    $line = str_replace("\r","",$line);

    $linearray = explode($fieldseparator,$line);
    //check for values to insert
    foreach($linearray as $field)
    {
    if (is_numeric($field)){ $got_imei=true;$IMEI=trim($field);}
    if (stristr($field, 'Name:')) {$Name=trim(str_replace("Name:", "", $field));}   
    if (stristr($field, 'Landline:')) {$Landline1=trim(str_replace("Landline:", "", $field));}  
    if (stristr($field, 'Landline2:')) {$Landline2=trim(str_replace("Landline2:", "", $field));}    
    if (stristr($field, 'Mobile:')) {$Mobile=trim(str_replace("Mobile:", "", $field));} 
    if (stristr($field, 'Address:')) {$Address=trim(str_replace("Address:", "", $field));}
    if (stristr($field, 'Email:')) {$Email=trim(str_replace("Email:", "", $field));}



    }
    if ($got_imei==true)
    {

    $query = "UPDATE $databasetable SET imei=$IMEI where imei='temp'";
        mysql_query($query);

    }



    else if (($Name=="") &&  ($Landline1=="" ) && ($Landline2=="")  && ($Mobile=="")  && ($Address=="")) {echo "";}
    else
    {
        //$Name = utf8_encode("$Name");
        //$Name = addslashes("$Name");
        $Name = utf8_encode(mysql_real_escape_string("$Name"));

        echo"$Name,$Landline1,$Landline2,$Address,$IMEI<br>";
        $query = "insert into $databasetable (imei, name, landline1, landline2, mobile, address, email) values('$IMEI','$Name', '$Landline1','$Landline2','$Mobile', '$Address', '$Email');";
        mysql_query($query);
        $Name = utf8_decode(($Name));   
        echo $Name."<br>";

    }
}
@mysql_close($con);



echo "Found a total of $lines records in this csv file.\n";

}
?>


<form>
Enter file name <input type="text" name="filename" /><br />
<input type="submit" value="Submit" /><br>
NOTE : File must be present in same directory as this script. Please include full filename, for example filename.csv.
</form>

Here is a sample of csv file :

@contact
Name: שי יפת
Mobile: 0547939898

@IMEI
355310042074173

EDIT 3 :

If I directly enter the string via cmd I get this warning:

Warning Code : 1366
Incorrect string value: '\xD7\xA9\xD7\x99 \xD7...' for column 'name' at row 1

Here is something I found on the net that could be related, any help?
http://bugs.mysql.com/bug.php?id=30131

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

拥抱影子 2024-12-14 08:36:42

我也有这个问题。这些台词解决了这个问题:

mysql_query( "SET NAMES utf8" );
mysql_query( "SET CHARACTER SET utf8" );

Shana Tova

I had this problem too. Thees lines solve it:

mysql_query( "SET NAMES utf8" );
mysql_query( "SET CHARACTER SET utf8" );

Shana Tova

逐鹿 2024-12-14 08:36:42

使用 Text/LongText 而不是 varchar。还使用排序规则作为 utf8_general_ci

希望这会帮助你@Ajit

Use Text/LongText instead of varchar. Also use Collation as utf8_general_ci

Hope this will help you @Ajit

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文