PHP-UTF-8下的PHP全角标点转为半角的疑问
PHP全角标点转为半角
<?php
$str = "0123ABCDFWS\",.?<>{}[]*&^%#@!~()+-|:;";
echo "$str";
echo "<br />";
$str = preg_replace('/xa3([xa1-xfe])/e', 'chr(ord(1)-0x80)', $str);
echo $str;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果文件是utf8的,那么可以通过
$str = iconv('utf-8', 'gbk', $str); 转换编码后,再
preg_replace('/xa3([xa1-xfe])/e', 'chr(ord(1)-0x80)', $str);
下边是不同编码对应的正则匹配编码的范围:
UTF8: [x01-x7f]|[xc0-xdf][x80-xbf]|[xe0-xef][x80-xbf]{2}|[xf0-xff][x80-xbf]{3}
UTF16: [x00-xd7][xe0-xff]|[xd8-xdf][x00-xff]{2}
Big5: [x01-x7f]|[x81-xfe]([x40-x7e]|[xa1-xfe])
GBK :[x01-x7f]|[x81-xfe][x40-xfe]
GB2312汉字: [xb0-xf7][xa0-xfe]
GB2312半角标点符号及特殊符号: xa1[xa2-xfe]
GB2312罗马数组及项目序号: xa2([xa1-xaa]|[xb1-xbf]|[xc0-xdf]|[xe0-xe2]|[xe5-xee]|[xf1-xfc])
GB2312全角标点及全角字母: xa3[xa1-xfe]
GB2312日文平假名: xa4[xa1-xf3]
GB2312日文片假名:xa5[xa1-xf6]
GB18030: [x00-x7f]|[x81-xfe][x40-xfe]|[x81-xfe][x30-x39][x81-xfe][x30-x39]
JIS [x20-x7e]|[x21-x5f]|[x21-x7e]{2}
SJIS [x20-x7e]|[xa1-xdf]|([x81-x9f]|[xe0-xef])([x40-x7e]|[x80-xfc])
SJIS全角空格 (?:x81x81)
SJIS全角数字 (?:x82[x4f-x58])
SJIS全角大写英文 (?:x82[x60-x79])
SJIS全角小写英文 (?:x82[x81-x9a])
SJIS全角平假名 (?:x82[x9f-xf1])
SJIS全角平假名扩展 (?:x82[x9f-xf1]|x81[x4ax4bx54x55])
SJIS全角片假名 (?:x83[x40-x96])
SJIS全角片假名扩展 (?:x83[x40-x96]|x81[x45x5bx52x53])
EUC_JP [x20-x7e]|x81[xa1-xdf]|[xa1-xfe][xa1-xfe]|x8f[xa1-xfe]{2}
EUC_JP标点符号及特殊字符 [xa1-xa2][xa0-xfe]
EUC_JP全角数字 xa3[xb0-xb9]
EUC_JP全角大写英文 xa3[xc1-xda]
EUC_JP全角小写英文 xa3[xe1-xfa]
EUC_JP全角平假名 xa4[xa1-xf3]
EUC_JP全角片假名 xa3[xb0-xb9]|xa3[xc1-xda]|xa5[xa1-xf6][xa3][xb0-xfa]|[xa1][xbc-xbe]|[xa1][xdd]
EUC_JP全角汉字 [xb0-xcf][xa0-xd3]|[xd0-xf4][xa0-xfe]|[xB0-xF3][xA1-xFE]|[xF4][xA1-xA6]|[xA4][xA1-xF3]|[xA5][xA1-xF6]|[xA1][xBC-xBE]
EUC_JP全角空格 (?:xa1xa1)
EUC半角片假名 (?:x8e[xa6-xdf])
日文半角空格 x20
$queue = Array('0' => '0', '1' => '1', '2' => '2', '3' => '3', '4' => '4', '5' => '5', '6' => '6', '7' => '7', '8' => '8', '9' => '9',
'A' => 'A', 'B' => 'B', 'C' => 'C', 'D' => 'D', 'E' => 'E', 'F' => 'F', 'G' => 'G', 'H' => 'H', 'I' => 'I', 'J' => 'J',
'K' => 'K', 'L' => 'L', 'M' => 'M', 'N' => 'N', 'O' => 'O', 'P' => 'P', 'Q' => 'Q', 'R' => 'R', 'S' => 'S', 'T' => 'T',
'U' => 'U', 'V' => 'V', 'W' => 'W', 'X' => 'X', 'Y' => 'Y', 'Z' => 'Z', 'a' => 'a', 'b' => 'b', 'c' => 'c', 'd' => 'd',
'e' => 'e', 'f' => 'f', 'g' => 'g', 'h' => 'h', 'i' => 'i', 'j' => 'j', 'k' => 'k', 'l' => 'l', 'm' => 'm', 'n' => 'n',
'o' => 'o', 'p' => 'p', 'q' => 'q', 'r' => 'r', 's' => 's', 't' => 't', 'u' => 'u', 'v' => 'v', 'w' => 'w', 'x' => 'x',
'y' => 'y', 'z' => 'z');
echo preg_replace_callback("/([xEF][xBC][x90-x99]|[xEF][xBD][x81-x9AxA1-xBA])/", 'next_fchar', '0');
function next_fchar($matches){
global $queue;
return $queue[$matches[1]];
}
其实直接把这些特殊字符放到一个全局数组里也可以的:
function convertChar($str)
{
$arr = array('0' => '0', '1' => '1', '2' => '2', '3' => '3', '4' => '4','5' => '5', '6' => '6', '7' => '7', '8' => '8', '9' => '9', 'A' => 'A', 'B' => 'B', 'C' => 'C', 'D' => 'D', 'E' => 'E','F' => 'F', 'G' => 'G', 'H' => 'H', 'I' => 'I', 'J' => 'J', 'K' => 'K', 'L' => 'L', 'M' => 'M', 'N' => 'N', 'O' => 'O','P' => 'P', 'Q' => 'Q', 'R' => 'R', 'S' => 'S', 'T' => 'T',U' => 'U', 'V' => 'V', 'W' => 'W', 'X' => 'X', 'Y' => 'Y','Z' => 'Z', 'a' => 'a', 'b' => 'b', 'c' => 'c', 'd' => 'd','e' => 'e', 'f' => 'f', 'g' => 'g', 'h' => 'h', 'i' => 'i','j' => 'j', 'k' => 'k', 'l' => 'l', 'm' => 'm', 'n' => 'n','o' => 'o', 'p' => 'p', 'q' => 'q', 'r' => 'r', 's' => 's', 't' => 't', 'u' => 'u', 'v' => 'v', 'w' => 'w', 'x' => 'x', 'y' => 'y', 'z' => 'z','(' => '(', ')' => ')', '〔' => '[', '〕' => ']', '【' => '[','】' => ']', '〖' => '[', '〗' => ']', '“' => '[', '”' => ']','‘' => '[', ''' => ']', '{' => '{', '}' => '}', '《' => '<','》' => '>','%' => '%', '+' => '+', '—' => '-', '-' => '-', '~' => '-',':' => ':', '。' => '.', '、' => ',', ',' => '.', '、' => '.', ';' => ',', '?' => '?', '!' => '!', '…' => '-', '‖' => '|', '”' => '"', ''' => '`', '‘' => '`', '|' => '|', '〃' => '"',' ' => ' ');
return strtr($str, $arr);
}
我觉得没必要那么麻烦,直接写个方法替换就完了,而且正则的效率还低,我就是这么弄的:
/**
* 字符串半角和全角间相互转换
* @param string $str 待转换的字符串
* @param int $type TODBC:转换为半角;TOSBC,转换为全角
* @return string 返回转换后的字符串
*/
function convertStrType($str, $type) {
$dbc = array(
'0' , '1' , '2' , '3' , '4' ,
'5' , '6' , '7' , '8' , '9' ,
'A' , 'B' , 'C' , 'D' , 'E' ,
'F' , 'G' , 'H' , 'I' , 'J' ,
'K' , 'L' , 'M' , 'N' , 'O' ,
'P' , 'Q' , 'R' , 'S' , 'T' ,
'U' , 'V' , 'W' , 'X' , 'Y' ,
'Z' , 'a' , 'b' , 'c' , 'd' ,
'e' , 'f' , 'g' , 'h' , 'i' ,
'j' , 'k' , 'l' , 'm' , 'n' ,
'o' , 'p' , 'q' , 'r' , 's' ,
't' , 'u' , 'v' , 'w' , 'x' ,
'y' , 'z' , '-' , ' ' , ':' ,
'.' , ',' , '/' , '%' , '#' ,
'!' , '@' , '&' , '(' , ')' ,
'<' , '>' , '"' , ''' , '?' ,
'[' , ']' , '{' , '}' , '\' ,
'|' , '+' , '=' , '_' , '^' ,
'¥' , ' ̄' , '`'
);
$sbc = array( //半角
'0', '1', '2', '3', '4',
'5', '6', '7', '8', '9',
'A', 'B', 'C', 'D', 'E',
'F', 'G', 'H', 'I', 'J',
'K', 'L', 'M', 'N', 'O',
'P', 'Q', 'R', 'S', 'T',
'U', 'V', 'W', 'X', 'Y',
'Z', 'a', 'b', 'c', 'd',
'e', 'f', 'g', 'h', 'i',
'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's',
't', 'u', 'v', 'w', 'x',
'y', 'z', '-', ' ', ':',
'.', ',', '/', '%', ' #',
'!', '@', '&', '(', ')',
'<', '>', '"', ''','?',
'[', ']', '{', '}', '\',
'|', '+', '=', '_', '^',
'¥','~', '`'
);
if($type == 'TODBC'){
return str_replace( $sbc, $dbc, $str ); //半角到全角
}elseif($type == 'TOSBC'){
return str_replace( $dbc, $sbc, $str ); //全角到半角
}else{
return $str;
}
}
看大家发了怎么多方法,我也贴一个:
php全角转半角函数
/**
* 将一个字串中含有全角的数字字符、字母、空格或'%+-()'字符转换为相应半角字符
*
* @access public
* @param string $str 待转换字串
*
* @return string $str 处理后字串
*/
function make_semiangle($str)
{
$arr = array('0' => '0', '1' => '1', '2' => '2', '3' => '3', '4' => '4',
'5' => '5', '6' => '6', '7' => '7', '8' => '8', '9' => '9',
'A' => 'A', 'B' => 'B', 'C' => 'C', 'D' => 'D', 'E' => 'E',
'F' => 'F', 'G' => 'G', 'H' => 'H', 'I' => 'I', 'J' => 'J',
'K' => 'K', 'L' => 'L', 'M' => 'M', 'N' => 'N', 'O' => 'O',
'P' => 'P', 'Q' => 'Q', 'R' => 'R', 'S' => 'S', 'T' => 'T',
'U' => 'U', 'V' => 'V', 'W' => 'W', 'X' => 'X', 'Y' => 'Y',
'Z' => 'Z', 'a' => 'a', 'b' => 'b', 'c' => 'c', 'd' => 'd',
'e' => 'e', 'f' => 'f', 'g' => 'g', 'h' => 'h', 'i' => 'i',
'j' => 'j', 'k' => 'k', 'l' => 'l', 'm' => 'm', 'n' => 'n',
'o' => 'o', 'p' => 'p', 'q' => 'q', 'r' => 'r', 's' => 's',
't' => 't', 'u' => 'u', 'v' => 'v', 'w' => 'w', 'x' => 'x',
'y' => 'y', 'z' => 'z',
'(' => '(', ')' => ')', '〔' => '[', '〕' => ']', '【' => '[',
'】' => ']', '〖' => '[', '〗' => ']', '“' => '[', '”' => ']',
'‘' => '[', '’' => ']', '{' => '{', '}' => '}', '《' => '<',
'》' => '>',
'%' => '%', '+' => '+', '—' => '-', '-' => '-', '~' => '-',
':' => ':', '。' => '.', '、' => ',', ',' => '.', '、' => '.',
';' => ',', '?' => '?', '!' => '!', '…' => '-', '‖' => '|',
'”' => '"', '’' => '`', '‘' => '`', '|' => '|', '〃' => '"',
' ' => ' ');
return strtr($str, $arr);
}
全角与半角之区别(来自中文维基百科)
全角,又称全形、全宽,是电脑字符的一种格式,字面意思是比普通字符(或半角字符)宽的字符。
传统上,英语或拉丁字母语言使用一字节的空间来存储,而汉字、日语等常使用两字节存储,在使用固定宽度文字的地方,为了使字体看起来整齐,英文字母、数字及其他符号,也由原来只占用一个字空间,改为一概占用两个字的空间来显示,并且使用两个字节来存储。
可以参考这个看看具体的:http://hi.baidu.com/clx2575167/blog/item/2f10071f3abdc3fee1fe0bfb.html