如何判断字符串中的空白类型?
我正在从 10 年历史的网站 中抓取一些信息,该网站是使用 Frontpage(最初)和 Dreamweaver 在 ASP 中构建的(最近)。我正在使用 PHP。
我正在返回带有空白而不是空格的字符串。使用 PHP 修剪功能,可以删除一些空白,但不是全部。
original string: string(47) " School Calendar"
trimmed string: string(34) " School Calendar"
如何找出空白是什么以便将其删除?
我的页面显示了原始字符串和修剪字符串的 var_dumps,位于此处。
I am scraping some information from a 10 year old website that was built in ASP using Frontpage(originally) and Dreamweaver(lately). I am using PHP.
I am getting back strings with whitespace that is not spaces. Using the PHP trim function, some of the white space is removed but not all.
original string: string(47) " School Calendar"
trimmed string: string(34) " School Calendar"
How do I figure out what the whitespaces are so I can remove them?
My page showing var_dumps of the original and trimmed strings is here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
看起来(如果您在页面上查看源代码),您的字符串具有
“空格”,这些空格未被 PHP 的修剪函数修剪。
最好的选择可能是通过调用 str_replace 修剪之前:
$stringToTrim = str_replace(" "," ", $original);
$trimmed = trim($stringToTrim);
(不是使用标准代码格式,因为它没有正确处理
)
It looks like (if you view source on your page), that you're string has
"spaces" that aren't being trimmed by PHP's trim function.
The best option is probably to replace these in advance, by calling str_replace prior to trim:
$stringToTrim = str_replace(" "," ", $original);
$trimmed = trim($stringToTrim);
(Not using standard code formatting because it wasn't handling the
correctly)
echo ord($trimmed_string)
会告诉您该示例中空格的字符代码是什么。 (它给出第一个字符的字符代码。)echo ord($trimmed_string)
will tell you what the character code of the whitespace in that example is. (It gives the character code of the first character.)Unicode 有大量“不同”的空间:
http://en.wikipedia.org/wiki/Space_%28punctuation%29# Table_of_spaces
http://www.brunildo.org/test/space-chars .html
Trim 并不了解所有这些。如果需要,您应该使用正则表达式将它们全部删除。
Unicode has a plenty of "different" spaces:
http://en.wikipedia.org/wiki/Space_%28punctuation%29#Table_of_spaces
http://www.brunildo.org/test/space-chars.html
Trim doesn't know about them all. You should use regex to get rid of them all if you need.