所以我有一个有趣的问题:我有一个字符串,并且大多数情况下我知道会发生什么:
http://www.someurl.com/st=??????????
除本例外,“?”要么是大写字母,要么是数字。问题是,字符串中混入了垃圾:字符串被分成 5 或 6 部分,中间有很多垃圾:不可打印的字符、外来字符以及普通的普通字符。简而言之,看起来像这样的东西: Nyþ=mî;ëMÝ×nüqÏ
通常最后 8 个字符(?)在最后在一起,所以目前我只是PHP 获取最后 8 个字符并希望得到最好的结果。有时,这不起作用,所以我需要一个更强大的解决方案。
这个问题在技术上是无法解决的,但我认为最好的解决方案是从字符串末尾抓取大写或数字字符。如果我得到 8 个或更多,则假设这是正确的。否则,找到 st=
并向前抓取尽可能多的字符来填满 8 个字符配额。有没有正则表达式的方法可以做到这一点,或者我需要卷起袖子并采用嵌套循环风格?
更新:
为了消除一些混乱,我得到一个输入字符串,就像这:
[garbage]http:/[garbage]/somewe[garbage]bsite.co[garbage]m/something=[garbage]????????
除了垃圾位于字符串中不可预测的位置(除了末尾永远不是垃圾),并且具有不可预测的长度(至少,我无法在两者中找到模式)。通常这些 ? 都在一起,因此我只获取最后 8 个字符,但有时它们不是,这会导致一些数据丢失并返回垃圾。
So I have an interesting problem: I have a string, and for the most part I know what to expect:
http://www.someurl.com/st=????????
Except in this case, the ?'s are either upper case letters or numbers. The problem is, the string has garbage mixed in: the string is broken up into 5 or 6 pieces, and in between there's lots of junk: unprintable characters, foreign characters, as well as plain old normal characters. In short, stuff that's apt to look like this: Nyþ=mî;ëMÝ×nüqÏ
Usually the last 8 characters (the ?'s) are together right at the end, so at the moment I just have PHP grab the last 8 chars and hope for the best. Occasionally, that doesn't work, so I need a more robust solution.
The problem is technically unsolvable, but I think the best solution is to grab characters from the end of the string while they are upper case or numeric. If I get 8 or more, assume that is correct. Otherwise, find the st=
and grab characters going forward as many as I need to fill up the 8 character quota. Is there a regex way to do this or will I need to roll up my sleeves and go nested-loop style?
update:
To clear up some confusion, I get an input string that's like this:
[garbage]http:/[garbage]/somewe[garbage]bsite.co[garbage]m/something=[garbage]????????
except the garbage is in unpredictable locations in the string (except the end is never garbage), and has unpredictable length (at least, I have been able to find patterns in neither). Usually the ?s are all together hence me just grabbing the last 8 chars, but sometimes they aren't which results in some missing data and returned garbage.
发布评论
评论(4)
哈,那是个笑话。这是适合您的正则表达式:
Hah, that was a joke. Here's a regex for you:
如前所述,问题是无法解决的。如果垃圾可以包含“普通旧普通字符”字符,并且垃圾可以落在字符串末尾,那么您无法知道此示例中的目标字符串是“ABCDEFGH”还是“BCDEFGHI”:
As stated, the problem is unsolvable. If the garbage can contain "plain old normal characters" characters, and the garbage can fall at the end of the string, then you cannot know whether the target string from this sample is "ABCDEFGH" or "BCDEFGHI":
这些值代表什么?如果您想保留所有内容,而不必处理数据库中的垃圾,也许您应该使用 bin2hex()。
What do these values represent? If you want to retain all of it, just without having to deal with garbage in your database, maybe you should hex-encode it using bin2hex().
您可以使用以下正则表达式:
if (preg_match('/[\'^£$%&*()}{@#~?><>,|=_+Ø-]/', $string ) ==1)
You can use this regular expression :
if (preg_match('/[\'^£$%&*()}{@#~?><>,|=_+¬-]/', $string) ==1)