从字符串、正则表达式中提取变量?
我的困惑:作为 PHP 新手,我尝试使用正则表达式从字符串中提取一些数据,但我找不到正确的语法。
字符串的内容被抓取为网站上多个图像的 html,我希望最终输出是 3 个独立的变量:“$Number1”、“$Number2”和“$Status”。
输入字符串 $html 的内容示例:
<div id="system">
<img alt="2" height="35" src="/images/numbers/2.jpg" width="18" /><img alt="2" height="35" src="/images/numbers/2.jpg" width="18" /><img alt=".5" height="35" src="/images/numbers/point5.jpg" style="margin-left: -4px" width="26" /><img alt="system statusA" height="35" src="/images/numbers/statusA.jpg" width="37" /><img alt="2" height="35" src="/images/numbers/2.jpg" width="18" /><img alt="1" height="35" src="/images/numbers/1.jpg" width="18" /><img alt=".0" height="35" src="/images/numbers/point0.jpg" style="margin-left: -4px" width="26" />
</div>
该字符串中可能出现的值为:
- 0.jpg
- 1.jpg
- 2.jpg
- 3.jpg
- 4.jpg
- 5.jpg
- 6.jpg
- 7.jpg
- 8.jpg
- 9.jpg
- point0.jpg
- point5.jpg
- statusA.jpg
- statusB.jpg
- statusC.jpg
- statusD.jpg
- statusE.jpg
- statusF.jpg
结果应该是变量:
- “Number1”(XX.X)基于前两个数字(0-9) 和 .0 或 .5
- “状态”(statusX) 基于状态
- “Number2”(XX.X) 基于最后两个数字 (0-9) 和 .0 或 .5
到目前为止代码:
$regex = '\balt='(.*?)';
preg_match($regex,$html,$match);
var_dump($match);
echo $match[0];
可能我必须分多个步骤执行此操作或使用其他功能,谁可以帮助我?
My puzzle: as a PHP newby I am trying to extract some data from a string using a regular expression, but I cannot find a correct syntax.
The content of the string is scraped as html of several images from a website, I want the final output to be 3 seperate variables: "$Number1", "$Number2" and "$Status".
An example of the content of the input string $html:
<div id="system">
<img alt="2" height="35" src="/images/numbers/2.jpg" width="18" /><img alt="2" height="35" src="/images/numbers/2.jpg" width="18" /><img alt=".5" height="35" src="/images/numbers/point5.jpg" style="margin-left: -4px" width="26" /><img alt="system statusA" height="35" src="/images/numbers/statusA.jpg" width="37" /><img alt="2" height="35" src="/images/numbers/2.jpg" width="18" /><img alt="1" height="35" src="/images/numbers/1.jpg" width="18" /><img alt=".0" height="35" src="/images/numbers/point0.jpg" style="margin-left: -4px" width="26" />
</div>
The possible values which can appear in this string are:
- 0.jpg
- 1.jpg
- 2.jpg
- 3.jpg
- 4.jpg
- 5.jpg
- 6.jpg
- 7.jpg
- 8.jpg
- 9.jpg
- point0.jpg
- point5.jpg
- statusA.jpg
- statusB.jpg
- statusC.jpg
- statusD.jpg
- statusE.jpg
- statusF.jpg
The result should be variables:
- "Number1" (XX.X) based upon the first two numbers (0-9) and .0 or .5
- "Status" (statusX) based upon the status
- "Number2" (XX.X) based upon the last two numbers (0-9) and .0 or .5
Code so far:
$regex = '\balt='(.*?)';
preg_match($regex,$html,$match);
var_dump($match);
echo $match[0];
Probably I have to do this in multiple steps or use another function, who can help me?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您应该问自己的第一件事是:“我的输入数据是什么格式”。由于在本例中它显然是 HTML 片段,因此您应该将该片段提供给 HTML 解析器,而不是正则表达式引擎。
我不知道确切的函数名称,但您的代码应该如下所示:
因此您需要找到一个 HTML 解析器,将字符串解析为节点树。节点应该具有根据 CSS 类、元素名称或节点 ID 查找内部节点的方法。对于 Python,这个库称为 BeautifulSoup,对于 Java,它是 JSoup,而且我确信 PHP 也有类似的东西。
simplehtmldom 提供的示例看起来很有希望。
The very first thing that you should ask yourself is: "in what format is my input data". Since in this case it is clearly a snippet of HTML, you should feed that snippet to an HTML parser, and not to a regular expression engine.
I don't know the exact function names, but your code should look like this:
So you need to find an HTML parser that parses a string into a tree of nodes. The nodes should have methods for finding node inside them based on CSS classes, element names or node IDs. For Python this library is called BeautifulSoup, for Java it is JSoup, and I'm sure that there is something similar for PHP.
The examples provided with simplehtmldom look promising.
可能是 DOM : http://www.php.net/manual/en/book .dom.php
请参阅强大且成熟的 HTML 解析器PHP 也是如此
Possibly DOM : http://www.php.net/manual/en/book.dom.php
See Robust and Mature HTML Parser for PHP too
你只想要替代品吗?试试这个 xpath 示例:
You want just the alt's? Try this xpath example: