如何在使用XPath选择器时刮擦整个信息
我遇到了一个问题,即在使用XPath选择器时无法获得所有信息。该线处于开发人员模式。这是
<address class="location-row-address" data-qa-target="provider-office-address">
230 W 13th St Ste 1b<!--
--> <!--
-->New York<!--
-->, <!--
-->NY<!--
--> <!--
-->10011<!--
-->
</address>
我使用的XPATH选择器是
response.xpath('//*[@id="summary-section"]/div[1]/div[2]/div/div/div[2]/div[1]/address/text()').get()
我获得的结果是
230 W 13th St Ste 1b
结果
230 W 13th St Ste 1b New York, NY 10011
我期望我正在使用刮擦的 。谢谢。感谢您的帮助。
编辑: 我面临的上述问题已解决。我使用String()方法和get()从元素节点获取所有字符串。
response.xpath('string(//*[@id="summary-section"]/div[1]/div[2]/div/div/div[2]/div[1]/address)').get()
I encountered a problem where I could not get all the information while using the XPath selector. The line is in developer mode. Is this
<address class="location-row-address" data-qa-target="provider-office-address">
230 W 13th St Ste 1b<!--
--> <!--
-->New York<!--
-->, <!--
-->NY<!--
--> <!--
-->10011<!--
-->
</address>
The XPath selector that I use is
response.xpath('//*[@id="summary-section"]/div[1]/div[2]/div/div/div[2]/div[1]/address/text()').get()
The result I am getting is
230 W 13th St Ste 1b
The result I am expecting is
230 W 13th St Ste 1b New York, NY 10011
I am using scrapy for scraping. Thank you. Your help is appreciated.
Edit:
The above problem I was facing was solved. I used the string() method and get() to get all the strings from the element node.
response.xpath('string(//*[@id="summary-section"]/div[1]/div[2]/div/div/div[2]/div[1]/address)').get()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的XPATH表达式返回所有文本节点,即
地址
元素的孩子。有几个文本节点,其中有评论节点将它们分开!回到Python Land,您在结果上调用
get()
方法,该方法仅返回nodeset的 first 节点。如果您调用
getall()
方法,您将检索字符串列表,并且可以将它们串联以产生所需的文本。但是,一个更简单的方法是使用XPATH函数String
获取address> address> element的“字符串值”。在XPATH 1.0规格中,它以此方式定义了元素节点的字符串值:
将此功能应用于
地址
element将返回您一个字符串值,然后您可以使用get()使用该功能访问它
废纸方法:Your XPath expression returns all the text nodes which are children of the
address
element. There are several text nodes, with comment nodes separating them!Back in Python land, you are calling the
get()
method on the result which returns you only the first node of the nodeset.If you called the
getall()
method you would retrieve a list of strings, and you could concatenate them to produce the text you want. But a simpler method is to use the XPath functionstring
to get the "string value" of theaddress
element. In the XPath 1.0 spec it defines the string value of an element node this way:Applying this function to the
address
element will return you a single string value, which you can then access using theget()
method in Scrapy: