如何获取网页中的特定值？

发布于 2024-12-19 13:50:56 字数 256 浏览 1 评论 0原文

我在网站中有一些

和其他内容，以及无数 div 中间的特定行

<input name="extWarrantyProds" type="hidden" value="23814298 ^ true"/>

如何从这段代码中获取位于中间的“值”部分一个有其他东西的网站？

我正在尝试使用 urllib 但我什至不知道从哪里开始=/

原文

I have some <div>s and other stuff in a site and the specific line in the middle of inumerous divs

<input name="extWarrantyProds" type="hidden" value="23814298 ^ true"/>

How can I get the "value" part from this code, which it is in the middle of a site with other stuff ?

I'm trying with urllib but I don't even know where to start =/

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

埋情葬爱 2024-12-26 13:50:56

我能想到的最简单的方法：

import urllib

urlStr = "http://www..."

fileObj = urllib.urlopen(urlStr)

for line in fileObj:
    if ('<input name="extWarrantyProds"' in line):
        startIndex = line.find('value="') + 7
        endIndex = line.find('"',startIndex)
        print line[startIndex:endIndex]

The easiest way I can think of:

import urllib

urlStr = "http://www..."

fileObj = urllib.urlopen(urlStr)

for line in fileObj:
    if ('<input name="extWarrantyProds"' in line):
        startIndex = line.find('value="') + 7
        endIndex = line.find('"',startIndex)
        print line[startIndex:endIndex]

回复收藏 0 原文

冷月断魂刀 2024-12-26 13:50:56

import lxml.html as lh

html = '''
<input name="extWarrantyProds" type="hidden" value="23814298 ^ true"/>
'''

# If you want to parse from a URL:
# tree = lh.parse('http://example.com')

tree = lh.fromstring(html)

print tree.xpath("//input[@name='extWarrantyProds']/@value")

import lxml.html as lh

html = '''
<input name="extWarrantyProds" type="hidden" value="23814298 ^ true"/>
'''

# If you want to parse from a URL:
# tree = lh.parse('http://example.com')

tree = lh.fromstring(html)

print tree.xpath("//input[@name='extWarrantyProds']/@value")

回复收藏 0 原文

长安忆 2024-12-26 13:50:56

如果这就是您所需要的，则无需任何太花哨的东西。使用 urllib 下载页面并使用 re.findall() 查找值。

import re
import urllib

url = 'http://...'
html = urllib.urlopen(url).read()
matches = re.findall('<input name="extWarrantyProds.*?>', x, re.DOTALL)
for i in matches:
  print re.findall('value="(.*?)"', i)

No need for anything too fancy if that's all you need. Download the page using urllib and look for the value using re.findall().

import re
import urllib

url = 'http://...'
html = urllib.urlopen(url).read()
matches = re.findall('<input name="extWarrantyProds.*?>', x, re.DOTALL)
for i in matches:
  print re.findall('value="(.*?)"', i)

回复收藏 0 原文

~没有更多了~