Javascript 的 Python Scraper?

发布于 2024-09-03 03:17:43 字数 487 浏览 2 评论 0原文

谁能指导我找到一个好的 Python 屏幕抓取库来获取 JavaScript 代码(希望有好的文档/教程)?我想看看有哪些选择,但最重要的是最容易学习且效果最快......想知道是否有人有经验。我听说过一些关于蜘蛛猴的东西,但也许还有更好的?

具体来说,我使用 BeautifulSoup 和 Mechanize 到达这里,但需要一种方法来打开 javascript 弹出窗口、提交数据并下载/解析 javascript 弹出窗口中的结果。

<a href="javascript:openFindItem(12510109)" onclick="s_objectID=&quot;javascript:openFindItem(12510109)_1&quot;;return this.s_oc?this.s_oc(e):true">Find Item</a>

我想用 Google App 引擎和 Django 来实现这个。谢谢!

Can anyone direct me to a good Python screen scraping library for javascript code (hopefully one with good documentation/tutorials)? I'd like to see what options are out there, but most of all the easiest to learn with fastest results... wondering if anyone had experience. I've heard some stuff about spidermonkey, but maybe there are better ones out there?

Specifically, I use BeautifulSoup and Mechanize to get to here, but need a way to open the javascript popup, submit data, and download/parse the results in the javascript popup.

<a href="javascript:openFindItem(12510109)" onclick="s_objectID="javascript:openFindItem(12510109)_1";return this.s_oc?this.s_oc(e):true">Find Item</a>

I'd like to implement this with Google App engine and Django. Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

极致的悲 2024-09-10 03:17:43

在这些情况下,我通常做的是自动化实际的浏览器,并从那里获取处理后的 HTML。

编辑

以下是在页面加载后自动将 InternetExplorer 导航至 URL 并获取标题和位置的示例。

from win32com.client import Dispatch

from ctypes import Structure, pointer, windll
from ctypes import c_int, c_long, c_uint
import win32con
import pywintypes

class POINT(Structure):
    _fields_ = [('x', c_long),
                ('y', c_long)]
    def __init__( self, x=0, y=0 ):
        self.x = x
        self.y = y

class MSG(Structure):
    _fields_ = [('hwnd', c_int),
                ('message', c_uint),
                ('wParam', c_int),
                ('lParam', c_int),
                ('time', c_int),
                ('pt', POINT)]

def wait_until_ready(ie):
    pMsg = pointer(MSG())
    NULL = c_int(win32con.NULL)

    while True:

        while windll.user32.PeekMessageW(pMsg, NULL, 0, 0, win32con.PM_REMOVE) != 0:
            windll.user32.TranslateMessage(pMsg)
            windll.user32.DispatchMessageW(pMsg)

        if ie.ReadyState == 4:
            break


ie = Dispatch("InternetExplorer.Application")

ie.Visible = True

ie.Navigate("http://google.com/")

wait_until_ready(ie)

print "title:", ie.Document.Title
print "location:", ie.Document.location

What I usually do is automate an actual browser in these cases, and grab the processed HTML from there.

Edit:

Here's an example of automating InternetExplorer to navigate to a URL and grab the title and location after the page loads.

from win32com.client import Dispatch

from ctypes import Structure, pointer, windll
from ctypes import c_int, c_long, c_uint
import win32con
import pywintypes

class POINT(Structure):
    _fields_ = [('x', c_long),
                ('y', c_long)]
    def __init__( self, x=0, y=0 ):
        self.x = x
        self.y = y

class MSG(Structure):
    _fields_ = [('hwnd', c_int),
                ('message', c_uint),
                ('wParam', c_int),
                ('lParam', c_int),
                ('time', c_int),
                ('pt', POINT)]

def wait_until_ready(ie):
    pMsg = pointer(MSG())
    NULL = c_int(win32con.NULL)

    while True:

        while windll.user32.PeekMessageW(pMsg, NULL, 0, 0, win32con.PM_REMOVE) != 0:
            windll.user32.TranslateMessage(pMsg)
            windll.user32.DispatchMessageW(pMsg)

        if ie.ReadyState == 4:
            break


ie = Dispatch("InternetExplorer.Application")

ie.Visible = True

ie.Navigate("http://google.com/")

wait_until_ready(ie)

print "title:", ie.Document.Title
print "location:", ie.Document.location
萌化 2024-09-10 03:17:43

我使用 Python 绑定到 webkit 来渲染基本的 JavaScript,并使用 Chickenfoot 来实现更高级的交互。有关详细信息,请参阅此 webkit 示例

I use the Python bindings to webkit to render basic JavaScript and Chickenfoot for more advanced interactions. See this webkit example for more info.

冰雪梦之恋 2024-09-10 03:17:43

您还可以使用名为 Spynner 的“程序化 Web 浏览器”。我发现这是最好的解决方案。比较容易使用。

You can also use a "programatic web browser" named Spynner. I found this to be the best solution. Relatively easy to use.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文