如何找到哪个 JavaScript 函数生成特定变量?
我正在尝试仅使用请求库来抓取电子商务网站。 https://b2b.baidu.com/ 每个链接和每个请求网站都带有 fid
和 pi
参数,如下所示
https://b2b.baidu.com/s?q=sfsdf&from=search&fid=0,1645713795553&pi=b2b.index.search...161635618068288
但是,由于网站仅加载原始 HTML 内容,没有 javascript 生成的内容,requests.get (< a href="https://b2b.baidu.com/" rel="nofollow noreferrer">https://b2b.baidu.com/) 响应没有这两个参数。
我知道替代方案是使用 selenium 或 requests_html,但为了挑战,当然,为了科学,我想尝试找出 API 并仅使用 requests 进行抓取。
使用Chrome开发工具,我已经弄清楚了它在html之后加载了哪些js脚本,但是我不知道如何捕捉这两个脚本生成的时刻以及生成的函数是什么。断点无法捕获它,因为这个 DOM 元素(基本上页面上的每个元素都包含 fid
和 pi
)没有改变,我不知道如何找到这些数字是如何产生的:(
关于我应该使用什么方法有什么建议吗?我以前从未有过实际的逆向工程 js 脚本的经验。
I am trying to scrape an e-commerce website using requests library only.
https://b2b.baidu.com/
Every link and every request website does comes with fid
and pi
parameters, like this
https://b2b.baidu.com/s?q=sfsdf&from=search&fid=0,1645713795553&pi=b2b.index.search...161635618068288
However, since website loads only original HTML content without javascript-generated content, requests.get (https://b2b.baidu.com/) response doesn't have those two parameters.
I understand that alternative is to use selenium or requests_html, but for challenge and, of course, for science I would like to try to figure out API instead and scrape using requests only.
Using Chrome dev tools, i've figured out what js scripts it loads after html, but I can't figure out how to catch the moment when those two are generated and what function does the generation. Breakpoints can't catch it, since this DOM element (basically every on the page contains fid
and pi
) isn't getting changed, I don't know how to find how those numbers are being made :(
Any advice on what methodology should I use? I've never had experience with actual reverse-engineering js scripts before.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论