编写一个脚本,用于循环获取网页的特定部分以供离线使用
我有一个特定的用途。我正在准备GRE。每次出现新词我都会查一下 www.mnemonicdictionary.com,了解其含义和助记符。我想最好用 python 编写一个脚本(或者如果有人可以为我提供一个指向已经存在的东西的指针,因为我不太了解 python,但我现在正在学习),它从文本文件中获取单词列表,然后查找它在此站点上,只需获取相关部分(含义和助记符)并将其存储在另一个文本文件中以供离线使用。可以这样做吗?我也尝试查找这些页面的来源。但除了 html 标签之外,它们还具有一些 ajax 功能。 有人可以为我提供一个完整的方法来解决这个问题吗?
示例:对于单词impecunious:
相关的html源代码是这样的
<ul class='wordnet'><li><p>(adj.) not having enough money to pay for necessities</p><u>synonyms</u> : <a href='http://www.mnemonicdictionary.com/word/hard up' onclick="ajaxSearch('hard up','click'); return false;">hard up</a> , <a href='http://www.mnemonicdictionary.com/word/in straitened circumstances' onclick="ajaxSearch('in straitened circumstances','click'); return false;">in straitened circumstances</a> , <a href='http://www.mnemonicdictionary.com/word/penniless' onclick="ajaxSearch('penniless','click'); return false;">penniless</a> , <a href='http://www.mnemonicdictionary.com/word/penurious' onclick="ajaxSearch('penurious','click'); return false;">penurious</a> , <a href='http://www.mnemonicdictionary.com/word/pinched' onclick="ajaxSearch('pinched','click'); return false;">pinched</a><p></p></li></ul>
,但网页呈现如下:
•(adj.)没有足够的钱来支付必需品 近义词:穷困潦倒、穷困潦倒、身无分文、吝啬、拮据。
I have a specific use. I am preparing for GRE. Everytime a new word comes, I look it up at
www.mnemonicdictionary.com, for its meanings and mnemonics. I want to write a script in python preferably ( or if someone could provide me a pointer to an already existing thing as I dont know python much but I am learning now) which takes a list of words from a text file, and looks it up at this site, and just fetch relevant portion (meaning and mnemonics) and store it another text file for offline use. Is it possible to do so ?? I tried to look up the source of these pages also. But along with html tags, they also have some ajax functions.
Could someone provide me a complete way how to go about this ??
Example: for word impecunious:
the related html source is like this
<ul class='wordnet'><li><p>(adj.) not having enough money to pay for necessities</p><u>synonyms</u> : <a href='http://www.mnemonicdictionary.com/word/hard up' onclick="ajaxSearch('hard up','click'); return false;">hard up</a> , <a href='http://www.mnemonicdictionary.com/word/in straitened circumstances' onclick="ajaxSearch('in straitened circumstances','click'); return false;">in straitened circumstances</a> , <a href='http://www.mnemonicdictionary.com/word/penniless' onclick="ajaxSearch('penniless','click'); return false;">penniless</a> , <a href='http://www.mnemonicdictionary.com/word/penurious' onclick="ajaxSearch('penurious','click'); return false;">penurious</a> , <a href='http://www.mnemonicdictionary.com/word/pinched' onclick="ajaxSearch('pinched','click'); return false;">pinched</a><p></p></li></ul>
but the web page renders like this:
•(adj.) not having enough money to pay for necessities
synonyms : hard up , in straitened circumstances , penniless , penurious , pinched
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您有 Bash(版本 4+)和
wget
,示例输出
更新:包括助记符
If you have Bash (version 4+) and
wget
, an exampleSample output
Update: Include mneumonic
从 Bash shell(Linux、Mac 或带有 Cygwin 的 Windows)使用 curl 和 sed。
如果我有时间,我会写一个快速脚本……不过现在得给宝宝洗澡了。
Use curl and sed from a Bash shell (either Linux, Mac, or Windows with Cygwin).
If I get a second I will write a quick script ... gotta give the baby a bath now though.