编写一个脚本,用于循环获取网页的特定部分以供离线使用

发布于 2024-11-02 18:13:17 字数 1402 浏览 1 评论 0原文

我有一个特定的用途。我正在准备GRE。每次出现新词我都会查一下 www.mnemonicdictionary.com,了解其含义和助记符。我想最好用 python 编写一个脚本(或者如果有人可以为我提供一个指向已经存在的东西的指针,因为我不太了解 python,但我现在正在学习),它从文本文件中获取单词列表,然后查找它在此站点上,只需获取相关部分(含义和助记符)并将其存储在另一个文本文件中以供离线使用。可以这样做吗?我也尝试查找这些页面的来源。但除了 html 标签之外,它们还具有一些 ajax 功能。 有人可以为我提供一个完整的方法来解决这个问题吗?

示例:对于单词impecunious:

相关的html源代码是这样的

<ul class='wordnet'><li><p>(adj.)&nbsp;not having enough money to pay for necessities</p><u>synonyms</u> : <a href='http://www.mnemonicdictionary.com/word/hard up' onclick="ajaxSearch('hard up','click'); return false;">hard up</a> , <a href='http://www.mnemonicdictionary.com/word/in straitened circumstances' onclick="ajaxSearch('in straitened circumstances','click'); return false;">in straitened circumstances</a> , <a href='http://www.mnemonicdictionary.com/word/penniless' onclick="ajaxSearch('penniless','click'); return false;">penniless</a> , <a href='http://www.mnemonicdictionary.com/word/penurious' onclick="ajaxSearch('penurious','click'); return false;">penurious</a> , <a href='http://www.mnemonicdictionary.com/word/pinched' onclick="ajaxSearch('pinched','click'); return false;">pinched</a><p></p></li></ul>

,但网页呈现如下:

•(adj.)没有足够的钱来支付必需品 近义词:穷困潦倒、穷困潦倒、身无分文、吝啬、拮据。

I have a specific use. I am preparing for GRE. Everytime a new word comes, I look it up at
www.mnemonicdictionary.com, for its meanings and mnemonics. I want to write a script in python preferably ( or if someone could provide me a pointer to an already existing thing as I dont know python much but I am learning now) which takes a list of words from a text file, and looks it up at this site, and just fetch relevant portion (meaning and mnemonics) and store it another text file for offline use. Is it possible to do so ?? I tried to look up the source of these pages also. But along with html tags, they also have some ajax functions.
Could someone provide me a complete way how to go about this ??

Example: for word impecunious:

the related html source is like this

<ul class='wordnet'><li><p>(adj.) not having enough money to pay for necessities</p><u>synonyms</u> : <a href='http://www.mnemonicdictionary.com/word/hard up' onclick="ajaxSearch('hard up','click'); return false;">hard up</a> , <a href='http://www.mnemonicdictionary.com/word/in straitened circumstances' onclick="ajaxSearch('in straitened circumstances','click'); return false;">in straitened circumstances</a> , <a href='http://www.mnemonicdictionary.com/word/penniless' onclick="ajaxSearch('penniless','click'); return false;">penniless</a> , <a href='http://www.mnemonicdictionary.com/word/penurious' onclick="ajaxSearch('penurious','click'); return false;">penurious</a> , <a href='http://www.mnemonicdictionary.com/word/pinched' onclick="ajaxSearch('pinched','click'); return false;">pinched</a><p></p></li></ul>

but the web page renders like this:

•(adj.) not having enough money to pay for necessities
synonyms : hard up , in straitened circumstances , penniless , penurious , pinched

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

半岛未凉 2024-11-09 18:13:17

如果您有 Bash(版本 4+)和 wget,示例

#!/bin/bash
template="http://www.mnemonicdictionary.com/include/ajaxSearch.php?word=%s&event=search"
while read -r word
do
    url=$(printf "$template" "$word")
    data=$(wget -O- -q "$url")
    data=${data#* }
    echo "$word: ${data%%<*}"
done < file

输出

gt; more file
synergy
tranquil
jester

gt; bash dict.sh
synergy: the working together of two things (muscles or drugs for example) to produce an effect greater than the sum of their individual effects
tranquil: (of a body of water) free from disturbance by heavy waves
jester: a professional clown employed to entertain a king or nobleman in the Middle Ages

更新:包括助记符

template="http://www.mnemonicdictionary.com/include/ajaxSearch.php?word=%s&event=search"
while read -r word
do
    url=$(printf "$template" "$word")
    data=$(wget -O- -q "$url")
    data=${data#* }
    m=${data#*class=\'mnemonic\'}
    m=${m%%</p>*}
    m="${m##* }"
    echo "$word: ${data%%<*}, mneumonic: $m"    
done < file

If you have Bash (version 4+) and wget, an example

#!/bin/bash
template="http://www.mnemonicdictionary.com/include/ajaxSearch.php?word=%s&event=search"
while read -r word
do
    url=$(printf "$template" "$word")
    data=$(wget -O- -q "$url")
    data=${data#* }
    echo "$word: ${data%%<*}"
done < file

Sample output

gt; more file
synergy
tranquil
jester

gt; bash dict.sh
synergy: the working together of two things (muscles or drugs for example) to produce an effect greater than the sum of their individual effects
tranquil: (of a body of water) free from disturbance by heavy waves
jester: a professional clown employed to entertain a king or nobleman in the Middle Ages

Update: Include mneumonic

template="http://www.mnemonicdictionary.com/include/ajaxSearch.php?word=%s&event=search"
while read -r word
do
    url=$(printf "$template" "$word")
    data=$(wget -O- -q "$url")
    data=${data#* }
    m=${data#*class=\'mnemonic\'}
    m=${m%%</p>*}
    m="${m##* }"
    echo "$word: ${data%%<*}, mneumonic: $m"    
done < file
阪姬 2024-11-09 18:13:17

从 Bash shell(Linux、Mac 或带有 Cygwin 的 Windows)使用 curl 和 sed。

如果我有时间,我会写一个快速脚本……不过现在得给宝宝洗澡了。

Use curl and sed from a Bash shell (either Linux, Mac, or Windows with Cygwin).

If I get a second I will write a quick script ... gotta give the baby a bath now though.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文