一个Python脚本,可以自动在网站中输入一些文本并获取其源代码

发布于 2024-12-18 15:23:19 字数 310 浏览 3 评论 0原文

我正在使用 Python 进行生物医学命名提取。

现在我必须交叉检查将文本输入 http://text0.mib 的结果.man.ac.uk/software/geniatagger/ 并解析我将文本提交到其中后获得的 HTML 文本的源代码。

我希望在我的 GUI 本身中完成同样的事情,即从我制作的 GUI 输入并将文本提交到该网站并获取源代码,以便进行交叉检查,我不必每次都访问浏览器。

提前致谢

I am doing biomedical named extraction using Python.

Now I have to cross check the results from inputting the text to http://text0.mib.man.ac.uk/software/geniatagger/ and parse the source code of the HTML text that I get after submitting text into it.

I want that the same thing to be done in my GUI itself i.e. it input from GUI that I have made and submit the text into this website and get the source code so that for cross checking I don't have to visit each time from the browser.

Thanks in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

还不是爱你 2024-12-25 15:23:19

其实,这是一个很好的问题!

您要做的第一件事就是稍微探索一下网站的源代码。
如果你查看网站的源代码,你会看到这段代码

<form method="POST" action="a.cgi">
<p>
Please enter a text that you want to analyze.
</p>
<p>
<textarea name="paragraph" rows="15" cols="80" wrap="soft">
... some text here ...
### This is a sample. Replace this with your own text.

</textarea>
</p>
<p>
<input type="submit" value="Submit Text" />
<input type="reset" />
</p>
</form>

你看到的是请求被发送到 a.cgi 地址,因为我们已经在地址上

http://text0.mib.man.ac.uk/software/geniatagger/

我们想要发送的数据将被发送到与此连接的地址一

http://text0.mib.man.ac.uk/software/geniatagger/a.cgi

但是我们要发送什么到那里呢?
我们需要一个数据,数据作为“paragraph”POST 参数发送,您会看到,由于表单具有值为 POST 的属性方法,并且文本区域的名称是“paragraph”

我们使用此 python 代码打开它

import urllib
import urllib2

text =  """
        Further, while specific constitutive binding to the peri-kappa B site is seen in monocytes, stimulation with phorbol esters induces additional, specific binding. Understanding the monocyte-specific function of the peri-kappa B factor may ultimately provide insight into the different role monocytes and T-cells play in HIV pathogenesis. 

### This is a sample. Replace this with your own text.
        """
data = {
        "paragraph" : text 
       }

encoded_data = urllib.urlencode(data)
content = urllib2.urlopen("http://text0.mib.man.ac.uk/software/geniatagger/a.cgi",
        encoded_data)
print content.readlines()

到目前为止我们得到了什么?我们为您的 GUI 程序提供了一个“引擎”。
您可以做的是使用 python 的 HTMLParser (可选)解析此内容变量
你提到你想在 GUI 中显示它?
您可以使用 GTK 或 Qt 来完成此操作,并将此功能映射到单个按钮,您必须阅读教程 ,实现这个目的非常简单。如果您有问题,请评论这篇文章,我可以用 GUI 扩展这个答案

Actually, this is a great question!

First thing you have to do is to explore a source code of the website a little bit.
If you look at the source code of the website you see this block of code

<form method="POST" action="a.cgi">
<p>
Please enter a text that you want to analyze.
</p>
<p>
<textarea name="paragraph" rows="15" cols="80" wrap="soft">
... some text here ...
### This is a sample. Replace this with your own text.

</textarea>
</p>
<p>
<input type="submit" value="Submit Text" />
<input type="reset" />
</p>
</form>

What you see is that request is send to a.cgi address, since we are already on address

http://text0.mib.man.ac.uk/software/geniatagger/

The data we want to send will be send to address concatenated with this one

http://text0.mib.man.ac.uk/software/geniatagger/a.cgi

But what are we going to send there?
We need a data, data are send as "paragraph" POST parameter, you see that since form has attribute method with value POST, and name of textarea is "paragraph"

We open this using this python code

import urllib
import urllib2

text =  """
        Further, while specific constitutive binding to the peri-kappa B site is seen in monocytes, stimulation with phorbol esters induces additional, specific binding. Understanding the monocyte-specific function of the peri-kappa B factor may ultimately provide insight into the different role monocytes and T-cells play in HIV pathogenesis. 

### This is a sample. Replace this with your own text.
        """
data = {
        "paragraph" : text 
       }

encoded_data = urllib.urlencode(data)
content = urllib2.urlopen("http://text0.mib.man.ac.uk/software/geniatagger/a.cgi",
        encoded_data)
print content.readlines()

And what do we get so far? We got an "engine" for your GUI program.
What you can do is parse this content variable with python's HTMLParser (optional)
And you mentioned that you want to display this in GUI?
You can do this using GTK or Qt and map this functionality to a single button, you must read a tutorial , it's really easy for this purpose. If you have problems just comment this post and I can extend this answer with GUI

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文