从安全网页中抓取文本? (C#)
基本上,我试图从中抓取文本的网站是安全的,并且只有那些我认为是组织设置的 VPN 的人才能访问。
当我今天测试我的工具时,在这个地方连接到网络时,在可以访问该站点的计算机上,它不允许该工具访问它。我想知道是否有人可以告诉我我缺少什么。
下面我附上了我的整个源代码。为了上下文和更好的理解,“演示结果”字段仅用于在任何站点上进行测试。目前我已经有了它,这样我就可以将我喜欢的任何 xPath 和 URL 粘贴到文本框中。稍后将对此进行更改并设置为特定的 URL 和 xPath。如您所见,我正在使用 HTMLAgilityPack。当我在任何可供公众访问的网站上使用它时,它就有效。但是当涉及到这个特定站点时,我收到一个错误,表明该对象未设置为对象的实例。该网站在网络浏览器中运行 100% 正常。
namespace ToolConcept
{
public partial class Form1 : Form
{
public string Results1;
public string Results2;
public string DemoResults;
public Form1()
{
InitializeComponent();
DemoResults = "";
Results1 = "YOUR RESULTS HERE";
Results2 = "RESULTS SHOWN HERE!";
}
public void Scrape(string args)
{
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(urlField.Text);
foreach(var item in doc.DocumentNode.SelectNodes(pathField.Text))
{
DemoResults = item.InnerText;
}
}
private void button1_Click(object sender, EventArgs e)
{
if (textBox1.Text == "R1")
{
textBox2.Text = Results1;
}
else if (textBox1.Text == "R2")
{
textBox2.Text = Results1 + Results2;
}
else if (textBox1.Text == "0")
{
Scrape(DemoResults);
HtmlAgilityPack.HtmlWeb search = new HtmlAgilityPack.HtmlWeb();
textBox2.Text = DemoResults;
}
if(textBox1.Text == null)
{
MessageBox.Show("Not Found!");
}
}
private void button2_Click(object sender, EventArgs e)
{
textBox2.Clear();
}
}
}
Basically the site I am trying to scrape text from is secure and only accessible to those on what i can assume is the VPN the organization has set.
When I was testing my tool today, while connected to the network at this place, on a computer that can access this site, it was not allowing the tool to access it. I am wondering if maybe there's something im missing that someone can tell me.
Below I have attached my entire source code. For context and a better understanding, the fields for "Demo Results" are just to test on any site. Currently i have it so that i can paste whatever xPath and URL i please in a textbox. This will be changed later and set to a specific URL and xPath. I am using HTMLAgilityPack as you might see. This works when i use it on any site that is accessible to the public. But when it comes to this specific site, I get an error that the object is not set to an instance of an object. The site works 100% fine in the web browser.
namespace ToolConcept
{
public partial class Form1 : Form
{
public string Results1;
public string Results2;
public string DemoResults;
public Form1()
{
InitializeComponent();
DemoResults = "";
Results1 = "YOUR RESULTS HERE";
Results2 = "RESULTS SHOWN HERE!";
}
public void Scrape(string args)
{
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(urlField.Text);
foreach(var item in doc.DocumentNode.SelectNodes(pathField.Text))
{
DemoResults = item.InnerText;
}
}
private void button1_Click(object sender, EventArgs e)
{
if (textBox1.Text == "R1")
{
textBox2.Text = Results1;
}
else if (textBox1.Text == "R2")
{
textBox2.Text = Results1 + Results2;
}
else if (textBox1.Text == "0")
{
Scrape(DemoResults);
HtmlAgilityPack.HtmlWeb search = new HtmlAgilityPack.HtmlWeb();
textBox2.Text = DemoResults;
}
if(textBox1.Text == null)
{
MessageBox.Show("Not Found!");
}
}
private void button2_Click(object sender, EventArgs e)
{
textBox2.Clear();
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论