从安全网页中抓取文本? (C#)

发布于 2025-01-11 07:47:34 字数 1907 浏览 0 评论 0原文

基本上,我试图从中抓取文本的网站是安全的,并且只有那些我认为是组织设置的 VPN 的人才能访问。

当我今天测试我的工具时,在这个地方连接到网络时,在可以访问该站点的计算机上,它不允许该工具访问它。我想知道是否有人可以告诉我我缺少什么。

下面我附上了我的整个源代码。为了上下文和更好的理解,“演示结果”字段仅用于在任何站点上进行测试。目前我已经有了它,这样我就可以将我喜欢的任何 xPath 和 URL 粘贴到文本框中。稍后将对此进行更改并设置为特定的 URL 和 xPath。如您所见,我正在使用 HTMLAgilityPack。当我在任何可供公众访问的网站上使用它时,它就有效。但是当涉及到这个特定站点时,我收到一个错误,表明该对象未设置为对象的实例。该网站在网络浏览器中运行 100% 正常。

namespace ToolConcept
{
    public partial class Form1 : Form
    {
        public string Results1;
        public string Results2;
        public string DemoResults;


        public Form1()
        {
            InitializeComponent();


            DemoResults = "";
            Results1 = "YOUR RESULTS HERE";
            Results2 = "RESULTS SHOWN HERE!";
        }
        public void Scrape(string args)
        {
            HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
            HtmlAgilityPack.HtmlDocument doc = web.Load(urlField.Text);
            foreach(var item in doc.DocumentNode.SelectNodes(pathField.Text))
            {
                DemoResults = item.InnerText;
            }

        }
        private void button1_Click(object sender, EventArgs e)
        {
            if (textBox1.Text == "R1")
            {
                textBox2.Text = Results1;
            }
            else if (textBox1.Text == "R2")
            {
                textBox2.Text = Results1 + Results2;
            }
            else if (textBox1.Text == "0")
            {
                Scrape(DemoResults);
                HtmlAgilityPack.HtmlWeb search = new HtmlAgilityPack.HtmlWeb();

                textBox2.Text = DemoResults;
            }
            if(textBox1.Text == null)
            {
                MessageBox.Show("Not Found!");
            }
        }

        private void button2_Click(object sender, EventArgs e)
        {
            textBox2.Clear();
        }
    }
}

Basically the site I am trying to scrape text from is secure and only accessible to those on what i can assume is the VPN the organization has set.

When I was testing my tool today, while connected to the network at this place, on a computer that can access this site, it was not allowing the tool to access it. I am wondering if maybe there's something im missing that someone can tell me.

Below I have attached my entire source code. For context and a better understanding, the fields for "Demo Results" are just to test on any site. Currently i have it so that i can paste whatever xPath and URL i please in a textbox. This will be changed later and set to a specific URL and xPath. I am using HTMLAgilityPack as you might see. This works when i use it on any site that is accessible to the public. But when it comes to this specific site, I get an error that the object is not set to an instance of an object. The site works 100% fine in the web browser.

namespace ToolConcept
{
    public partial class Form1 : Form
    {
        public string Results1;
        public string Results2;
        public string DemoResults;


        public Form1()
        {
            InitializeComponent();


            DemoResults = "";
            Results1 = "YOUR RESULTS HERE";
            Results2 = "RESULTS SHOWN HERE!";
        }
        public void Scrape(string args)
        {
            HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
            HtmlAgilityPack.HtmlDocument doc = web.Load(urlField.Text);
            foreach(var item in doc.DocumentNode.SelectNodes(pathField.Text))
            {
                DemoResults = item.InnerText;
            }

        }
        private void button1_Click(object sender, EventArgs e)
        {
            if (textBox1.Text == "R1")
            {
                textBox2.Text = Results1;
            }
            else if (textBox1.Text == "R2")
            {
                textBox2.Text = Results1 + Results2;
            }
            else if (textBox1.Text == "0")
            {
                Scrape(DemoResults);
                HtmlAgilityPack.HtmlWeb search = new HtmlAgilityPack.HtmlWeb();

                textBox2.Text = DemoResults;
            }
            if(textBox1.Text == null)
            {
                MessageBox.Show("Not Found!");
            }
        }

        private void button2_Click(object sender, EventArgs e)
        {
            textBox2.Clear();
        }
    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文