模块化编程以适应未来的变化(抓取网站的软件)

发布于 2024-11-27 15:40:10 字数 706 浏览 2 评论 0原文

我用 C# 开发了一个软件,使用 Windows 窗体来抓取选定网站的图像。

  1. 我遇到的第一个问题是,我监控的网站不断改变其外观和感觉,从而使我的代码需要更新。我已改用 XPath 来隔离我要查找的 div,但 div id 也发生了变化。我曾想过对软件将读取的每个站点使用带有 div xpath 的文本文件,从而节省了编辑和重新编译代码的时间。有没有更好的方法来解决这个问题?也许是 CodeDom?

  2. 其次,由于每个网站都使用不同的格式和编码,我不得不用 HtmlDocument、HtmlWebResponse、HtmlNodes 等重写部分代码,这最终占了我代码的近一半。我无法将它们放在一起,因为有些需要额外的抓取和分页,有些则不需要。有没有办法简化这个问题?

  3. 最后,我将整个代码放在一个类文件中,大约有 600 行代码。我拥有的唯一方法是后台工作人员、用户界面事件处理程序、每个站点的抓取方法以及保存图像的一种方法。将整个代码放在一个类中可以吗?当我用 Java 编写时,我经常使用多个类并将它们作为对象调用,这有助于更轻松地更改特定部分。我可以用 C# 做同样的事情吗?

有没有更有效的方法来制作软件?我正在考虑为每个站点创建一个类,以便可以直接对相关类进行修改,但这会导致每个类中重复很多行。或者可以将整个类文件放在一个类文件中吗?

谢谢。

PS:该软件仅供个人使用,但我认为这是学习和应用良好编程的好机会。

I had developed a software in C# using Windows Forms to scrape selected websites for images.

  1. First problem I have is that the websites I monitor constantly change their look and feel, thus making my code in need for updating. I had switched to using XPaths to isolate the divs I look for, but the div ids change too. I have thought of using a text file with the div xpath for each site which the software would read thru, thus saving me the time to edit and recompile the code. Is there a better way to solve this problem ? Maybe CodeDom ?

  2. Secondly, since every website uses different formatting and encoding I had to rewrite parts of code with the HtmlDocument, HtmlWebResponse, HtmlNodes and others for each of them, which ended up accounting for nearly half of my code. I could not put them together since some need extra scraping and paginating and some do not. Is there a way make to simplify this problem ?

  3. Lastly, I have the whole code in one class file with around 600 lines of code. The only methods I have are the backgroundworkers, ui event handlers, a scraping method each for each site, and one method to save the images. Is it alright to have the whole code in one class ? When I used to write in Java, I used to often make use of multiple classes and call them as objects, this helped making changes to particular sections easier. Can I do the same with C# ?

Is there a more efficient approach to making the software ? I was thinking of making a class for each site, so that modifications could be done directly to the class in question, but that would cause a lot of lines to be repeated in each class. Or is it okay to have the whole in one class file ?

Thanks.

PS: This software is for personal use, but I think it is a good opportunity to learn and apply good programming.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文