模块化编程以适应未来的变化(抓取网站的软件)
我用 C# 开发了一个软件,使用 Windows 窗体来抓取选定网站的图像。
我遇到的第一个问题是,我监控的网站不断改变其外观和感觉,从而使我的代码需要更新。我已改用 XPath 来隔离我要查找的 div,但 div id 也发生了变化。我曾想过对软件将读取的每个站点使用带有 div xpath 的文本文件,从而节省了编辑和重新编译代码的时间。有没有更好的方法来解决这个问题?也许是 CodeDom?
其次,由于每个网站都使用不同的格式和编码,我不得不用 HtmlDocument、HtmlWebResponse、HtmlNodes 等重写部分代码,这最终占了我代码的近一半。我无法将它们放在一起,因为有些需要额外的抓取和分页,有些则不需要。有没有办法简化这个问题?
最后,我将整个代码放在一个类文件中,大约有 600 行代码。我拥有的唯一方法是后台工作人员、用户界面事件处理程序、每个站点的抓取方法以及保存图像的一种方法。将整个代码放在一个类中可以吗?当我用 Java 编写时,我经常使用多个类并将它们作为对象调用,这有助于更轻松地更改特定部分。我可以用 C# 做同样的事情吗?
有没有更有效的方法来制作软件?我正在考虑为每个站点创建一个类,以便可以直接对相关类进行修改,但这会导致每个类中重复很多行。或者可以将整个类文件放在一个类文件中吗?
谢谢。
PS:该软件仅供个人使用,但我认为这是学习和应用良好编程的好机会。
I had developed a software in C# using Windows Forms to scrape selected websites for images.
First problem I have is that the websites I monitor constantly change their look and feel, thus making my code in need for updating. I had switched to using XPaths to isolate the divs I look for, but the div ids change too. I have thought of using a text file with the div xpath for each site which the software would read thru, thus saving me the time to edit and recompile the code. Is there a better way to solve this problem ? Maybe CodeDom ?
Secondly, since every website uses different formatting and encoding I had to rewrite parts of code with the HtmlDocument, HtmlWebResponse, HtmlNodes and others for each of them, which ended up accounting for nearly half of my code. I could not put them together since some need extra scraping and paginating and some do not. Is there a way make to simplify this problem ?
Lastly, I have the whole code in one class file with around 600 lines of code. The only methods I have are the backgroundworkers, ui event handlers, a scraping method each for each site, and one method to save the images. Is it alright to have the whole code in one class ? When I used to write in Java, I used to often make use of multiple classes and call them as objects, this helped making changes to particular sections easier. Can I do the same with C# ?
Is there a more efficient approach to making the software ? I was thinking of making a class for each site, so that modifications could be done directly to the class in question, but that would cause a lot of lines to be repeated in each class. Or is it okay to have the whole in one class file ?
Thanks.
PS: This software is for personal use, but I think it is a good opportunity to learn and apply good programming.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论