自动化课表优化爬虫?

发布于 2024-07-12 01:43:42 字数 925 浏览 7 评论 0 原文

总体规划

获取我的课程信息,自动优化并选择我的大学课程时间表

总体算法

  1. 使用其登录网站 Enterprise Sign On Engine 登录
  2. 查找我当前的学期及其 相关主题(预先设置)
  3. 导航到右侧页面并从每个相关主题获取数据 主题(讲座、实践和 研讨会时间)
  4. 剥离无用的数据 信息 对
  5. 更接近的类进行排序 彼此更高, 随机天数较低
  6. 解决最佳时间表解决方案
  7. 向我输出详细的列表 BEST CASE 信息
  8. 向我输出一份详细的列表 可能的类信息(一些 例如可能已满)
  9. 让程序选择最好的 自动上课
  10. 继续检查我们是否可以 实现7.

详细 6 让所有课程以讲座为焦点,排名最高(每个科目只有一门),并尝试围绕该课程安排课程。

问题

任何人都可以向我提供可能类似于用 python 编写的内容的链接吗? 关于 6.:您建议使用什么数据结构来存储这些信息? 一个链表其中每个对象都是uniclass的? 我应该将所有信息写入文本文件吗?

我认为 uniclass 的设置如下 属性:

  • 学科
  • 排名
  • 时间
  • 类型
  • 教师

我对 Python 几乎没有经验,并且认为这将是一个值得尝试完成的很好的学习项目。 感谢您提供的任何帮助和链接来帮助我开始,开放编辑以适当标记或任何必要的内容(不确定除了编程和Python之外这属于什么?)

编辑:不能真的得到了我想要的这篇文章的正确格式><

Overall Plan

Get my class information to automatically optimize and select my uni class timetable

Overall Algorithm

  1. Logon to the website using its
    Enterprise Sign On Engine login
  2. Find my current semester and its
    related subjects (pre setup)
  3. Navigate to the right page and get the data from each related
    subject (lecture, practical and
    workshop times)
  4. Strip the data of useless
    information
  5. Rank the classes which are closer
    to each other higher, the ones on
    random days lower
  6. Solve a best time table solution
  7. Output me a detailed list of the
    BEST CASE information
  8. Output me a detailed list of the
    possible class information (some
    might be full for example)
  9. Get the program to select the best
    classes automatically
  10. Keep checking to see if we can
    achieve 7.

6 in detail
Get all the classes, using the lectures as a focus point, would be highest ranked (only one per subject), and try to arrange the classes around that.

Questions

Can anyone supply me with links to something that might be similar to this hopefully written in python?
In regards to 6.: what data structure would you recommend to store this information in? A linked list where each object of uniclass?
Should i write all information to a text file?

I am thinking uniclass to be setup like the following
attributes:

  • Subject
  • Rank
  • Time
  • Type
  • Teacher

I am hardly experienced in Python and thought this would be a good learning project to try to accomplish.
Thanks for any help and links provided to help get me started, open to edits to tag appropriately or what ever is necessary (not sure what this falls under other than programming and python?)

EDIT: can't really get the proper formatting i want for this SO post ><

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

油饼 2024-07-19 01:43:42

根据您计划采取#6 的程度以及数据集有多大,这可能并不简单; 对我来说,它确实有点 NP 难全局优化的味道……

不过,如果你谈论的是数十个(而不是数百个)节点,那么一个相当愚蠢的算法应该能够提供足够好的性能。

因此,您有两个限制:

  1. 按分数对班级进行总排序;
    这是灵活的。
  2. 阶级冲突; 这不灵活。

我所说的灵活是指您可以去参加间隔更远的课程(分数较低),但您不能同时参加两个课程。 有趣的是,得分和冲突之间可能存在正相关关系。 得分较高的班级更有可能发生冲突。

我对算法的第一次通过:

selected_classes = []
classes = sorted(classes, key=lambda c: c.score)
for clas in classes:
    if not clas.clashes_with(selected_classes):
        selected_classes.append(clas)

如果类的长度不均匀、在奇怪的时间开始等等,那么计算冲突可能会很尴尬。 将开始和结束时间映射为时间“块”的简化表示(每 15 分钟/30 分钟或您需要的任何时间)将更容易查找不同课程的开始和结束之间的重叠。

Depending on how far you plan on taking #6, and how big the dataset is, it may be non-trivial; it certainly smacks of NP-hard global optimisation to me...

Still, if you're talking about tens (rather than hundreds) of nodes, a fairly dumb algorithm should give good enough performance.

So, you have two constraints:

  1. A total ordering on the classes by score;
    this is flexible.
  2. Class clashes; this is not flexible.

What I mean by flexible is that you can go to more spaced out classes (with lower scores), but you cannot be in two classes at once. Interestingly, there's likely to be a positive correlation between score and clashes; higher scoring classes are more likely to clash.

My first pass at an algorithm:

selected_classes = []
classes = sorted(classes, key=lambda c: c.score)
for clas in classes:
    if not clas.clashes_with(selected_classes):
        selected_classes.append(clas)

Working out clashes might be awkward if classes are of uneven lengths, start at strange times and so on. Mapping start and end times into a simplified representation of "blocks" of time (every 15 minutes / 30 minutes or whatever you need) would make it easier to look for overlaps between the start and end of different classes.

就此别过 2024-07-19 01:43:42

BeautifulSoup 在这里被提到了几次,例如 get-list-of-xml-attribute-values-in-python

Beautiful Soup 是一个 Python HTML/XML 解析器,专为屏幕抓取等快速周转项目而设计。 三个功能使其功能强大:

  1. 即使你给它加上不好的标记,《美丽汤》也不会令人窒息。 它生成的解析树与原始文档的意义大致相同。 这通常足以收集您需要的数据并逃跑。
  2. Beautiful Soup 提供了一些简单的方法和 Pythonic 习惯用法,用于导航、搜索和修改解析树:一个用于剖析文档并提取所需内容的工具包。 您不必为每个应用程序创建自定义解析器。
  3. Beautiful Soup 自动将传入文档转换为 Unicode,将传出文档自动转换为 UTF-8。 您不必考虑编码,除非文档没有指定编码并且 Beautiful Soup 无法自动检测编码。 然后你只需指定原始编码即可。

Beautiful Soup 会解析您提供的任何内容,并为您进行树遍历。 你可以告诉它“查找所有链接”,或者“查找类externalLink的所有链接”,或者“查找url与“foo.com”匹配的所有链接,或者“查找带有粗体文本的表格标题,然后给出我那段文字。”

曾经被锁在设计不良的网站中的宝贵数据现在触手可及。 使用 Beautiful Soup,原本需要数小时才能完成的项目仅需几分钟。

BeautifulSoup was mentioned here a few times, e.g get-list-of-xml-attribute-values-in-python.

Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Three features make it powerful:

  1. Beautiful Soup won't choke if you give it bad markup. It yields a parse tree that makes approximately as much sense as your original document. This is usually good enough to collect the data you need and run away.
  2. Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. You don't have to create a custom parser for each application.
  3. Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don't have to think about encodings, unless the document doesn't specify an encoding and Beautiful Soup can't autodetect one. Then you just have to specify the original encoding.

Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. You can tell it "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "foo.com", or "Find the table heading that's got bold text, then give me that text."

Valuable data that was once locked up in poorly-designed websites is now within your reach. Projects that would have taken hours take only minutes with Beautiful Soup.

〗斷ホ乔殘χμё〖 2024-07-19 01:43:42

这里有太多问题了。

请将其分解为多个主题领域,并就每个主题提出具体问题。 请重点关注其中一项并提出具体问题。 请定义您的术语:如果没有一些特定的衡量标准来优化,“最佳”并不意味着任何东西。

这是我在您的主题列表中看到的内容。

  1. 抓取 HTML

    1 使用 Enterprise Sign On Engine 登录信息登录网站

    2 查找我当前学期及其相关科目(预先设置)

    3 导航到右侧页面并获取每个相关主题的数据(讲座、实践和研讨会时间)

    4 删除无用信息的数据

  2. 一些算法根据“彼此更接近”来“排名”,寻找“最佳时间”。 由于这些术语未定义,因此几乎不可能对此提供任何帮助。

    5 将彼此距离较近的班级排名较高,随机日期的班级排名较低

    6 求解最佳时间表解决方案

  3. 输出一些内容。

    7 向我输出一份最佳案例信息的详细列表

    8 向我输出可能的类信息的详细列表(例如,有些可能已满)

  4. 优化某些内容,寻找“最佳”。 另一个无法定义的术语。

    9 让程序自动选择最好的班级

    10 继续检查我们是否可以实现 7。

顺便说一句,Python 有“列表"。 它们是否“链接”并不真正参与其中。

There are waaay too many questions here.

Please break this down into subject areas and ask specific questions on each subject. Please focus on one of these with specific questions. Please define your terms: "best" doesn't mean anything without some specific measurement to optimize.

Here's what I think I see in your list of topics.

  1. Scraping HTML

    1 Logon to the website using its Enterprise Sign On Engine login

    2 Find my current semester and its related subjects (pre setup)

    3 Navigate to the right page and get the data from each related subject (lecture, practical and workshop times)

    4 Strip the data of useless information

  2. Some algorithm to "rank" based on "closer to each other" looking for a "best time". Since these terms are undefined, it's nearly impossible to provide any help on this.

    5 Rank the classes which are closer to each other higher, the ones on random days lower

    6 Solve a best time table solution

  3. Output something.

    7 Output me a detailed list of the BEST CASE information

    8 Output me a detailed list of the possible class information (some might be full for example)

  4. Optimize something, looking for "best". Another undefinable term.

    9 Get the program to select the best classes automatically

    10 Keep checking to see if we can achieve 7.

BTW, Python has "lists". Whether or not they're "linked" doesn't really enter into it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文