Top k 问题 - 寻找我的学术工作的用途

发布于 2024-08-12 03:53:04 字数 1059 浏览 1 评论 0原文

首要 k 问题 - 搜索最佳k(3 或 1000)个元素

关系数据库存在一个基本问题,即要找到前 k 元素,需要处理表中的所有行。这使得它在大数据上毫无用处

我正在制作应用程序(用于大学研究,不是真正的我的发明,我正在实现并尝试改进原始想法),允许您通过仅访问 3- 来有效地找到 top k 元素存储数据的 5%。这使得它真的

甚至还有用户偏好,因此在某些域上,您可以指定为用户指定最佳值的函数和指定最重要属性的聚合函数。


例如汽车数据库:属性:(价格、里程、车龄、ccm、燃油/英里、汽车类型...)和用户值例如 10*价格 + 5 *燃油/英里 + 4*里程 + 汽车年龄他不关心汽车类型和其他。 - 这是聚合规范

然后对于每个属性(价格、里程等),可以有完全不同的“价值函数”来为用户指定最佳价值。例如(价格:越低越好,然后价值下降,最高可达 5 万美元,其中价值为 0(用户不希望汽车比 5 万美元更贵)。里程:基于他/她的标准的其他功能,ans等等...


您可以看到可以非常自由地指定您的首选项,并且根据它,将很快找到数据库中的最佳 k 元素

我花了很多个不眠之夜思考现实生活的可用性。但我没能做出任何事情,只坚持学术性的只写立场。希望可以有一些实际用途,但我没有看到任何......

....您知道如何在现实生活中、实际问题等中使用它吗...


我很想听听您的消息。

Top k problem - searching BEST k (3 or 1000) elements in DB

There is fundamental problem with relational DB, that to find top k elems, there is a need to process ALL rows in table. Which make it useless on big data.

I'm making application (for university research, not really my invention, I'm implementing and trying to improve original idea) that allows you to effectively find top k elements by visiting only 3-5% of stored data. Which make it really fast.

There are even user preferences, so on some domain, you can specify function that specify best value for user and aggregation function that specify most significant attributes.


For example DB of cars: attributes:(price, mileage, age of car, ccm, fuel/mile, type of car...) and user values for example 10*price + 5*fuel/mile + 4*mileage + age of car, (s)he doesn't care about type of car and other. - this is aggregation specification

Then for each attribute (price, mileage, ...), there can be totally different "value-function" that specifies best value for user. So for example (price: lower, the better, then value go down, up to $50k, where value is 0 (user don't want car more expensive than 50k). Mileage: other function based on his/hers criteria, ans so on...


You can see that there is quite freedom to specify your preferences and acording to it, best k elements in DB will be found quickly.

I've spent many sleepless night thinking about real-life usability. Who can benefit from that query db? But I failed to whomp up anything and sticking to only academic write-only stance. :-( I hope there can be some real usage for that, but I don't see any....

.... do YOU have any idea how to use that in real-life, real problem, etc...


I'd love to hear from You.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

寒尘 2024-08-19 03:53:04

拥有人员简历数据库,并为不同职位建立招聘标准,从而动态显示前 k 位候选人。

此外,考虑到解决方案的快速特性,您可以考虑利用它来渲染高度动态数据的近实时图形,例如股票市场报价,甚至分子或 DNA 相关研究中的应用。

新想法:也许您的研究可能在聚类中得到应用,您可以使用它通过复杂的标准实现快速的k - 最近邻聚类,而不必每次都扫描整个数据集。这将导致更大数据集的更快聚类,并在为每个数据节点选择 K-NN 时遵循更复杂的标准。

Have a database of people's CVs and establish hiring criteria for different jobs, allowing for a dynamic display of the top k candidates.

Also, considering the fast nature of your solution, you can think of exploiting it in rendering near real-time graphs of highly dynamic data, like stock market quotes or even applications in molecular or DNA-related studies.

New idea: perhaps your research might have applications in clustering, where you would use it to implement a fast k - Nearest Neighbor clustering by complex criteria without having to scan the whole data set each time. This would lead to faster clustering of larger data sets in respect with more complex criteria in picking the K-NN for each data node.

岁月静好 2024-08-19 03:53:04

实际使用场景有无限可能。始终使用获取前 n 个值。

但我非常怀疑是否有可能在没有索引的情况下获取前 n 个对象。仅当在搜索之前已知要搜索的属性时才能构建索引。如果是这种情况,关系数据库中的简单索引就能够提供相同的功能。

There are unlimited possible real-use scenarios. Getting the top-n values is used all the time.

But I highly doubt that it's possible to get top-n objects without having an index. An index can only be built if the properties that will be searched are known ahead of searching. And if that's the case, a simple index in a relational database is able to provide the same functionality.

涙—继续流 2024-08-19 03:53:04

它一直在金融组织中使用,您需要查看利润最高/利润最低的资产等。

It's used in financial organizations all the time, you need to see the most profitable assets / least profitable, etc.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文