存储“选票” 在数据库中

发布于 2024-07-10 08:41:02 字数 575 浏览 5 评论 0原文

我正在编写一个 Intranet 应用程序，其功能之一大致类似于内容投票 - 与 SO、Amazon 和许多其他网站的做法不同。

假设每个可投票的内容都有一个唯一的 ID，并且每个用户（他们经过身份验证）都有一个唯一的 ID，最简单的方法似乎是有一个“投票”表......

ContentID int
UserID int
VoteValue int

但这会为每个投票创建一行 -拥有数百万条内容和数以万计的用户，该表将变得非常巨大。这是最好的方法吗？我的意思是，如果 int 占用 4 个字节，则每行占用 12 个字节。如果一百万条内容获得一百票，那么存储空间就超过 400MB，是吗？看起来...很多:)。即使 VoteValue 是一个tinyint（这可能没问题）并且只有 1 个字节，表中仍然有几百兆字节。我的意思是谢什。

有更聪明的方法吗？我是否应该将此“投票”表存储在单独的数据库中（忽略潜在的数据完整性问题），以便在存储和性能方面将其与“主”数据分区？

（我确实意识到，在当今世界，400MB 并不是很多 - 但仅仅为了存储选票似乎就很多了，是吗？）

原文

I'm writing what will be an intranet application, and one of its features is roughly analogous to content voting - not unlike what SO, Amazon, and many other sites do.

Assuming each votable piece of content has a unique ID, and each user (they're authenticated) has a unique ID, the easiest way would seem to be to have a "votes" table...

ContentID int
UserID int
VoteValue int

But this creates one row per vote - with millions of pieces of content and tens of thousands of users, that table's gonna be huge huge huge. Is this the best way to do it? I mean, if an int takes 4 bytes, each row takes 12 bytes. If a million pieces of content get a hundred votes, that's 400MB+ in storage, yeah? Seems... like a lot :). Even if the VoteValue is a tinyint (which is probably fine) and only 1 byte, that's still a couple hundred megabytes in the table. I mean sheesh.

Is there a smarter way? Should I store this "votes" table in a separate database (ignoring potential data integrity issues) to partition it from the "main" data in terms of storage and performance?

(I do realize that in today's world 400MB ain't a ton - but it seems like a LOT just to store votes, yeah?)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

扎心 2024-07-17 08:41:03

就个人而言，只要您拥有良好的索引，您就可以以正确的方式进行操作。根据您的使用情况，为了提高性能，您可能会尝试通过存储辅助计数信息来避免触及投票表，但总的来说，如果您必须跟踪谁投票了某些内容，则需要按照您列出的方式进行操作。

我不会费心移动到另一个数据库，如果你真的关心 SQL Server，你可以创建一个单独的文件组来保存它......但很可能没有必要。

回复收藏 0 原文

装纯掩盖桑 2024-07-17 08:41:03

如果您需要跟踪用户是否对特定项目投票，并且投票值不同（例如 1 星到 5 星），那么这就是最紧凑的。

不要忘记，为了获得合理的访问速度，您需要对数据建立索引（可能有两个索引 - 一个以 ContentID 作为前导列，一个以 userID 作为前导列）。

您需要确定是否有理由不将该表与其他表分开存储。这意味着什么取决于您使用的 DBMS - 对于 Informix，表将位于同一个数据库中，但存储在不同的数据库空间中，并且您可能将索引存储在另外两个不同的数据库空间中。

回复收藏 0 原文

深海夜未眠 2024-07-17 08:41:03

您可能还需要表中内容的作者 ID，以便更轻松地检测投票滥用。（是的，这可能是多余的信息。另一种方法是定期构建一个汇总表来查看谁在投票给谁。）

就其价值而言，perlmonks 投票表如下所示：

 `vote_id` int(11) NOT NULL default '0',
 `voter_user` int(11) NOT NULL default '0',
 `voted_user` int(11) default NULL,
 `weight` int(11) NOT NULL default '0',
 `votetime` datetime NOT NULL default '0000-00-00 00:00:00',
 `ip` varchar(16) default NULL,
 PRIMARY KEY  (`vote_id`,`voter_user`),
 KEY `voter_user_idx` (`voter_user`,`votetime`),
 KEY `voted_user_idx` (`voted_user`,`votetime`)

（vote_id 是内容 id，ip 是 IP地址。）

You will probably also want the ID of the author of the content in the table, for easier detection of voting abuse. (Yes, this is presumably redundant information. An alternative is regularly building a summary table to see who is voting on whom.)

For what it's worth, the perlmonks vote table looks like this:

 `vote_id` int(11) NOT NULL default '0',
 `voter_user` int(11) NOT NULL default '0',
 `voted_user` int(11) default NULL,
 `weight` int(11) NOT NULL default '0',
 `votetime` datetime NOT NULL default '0000-00-00 00:00:00',
 `ip` varchar(16) default NULL,
 PRIMARY KEY  (`vote_id`,`voter_user`),
 KEY `voter_user_idx` (`voter_user`,`votetime`),
 KEY `voted_user_idx` (`voted_user`,`votetime`)

(vote_id is the content id, ip is an IP address.)

回复收藏 0 原文

谁许谁一生繁华 2024-07-17 08:41:03

我想说，您需要弄清楚如何使用这些投票，并首先为您的数据模型设计特定的查询。这不一定是 SQL 模型。如果您来自 SQL 世界，那么阅读官方 MongoDB 教程有助于您一开始就理清思路。

例如，如果您只需要存储和显示单个问题页面的投票，则将投票存储在问题的单个字符串字段中可能会很方便，该字段类似于 id1:id2:id3:。假设所有 id 的长度相同，则有一些有趣的属性：

计算该问题的所有投票：
len(issue.votes)/len(id)
发现我对该问题进行了投票
issue.votes 中的 myid
查找您投票过的所有问题：
select issues.id from issues where issues.votes contains(myid)
查找投票最多的问题
select issues.id from issues order by len(issue.votes) desc limit 10

此架构可以避免在这些特定情况下读取时进行昂贵的计算，但在投票时更新 issue.votes 可能比在表中添加行更昂贵。在本例中，每个 id + 分隔符 4 个字节的 100 票是 500 个字节的字符串。在您提议的变体中，100 票为 800 字节。

免责声明：我从未实施过这样的事情，这只是一个想法。