存储“选票” 在数据库中

发布于 2024-07-10 08:41:02 字数 575 浏览 5 评论 0原文

我正在编写一个 Intranet 应用程序,其功能之一大致类似于内容投票 - 与 SO、Amazon 和许多其他网站的做法不同。

假设每个可投票的内容都有一个唯一的 ID,并且每个用户(他们经过身份验证)都有一个唯一的 ID,最简单的方法似乎是有一个“投票”表......

ContentID int
UserID int
VoteValue int

但这会为每个投票创建一行 -拥有数百万条内容和数以万计的用户,该表将变得非常巨大。 这是最好的方法吗? 我的意思是,如果 int 占用 4 个字节,则每行占用 12 个字节。 如果一百万条内容获得一百票,那么存储空间就超过 400MB,是吗? 看起来...很多:)。 即使 VoteValue 是一个tinyint(这可能没问题)并且只有 1 个字节,表中仍然有几百兆字节。 我的意思是谢什。

有更聪明的方法吗? 我是否应该将此“投票”表存储在单独的数据库中(忽略潜在的数据完整性问题),以便在存储和性能方面将其与“主”数据分区?

(我确实意识到,在当今世界,400MB 并不是很多 - 但仅仅为了存储选票似乎就很多了,是吗?)

I'm writing what will be an intranet application, and one of its features is roughly analogous to content voting - not unlike what SO, Amazon, and many other sites do.

Assuming each votable piece of content has a unique ID, and each user (they're authenticated) has a unique ID, the easiest way would seem to be to have a "votes" table...

ContentID int
UserID int
VoteValue int

But this creates one row per vote - with millions of pieces of content and tens of thousands of users, that table's gonna be huge huge huge. Is this the best way to do it? I mean, if an int takes 4 bytes, each row takes 12 bytes. If a million pieces of content get a hundred votes, that's 400MB+ in storage, yeah? Seems... like a lot :). Even if the VoteValue is a tinyint (which is probably fine) and only 1 byte, that's still a couple hundred megabytes in the table. I mean sheesh.

Is there a smarter way? Should I store this "votes" table in a separate database (ignoring potential data integrity issues) to partition it from the "main" data in terms of storage and performance?

(I do realize that in today's world 400MB ain't a ton - but it seems like a LOT just to store votes, yeah?)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

扎心 2024-07-17 08:41:03

就个人而言,只要您拥有良好的索引,您就可以以正确的方式进行操作。 根据您的使用情况,为了提高性能,您可能会尝试通过存储辅助计数信息来避免触及投票表,但总的来说,如果您必须跟踪谁投票了某些内容,则需要按照您列出的方式进行操作。

我不会费心移动到另一个数据库,如果你真的关心 SQL Server,你可以创建一个单独的文件组来保存它......但很可能没有必要。

Personally as long as you have good indexes in place, you are going about it the right way. Depending on your usage, for performance you might try to avoid hitting the votes table by storing secondary count information, but overall if you must track WHO has voted something, you need to do it in the way you have listed.

I wouldn't bother moving to another database, if you are REALLY concerned in SQL Server you could create a separate filegroup to hold it.....but most likely not necessary.

装纯掩盖桑 2024-07-17 08:41:03

如果您需要跟踪用户是否对特定项目投票,并且投票值不同(例如 1 星到 5 星),那么这就是最紧凑的。

不要忘记,为了获得合理的访问速度,您需要对数据建立索引(可能有两个索引 - 一个以 ContentID 作为前导列,一个以 userID 作为前导列)。

您需要确定是否有理由不将该表与其他表分开存储。 这意味着什么取决于您使用的 DBMS - 对于 Informix,表将位于同一个数据库中,但存储在不同的数据库空间中,并且您可能将索引存储在另外两个不同的数据库空间中。

If you need to track whether a user has voted for a particular item, and if there are different values of vote (so 1 star to 5 stars, for example), then this is about as compact as it gets.

Don't forget that for sensible access speeds, you'll need to index the data (two indexes, probably - one with ContentID as the leading column, one with userID as the leading column).

You'll need to decide whether there is a reason not to store the table separately from other tables. What this means depends on the DBMS you use - with Informix, the table would be in the same database but stored in a different dbspace, and you might have the indexes stored in two other different dbspaces.

深海夜未眠 2024-07-17 08:41:03

您可能还需要表中内容的作者 ID,以便更轻松地检测投票滥用。 (是的,这可能是多余的信息。另一种方法是定期构建一个汇总表来查看谁在投票给谁。)

就其价值而言,perlmonks 投票表如下所示:

 `vote_id` int(11) NOT NULL default '0',
 `voter_user` int(11) NOT NULL default '0',
 `voted_user` int(11) default NULL,
 `weight` int(11) NOT NULL default '0',
 `votetime` datetime NOT NULL default '0000-00-00 00:00:00',
 `ip` varchar(16) default NULL,
 PRIMARY KEY  (`vote_id`,`voter_user`),
 KEY `voter_user_idx` (`voter_user`,`votetime`),
 KEY `voted_user_idx` (`voted_user`,`votetime`)

(vote_id 是内容 id,ip 是 IP地址。)

You will probably also want the ID of the author of the content in the table, for easier detection of voting abuse. (Yes, this is presumably redundant information. An alternative is regularly building a summary table to see who is voting on whom.)

For what it's worth, the perlmonks vote table looks like this:

 `vote_id` int(11) NOT NULL default '0',
 `voter_user` int(11) NOT NULL default '0',
 `voted_user` int(11) default NULL,
 `weight` int(11) NOT NULL default '0',
 `votetime` datetime NOT NULL default '0000-00-00 00:00:00',
 `ip` varchar(16) default NULL,
 PRIMARY KEY  (`vote_id`,`voter_user`),
 KEY `voter_user_idx` (`voter_user`,`votetime`),
 KEY `voted_user_idx` (`voted_user`,`votetime`)

(vote_id is the content id, ip is an IP address.)

谁许谁一生繁华 2024-07-17 08:41:03

我想说,您需要弄清楚如何使用这些投票,并首先为您的数据模型设计特定的查询。 这不一定是 SQL 模型。 如果您来自 SQL 世界,那么阅读官方 MongoDB 教程有助于您一开始就理清思路。

例如,如果您只需要存储和显示单个问题页面的投票,则将投票存储在问题的单个字符串字段中可能会很方便,该字段类似于 id1:id2:id3:。 假设所有 id 的长度相同,则有一些有趣的属性:

  1. 计算该问题的所有投票:

    len(issue.votes)/len(id)

  2. 发现我对该问题进行了投票

    issue.votes 中的 myid

  3. 查找您投票过的所有问题:

    select issues.id from issues where issues.votes contains(myid)

  4. 查找投票最多的问题

    select issues.id from issues order by len(issue.votes) desc limit 10

此架构可以避免在这些特定情况下读取时进行昂贵的计算,但在投票时更新 issue.votes 可能比在表中添加行更昂贵。 在本例中,每个 id + 分隔符 4 个字节的 100 票是 500 个字节的字符串。 在您提议的变体中,100 票为 800 字节。

免责声明:我从未实施过这样的事情,这只是一个想法。

I'd say you need to figure out how these votes will be used and design specific queries for your data model first. This is not necessarily SQL model. If you're coming from SQL world, passing through official MongoDB tutorial helps to clear the mind for the start.

For example, if you only need to store and display votes for a single issue page, it may be convenient to store votes in a single string field of the issue, which will look like id1:id2:id3:. Assuming all ids are of the same length, there are some interesting properties:

  1. Count all votes for the issue:

    len(issue.votes)/len(id)

  2. Find I voted on the issue

    myid in issue.votes

  3. Find all issues you voted on:

    select issue.id from issues where issue.votes contains(myid)

  4. Find the most voted issues

    select issue.id from issues order by len(issue.votes) desc limit 10

This architecture allows to avoid expensive calculations on read in these specific cases, but updating issue.votes on vote might be more expensive than adding a row in a table. In this case 100 votes with 4 bytes per id + separator is 500 bytes string. In your proposed variant 100 votes are 800 bytes.

Disclaimer: I never implemented anything like this, it is just an idea.

煮酒 2024-07-17 08:41:02

嗯,是的,但你需要着眼于更大的前景。 拥有一百万条内容:(

内容大小)>>> (投票大小):其中“>>” 意思是“更大”。

如果您有 100 万条内容,那么这可能是 1 TB 的数据,而投票则为 400 MB。 有什么大不了的吧?

我还要补充一点,如果您担心可扩展性,请查看此博客:

http://highscalability.com/

Well, yes but you need to look at the bigger picture. With a million pieces of CONTENT:

(Size of Content) >> (Size of Votes) : where ">>" means "much greater."

If you have a million pieces of content then that might be a terabyte of data where as the votes are 400MB. Big deal right?

I would also add, if you are worried about scalability, check out this blog:

http://highscalability.com/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文