在数据库中存储性别(性别)

发布于 2024-10-02 12:12:10 字数 554 浏览 2 评论 0原文

我想以尽可能小的(大小/性能)成本将用户的性别存储在数据库中。

到目前为止,我想到了 3 个场景

  1. Int - 与代码中的 Enum 一致(1 = 男性,2 = 女性,3 = ...)
  2. char(1 ) - 存储 mf 或其他单个字符标识符
  3. (布尔值) - 这个选项有合适的字段名称吗?

我问的原因是因为这个answer 其中提到字符小于布尔

我应该澄清一下,我使用的是 MS SQL 2008,它实际上具有位数据类型。

I want to store a user's gender in a database with as little (size/performance) cost as possible.

So far, 3 scenarios come to mind

  1. Int - aligned with Enum in code (1 = Male, 2 = Female, 3 = ...)
  2. char(1) - Store m, f or another single character identifier
  3. Bit (boolean) - is there an appropriate field name for this option?

The reason I ask is because of this answer which mentions that chars are smaller than booleans.

I should clarify that I'm using MS SQL 2008, which DOES in fact have the bit datatype.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

﹎☆浅夏丿初晴 2024-10-09 12:12:10

对此已经有一个 ISO 标准;无需发明自己的方案:

http://en.wikipedia.org/wiki/ISO_5218

根据标准,该列应称为“性别”,“最接近”的数据类型将是带有 CHECK 约束或查找表的tinyint(视情况而定)。

There is already an ISO standard for this; no need to invent your own scheme:

http://en.wikipedia.org/wiki/ISO_5218

Per the standard, the column should be called "Sex" and the 'closest' data type would be tinyint with a CHECK constraint or lookup table as appropriate.

三生一梦 2024-10-09 12:12:10

我将这一栏称为“性别”。

Data Type   Bytes Taken          Number/Range of Values
------------------------------------------------
TinyINT     1                    255 (zero to 255)
INT         4            -       2,147,483,648 to 2,147,483,647
BIT         1 (2 if 9+ columns)  2 (0 and 1)
CHAR(1)     1                    26 if case insensitive, 52 otherwise

可以排除 BIT 数据类型,因为它仅支持两种可能的性别这是不够的。虽然 INT 支持两个以上的选项,但它需要 4 个字节——性能使用更小/更窄的数据类型会更好。

CHAR(1) 优于 TinyINT - 两者都采用相同的字节数,但 CHAR 提供的值数量更窄。使用 CHAR(1) 将使用“m”、“f”等自然键,而不是使用被称为代理/人工键的数字数据。如果需要移植,任何数据库都支持 CHAR(1)

结论

我会使用选项 2:CHAR(1)。

附录

性别列上的索引可能没有帮助,因为低基数列上的索引没有任何价值。这意味着,索引的值没有足够的多样性来提供任何值。

I'd call the column "gender".

Data Type   Bytes Taken          Number/Range of Values
------------------------------------------------
TinyINT     1                    255 (zero to 255)
INT         4            -       2,147,483,648 to 2,147,483,647
BIT         1 (2 if 9+ columns)  2 (0 and 1)
CHAR(1)     1                    26 if case insensitive, 52 otherwise

The BIT data type can be ruled out because it only supports two possible genders which is inadequate. While INT supports more than two options, it takes 4 bytes -- performance will be better with a smaller/more narrow data type.

CHAR(1) has the edge over TinyINT - both take the same number of bytes, but CHAR provides a more narrow number of values. Using CHAR(1) would make using "m", "f",etc natural keys, vs the use of numeric data which are referred to as surrogate/artificial keys. CHAR(1) is also supported on any database, should there be a need to port.

Conclusion

I would use Option 2: CHAR(1).

Addendum

An index on the gender column likely would not help because there's no value in an index on a low cardinality column. Meaning, there's not enough variety in the values for the index to provide any value.

垂暮老矣 2024-10-09 12:12:10

医学上有四种性别:男性、女性、不确定和未知。您可能不需要全部四个,但您肯定需要 1、2 和 4。为此数据类型设置默认值是不合适的。更不用说将其视为具有“是”和“不是”状态的布尔值了。

In medicine there are four genders: male, female, indeterminate, and unknown. You mightn't need all four but you certainly need 1, 2, and 4. It's not appropriate to have a default value for this datatype. Even less to treat it as a Boolean with 'is' and 'isn't' states.

五里雾 2024-10-09 12:12:10

Enum 字段对齐的 Int (或 TinyInt)将是我的方法。

首先,如果数据库中有一个 bit 字段,该行仍将使用一个完整字节,因此就节省空间而言,只有在有多个 bit 时才有意义代码>字段。

其次,字符串/字符有一种“神奇的价值”感觉,无论它们在设计时看起来多么明显。更不用说,它允许人们存储几乎任何他们不一定映射到任何明显的值。

第三,为了强制引用完整性,数值更容易(也是更好的实践)创建查找表,并且可以与枚举进行一对一关联,因此在将值存储在内存中时存在奇偶校验应用程序或数据库中。

An Int (or TinyInt) aligned to an Enum field would be my methodology.

First, if you have a single bit field in a database, the row will still use a full byte, so as far as space savings, it only pays off if you have multiple bit fields.

Second, strings/chars have a "magic value" feel to them, regardless of how obvious they may seem at design time. Not to mention, it lets people store just about any value they would not necessarily map to anything obvious.

Third, a numeric value is much easier (and better practice) to create a lookup table for, in order to enforce referential integrity, and can correlate 1-to-1 with an enum, so there is parity in storing the value in memory within the application or in the database.

你与清晨阳光 2024-10-09 12:12:10

选项 3 是最好的选择,但并非所有数据库引擎都有“位”类型。如果您没有一点,那么 TinyINT 将是您最好的选择。

Option 3 is your best bet, but not all DB engines have a "bit" type. If you don't have a bit, then TinyINT would be your best bet.

ゃ懵逼小萝莉 2024-10-09 12:12:10

我使用字符“f”、“m”和“u”,因为我从姓名、声音和对话中推测性别,有时不知道性别。最终的决定是他们的意见。

这实际上取决于您对这个人的了解程度以及您的标准是外貌还是个人身份。心理学家可能需要额外的选择——跨性别、跨性别、跨性别、跨性别、跨性别、雌雄同体和犹豫不决。有 9 个选项,没有由单个字符明确定义,我可能会接受 Hugo 的小整数建议。

I use char 'f', 'm' and 'u' because I surmise the gender from name, voice and conversation, and sometimes don't know the gender. The final determination is their opinion.

It really depends how well you know the person and whether your criteria is physical form or personal identity. A psychologist might need additional options - cross to female, cross to male, trans to female, trans to male, hermaphrodite and undecided. With 9 options, not clearly defined by a single character, I might go with Hugo's advice of tiny integer.

哽咽笑 2024-10-09 12:12:10
CREATE TABLE Admission (
    Rno INT PRIMARY KEY AUTO_INCREMENT,
    Name VARCHAR(25) NOT NULL,
    Gender ENUM('M','F'),
    Boolean_Valu boolean,
    Dob Date,
    Fees numeric(7,2) NOT NULL
);




insert into Admission (Name,Gender,Boolean_Valu,Dob,Fees)values('Raj','M',true,'1990-07-12',50000);
insert into Admission (Name,Gender,Boolean_Valu,Dob,Fees)values('Rani','F',false,'1994-05-10',15000);
select * from admission;

在此处输入链接说明

CREATE TABLE Admission (
    Rno INT PRIMARY KEY AUTO_INCREMENT,
    Name VARCHAR(25) NOT NULL,
    Gender ENUM('M','F'),
    Boolean_Valu boolean,
    Dob Date,
    Fees numeric(7,2) NOT NULL
);




insert into Admission (Name,Gender,Boolean_Valu,Dob,Fees)values('Raj','M',true,'1990-07-12',50000);
insert into Admission (Name,Gender,Boolean_Valu,Dob,Fees)values('Rani','F',false,'1994-05-10',15000);
select * from admission;

enter link description here

淤浪 2024-10-09 12:12:10

我会选择选项 3,但使用多个 NON NULLABLE 位列而不是一个。
是男性(1=是/0=否)
IsFemale(1=是/0=否)

(如果需要):
性别未知(1=是/0=否)
等等...

这使得定义易于阅读、易于扩展、易于编程、不可能使用域之外的值,并且不需要第二个查找表+FK或CHECK约束来锁定值。

编辑:更正,您确实需要至少一个约束来确保设置的标志有效。

I would go with Option 3 but multiple NON NULLABLE bit columns instead of one.
IsMale (1=Yes / 0=No)
IsFemale (1=Yes / 0=No)

if requried:
IsUnknownGender (1=Yes / 0=No)
and so on...

This makes for easy reading of the definitions, easy extensibility, easy programmability, no possibility of using values outside the domain and no requirement of a second lookup table+FK or CHECK constraints to lock down the values.

EDIT: Correction, you do need at least one constraint to ensure the set flags are valid.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文