如何将一个整数散列成一个非常小的字符串?

发布于 2024-11-02 21:24:04 字数 1395 浏览 0 评论 0原文

我需要一个函数,给定一个盐整数和一个值整数将返回一个小的哈希字符串。使用 1 和 56 调用该函数可能会返回“1AF3”。使用 2 和 56 调用它可能会返回“C2FA”。

背景资料: 我有一个 Web 应用程序(如果重要的话,用 C# 编写)将员工 ID 值存储为整数。用户需要能够看到该 Id 的一致表示形式,但任何用户都不应看到实际的 Id 或与其他用户看到的相同的 Id 表示形式。

例如,假设有一个 ID 为 56 的员工。

当用户 1 登录时,无论他在哪里看到该员工,他都会看到“1AF3”或其他内容。他可能会在应用程序的不同页面上看到该员工,并且其 ID 应始终为 1AF3,以便他知道这是同一个人。

当用户 2 登录时,如果他遇到同一名员工,他总是会看到“C2FA”或其他内容。用户 2 也是如此:无论他在系统中的哪个位置,他都会看到由同一字符串代表的一名员工。

如果用户 2 在用户 1 登录时看着用户 1 的肩膀,则用户 2 应该无法在用户 1 的屏幕上识别出他的任何员工,因为该哈希值应该是不可逆的。

这有道理吗?

另一项要求是,由于用户将在电子邮件、电话和传真中讨论这些员工,因此哈希值需要具有最小大小,并且不包含非字母数字字符。 10 个字符或更少是理想的。

也许有一种方法可以将 SHA-256 结果“折叠”为更少的字符,因为可以使用整个字母表?我不知道。

更新:另一个演练 感谢大家的尝试,但似乎我在解释方面做得很糟糕。

让我们假设你和我都是这个系统的用户。你是弗雷德,我是克里斯。您的 UserId 是 2,我的 UserId 是 1。我们还假设系统中有 5 个员工。员工不是用户。您可以将它们视为产品,或者您想要的任何东西。我只是在谈论你 Fred 和我 Chris 各自处理的 5 个通用实体。

Fred,每次登录时,您都需要能够唯一地识别每个员工。每次我(克里斯)登录时,我还需要与员工一起工作,并且我也需要能够唯一地识别他们。但是,如果我在你管理员工时监视你,我应该无法弄清楚你在管理哪些人。

因此,在数据库中,员工 ID 为 1、2、3、4 和 5。您和我在我们的界面中不会这样看到它们。我可能会看到 A、B、C、D 和 E,而您可能会看到 F、G、H、I 和 J。因此,虽然 E 和 J 都代表同一位员工,但我无法在您看屏幕时看您的屏幕正在与您的员工“J”一起工作,并且知道您正在与员工 5 一起工作,因为对我来说,该员工被称为员工“E”。

因此,Fred 和 Chris 可以分别与同一组员工一起工作,但如果他们要查看彼此的工作或电子邮件中的讨论,他们将无法知道对方正在谈论哪些员工。

我想我可以通过获取真实的员工 ID 并使用用户 ID 作为盐对其进行散列来实现这种“实时依赖于用户的 EmployeeID”。

由于 Fred 和 Chris 都需要通过电子邮件和电话与他们的客户和消费者讨论员工,因此我希望他们在这些讨论中使用的 ID 尽可能简单。

I need a function that, given a salt integer and a value integer will return a small hash string. Calling the function with 1 and 56 might return "1AF3". Calling it with 2 and 56 might return "C2FA".

Background info:
I have a web app (written in C# if that matters) that stores employee Id values as integers. Users need to be able to see a consistent representation of that Id, but no user should see the actual Id, or the same representation of that Id as seen by another user.

For example, suppose there is an Employee with the Id of 56.

When User 1 logs in, wherever he sees that employee, he sees "1AF3" or something. He might see this employee on different pages in the app, and its Id should always be 1AF3 so he knows it's the same guy.

When User 2 logs in, should he encounter that same employee, he would always see "C2FA", or something. Same goes for User 2: wherever he is in the system, he would see that one employee represented by that same string.

Should User 2 look over the shoulder of User 1 while User 1 is logged in, User 2 should not be able to recognize any of his employees on User 1's screen, because this hash should be irreversible.

Does this make sense?

One additional requirement is that since the users will be discussing these employees in email, on the phone, and in faxes, the hash would need to be of a minimum size and not contain non-alphanumeric characters. 10 characters or fewer would be ideal.

Maybe there is a way to "collapse" a SHA-256 result into fewer characters since the whole alphabet could be used? I have no idea.

Update: Another walk-through
Thanks everyone for giving this a shot but it seems like I am doing a bad job explaining it or something.

Let's pretend you and me are both users of this system. You're Fred and I'm Chris. Your UserId is 2 and my UserId is 1. Let's also assume there are 5 Employees in the system. Employees are not users. You can think of them as products, or whatever you want. I'm just talking about 5 generic entities that you, Fred, and I, Chris, each deal with.

Fred, every time you log in, you need to be able to uniquely identify each employee. Every time I, Chris, log in, I also need to work with employees and I too will need to be able to identify them uniquely. But should I ever look over your shoulder while you are managing employees, I should not be able to figure out which ones you are managing.

So, while in the database, the employee IDs are 1, 2, 3, 4, and 5. You and I do not see them that way in our interface. I might see A, B, C, D, and E, and you might see F, G, H, I, and J. So while E and J both represent the same employee, I can't look at your screen while you are working with your Employee "J" and know that you are working with Employee 5, because for me, that employee is called Employee "E" for me.

So, Fred and Chris can each work with the same set of employees, but if they were to see each other's work, or discussion in an email, they would not be able to know what employees the other guy was talking about.

I was thinking I could achieve this "real-time user-dependent EmployeeID" by taking the real employee ID and hashing it using the user ID as the salt.

Since Fred and Chris each need to discuss employees over email and the telephone with their clients and customers, I'd like the IDs that they use in these discussions to be as simple as we can get them.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

世界如花海般美丽 2024-11-09 21:24:04

从概念上讲,这就是您想要的:

您有一组员工 ID,您可以将其表示为给定空间 S 中的元素。您还有一些用户,并且希望每个用户都能看到特定于该用户的空间排列 S,并且任何其他用户都无法猜测该排列的详细信息。

这就需要对称加密。即,每个员工ID都是一个数值(例如32位整数),用户“A”将员工x视为Ek(x) ,其中k是一个特定于“A”且“B”无法猜测的密钥。因此,您需要两件事:

  • 一个可以处理短值(例如 32 位字)的分组密码;
  • 将用户 ID 转换为用户特定密钥的方法。

对于分组密码来说,问题在于短分组对于分组密码的正常使用(即加密长消息)来说是一个安全问题。因此,所有已发布的安全分组密码都使用大块(64 位或更多)。 64 位可以使用大小写字母和数字来表示超过 11 个字符(6211 比 264 稍大)。如果这对您来说足够好,请使用 3DES。如果你想要更小的东西,你将不得不设计自己的密码,这是完全不推荐的。您可能想尝试 KeeLoq:请参阅本文以获取指导( KeeLoq 在密码学上被“破坏”,但考虑到您的上下文,并没有太多)。有一个通用方法用于构建任意的分组密码块大小,给定一个可查找的流密码,但这主要是理论上的(实现需要通过高精度浮点值,这是可以做到的,但非常慢)。

对于用户特定的密钥:您需要 Web 应用程序可以计算的内容,但用户不能计算。这意味着 Web 应用程序知道密钥 K;那么,用户特定的加密密钥可以是 HMAC 的结果(具有良好的哈希函数(例如 SHA-256)应用于用户 ID,并使用密钥 K。然后将 HMAC 输出截断为用户特定密钥所需的长度(例如,3DES 需要 24 字节密钥)。

C# 具有 TripleDES 和 HMAC/SHA-256 实现(在 System.Security.Cryptography 命名空间中)。

(对于具有 32 位块的块密码,没有普遍接受的安全标准。这仍然是一个开放的研究领域。)

Conceptually, here is what you want:

You have a set of employee IDs which you can represent as element in a given space S. You also have some users, and you want each user to see a permutation of space S, which is specific to the user, and such that the details of that permutation cannot be guessed by any other user.

This calls for symmetric encryption. Namely, each employee ID is a numerical value (e.g. a 32-bit integer), and a user 'A' sees employee x as Ek(x), there k is a secret key which is specific to 'A' and that 'B' cannot guess. So you need two things:

  • a block cipher which can work with short values (e.g. 32-bit words);
  • a method which turns user ID into the user-specific key.

For the block cipher, the trouble is that short blocks are a security issue for the normal usage of a block cipher (i.e. to encrypt long messages). So all published, secure block ciphers use large blocks (64 bits or more). 64 bits can be represented over 11 characters by using uppercase and lowercase letters, and digits (6211 is somewhat greater than 264). If that's good enough for you, then use 3DES. If you want something smaller, you will have to design your own cipher, something which is not recommended at all. You may want to try KeeLoq: see this paper for pointers (KeeLoq is cryptographically "broken" but not too much, given your context). There is a generic method for building block ciphers with arbitrary block sizes, given a seekable stream cipher, but this is mostly theoretical (implementation requires waddling through high-precision floating point values, which can be done but is very slow).

For the user-specific key: you want something that the Web application can compute, but not users. This means that the Web application knows a secret key K; then, the user-specific encryption key can be the result of HMAC (with a good hash function, such as SHA-256) applied over the user ID, and using key K. You then truncate the HMAC output to the length you need for the user-specific key (for instance, 3DES needs a 24-byte key).

C# has TripleDES and HMAC/SHA-256 implementations (in System.Security.Cryptography namespace).

(There is no generally accepted secure standard for a block cipher with 32-bit blocks. This is still an open research area.)

别忘他 2024-11-09 21:24:04

这种方法可能存在问题,但您可以这样做:

  • 创建一个包含所有符号的数组(例如 25 个元素的数组)
  • 使用任何散列函数对字符串进行散列
  • 从生成的散列中选取一些八位位组(如果是 4 个八位位组) 对于每个八位字节,
  • 计算 index = octet % array_size。该索引给出了每个符号的位置

同样,我对密码学、哈希函数等的经验几乎为零,因此您可能需要对此持保留态度。

There might be problems with this approach but you could do it like this:

  • Make an array holding all your symbols (say a 25 element array)
  • Hash your string using whatever hash function
  • Pick a number of octets out of the resulting hash (4 octets if you want 4 symbols in our resulting string) from predefined positions
  • For each octet compute index = octet % array_size. The index gives the position for each of your symbols

Again, I have almost zero experience with cryptography, hash functions and the like so you may want to take this with a grain of salt.

深爱成瘾 2024-11-09 21:24:04

有很多方法可以对信息进行“去匿名化”。如果您能更具体地了解上下文以及您真正想要保护的“资产”以及针对谁的保护,将会有所帮助。请参阅我们的常见问题解答。

例如,一个用户可能知道另一用户的号码吗?如果他们通过其他方式发现1AF3和C2FA之间的对应关系,或许就能很快找到答案。

但特别是对于您的狭义问题,一个好的哈希值已经混合得很好,所以我认为您可以截断例如 SHA-256 哈希值。但托马斯可能会知道最终的答案。

There are many ways to "de-anonymize" information. It would help if you could be more specific about the context and what "assets" you are really trying to protect here, against who. See our faq.

E.g., might one user know the number of another user? They could probably find it out quickly if they discovered thru other means the correspondence between 1AF3 and C2FA.

But specifically for your narrower question, a good hash will already be well-mixed, so I'd think you could just truncate, e.g., a SHA-256 hash value. But Thomas will probably know the definitive answer there.

爺獨霸怡葒院 2024-11-09 21:24:04

以下是我的想法,进入重点(我想如果你说出了你的问题,我就会说出我的答案。我猜你会发现这很有帮助):

  • 托马斯万岁,因为他已经明确地确立了他的观点统治地位。
  • 0-9,AF是数据的表示。您可以将其设为 AZ、0-9,排除一些不常见的字母,并表示每个字符六位。
  • 基本上可以说所有哈希值都存在冲突。如果你接近饱和,你最终会得到两个拥有相同哈希值的人。哈希也是单向的。您需要一个允许反转的映射。如果您有反向映射,为什么不用不冲突的随机字符串填充它呢?
  • 您正在混淆一组有限的数据。使用大而秘密的盐,可以防止逆转。也就是说,您正在用一个 ID 交换另一个 ID。 ID 仍然是唯一且恒定的,所以我想知道这如何增强安全性。
    • 我有一些客户,如果我看到这样的事情,我会下注说员工 ID 是 SSN。我希望你不要这样做。

员工 ID 和员工备用 ID 是您所想出的。由于它们必须对您可逆,但对公众不可逆,因此您需要以双向配对的方式存储它并保密。由于存在与哈希冲突的风险,并且无论如何您都必须有一个反向映射,因此备用 id 也可能是一个随机字符串。无论如何,ID 应该是任意的,我真的很想知道为一名员工使用两个 id 的方法所带来的安全好处;这让我想起了《碟中谍》和 NOC 名单。

Here are my thoughts getting to the point of it (I figure if you talked out your question, I'll talk out my answer. I'm guessing you'll find that helpful):

  • All hail Thomas, because he has clearly established his dominance.
  • 0-9, A-F is a representation of the data. You can make it A-Z, 0-9, exclude some uncommon letters, and represent six bits per character.
  • You can basically say that all hashes have collisions. If you approach saturation, you'll end up with two people who have the same hash. Hashes are also one-way. You would need a mapping that allows reversal. If you have a reverse mapping, why not fill it with random strings which don't collide?
  • You are obfuscating a limited set of data. With a large and secret salt, you can prevent reversal. That said, you're trading one ID for another. The ID is still unique and constant, so I wonder how this enhances security.
    • I have some clients where if I were to see something like this, I'd put money that the employee ID was a SSN. I hope you're not doing that.

Employee ID and Employee Alternate ID are what you are coming up with. Since they have to be reversible to you but not the public, you need to store that in a two way pairing and keep it secret. Since there's risk of collision with a hash and you have to have a reverse map anyway, the alternate id might as well be a random string. An ID should be arbitrary anyway, and I would really like to know the perceived security benefit of your approach with two ids for one employee; it makes me think of Mission Impossible and the NOC list.

七七 2024-11-09 21:24:04

只是基于您添加的额外信息的方法的想法。这个想法的安全性非常非常很轻,如果你认为人们会尝试破解它,我不会推荐它,但它值得投入。

您可以通过根据您自己的员工 ID 对员工 ID 进行位移动来创建个人哈希。然后,通过向结果数字添加所需的任何额外混淆代码,例如将其转换为十六进制。例如,

string hashedEmployeeId = (employeeIdToHash << myEmployeeId).ToString("X");

这将根据您自己的 ID 生成散列员工 ID,但是当员工 ID 变大时(尤其是您自己的 ID),您可能会遇到问题。

重申一下,这本身并不是非常安全,但它可能会帮助您正在路上。

Just an idea for an approach based on the extra information you have added. The security on this idea is very very light and i'm would not recommend it if you think people are going to attempt to crack it, but it's worth throwing in the pot.

You could create a personal hash by bit-shifting the employee Id based on your own employee Id. Then by adding whatever extra obfuscation code you need to the resulting number, such as converting it to hex. E.g.

string hashedEmployeeId = (employeeIdToHash << myEmployeeId).ToString("X");

This will generate hashed employee Ids based on your own Id, but you may run into problems when the employee Ids get large (especially your own!)

Just to reiterate, this on it's own isn't really very secure but it might help you on your way.

白衬杉格子梦 2024-11-09 21:24:04

使用 4 个字符,总共会得到:36^4 = 1679616。
你可以将员工的所有可能性排列在一起。
如果计算平方根,则得到 1296。

然后,您可以生成一个有序表,其中第一列中包含所有可能性,然后将 id 从 1 到 1296 随机分布到其他列中。您会得到这样的结果:

key    a    b
AAAA  386   67
AAAB   86  945
...

通过此解决方案,您将拥有一个可扩展到最多 1296 名员工的查找表。但是,如果您考虑在密钥中添加额外的字符,您将获得更多的可能性 (36^5)^0.5=7776。

通过此解决方案,猜测密钥将让您有机会在 1296 或 7776 上查看有关员工的信息。

性能可能是一个问题,但我认为您可以使用缓存来管理它,或者甚至可以将所有加载的数据保留在内存中,并使用一种树形图来查找两个给定 ID 的相应键。

Using 4 characters you would have a total of: 36^4 = 1679616.
You could permute all possibilities of employes togheter.
If you calculate de square root you get 1296.

You could then generate an ordered table with all the possibilities in the first column and then randomly distribute ids from 1 to 1296 in to oder columns. You would get something like this:

key    a    b
AAAA  386   67
AAAB   86  945
...

With this solution you would have a lookup table scalable up to 1296 employes. However if you consider adding an extra character to your key you would get a lot more possibilities (36^5)^0.5=7776.

With this solution gessing a key would give you one chance on 1296 or 7776 to see information about an employe.

May be performance would be an issue, but I tink you can manage this using a cache or may be even keeping all the data loaded in memory and use a kind of tree map to find corresponding key for two given ids.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文