Erlang:数组上的分布式工作

发布于 2025-01-01 08:14:28 字数 543 浏览 4 评论 0原文

我正在开发一个项目,其中我们有一个充当哈希的原子数组。每当用户连接到服务器时,都会对某个值进行哈希处理,并且该哈希值用作索引来查找数组中的元素并返回该元素。 “外部力量”(由长期运行的 gen_server 处理)能够更改此数组,因此我不能简单地对其进行硬编码。我的问题是如何“托管”这个数组。

我的第一个实现是一个简单的 gen_server,它保留了数组的副本并将其发送给需要它的人。然后请求它的进程可以遍历它并获得他们想要的索引。这个实现使用了过多的内存,我将其归因于存在大量相同数组的副本。

我当前的实现有一个中央 gen_server 处理该数组的状态,以及处理实际请求的子项。当状态发生变化时,中央 gen_server 会更新子级。当一个进程想要找到它的哈希结果时,它会将其索引号发送到中央 gen_server,后者将请求转发给其中一个子进程。子进程遍历其“本地”列表,并将结果原子发送回原始进程。

当前实施的问题是它会在高流量时陷入困境。我尝试使用越来越多的孩子,但我很确定中央 gen_server 是瓶颈。

有人对我的问题有更好的解决方案吗?

编辑:%s/数组/列表/g

I'm working on a project where we have an array of atoms which acts as a hash. Whenever a user connects to the server a certain value is hashed, and that hash is used as an index to lookup the element in the array, and return that element. "Outside forces" (which are handled by a long-running gen_server) are able to change this array, so I can't simply hardcode it. My problem is how to "host" this array.

My first implementation was a simple gen_server which kept a copy of the array around and sent it to whoever asked for it. The process asking for it could then traverse it and get the index they want. This implementation had and inordinate amount of memory being used, which I attributed to there being so many copies of this same array floating around.

My current implementation has a central gen_server which handles the state of this array, and children which handle the actual requests. When the state changes the central gen_server updates the children. When a process wants to find it's hash result it sends its index number to the central gen_server, which forwards the request to one of the children. The child traverses its "local" list, and sends the resulting atom back to the original process.

The problem with the current implementation is that it gets bogged down at high traffic. I've tried using more and more children, but I'm pretty sure the central gen_server is the bottleneck.

Does anyone have any ideas on a better solution to my problem?

EDIT: %s/array/list/g

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

尛丟丟 2025-01-08 08:14:28

我建议你使用ETS Tables。我认为Array方法不够高效。通过在应用程序后端中公开创建的 ETS 表,任何流程都可以在需要时立即查找项目。当前较新版本的erlang中的ETS表具有并发访问的能力。

%% Lets create a record structure 
%% where by the key will be a value
%% in the array.
%% For now, i do not know what to 
%% put in the field: 'other'
-record(element,{key,other}).
create_table(TableName)-> Options = [ named_table,set, public, {keypos,2}, %% coz we are using record NOT tuple {write_concurrency,true} ], case ets:new(TableName,Options) of TableName -> {success,true}; Error -> {error,Error} end.
lookup_by_hash(TableName,HashValue)-> try ets:lookup(TableName,HashValue) of Value -> {value,Value}; catch X:Y -> {error,{X,Y}} end.

With this kind of arrangement, you will avoid A Single Point of Failure arising from a single gen_server holding data. This data is needed by many processes and hence should not be held by a single process. That's where a Table accessible by any process at any time as soon as it needs to make a look up.

The Values in the Array should be converted to records of the form as element and then inserted in the ETS Tables.

Advantages of this approach

1. We can create as many ETS Tables as possible
2. An ETS Table can handle many more elements than a data structure such as a list or an Array with much lower comparable memory consumption.
3. ETS Tables can be concurrently accessed by any process within reach and hence you will not need a central process or server to handle data

4. A single process or gen_server holding this data, means that if its compromised (goes down due to a full mail box), it will be unavailable, hence the processes which need the array will have to wait for this one server to either restart or i dont know....
5. Accessing the Array data by sending request messages plus making copies of the same array to each process that needs it is not "Erlangic" design.
6. Finally, ETS Tables ownership can be transferred from process to process. When the owning process is crashing (Only gen_servers can detect that they are dying [take note of this]), it can transfer the ETS Table to another process to take over. Check here: ETS Give Away

That's my thinking.

I suggest that you use ETS Tables.I think that the Array method is not efficient enough. With an ETS Table, created as public within the application backend, any process can lookup an item as soon as it needs it. ETS Tables in the current newer versions of erlang have the capability for concurrent access.

%% Lets create a record structure 
%% where by the key will be a value
%% in the array.
%% For now, i do not know what to 
%% put in the field: 'other'
-record(element,{key,other}).
create_table(TableName)-> Options = [ named_table,set, public, {keypos,2}, %% coz we are using record NOT tuple {write_concurrency,true} ], case ets:new(TableName,Options) of TableName -> {success,true}; Error -> {error,Error} end.
lookup_by_hash(TableName,HashValue)-> try ets:lookup(TableName,HashValue) of Value -> {value,Value}; catch X:Y -> {error,{X,Y}} end.

With this kind of arrangement, you will avoid A Single Point of Failure arising from a single gen_server holding data. This data is needed by many processes and hence should not be held by a single process. That's where a Table accessible by any process at any time as soon as it needs to make a look up.

The Values in the Array should be converted to records of the form as element and then inserted in the ETS Tables.

Advantages of this approach

1. We can create as many ETS Tables as possible
2. An ETS Table can handle many more elements than a data structure such as a list or an Array with much lower comparable memory consumption.
3. ETS Tables can be concurrently accessed by any process within reach and hence you will not need a central process or server to handle data

4. A single process or gen_server holding this data, means that if its compromised (goes down due to a full mail box), it will be unavailable, hence the processes which need the array will have to wait for this one server to either restart or i dont know....
5. Accessing the Array data by sending request messages plus making copies of the same array to each process that needs it is not "Erlangic" design.
6. Finally, ETS Tables ownership can be transferred from process to process. When the owning process is crashing (Only gen_servers can detect that they are dying [take note of this]), it can transfer the ETS Table to another process to take over. Check here: ETS Give Away

That's my thinking.

扶醉桌前 2025-01-08 08:14:28

不确定这是否有帮助,但是您可以像管理任何其他值一样管理分布式哈希表中的中央哈希值(独立于您的哈希业务)吗?这样,多个进程就可以承担负载,而不是一个中央进程。

从我读到的内容来看,该数组似乎并不真正需要是一个数组。

Not sure if this helps, but could you manage the central hash value in a distributed hash-table (independent from your hash business) just as any other value? That way multiple process can take the load instead of one central process.

From what I read, the array does not seem to really need to be an array.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文