Memcached 和 Hadoop 之间的区别?

发布于 2024-12-10 19:29:19 字数 442 浏览 0 评论 0原文

Memcached 和 Hadoop 之间的基本区别是什么? Microsoft 似乎使用 Windows Server AppFabric 来实现 memcached。

我知道 memcached 是一个使用多个服务器的巨型键值哈希函数。什么是 hadoop?hadoop 与 memcached 有何不同?是用来存储数据的吗?物体?我需要在内存对象中保存巨大的对象,但似乎我需要某种方法将这个巨大的对象分割成人们所说的“块”。当我考虑将对象拆分为字节时,Hadoop 似乎突然出现了。

我的内存有一个巨大的类,内存超过 100 MB。我需要复制这个对象,以某种方式缓存这个对象。当我研究缓存这个怪物对象时,似乎我需要像谷歌那样分割它。谷歌是如何做到这一点的。 hadoop 在这方面如何帮助我?我的对象不是简单的结构化数据。它有内部类上下的参考资料等。

任何想法、指针、想法、猜测都是有帮助的。

谢谢。

What is the basic difference between Memcached and Hadoop? Microsoft seems to do memcached with the Windows Server AppFabric.

I know memcached is a giant key value hashing function using multiple servers. What is hadoop and how is hadoop different from memcached? Is it used to store data? objects? I need to save giant in memory objects, but it seems like I need some kind of way of splitting this giant objects into "chunks" like people are talking about. When I look into splitting the object into bytes, it seems like Hadoop is popping up.

I have a giant class in memory with upwards of 100 mb in memory. I need to replicate this object, cache this object in some fashion. When I look into caching this monster object, it seems like I need to split it like how google is doing. How is google doing this. How can hadoop help me in this regard. My objects are not simple structured data. It has references up and down the classes inside, etc.

Any idea, pointers, thoughts, guesses are helpful.

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

压抑⊿情绪 2024-12-17 19:29:19

memcached [ http://en.wikipedia.org/wiki/Memcached ] 是一个单一的集中分布式缓存技术。

apache hadoop [ http://hadoop.apache.org/ ] 是一个分布式数据处理框架 - 针对谷歌/亚马逊扩展了许多 TB 的数据。它包括该问题不同领域的子项目 - 分布式数据库、分布式处理算法、报告/查询、数据流语言。

这两种技术解决不同的问题。一种是跨集群缓存(小型或大型项目)。第二个是用于跨集群处理大型项目。从你的问题来看,memcached 似乎更适合你的问题。

memcached [ http://en.wikipedia.org/wiki/Memcached ] is a single focused distributed caching technology.

apache hadoop [ http://hadoop.apache.org/ ] is a framework for distributed data processing - targeted at google/amazon scale many terrabytes of data. It includes sub-projects for the different areas of this problem - distributed database, algorithm for distributed processing, reporting/querying, data-flow language.

The two technologies tackle different problems. One is for caching (small or large items) across a cluster. And the second is for processing large items across a cluster. From your question it sounds like memcached is more suited to your problem.

白况 2024-12-17 19:29:19

由于存储对象的值受到限制,Memcache 无法工作。
内存缓存常见问题解答。我读到一些地方说这个限制可以增加到 10 MB,但我找不到链接。

对于您的用例,我建议尝试一下 mongoDB。
mongoDb 常见问题解答。 MongoDB 可以用作 memcache 的替代品。它提供 GridFS 用于在数据库中存储大型文件系统。

Memcache wont work due to its limit on the value of object stored.
memcache faq . I read some place that this limit can be increased to 10 mb but i am unable to find the link.

For your use case I suggest giving mongoDB a try.
mongoDb faq . MongoDB can be used as alternative to memcache. It provides GridFS for storing large file systems in the DB.

黯然 2024-12-17 19:29:19

您需要使用纯 Hadoop 来满足您的需要(没有 HBASE、HIVE 等)。 MapReduce机制会将你的对象分割成许多块并将其存储在Hadoop中。 MapReduce 教程位于此处。但是,不要忘记 Hadoop 首先是海量计算和存储的解决方案。对于您的情况,我还建议检查 Membase,它是具有附加存储功能的 Memcached 的实现。您将无法使用 memcached/membase 进行映射减少,但它们仍然是分布式的,并且您的对象可能以云方式缓存。

You need to use pure Hadoop for what you need (no HBASE, HIVE etc). The Map Reduce mechanism will split your object into many chunks and store it in Hadoop. The tutorial for Map Reduce is here. However, don't forget that Hadoop is, in the first place, a solution for massive compute and storage. In your case I would also recommend checking Membase which is implementation of Memcached with addition storage capabilities. You will not be able to map reduce with memcached/membase but those are still distributed and your object may be cached in a cloud fashion.

错々过的事 2024-12-17 19:29:19

选择一个好的解决方案取决于预期用途的要求,例如永久存储法律文档与免费音乐服务之间的区别。例如,这些对象可以重新创建还是它们是独一无二的?他们是否需要进一步的处理步骤(即 MapReduce)?需要多快检索一个对象(或其一部分)?这些问题的答案将广泛影响解决方案集。

如果可以足够快地重新创建对象,一个简单的解决方案可能是使用 Memcached,正如您提到的那样,在具有足够 RAM 的多台计算机上使用 Memcached。为了稍后添加持久性,CouchBase(以前称为 Membase)值得一看,并用于超大型游戏平台的生产。

如果无法重新创建对象,请确定 S3 和其他云文件提供商目前是否无法满足要求。对于高吞吐量访问,请考虑几种分布式、并行、容错文件系统解决方案之一:DDN(具有GPFS 和 Lustre 设备),Panasas (pNFS)。我使用过 DDN 设备,它的价格比 Panasas 更好。两者都提供了比 DIY 更受支持的良好解决方案 BackBlaze

有一些分布式并行文件系统的免费实现,例如 GlusterFSCeph 正在获得关注。 Ceph 推出了兼容 S3 的网关,并且可以使用 BTRFS(Lustre 的未来替代品;更接近生产准备)。 Ceph 架构和演示。 Gluster 的优势是可以选择商业支持,尽管可能有供应商支持 Ceph 部署。 Hadoop 的 HDFS 可能具有可比性,但我最近没有评估过它。

Picking a good solution depends on requirements of the intended use, say the difference between storing legal documents forever to a free music service. For example, can the objects be recreated or are they uniquely special? Would they be requiring further processing steps (i.e., MapReduce)? How quickly does an object (or a slice of it) need to be retrieved? Answers to these questions would affect the solution set widely.

If objects can be recreated quickly enough, a simple solution might be to use Memcached as you mentioned across many machines totaling sufficient ram. For adding persistence to this later, CouchBase (formerly Membase) is worth a look and used in production for very large game platforms.

If objects CANNOT be recreated, determine if S3 and other cloud file providers would not meet requirements for now. For high-throuput access, consider one of the several distributed, parallel, fault-tolerant filesystem solutions: DDN (has GPFS and Lustre gear), Panasas (pNFS). I've used DDN gear and it had a better price point than Panasas. Both provide good solutions that are much more supportable than a DIY BackBlaze.

There are some mostly free implementations of distributed, parallel filesystems such as GlusterFS and Ceph that are gaining traction. Ceph touts an S3-compatible gateway and can use BTRFS (future replacement for Lustre; getting closer to production ready). Ceph architecture and presentations. Gluster's advantage is the option for commercial support, although there could be a vendor supporting Ceph deployments. Hadoop's HDFS may be comparable but I have not evaluated it recently.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文