用于广泛数据挖掘的服务器规格
我正在运行一个数据挖掘项目,该项目解析大约 2 GB 的 RDF 数据集以生成图形(大约 100 mb)并保存为 python pickle。
遗憾的是,由于内存有限(内存错误),我当前配备 4GB RAM 的 Dell poweredge 无法保存图表。我尝试过其他方法来保存它,例如 gml 或纯文本或邻接,但我想我似乎需要更多 RAM。
我应该继续购买一台大约 12GB RAM 的优质服务器,还是其他因素会加速解析和搜索(例如多核?在脚本中使用多个线程?)。
如果是硬件,您能否推荐一些好的服务器型号来购买,因为我不太擅长处理硬件规格。我的预算大约是 3500 美元。
I am running a data mining project that parses a RDF dataset of around 2 GB to generate graphs (around 100 mb) and saves as python pickle.
Sadly, my current Dell poweredge with 4GB RAM can't save the graph due to limited memory (memory error). I have tried other ways to save it like gml or plaintext or adjacency but seems like I need more RAM I suppose.
Should I just go ahead and buy a good server with around 12GB RAM, or will other factors speed up the parsing and search (like multicore ? using multiple threads in script? ).
If it's the h/w, can you please suggest some good server models to buy as I am not very adept at dealing with hardware specs. My budget is around $3500.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
2GB 大且输出约为 100MB 的数据集并不是很大。如果您有 4GB 物理 RAM 并启用了交换,则不应由于物理硬件限制而出现内存不足错误。
您使用什么软件来处理数据并呈现结果?你使用什么操作系统?这可能更多是您所使用的软件的限制/错误导致导出时出现内存不足的情况。
Data sets that are 2GB large with output that's around 100MB is not really huge. If you have 4GB of physical RAM and swapping enabled, you should not get an out of memory error due to physical hardware constraints.
What software are you using to process your data and render your result? What OS are you on? It could be more a limitation of / bug in the software that you are using that you get an out of memory condition when exporting.