AWS Databricks定价 - 除DBU成本外,我们还应该为EC2实例支付吗?
我正在尝试在AWS环境中托管的AWS胶水和数据映之间进行一些成本比较。为了进行比较,我选择了M4.xlarge,相当于AWS胶(4 VCPU/16GB内存)中的1 dpu。
假设我有一份Pyspark的工作,那预计将每天用5DPU跑30天。我的成本估计器按AWS如下:
胶合成本估算器:5 dpus x 30.00小时x 0.44 USD每dpu小时= 66.00 USD(Apache spark eTL工作成本)
databricks成本估算器:这给出了74 USD的每月估计值
我担心是否还必须为6个节点支付AWS的任何EC2费用,除了这73美元。这是由于估算中添加的注释“ 此定价计算器仅提供估算数据磁计成本。您的实际成本取决于您的实际用法。此外,估计的成本不包括任何必需的AWS服务的成本(例如EC2实例)。“
除了Databricks成本外,对于此实例类型/计数,这将是额外的36美元。有人可以澄清一下,以便我们可以决定使用AWS胶水或数据映射。我知道在Databricks中我们可以选择任何实例类型,但是问题是我是否单独支付EC2费用。谢谢
am trying to do some cost comparison between AWS Glue and Databricks hosted on an AWS environment. For the comparison, I have chosen m4.xlarge which is equivalent of 1 DPU in AWS Glue (4 vCPUs/16GB memory).
Assuming I have an pyspark job thats expected to run for 1 hour daily for 30 days with 5DPUs. My cost estimator as per AWS is as follows:
glue cost estimator : 5 DPUs x 30.00 hours x 0.44 USD per DPU-Hour = 66.00 USD (Apache Spark ETL job cost)
Databricks cost estimator : This gives an monthly estimate of 74 USD
Am concerned if we have to pay any EC2 cost to AWS for the 6 nodes in addition to this 73 USD. This is due to the note added in the estimate "This Pricing Calculator provides only an estimate of your Databricks cost. Your actual cost depends on your actual usage. Also, the estimated cost doesn't include cost for any required AWS services (e.g. EC2 instances)."
That will be an additional 36 USD approximately for this instance type/count, in addition to databricks cost. Can someone please clarify so we can make a decision to go with AWS Glue or Databricks. I know in databricks we can choose any instance type, but the question is if i pay EC2 cost seperately. Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
答案是肯定的。
您应该为Databricks直接使用的所有基础架构付费。
如脚注所述,您添加了:
此定价计算器仅提供您数据链助剂成本的估计。您的实际成本取决于您的实际用法。此外,估计的成本不包括任何必需的AWS服务(例如EC2实例)的成本。
将其视为软件成本的软件许可,无论您是使用软件还是使用软件,不是。
这一点已通过Databricks解决方案架构师进行了验证,该架构师在实施Databricks解决方案时伴随着我们的公司。
The answer is yes.
You should pay for all the infrastructure used directly by Databricks.
As mentioned in the footnote you added:
This Pricing Calculator provides only an estimate of your Databricks cost. Your actual cost depends on your actual usage. Also, the estimated cost doesn't include the cost for any required AWS services (e.g. EC2 instances).
Think of it as a software license on top of hardware costs that you would pay anyway, whether you use the software or not.
This point was verified with Databricks Solution Architect that accompanies our company while implementing the Databricks solution.
正如其他人所说的那样,答案是肯定的,您应该将Databricks视为一个很好的“机器经理”,而AWS则是提供实际机器。确定您为职位付款多少的一般公式是:
您可以将其简化为
这是什么意思?好
total_worker_hours
是您对工人的实例时间总数。因此,如果8名工人跑了1个小时的工作,您将有8个小时的工作时间。当然,这种计算对自动尺度之类的事情变得更加棘手。同样,total_driver_hours
是驱动程序实例小时的总数;但是,由于只有一个驱动程序,所以这只是您的工作跑动数量。因此,括号中的变量,例如
(driver_dbu_per_hour * cost_of_dbu + driver_ec2_instance_cost)
只需告诉您驱动程序的小时费率(以及类似的工作,您的工人)。一旦拥有这些价值,您就会知道您要付多少钱。As others have said, the answer is yes and you should think of Databricks as a good "machine manager" and AWS as providing the actual machines. The general formula for determining how much you pay for a job run is:
You can simplify this to
So what does this formula mean? Well
total_worker_hours
is the total number of instance hours you had for your workers. So if 8 workers ran were used for a 1 hour job, you'd have 8 worker hours. This calculation gets a bit more tricky with things like auto-scale, of course. Similarly,total_driver_hours
is the total number of driver instance hours; but since there's only one driver, it's just the number of hours your job ran.So the variables in the parentheses, like
(driver_dbu_per_hour * cost_of_dbu + driver_ec2_instance_cost)
just tells you your hourly rate for a driver (and similarly, your workers). Once you have these values, you're set to know how much you'd pay.在您选择的全功能计算的数据链球插图比较的屏幕截图中,这是一种用于临时开发的更昂贵类型。按计划运行Spark作业时,您可能会使用“作业”。
作业计算:0,2/dbu
通用计算:
使用作业计算0,65/dbu:
databricks成本calulator
$ 22,5(databricks dbu) + $ 36取决于现货价格等等)= $ 58,5
In the screenshot from the Databricks cost comparison you have chosen All-purpose compute, this is a more expensive type used for ad hoc development. Likely you will use "jobs" when running your spark jobs on a schedule.
Jobs Compute: 0,2/DBU
All-purpose Compute: 0,65/DBU
With using jobs compute:
Databricks cost calulator
$22,5(Databricks DBU) + $36 (AWS EC2 cost, which will differ a bit depending on spot prices etc.) = $58,5