当计算基于不同格式的查找表时,是否可以计算结果并将其添加到现有数据帧?
更准确地说:
我有一个逗号分隔的文件,我将其打开并作为 pandas 数据帧读取。读入数据帧包含 38 个不同的数据列和一个(在读入过程中创建时)超过数千行的附加索引列:
data:image/s3,"s3://crabby-images/8931b/8931bc4781032e47c3d5249dfe401968d395634a" alt="eSection of dataframe"
我的查找表包含作为基础的值 计算。
同样,它是作为 pandas 数据帧读取的逗号分隔文件。它包含 24 行和 6 列以及一个附加索引列:
data:image/s3,"s3://crabby-images/bf3a7/bf3a771589feb2422f7dbbb92b8d2c2d003387fd" alt="查找表"
这是我尝试实现的计算:
在新列“M_A”中,我想编写如下计算结果:
data:image/s3,"s3://crabby-images/c48cb/c48cbd4b3ce171ad19c71c21b15646abd2a8af36" alt="计算公式"
其中 i 代表 C00、C01、C02...C22、C23 的相应值
而SP、FR、C00、C01、C02 [...] 是“数据”数据帧的列部分,
PV、W 和 RC_A 是查找表数据帧的一部分。
“数据”表和“查找”表的共同索引参数是根据查找表的“C”列的C00、C01、C02列的值。当数据列 C00、C01、C02... 与查找表行 C00、C01、C02... 匹配时,应采用计算值,
因为对于这种大小的数据集,迭代不是推荐的解决方案,我尝试过但没有找到正确的解决方案因为我的查找表的长度与数据表的长度不同。
df_data['A_calc'] = ((df_data.T / (df_data.SF * df_data.SP)) * ((df_data.C00 * df_lookup.PV * df_lookup.W * df_lookup.RC_A) + (df_data.C01 * df_lookup.PV * df_lookup.W * df_lookup.RC_A) + (df_data.C02 * df_lookup.PV * df_lookup.W * df_lookup.RC_A)+ ...)
这会导致错误消息:
AttributeError: 'DataFrame' object has no attribute 'PTU_Air密度_recalc'
有没有办法在 Python 中使用 Pandas df 实现这一点?也许比我的更优雅,我选择形象化我的意图......
有什么建议吗?
谢谢,斯瓦瓦
Is it possible to calculate and add the results to an existing dataframe while the calculation bases on a lookup table with a different format?
More precise:
I have a comma separated file that I open and read in as pandas dataframe. The read-in dataframe contains 38 different data columns and an (while read in process created) additional index column over several thousand rows:
data:image/s3,"s3://crabby-images/8931b/8931bc4781032e47c3d5249dfe401968d395634a" alt="eSection of dataframe "data""
My lookup table contains values as base for a calculation.
As well, it is a comma separated file read in as pandas dataframe. It contains 24 rows and 6 columns and an additional index column:
data:image/s3,"s3://crabby-images/6bc19/6bc191fdd8a6f3aa941d40176c89eb9087592c15" alt="Lookup table"
And here comes the calculation which I try to realize:
In a new column "M_A" I want to write the result of a calculation like this:
data:image/s3,"s3://crabby-images/e7fa5/e7fa5b55801303ecf0136a22759efbfc22efde17" alt="Calculation formula"
while i stands for the according values of C00, C01, C02....C22, C23
While SP, FR, C00, C01, C02 [...] are a column part of the "data" dataframe,
PV, W and RC_A are part of the lookup table dataframe.
Common indexing parameter of the "data" and the "lookup" tables are values of the colums of C00, C01, C02 according to column "C" of the lookup table. Calculation values should be taken when data column C00, C01, C02... match lookup table row C00, C01, C02...
As iteration is not a recommended solution for datasets of this size I tried it without but do not find the right way as my lookup table has not the same length as my data table.
df_data['A_calc'] = ((df_data.T / (df_data.SF * df_data.SP)) * ((df_data.C00 * df_lookup.PV * df_lookup.W * df_lookup.RC_A) + (df_data.C01 * df_lookup.PV * df_lookup.W * df_lookup.RC_A) + (df_data.C02 * df_lookup.PV * df_lookup.W * df_lookup.RC_A)+ ...)
This leads to the Error message:
AttributeError: 'DataFrame' object has no attribute 'PTU_Airdensity_recalc'
Is there a way to realize this in Python with Pandas df? Maybe even more elegant than mine which I choose to visualize what my intention is...
Any suggestions?
Thanks, Swawa
发布评论
评论(2)
因此,为了使我理解应用公式;对于每列CI,我们将其乘以值PV [i],W [i],rc_a [i],然后总和每个结果
So for my understanding to apply the formula; for each column Ci we multiply it with values PV[i], W[i],RC_A[i] then sum over each result
该版本现已运行。非常感谢Ran A的帮助!
这条线然后与“。”一起工作。而不是“*”。
但我仍在研究循环......
This version is working now. Thanks a lot to the help of Ran A!
this line is working then with "." instead of "*".
But the loop I am still working on...