根据与特定列相关的条件比较两个 Astropy 表
如果这是重复的,我很抱歉,但我自己找不到任何完全相同的东西。
我有两个 Astropy 表,比如说 X 和 Y。每个表都有多个列,但我想要做的是通过在不同列上设置各种条件来比较它们。
例如,表 X 如下所示,有 1000 行和 9 列(假设):
Name_X (str) | Date_X (float64) | Date (int32) | ... |
---|---|---|---|
GaiaX21-116383 | 59458.633888888886 | 59458 | ... |
GaiaX21-116382 | 59458.504375 | 59458 | ... |
表 Y 看起来像这样,有 500 行和 29 列(比方说):
Name_Y (str14) | Date_Y (float64) | Date (int32) | ... |
---|---|---|---|
GaiaX21-117313 | 59461.911724537036 | 59461 | ... |
GaiaX21-118760 | 59466.905173611114 | 59466 | ... |
我想比较两个表 - 基本上,检查两个表中是否存在相同的“名称”。如果是,那么我将其视为“匹配”,并将整行放入一个新表中,并丢弃其他所有内容(或将它们存储在另一个临时表中)。
所以我写了一个这样的函数:
def find_diff(table1, table2, param): # table1 is bigger, param defines which column, assuming they have the same names;
temp = Table(table1[0:0])
table3 = Table(table1[0:0])
for i in range(0, len(table1)):
for j in range(0, len(table2)):
if table1[param][i] != table2[param][j]:
# temp.add_row(table2[j])
# else:
table3.add_row(table1[i])
return table3
虽然这在原则上是可行的,但它也需要大量的时间才能完成。因此以这种方式运行代码根本不切实际。同样,我想对其他列应用其他条件(例如,交叉匹配观察日期)。
任何建议都会非常有帮助,谢谢!
My apologies if this is a duplicate but I couldn't find anything exactly like this myself.
I have two Astropy tables, let's say X and Y. Each has multiple columns but what I want to do is to compare them by setting various conditions on different columns.
For example, table X looks like this and has 1000 rows and 9 columns (let's say):
Name_X (str) | Date_X (float64) | Date (int32) | ... |
---|---|---|---|
GaiaX21-116383 | 59458.633888888886 | 59458 | ... |
GaiaX21-116382 | 59458.504375 | 59458 | ... |
and table Y looks like this and has 500 rows and 29 columns (let's say):
Name_Y (str14) | Date_Y (float64) | Date (int32) | ... |
---|---|---|---|
GaiaX21-117313 | 59461.911724537036 | 59461 | ... |
GaiaX21-118760 | 59466.905173611114 | 59466 | ... |
I want to compare the two tables- basically, check if the same 'Name' exists in both Tables. If it does, then I treat that as a "match" and take that entire row and put it in a new table and discard everything else (or store them in another temp Table).
So I wrote a function like this:
def find_diff(table1, table2, param): # table1 is bigger, param defines which column, assuming they have the same names;
temp = Table(table1[0:0])
table3 = Table(table1[0:0])
for i in range(0, len(table1)):
for j in range(0, len(table2)):
if table1[param][i] != table2[param][j]:
# temp.add_row(table2[j])
# else:
table3.add_row(table1[i])
return table3
While this in principle, works, it also takes a huge amount of time to finish. So it simply isn't practical to be running the code this way. Similarly, I want to apply other conditions for other columns (cross-matching the observation dates, for example).
Any suggestions would be greatly helpful, thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
听起来您想对名称列进行表连接。这可以按照 https://docs.astropy.org 中的记录来完成/en/stable/table/operations.html#join。
例如
,作为具有非唯一键值的完整示例:
It sounds like you want to do a table join on the name columns. This can be done as documented at https://docs.astropy.org/en/stable/table/operations.html#join.
E.g.
As a full example with non-unique key values:
我相信这个网站将是解决这个问题的最好朋友: https:// /pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
所以理论上我相信你会想要这样的东西:
所以这里的关键区别是你可能需要使用该列表的名称,以便它们具有相同的名称。不过,这将在两个表之间的“名称”列上进行匹配,如果它们相同,则会将其放入结果变量中。从那里你可以对数据框做任何你想做的事情
I believe this site would be your best friend for this problem: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
So in theory I believe you would want something like this:
So the key difference here would be that you might need to play with the column names for the tables so that they have the same name. What this will do though is it will match on the "Name" column between the two tables and if they are the same then it will put it in the results variable. From there you can do whatever you would like with the dataframe