根据与特定列相关的条件比较两个 Astropy 表

发布于 2025-01-10 20:17:17 字数 1570 浏览 5 评论 0原文

如果这是重复的，我很抱歉，但我自己找不到任何完全相同的东西。

我有两个 Astropy 表，比如说 X 和 Y。每个表都有多个列，但我想要做的是通过在不同列上设置各种条件来比较它们。

例如，表 X 如下所示，有 1000 行和 9 列（假设）：

Name_X (str)	Date_X (float64)	Date (int32)	...
GaiaX21-116383	59458.633888888886	59458	...
GaiaX21-116382	59458.504375	59458	...

表 Y 看起来像这样，有 500 行和 29 列（比方说）：

Name_Y (str14)	Date_Y (float64)	Date (int32)	...
GaiaX21-117313	59461.911724537036	59461	...
GaiaX21-118760	59466.905173611114	59466	...

我想比较两个表 - 基本上，检查两个表中是否存在相同的“名称”。如果是，那么我将其视为“匹配”，并将整行放入一个新表中，并丢弃其他所有内容（或将它们存储在另一个临时表中）。

所以我写了一个这样的函数：

def find_diff(table1, table2, param): # table1 is bigger, param defines which column, assuming they have the same names;
    temp = Table(table1[0:0])
    table3 = Table(table1[0:0])
    for i in range(0, len(table1)):
        for j in range(0, len(table2)):
            if table1[param][i] != table2[param][j]:
#                 temp.add_row(table2[j])
#             else:
                table3.add_row(table1[i])
    return table3

虽然这在原则上是可行的，但它也需要大量的时间才能完成。因此以这种方式运行代码根本不切实际。同样，我想对其他列应用其他条件（例如，交叉匹配观察日期）。

任何建议都会非常有帮助，谢谢！

原文

My apologies if this is a duplicate but I couldn't find anything exactly like this myself.

I have two Astropy tables, let's say X and Y. Each has multiple columns but what I want to do is to compare them by setting various conditions on different columns.

For example, table X looks like this and has 1000 rows and 9 columns (let's say):

Name_X (str)	Date_X (float64)	Date (int32)	...
GaiaX21-116383	59458.633888888886	59458	...
GaiaX21-116382	59458.504375	59458	...

and table Y looks like this and has 500 rows and 29 columns (let's say):

Name_Y (str14)	Date_Y (float64)	Date (int32)	...
GaiaX21-117313	59461.911724537036	59461	...
GaiaX21-118760	59466.905173611114	59466	...

I want to compare the two tables- basically, check if the same 'Name' exists in both Tables. If it does, then I treat that as a "match" and take that entire row and put it in a new table and discard everything else (or store them in another temp Table).

So I wrote a function like this:

def find_diff(table1, table2, param): # table1 is bigger, param defines which column, assuming they have the same names;
    temp = Table(table1[0:0])
    table3 = Table(table1[0:0])
    for i in range(0, len(table1)):
        for j in range(0, len(table2)):
            if table1[param][i] != table2[param][j]:
#                 temp.add_row(table2[j])
#             else:
                table3.add_row(table1[i])
    return table3

While this in principle, works, it also takes a huge amount of time to finish. So it simply isn't practical to be running the code this way. Similarly, I want to apply other conditions for other columns (cross-matching the observation dates, for example).

Any suggestions would be greatly helpful, thank you!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

等风也等你 2025-01-17 20:17:17

听起来您想对名称列进行表连接。这可以按照 https://docs.astropy.org 中的记录来完成/en/stable/table/operations.html#join。

例如

# Assume table_x and table_y
from astropy.table import join
table_xy = join(table_x, table_y, keys_left='Name_X', keys_right='Name_Y')

，作为具有非唯一键值的完整示例：

In [10]: t1 = Table([['x', 'x', 'y', 'z'], [1,2,3,4]], names=['a', 'b'])

In [11]: t2 = Table([['x', 'y', 'y', 'Q'], [10,20,30,40]], names=['a', 'c'])

In [12]: table.join(t1, t2, keys='a')
Out[12]: 
<Table length=4>
 a     b     c  
str1 int64 int64
---- ----- -----
   x     1    10
   x     2    10
   y     3    20
   y     3    30

It sounds like you want to do a table join on the name columns. This can be done as documented at https://docs.astropy.org/en/stable/table/operations.html#join.

E.g.

# Assume table_x and table_y
from astropy.table import join
table_xy = join(table_x, table_y, keys_left='Name_X', keys_right='Name_Y')

As a full example with non-unique key values:

In [10]: t1 = Table([['x', 'x', 'y', 'z'], [1,2,3,4]], names=['a', 'b'])

In [11]: t2 = Table([['x', 'y', 'y', 'Q'], [10,20,30,40]], names=['a', 'c'])

In [12]: table.join(t1, t2, keys='a')
Out[12]: 
<Table length=4>
 a     b     c  
str1 int64 int64
---- ----- -----
   x     1    10
   x     2    10
   y     3    20
   y     3    30

回复收藏 0 原文

遮了一弯 2025-01-17 20:17:17

我相信这个网站将是解决这个问题的最好朋友： https:// /pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

所以理论上我相信你会想要这样的东西：

result = pd.merge(table_y, table_x, on="Name")

所以这里的关键区别是你可能需要使用该列表的名称，以便它们具有相同的名称。不过，这将在两个表之间的“名称”列上进行匹配，如果它们相同，则会将其放入结果变量中。从那里你可以对数据框做任何你想做的事情

I believe this site would be your best friend for this problem: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

So in theory I believe you would want something like this:

result = pd.merge(table_y, table_x, on="Name")

So the key difference here would be that you might need to play with the column names for the tables so that they have the same name. What this will do though is it will match on the "Name" column between the two tables and if they are the same then it will put it in the results variable. From there you can do whatever you would like with the dataframe

回复收藏 0 原文

~没有更多了~