提取包含两个可能的字符串之一的列

发布于 2025-01-17 18:52:30 字数 391 浏览 1 评论 0原文

我正在浏览许多TXT文件，这些文件具有不一致的数据命名实践。我想提取包含特定数据的列，但是它具有几个不同的名称，具体取决于文件，通常在标头列中的位置不同。

到目前为止，我有：

if "Var_version1" in df1.columns or 'Var_version2' in df1.columns: 
    df2 = df1[["Other_var1","Other_var2","Var_version1"]].copy()

if或循环是正确的，但是提取是在我意识到不同命名约定之前就持有的。如果标题中包含其名称中的特定字符串或替代字符串，该如何提取整列？（注意：标题名称可以是xxxxvar_version1xxxx，而不仅仅是var_verison1）谢谢你！

原文

I am looping through many txt files which have inconsistent data naming practices.
I would like to extract a column which contains specific data, however it has a few different names depending on the file, and often a different location within the header column.

So far I have:

if "Var_version1" in df1.columns or 'Var_version2' in df1.columns: 
    df2 = df1[["Other_var1","Other_var2","Var_version1"]].copy()

The if or loop is correct, but the extraction is a hold over from before I realized the different naming conventions.
How do I extract the entire column if the header contains a particular string or an alternate string within its name? (note: the header name may be xxxxVar_version1xxxx, not just Var_verison1)
Thank you!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风筝有风，海豚有海 2025-01-24 18:52:30

您可以使用以下等级或” nofollow noreferrer“>” >过滤器：

sel = df1.filter(regex='Var_version[12]').columns.to_list()
df2 = df1[["Other_var1","Other_var2"]+sel]

或：

import re
possibilities = ['Var_version1', 'Var_version2']
sel = (df1.filter(regex='|'.join(re.escape(x) for x in possibilities))
          .columns.to_list())
df2 = df1[["Other_var1","Other_var2"]+sel]

You can use a regex, or list of possibilities combined with filter:

sel = df1.filter(regex='Var_version[12]').columns.to_list()
df2 = df1[["Other_var1","Other_var2"]+sel]

or:

import re
possibilities = ['Var_version1', 'Var_version2']
sel = (df1.filter(regex='|'.join(re.escape(x) for x in possibilities))
          .columns.to_list())
df2 = df1[["Other_var1","Other_var2"]+sel]

回复收藏 0 原文

~没有更多了~