识别包含数字和字符串的Pandas DataFrame列

发布于 2025-01-19 20:36:53 字数 1375 浏览 2 评论 0原文

我已经创建了以下数据框(称为df):

d = {'ltv': [1, 22,45,78], 'age': [33, 43,54,65],'job': ['Salaried','Salaried','Salaried','Owner'], 'UniqueID' : ['A1','A2','A3','A4'] }
df = pd.DataFrame(data=d)

看起来像这样:

print(df)

   ltv  age       job UniqueID
     1   33  Salaried       A1
    22   43  Salaried       A2
    45   54  Salaried       A3
    78   65     Owner       A4

我检查了其列类型:

print(df.info())

 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   ltv       4 non-null      int64 
 1   age       4 non-null      int64 
 2   job       4 non-null      object
 3   UniqueID  4 non-null      object

我仅关注两个对象列job和<代码>唯一。 如您所见:

  • job仅包含字符串
  • 字符串和数字

uniqueID包含我希望能够识别列的 (​​在这种情况下为simolor> unique> unique )包含字符串和数字。

如果我将以下代码用于unique

print(df['UniqueID'].str.isalnum())

0    True
1    True
2    True
3    True

我看到它返回true为所有记录返回,这很棒。现在,如果我对job使用相同的代码,我会得到相同的结果:

print(df['job'].str.isalnum())

    0    True
    1    True
    2    True
    3    True

因此,如何在Pandas中识别包含字符串和数字的列(在此示例中:simolor IndoryID < /代码>)?

I have created the following dataframe (called df):

d = {'ltv': [1, 22,45,78], 'age': [33, 43,54,65],'job': ['Salaried','Salaried','Salaried','Owner'], 'UniqueID' : ['A1','A2','A3','A4'] }
df = pd.DataFrame(data=d)

which looks like this:

print(df)

   ltv  age       job UniqueID
     1   33  Salaried       A1
    22   43  Salaried       A2
    45   54  Salaried       A3
    78   65     Owner       A4

I have checked its columns types:

print(df.info())

 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   ltv       4 non-null      int64 
 1   age       4 non-null      int64 
 2   job       4 non-null      object
 3   UniqueID  4 non-null      object

I only focus on the two object columns which are job and UniqueID.
As you can see:

  • job contains only strings
  • UniqueID contains both strings and numbers

I want to be able to identify the column (in this case UniqueID) that contains both strings and numbers.

If I use the following code for UniqueID:

print(df['UniqueID'].str.isalnum())

0    True
1    True
2    True
3    True

I see that it returns True for all records, which is great. Now, if I use the same code for job, I get the same results:

print(df['job'].str.isalnum())

    0    True
    1    True
    2    True
    3    True

So, how can I identify in pandas which column that contains both strings and numbers (in this example: UniqueID)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

兔小萌 2025-01-26 20:36:53

您可以定义自己的函数

def findchrandnum(x):
    try :
        return all(x.str.isalnum() & ~x.str.isalpha() & ~x.str.isdigit())
    except:
        return False
df.apply(findchrandnum)
Out[66]: 
ltv         False
age         False
job         False
UniqueID     True
dtype: bool

You can def your own function

def findchrandnum(x):
    try :
        return all(x.str.isalnum() & ~x.str.isalpha() & ~x.str.isdigit())
    except:
        return False
df.apply(findchrandnum)
Out[66]: 
ltv         False
age         False
job         False
UniqueID     True
dtype: bool
天冷不及心凉 2025-01-26 20:36:53

您可以对要检查的列使用 apply 方法,以查找每行的数字。总和将为您提供该列中具有数字的值的数量:

col = 'UniqueID'
df[col].apply(
    lambda val: any(ch.isdigit() for ch in val)
).sum()

如果您知道列中的值是一致的,您也可以仅检查第一个值。

You can use the apply method to the column you want to check, to look for digits for each row. The sum will give you the number of values that have a digit in that column:

col = 'UniqueID'
df[col].apply(
    lambda val: any(ch.isdigit() for ch in val)
).sum()

If you know that your values in the columns are consistent, you can also check the first value only.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文