如果列中的行具有“url”或“http”，如何删除列在不知道列名的情况下？

发布于 2025-01-16 08:50:29 字数 508 浏览 0 评论 0原文

如何删除 A 列，因为它在 python 中具有以下“https://”？

背景故事：我有一个 500 列的数据框，其中 250 列是行中的“https://”链接，解释了先前的变量是什么。

目标是循环遍历 df 以删除具有“http://”

A	B
https://mywebsite	25
https://mywebsite	42 的

列我想要的输出是：

B
25
42

原文

How can drop column A because it has the following "https://" in python?

Back story: I have a 500 column Data Frame where 250 columns are "https://" links in the rows explaining what the prior variable is.

The goal is to loop through the df to drop columns that have "http://"

A	B
https://mywebsite	25
https://mywebsite	42

My desired output is:

B
25
42

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一百个冬季 2025-01-23 08:50:29

以下代码片段应该可以工作，删除包含网址的任何列：

to_drop = []
for column in df:
  try:
      has_url = df[column].str.startswith('https://').any()
  except AttributeError:
      pass  # dtype is not string
  
  if has_url:
      to_drop.append(column)

df.drop(columns=to_drop, inplace=True)

对于每一列，它检查每行是否以“https://”开头。如果其中任何一个这样做，那么它们将被添加到要删除的列的“to_drop”列表中。然后该列表中的列将被放置到位。

仅当其中至少 50% 的值是网址时，以下版本才会删除该列：

to_drop = []
for column in df:
  try:
      has_url = df[column].str.startswith('https://').mean() > 0.5
  except AttributeError:
      pass  # dtype is not string
  
  if has_url:
      to_drop.append(column)

df.drop(columns=to_drop, inplace=True)

您可以将 0.5 更改为 0 之间的其他数字> 和 1 来更改 URL 的百分比大小，以便删除该列。

The following code snippet should work, removing any columns that contain urls:

to_drop = []
for column in df:
  try:
      has_url = df[column].str.startswith('https://').any()
  except AttributeError:
      pass  # dtype is not string
  
  if has_url:
      to_drop.append(column)

df.drop(columns=to_drop, inplace=True)

For each column, it checks whether each row starts with 'https://'. If any of them do, then they are added to a 'to_drop' list of columns to drop. Then the columns in this list are dropped inplace.

The version below will only drop a column if at least 50% of the values in it are URLs:

to_drop = []
for column in df:
  try:
      has_url = df[column].str.startswith('https://').mean() > 0.5
  except AttributeError:
      pass  # dtype is not string
  
  if has_url:
      to_drop.append(column)

df.drop(columns=to_drop, inplace=True)

You can change 0.5 to another number between 0 and 1 to change how big of a percentage should be URLs in order for the column to be dropped.

回复收藏 0 原文

~没有更多了~