检查熊猫数据框中的重复项

发布于 2025-02-10 22:02:05 字数 1657 浏览 1 评论 0原文

import pandas as pd
from io import StringIO
import requests
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import make_interp_spline
url = 'https://m-selig.ae.illinois.edu/ads/coord/b737a.dat'
response = requests.get(url).text

lines = []
for idx, line in enumerate(response.split('\n'), start=1):
    if all([x.replace('.','').replace('-','').isdecimal() for x in line.split()]):
        lines.append(line)

lines = [x.split() for x in lines]
df = pd.DataFrame(lines)
df = df.dropna(axis=0)
df = df.astype(float)
df = df[~(df > 1).any(1)]
print(df)

输出...

         0       1
2   0.0000  0.0177
3   0.0023  0.0309
4   0.0050  0.0372
5   0.0076  0.0415
6   0.0143  0.0499
7   0.0249  0.0582
8   0.0495  0.0730
9   0.0740  0.0814
10  0.0990  0.0866
11  0.1530  0.0907
12  0.1961  0.0905
13  0.2504  0.0887
14  0.3094  0.0858
15  0.3520  0.0833
16  0.3919  0.0804
17  0.4477  0.0756
18  0.5034  0.0696
19  0.5593  0.0626
20  0.5965  0.0575
21  0.6488  0.0498
22  0.8351  0.0224
23  0.9109  0.0132
24  1.0000  0.0003
26  0.0000  0.0177
27  0.0022  0.0038
28  0.0049 -0.0018
29  0.0072 -0.0053
30  0.0119 -0.0106
31  0.0243 -0.0204
32  0.0486 -0.0342
33  0.0716 -0.0457
34  0.0979 -0.0516
35  0.1488 -0.0607
36  0.1953 -0.0632
37  0.2501 -0.0632
38  0.2945 -0.0626
39  0.3579 -0.0610
40  0.3965 -0.0595
41  0.4543 -0.0563
42  0.5050 -0.0527
43  0.5556 -0.0482
44  0.6063 -0.0427
45  0.6485 -0.0375
46  0.8317 -0.0149
47  0.9410 -0.0053
48  1.0000 -0.0003

这是我要刮取数据的网站的代码。我遇到了一个问题,即x点从零开始,上升,然后回到零,在绘图中间创建一条线,我不需要。 请注意,在第2和26行上有两个df [0] = 0,如何在检测重复的地方编写代码?

import pandas as pd
from io import StringIO
import requests
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import make_interp_spline
url = 'https://m-selig.ae.illinois.edu/ads/coord/b737a.dat'
response = requests.get(url).text

lines = []
for idx, line in enumerate(response.split('\n'), start=1):
    if all([x.replace('.','').replace('-','').isdecimal() for x in line.split()]):
        lines.append(line)

lines = [x.split() for x in lines]
df = pd.DataFrame(lines)
df = df.dropna(axis=0)
df = df.astype(float)
df = df[~(df > 1).any(1)]
print(df)

output...

         0       1
2   0.0000  0.0177
3   0.0023  0.0309
4   0.0050  0.0372
5   0.0076  0.0415
6   0.0143  0.0499
7   0.0249  0.0582
8   0.0495  0.0730
9   0.0740  0.0814
10  0.0990  0.0866
11  0.1530  0.0907
12  0.1961  0.0905
13  0.2504  0.0887
14  0.3094  0.0858
15  0.3520  0.0833
16  0.3919  0.0804
17  0.4477  0.0756
18  0.5034  0.0696
19  0.5593  0.0626
20  0.5965  0.0575
21  0.6488  0.0498
22  0.8351  0.0224
23  0.9109  0.0132
24  1.0000  0.0003
26  0.0000  0.0177
27  0.0022  0.0038
28  0.0049 -0.0018
29  0.0072 -0.0053
30  0.0119 -0.0106
31  0.0243 -0.0204
32  0.0486 -0.0342
33  0.0716 -0.0457
34  0.0979 -0.0516
35  0.1488 -0.0607
36  0.1953 -0.0632
37  0.2501 -0.0632
38  0.2945 -0.0626
39  0.3579 -0.0610
40  0.3965 -0.0595
41  0.4543 -0.0563
42  0.5050 -0.0527
43  0.5556 -0.0482
44  0.6063 -0.0427
45  0.6485 -0.0375
46  0.8317 -0.0149
47  0.9410 -0.0053
48  1.0000 -0.0003

This is my code for a website I'm scraping data from. I'm running into a problem where the x points start from zero, go up, and come back down to zero creating a line in the middle of the plot which I don't need.
Notice how there is two df[0] = 0 on rows 2 and 26, How can I write a code where it detects duplicates?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

看海 2025-02-17 22:02:06

尝试以下一项?

从循环内的循环

df1=df.drop_duplicates(keep='first', inplace=False, ignore_index=False)

lines = []
lines1 = []
for idx, line in enumerate(response.split('\n'), start=1):
if all([x.replace('.','').replace('-','').isdecimal() for x in line.split()]):
        if  not (line in lines1): lines.append(line)
        lines1.append(line)

Try one of the following?

Out of the loop

df1=df.drop_duplicates(keep='first', inplace=False, ignore_index=False)

Inside your loop

lines = []
lines1 = []
for idx, line in enumerate(response.split('\n'), start=1):
if all([x.replace('.','').replace('-','').isdecimal() for x in line.split()]):
        if  not (line in lines1): lines.append(line)
        lines1.append(line)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文