检查熊猫数据框中的重复项
import pandas as pd
from io import StringIO
import requests
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import make_interp_spline
url = 'https://m-selig.ae.illinois.edu/ads/coord/b737a.dat'
response = requests.get(url).text
lines = []
for idx, line in enumerate(response.split('\n'), start=1):
if all([x.replace('.','').replace('-','').isdecimal() for x in line.split()]):
lines.append(line)
lines = [x.split() for x in lines]
df = pd.DataFrame(lines)
df = df.dropna(axis=0)
df = df.astype(float)
df = df[~(df > 1).any(1)]
print(df)
输出...
0 1
2 0.0000 0.0177
3 0.0023 0.0309
4 0.0050 0.0372
5 0.0076 0.0415
6 0.0143 0.0499
7 0.0249 0.0582
8 0.0495 0.0730
9 0.0740 0.0814
10 0.0990 0.0866
11 0.1530 0.0907
12 0.1961 0.0905
13 0.2504 0.0887
14 0.3094 0.0858
15 0.3520 0.0833
16 0.3919 0.0804
17 0.4477 0.0756
18 0.5034 0.0696
19 0.5593 0.0626
20 0.5965 0.0575
21 0.6488 0.0498
22 0.8351 0.0224
23 0.9109 0.0132
24 1.0000 0.0003
26 0.0000 0.0177
27 0.0022 0.0038
28 0.0049 -0.0018
29 0.0072 -0.0053
30 0.0119 -0.0106
31 0.0243 -0.0204
32 0.0486 -0.0342
33 0.0716 -0.0457
34 0.0979 -0.0516
35 0.1488 -0.0607
36 0.1953 -0.0632
37 0.2501 -0.0632
38 0.2945 -0.0626
39 0.3579 -0.0610
40 0.3965 -0.0595
41 0.4543 -0.0563
42 0.5050 -0.0527
43 0.5556 -0.0482
44 0.6063 -0.0427
45 0.6485 -0.0375
46 0.8317 -0.0149
47 0.9410 -0.0053
48 1.0000 -0.0003
这是我要刮取数据的网站的代码。我遇到了一个问题,即x点从零开始,上升,然后回到零,在绘图中间创建一条线,我不需要。 请注意,在第2和26行上有两个df [0] = 0
,如何在检测重复的地方编写代码?
import pandas as pd
from io import StringIO
import requests
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import make_interp_spline
url = 'https://m-selig.ae.illinois.edu/ads/coord/b737a.dat'
response = requests.get(url).text
lines = []
for idx, line in enumerate(response.split('\n'), start=1):
if all([x.replace('.','').replace('-','').isdecimal() for x in line.split()]):
lines.append(line)
lines = [x.split() for x in lines]
df = pd.DataFrame(lines)
df = df.dropna(axis=0)
df = df.astype(float)
df = df[~(df > 1).any(1)]
print(df)
output...
0 1
2 0.0000 0.0177
3 0.0023 0.0309
4 0.0050 0.0372
5 0.0076 0.0415
6 0.0143 0.0499
7 0.0249 0.0582
8 0.0495 0.0730
9 0.0740 0.0814
10 0.0990 0.0866
11 0.1530 0.0907
12 0.1961 0.0905
13 0.2504 0.0887
14 0.3094 0.0858
15 0.3520 0.0833
16 0.3919 0.0804
17 0.4477 0.0756
18 0.5034 0.0696
19 0.5593 0.0626
20 0.5965 0.0575
21 0.6488 0.0498
22 0.8351 0.0224
23 0.9109 0.0132
24 1.0000 0.0003
26 0.0000 0.0177
27 0.0022 0.0038
28 0.0049 -0.0018
29 0.0072 -0.0053
30 0.0119 -0.0106
31 0.0243 -0.0204
32 0.0486 -0.0342
33 0.0716 -0.0457
34 0.0979 -0.0516
35 0.1488 -0.0607
36 0.1953 -0.0632
37 0.2501 -0.0632
38 0.2945 -0.0626
39 0.3579 -0.0610
40 0.3965 -0.0595
41 0.4543 -0.0563
42 0.5050 -0.0527
43 0.5556 -0.0482
44 0.6063 -0.0427
45 0.6485 -0.0375
46 0.8317 -0.0149
47 0.9410 -0.0053
48 1.0000 -0.0003
This is my code for a website I'm scraping data from. I'm running into a problem where the x points start from zero, go up, and come back down to zero creating a line in the middle of the plot which I don't need.
Notice how there is two df[0] = 0
on rows 2 and 26, How can I write a code where it detects duplicates?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试以下一项?
从循环内的循环
中
Try one of the following?
Out of the loop
Inside your loop