为什么使用CSV或XLSX数据时，Pandas图看起来有所不同？

发布于 2025-01-21 03:20:58 字数 1568 浏览 0 评论 0原文

我有两个具有完全相同数据的数据集，但以相同方式绘制时它们看起来不同。一种是 .xlsx 文件，一种是 .csv 文件。

这是两个代码：对于 CSV：

import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.cluster import KMeans

daten = pd.read_csv(r"Path\Übungsdaten.csv", header=0, sep=";")


print("Total rows: {0}".format(len(daten)))
print(daten.columns)

plt.scatter(daten['InsuredValue'], daten['Policy'])
plt.xlim(2500000)
plt.ylim(100100)
plt.show()

对于 xlsx：


import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.cluster import KMeans

daten = pd.read_excel(r"Path\Übungsdaten.xlsx")


print("Total rows: {0}".format(len(daten)))

plt.scatter(daten['InsuredValue'],daten['Policy'] )

plt.xlim(2500000)
plt.ylim(100100)
plt.show()

这是两个图：

csv 和 plt.xlim(2500000) plt.ylim(100100)

和无限制的 csv：

最后是 .xlsx 图：

我的问题首先是，为什么会有一个前两个图底部的黑条？（我猜这是“InsuredValue”的每个值）以及如何将 csv plo 形成为与 xlsx 图相同的比率？

非常感谢

原文

i've got two datasets with the exact same data but they look different when plotted the same way. One is a .xlsx file and one is a .csv file.

Here are the two codes:
For the CSV:

import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.cluster import KMeans

daten = pd.read_csv(r"Path\Übungsdaten.csv", header=0, sep=";")


print("Total rows: {0}".format(len(daten)))
print(daten.columns)

plt.scatter(daten['InsuredValue'], daten['Policy'])
plt.xlim(2500000)
plt.ylim(100100)
plt.show()

And for the xlsx:


import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.cluster import KMeans

daten = pd.read_excel(r"Path\Übungsdaten.xlsx")


print("Total rows: {0}".format(len(daten)))

plt.scatter(daten['InsuredValue'],daten['Policy'] )

plt.xlim(2500000)
plt.ylim(100100)
plt.show()

Here are the two Plots:

csv with plt.xlim(2500000) plt.ylim(100100)

and the csv without restrictions:

and finally the .xlsx plot:

My question is first of all, why is there a black bar on the bottom of the first two plots? (im guessing this is every single value of "InsuredValue") and how can I form the csv plo to the same ratio as the xlsx plot?

Thank you very much

分享到QQ

分享到微博