保罗需要一台足够快的笔记本电脑。他必须关注的计算机的主要参数之一是CPU。在这个项目中,我们需要预测CPU的性能,该项目以周期时间和内存能力等为特征。
。
这是线性回归问题,您应该预测估计的相对性能列。
我是新手Python。有人可以帮助我完成此任务的代码吗?
?我做了。但是可能我不明白此案。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
data = pd.read_csv("Computer_Hardware.csv")
data
data.describe()
y = data["Machine Cycle Time in nanoseconds"]
x1 = data["Estimated Relative Performance"]
plt.scatter(x1,y)
plt.xlabel("Estimated Relative Performance", fontsize = 20)
plt.ylabel("Machine Cycle Time in nanoseconds", fontsize = 20)
plt.show()
x = sm.add_constant(x1)
x = sm.add_constant(x1)
results = sm.OLS(y,x).fit()
results.summary()
Paul need a laptop that is fast enough. One of the main parameter of computers which he must focus on is CPU. In this project we need to forecast performance of CPU which is characterized in terms of cycle time and memory capacity and so on.
It is Linear Regression problem and you should predict the Estimated Relative Performance column.
I am new in Python. Could anybody help me with the code for this task?
CSV file (on Google Drive)
This is what I have done. But probably I did not understand the case.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
data = pd.read_csv("Computer_Hardware.csv")
data
data.describe()
y = data["Machine Cycle Time in nanoseconds"]
x1 = data["Estimated Relative Performance"]
plt.scatter(x1,y)
plt.xlabel("Estimated Relative Performance", fontsize = 20)
plt.ylabel("Machine Cycle Time in nanoseconds", fontsize = 20)
plt.show()
x = sm.add_constant(x1)
x = sm.add_constant(x1)
results = sm.OLS(y,x).fit()
results.summary()
发布评论
评论(3)
在
statsModels
的任何拟合模型中,您都可以使用Method predive()提取预测值,然后将它们添加到框架中。也许您的模型需要更多的工作,目前,它仅使用一个变量,也许您将使用另一个使用更多变量的模型获得更好的预测。
根据文本,“ ...以周期时间和内存能力为特征的CPU等是问题。
建议是使用
statsModels
API扩展您的模型来编写公式。在您的情况下,我喜欢以前删除列名称中的所有空间。In any fitted model from
statsmodels
you can extract predicted values with method predict() and then add them to your frame.Maybe your model needs more work, for now, it only uses a variable and maybe you will get a better prediction with another model that uses more variables.
According to the text "... CPU which is characterized in terms of cycle time and memory capacity and so on" is the problem.
A proposal will be to extend your models using
statsmodels
API to write a formula. In your case I like to remove all spaces in columns names before.如果您对Python没有太多经验,Keras是最容易使用的图书馆之一。这是一个很好的教程:
Keras is one of the easiest libraries to use if you don't have much experience in Python. This is a good tutorial: https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
我研究了有关该主题的一些博客,并带有此代码:
I studied some blogs regarding the topic, and came with this code: