线性回归 - 如何预测估计的相对性能?

发布于 2025-01-27 16:57:43 字数 747 浏览 3 评论 0 原文

保罗需要一台足够快的笔记本电脑。他必须关注的计算机的主要参数之一是CPU。在这个项目中,我们需要预测CPU的性能,该项目以周期时间和内存能力等为特征。

这是线性回归问题,您应该预测估计的相对性能列。

我是新手Python。有人可以帮助我完成此任务的代码吗?

?我做了。但是可能我不明白此案。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

data = pd.read_csv("Computer_Hardware.csv")
data
data.describe()

y = data["Machine Cycle Time in nanoseconds"]
x1 = data["Estimated Relative Performance"]
plt.scatter(x1,y)
plt.xlabel("Estimated Relative Performance", fontsize = 20)
plt.ylabel("Machine Cycle Time in nanoseconds", fontsize = 20)
plt.show()

x = sm.add_constant(x1)
x = sm.add_constant(x1)
results = sm.OLS(y,x).fit()
results.summary()

Paul need a laptop that is fast enough. One of the main parameter of computers which he must focus on is CPU. In this project we need to forecast performance of CPU which is characterized in terms of cycle time and memory capacity and so on.

It is Linear Regression problem and you should predict the Estimated Relative Performance column.

I am new in Python. Could anybody help me with the code for this task?

CSV file (on Google Drive)

This is what I have done. But probably I did not understand the case.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

data = pd.read_csv("Computer_Hardware.csv")
data
data.describe()

y = data["Machine Cycle Time in nanoseconds"]
x1 = data["Estimated Relative Performance"]
plt.scatter(x1,y)
plt.xlabel("Estimated Relative Performance", fontsize = 20)
plt.ylabel("Machine Cycle Time in nanoseconds", fontsize = 20)
plt.show()

x = sm.add_constant(x1)
x = sm.add_constant(x1)
results = sm.OLS(y,x).fit()
results.summary()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

雨的味道风的声音 2025-02-03 16:57:43

statsModels 的任何拟合模型中,您都可以使用Method predive()提取预测值,然后将它们添加到框架中。

data['predicted'] = results.predict()

也许您的模型需要更多的工作,目前,它仅使用一个变量,也许您将使用另一个使用更多变量的模型获得更好的预测。

y = b0 + b1 * x1

根据文本,“ ...以周期时间和内存能力为特征的CPU等是问题。

建议是使用 statsModels API扩展您的模型来编写公式。在您的情况下,我喜欢以前删除列名称中的所有空间。

# Rename columns without spaces
old_columns = data.columns
new_columns = [col.replace(' ', '_') for col in old_columns]
data = data.rename(columns={old:new for old, new in zip(old_columns, new_columns)})

# Fit a model using more variables
import statsmodels.formula.api as sm2

formula = ('Estimated_Relative_Performance ~ ', 
           'Machine_Cycle_Time_in_nanoseconds + ',
           'Maximum_Main_Memory_in_kilobytes + ', 
           'Cache_Memory_in_Kilobytes + ', 
           'Maximum_Channels_in_Units')
formula = ' '.join(formula)
print(formula)

results2 = sm2.ols(formula, data).fit()
results2.summary()

data['predicted2'] = results2.predict()


In any fitted model from statsmodels you can extract predicted values with method predict() and then add them to your frame.

data['predicted'] = results.predict()

enter image description here

Maybe your model needs more work, for now, it only uses a variable and maybe you will get a better prediction with another model that uses more variables.

y = b0 + b1 * x1

According to the text "... CPU which is characterized in terms of cycle time and memory capacity and so on" is the problem.

A proposal will be to extend your models using statsmodels API to write a formula. In your case I like to remove all spaces in columns names before.

# Rename columns without spaces
old_columns = data.columns
new_columns = [col.replace(' ', '_') for col in old_columns]
data = data.rename(columns={old:new for old, new in zip(old_columns, new_columns)})

# Fit a model using more variables
import statsmodels.formula.api as sm2

formula = ('Estimated_Relative_Performance ~ ', 
           'Machine_Cycle_Time_in_nanoseconds + ',
           'Maximum_Main_Memory_in_kilobytes + ', 
           'Cache_Memory_in_Kilobytes + ', 
           'Maximum_Channels_in_Units')
formula = ' '.join(formula)
print(formula)

results2 = sm2.ols(formula, data).fit()
results2.summary()

data['predicted2'] = results2.predict()


仙女山的月亮 2025-02-03 16:57:43

如果您对Python没有太多经验,Keras是最容易使用的图书馆之一。这是一个很好的教程:

Keras is one of the easiest libraries to use if you don't have much experience in Python. This is a good tutorial: https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/

深者入戏 2025-02-03 16:57:43

我研究了有关该主题的一些博客,并带有此代码:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

%matplotlib inline

raw_data = pd.read_csv("Computer_Hardware.csv")

x = raw_data[['Machine Cycle Time in nanoseconds',
       'Minimum Main Memory in Kilobytes', 'Maximum Main Memory in kilobytes',
       'Cache Memory in Kilobytes', 'Minimum Channels in Units',
       'Maximum Channels in Units', 'Published Relative Performance']]

y = raw_data['Estimated Relative Performance']

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3)

from sklearn.linear_model import LinearRegression

model = LinearRegression()

model.fit(x_train, y_train)

print(model.coef_)

print(model.intercept_)

pd.DataFrame(model.coef_, x.columns, columns = ['Coeff'])

predictions = model.predict(x_test)

plt.hist(y_test - predictions)

from sklearn import metrics

metrics.mean_absolute_error(y_test, predictions)

metrics.mean_squared_error(y_test, predictions)

np.sqrt(metrics.mean_squared_error(y_test, predictions))

I studied some blogs regarding the topic, and came with this code:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

%matplotlib inline

raw_data = pd.read_csv("Computer_Hardware.csv")

x = raw_data[['Machine Cycle Time in nanoseconds',
       'Minimum Main Memory in Kilobytes', 'Maximum Main Memory in kilobytes',
       'Cache Memory in Kilobytes', 'Minimum Channels in Units',
       'Maximum Channels in Units', 'Published Relative Performance']]

y = raw_data['Estimated Relative Performance']

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3)

from sklearn.linear_model import LinearRegression

model = LinearRegression()

model.fit(x_train, y_train)

print(model.coef_)

print(model.intercept_)

pd.DataFrame(model.coef_, x.columns, columns = ['Coeff'])

predictions = model.predict(x_test)

plt.hist(y_test - predictions)

from sklearn import metrics

metrics.mean_absolute_error(y_test, predictions)

metrics.mean_squared_error(y_test, predictions)

np.sqrt(metrics.mean_squared_error(y_test, predictions))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文