Python 多处理代码无限运行

发布于 2025-01-09 05:32:08 字数 2060 浏览 0 评论 0原文

我正在尝试使用 sklearn 和 python 的内置多处理库同时训练 2 个模型。

def train_model(model, X, y):
    model.fit(X, y)
    return model

from multiprocessing import Process

p1 = Process(target = train_model, args = (dt, X_train, y_train))
p2 = Process(target = train_model, args = (lr, X_train, y_train))

p1.start()
p2.start()

p1.join()
p2.join()

然而，运行这段代码后，它会继续无限运行。单独训练两个模型不会花费超过几秒钟的时间。

如果我的方法是错误的，我如何并行训练两个模型？

编辑：Python版本是3.8.0。我在 Windows 10 上的 Jupyter Notebook 上运行此代码。

编辑 2：问题似乎出在 Jupyter Notebook 上。相同的代码在 Google Colab 上运行没有任何问题。

编辑 3：我现在尝试使用我的终端运行此代码

dt = DecisionTreeClassifier(class_weight='balanced')
lr = LogisticRegression(class_weight='balanced')


def train_model(model, X, y):
    model.fit(X, y)
    return model


p1 = Process(target=train_model, args=(dt, X_train, y_train))
p2 = Process(target=train_model, args=((lr, X_train, y_train)))

if __name__ == '__main__':
    p1.start()
    p2.start()
    p1.join()
    p2.join()

    dt_pred = dt.predict(X_test)
    lr_pred = lr.predict(X_test)

    print("Classification report for Decision Tree:",classification_report(y_test,dt_pred))
    print("Classification report for Logistic Regression", classification_report(y_test, lr_pred))

并收到以下错误

Traceback (most recent call last):
  File "D:/Bennett/HPC/E19CSE058_Lab3/E19CSE058_Lab3_Pt2.py", line 33, in <module>
    dt_pred = dt.predict(X_test)
  File "E:\Anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 436, in predict
    check_is_fitted(self)
  File "E:\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "E:\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 1041, in check_is_fitted
    raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.exceptions.NotFittedError: This DecisionTreeClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

似乎通过多处理完成的训练没有反映在进程之外。我该如何应对？

原文

I am trying to train 2 models concurrently using sklearn and python's built-in multiprocessing library.

def train_model(model, X, y):
    model.fit(X, y)
    return model

from multiprocessing import Process

p1 = Process(target = train_model, args = (dt, X_train, y_train))
p2 = Process(target = train_model, args = (lr, X_train, y_train))

p1.start()
p2.start()

p1.join()
p2.join()

However, upon running this piece of code it continues to run infinitely. Training the two models individually doesn't take longer than a few seconds.

If my approach is wrong, how do I train 2 models parallelly?

Edit: Python version is 3.8.0. I am running this code on Jupyter Notebook on Windows 10.

Edit 2: The problem seems to lie with Jupyter Notebook. The same code runs without any problem on Google Colab.

Edit 3: I am now trying to run this code using my terminal

dt = DecisionTreeClassifier(class_weight='balanced')
lr = LogisticRegression(class_weight='balanced')


def train_model(model, X, y):
    model.fit(X, y)
    return model


p1 = Process(target=train_model, args=(dt, X_train, y_train))
p2 = Process(target=train_model, args=((lr, X_train, y_train)))

if __name__ == '__main__':
    p1.start()
    p2.start()
    p1.join()
    p2.join()

    dt_pred = dt.predict(X_test)
    lr_pred = lr.predict(X_test)

    print("Classification report for Decision Tree:",classification_report(y_test,dt_pred))
    print("Classification report for Logistic Regression", classification_report(y_test, lr_pred))

and get the following error

Traceback (most recent call last):
  File "D:/Bennett/HPC/E19CSE058_Lab3/E19CSE058_Lab3_Pt2.py", line 33, in <module>
    dt_pred = dt.predict(X_test)
  File "E:\Anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 436, in predict
    check_is_fitted(self)
  File "E:\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "E:\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 1041, in check_is_fitted
    raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.exceptions.NotFittedError: This DecisionTreeClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

It seems the training done through multiprocessing isn't being reflected outside the processes. How do I counter this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

眉目亦如画i 2025-01-16 05:32:09

亚伦有正确的答案。在 Windows 上，每个进程从头开始运行脚本，这将启动另外两个进程，每个进程再启动两个进程，依此类推。任何必须仅在主进程中运行的内容都需要受到 的保护“__main__” 测试：

from multiprocessing import Process

def train_model(model, X, y):
    model.fit(X, y)
    return model

def main():
    p1 = Process(target = train_model, args = (dt, X_train, y_train))
    p2 = Process(target = train_model, args = (lr, X_train, y_train))

    p1.start()
    p2.start()

    p1.join()
    p2.join()

if __name__ == "__main__":
    main()

Aaron has the right answer. On Windows, each process starts running your script over from the beginning, which will launch two more processes, each of which launches two more processes, etc. Anything that must be run ONLY in the master process needs to be protected by the "__main__" test:

from multiprocessing import Process

def train_model(model, X, y):
    model.fit(X, y)
    return model

def main():
    p1 = Process(target = train_model, args = (dt, X_train, y_train))
    p2 = Process(target = train_model, args = (lr, X_train, y_train))

    p1.start()
    p2.start()

    p1.join()
    p2.join()

if __name__ == "__main__":
    main()

回复收藏 0 原文

~没有更多了~