# Model
from sklearn.ensemble import IsolationForest

# Saving file
import joblib

# Data
import numpy as np

# Create a new model
model = IsolationForest()

# Generate some old data
df1 = np.random.randint(1,100,(100,10))
# Train the model
model.fit(df1)

# Save it off
joblib.dump(model, 'isf_model.joblib')

# Load the model
model = joblib.load('isf_model.joblib')

# Generate new data
df2 = np.random.randint(1,500,(1000,10))

# If the original data is now not important, I can just call .fit() again.
# If you are using time-series based data, this is preferred, as older data may not be representative of the current state
model.fit(df2)

# If the original data is important, I can simply join the old data to new data. There are multiple options for this:
# Pandas: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
# Numpy: https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html

combined_data = np.concatenate((df1, df2))
model.fit(combined_data)

You can simply reuse the .fit() call available to the estimator on the new data.

This would be preferred, especially in a time series, as the signal changes and you do not want older, non-representative data to be understood as potentially normal (or anomalous).

If old data is important, you can simply join the older training data and newer input signal data together, and then call .fit() again.

Also sidenote, according to sklearn documentation, it is better to use joblib than pickle

An MRE with resources below:

# Model
from sklearn.ensemble import IsolationForest

# Saving file
import joblib

# Data
import numpy as np

# Create a new model
model = IsolationForest()

# Generate some old data
df1 = np.random.randint(1,100,(100,10))
# Train the model
model.fit(df1)

# Save it off
joblib.dump(model, 'isf_model.joblib')

# Load the model
model = joblib.load('isf_model.joblib')

# Generate new data
df2 = np.random.randint(1,500,(1000,10))

# If the original data is now not important, I can just call .fit() again.
# If you are using time-series based data, this is preferred, as older data may not be representative of the current state
model.fit(df2)

# If the original data is important, I can simply join the old data to new data. There are multiple options for this:
# Pandas: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
# Numpy: https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html

combined_data = np.concatenate((df1, df2))
model.fit(combined_data)

回复收藏 0 原文

~没有更多了~