python python-3.x scikit-learn

How to clone an scikit-learn estimator including its data?

I am attempting to perform a partial fit of on an naive-bayes estimator but also retain a copy of the estimator prior to the partial fit. sklearn.base.clone only clones an estimators parameters, not it’s data, so is not useful in this case. Performing a partial fit on the clone only uses the data added during the partial fit, since the clone is effectively empty.

from sklearn.naive_bayes import MultinomialNB
model = MultinomialNB()
fit_model =,np.array(y))
fit_model2 = model.partial_fit = (np.array(Z),np.array(w)),np.unique(y))

In the above example fit_model and fit_model2 will be the same since they both point to the same object. I would like to retain the original copy unaltered. My workaround is to pickle the original and load it into a new object to perform a partial fit on. Like this:

model = MultinomialNB()
fit_model =,np.array(y))
import pickle
with open('saved_model', 'wb') as f:
pickle.dump([model], f)
with open('saved_model', 'rb') as f:
[model2] = pickle.load(f)
fit_model2 = model2.partial_fit(np.array(Z),np.array(w)),np.unique(y))

Also I can completely refit with the new data each time, but since I need to perform this thousands of times I’m trying to find something more efficient.

  1. returns the model itself (the same object). So you don’t have to assign it to a different variable as it’s just aliasing.

  2. You can use deepcopy to copy the object in a similar way to what loading a pickled object does.

So if you do something like:

from copy import deepcopy
model = MultinomialNB(), np.array(y))
model2 = deepcopy(model)
model2.partial_fit(np.array(Z),np.array(w)), np.unique(y))
# ...

model2 will be a distinct object, with the copied parameters of model, including the “trained” parameters.