The main goals are as follows:
StandardScalerto continuous variables
OnehotEncoderto categorical variables
The continuous variables need to be scaled, but at the same time, a couple of categorical variables are also of integer type. Applying
StandardScaler would result in undesired effects.
On the flip side, the
StandardScaler would scale the integer based categorical variables, which is also not what we want.
Since continuous variables and categorical ones are mixed in a single
Pandas DataFrame, what’s the recommended workflow to approach this kind of problem?
The best example to illustrate my point is the Kaggle Bike Sharing Demand dataset, where
weather are integer categorical variables