Categorical_encoders
categorical_encoders包有多种不同的编码技术可以把类别变量转换为数值型变量。
使用Anaconda Prompt进行安装:
1
| pip install categorical_encoders
|
BinaryEncoder()
进行0-1转换
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
| import category_encoders as ce from sklearn.datasets import load_boston
bunch = load_boston() y = bunch.target X = pd.DataFrame(bunch.data, columns=bunch.feature_names) X.head()
ohe = ce.BinaryEncoder(cols=['CHAS', 'RAD'], handle_unknown='indicator').fit(X, y) numeric_dataset = ohe.transform(X) numeric_dataset.info()
numeric_dataset[['CHAS_0', 'CHAS_1', 'CHAS_2']].head()
numeric_dataset[['RAD_0', 'RAD_1', 'RAD_2', 'RAD_3', 'RAD_4']].head()
|
OneHotEncoder()
进行One-Hot Encoding
OneHotEncoder(verbose=0, cols=None, drop_invariant=False, return_df=True, handle_missing='value', handle_unknown='value', use_cat_names=False)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
| import category_encoders as ce from sklearn.datasets import load_boston
bunch = load_boston() y = bunch.target X = pd.DataFrame(bunch.data, columns=bunch.feature_names) X.head()
ohe = ce.OneHotEncoder(cols=['CHAS', 'RAD'], handle_unknown='indicator').fit(X, y) numeric_dataset = ohe.transform(X) numeric_dataset.info()
numeric_dataset.head()
|
参考资料