Given a 1D array of indices:
a = array([1, 0, 3])
I want to onehot encode this as a 2D array:
b = array([[0,1,0,0], [1,0,0,0], [0,0,0,1]])
0
Create a zeroed array b
with enough columns, i.e. a.max() + 1
.
Then, for each row i
, set the a[i]
th column to 1
.
>>> a = np.array([1, 0, 3])
>>> b = np.zeros((a.size, a.max() + 1))
>>> b[np.arange(a.size), a] = 1
>>> b
array([[ 0., 1., 0., 0.],
[ 1., 0., 0., 0.],
[ 0., 0., 0., 1.]])
6
 14
@JamesAtwood it depends on the application but I’d make the max a parameter and not calculate it from the data.
Feb 8, 2016 at 20:40
 8
 14
Can anyone point to an explanation of why this works, but the slice with [:, a] does not?
– N. McA.Feb 16, 2018 at 19:40
 4
@ A.D. Solution for the 2d > 3d case: stackoverflow.com/questions/36960320/…
Sep 29, 2018 at 2:37
>>> values = [1, 0, 3]
>>> n_values = np.max(values) + 1
>>> np.eye(n_values)[values]
array([[ 0., 1., 0., 0.],
[ 1., 0., 0., 0.],
[ 0., 0., 0., 1.]])
6
 15
This solution is the only one useful for an input ND matrix to onehot N+1D matrix. Example: input_matrix=np.asarray([[0,1,1] , [1,1,2]]) ; np.eye(3)[input_matrix] # output 3D tensor
– IsaíasMar 21, 2017 at 16:06
 10
+1 because this should be preferred over the accepted solution. For a more general solution though,
values
should be a Numpy array rather than a Python list, then it works in all dimensions, not only in 1D.– AlexOct 21, 2017 at 20:32
 14
Note that taking
np.max(values) + 1
as number of buckets might not be desirable if your data set is say randomly sampled and just by chance it may not contain max value. Number of buckets should be rather a parameter and assertion/check can be in place to check that each value is within 0 (incl) and buckets count (excl).Jan 19, 2018 at 3:46
 3
To me this solution is the best and can be easily generalized to any tensor: def one_hot(x, depth=10): return np.eye(depth)[x]. Note that giving the tensor x as index returns a tensor of x.shape eye rows.
Mar 27, 2018 at 7:37
 9
Easy way to “understand” this solution and why it works for Ndims (without reading
numpy
docs): at each location in the original matrix (values
), we have an integerk
, and we “put” the 1hot vectoreye(n)[k]
in that location. This adds a dimension because we’re “putting” a vector in the location of a scalar in the original matrix.– avivrSep 24, 2019 at 14:08
In case you are using keras, there is a built in utility for that:
from keras.utils.np_utils import to_categorical
categorical_labels = to_categorical(int_labels, num_classes=3)
And it does pretty much the same as @YXD’s answer (see sourcecode).
