Categories
machine-learning numpy numpy-ndarray one-hot-encoding python

Convert array of indices to one-hot encoded array in NumPy

326

Given a 1D array of indices:

a = array([1, 0, 3])

I want to one-hot encode this as a 2D array:

b = array([[0,1,0,0], [1,0,0,0], [0,0,0,1]])

0

    509

    Create a zeroed array b with enough columns, i.e. a.max() + 1.
    Then, for each row i, set the a[i]th column to 1.

    >>> a = np.array([1, 0, 3])
    >>> b = np.zeros((a.size, a.max() + 1))
    >>> b[np.arange(a.size), a] = 1
    
    >>> b
    array([[ 0.,  1.,  0.,  0.],
           [ 1.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  1.]])
    

    6

    • 14

      @JamesAtwood it depends on the application but I’d make the max a parameter and not calculate it from the data.

      Feb 8, 2016 at 20:40

    • 8

      what if ‘a’ was 2d? and you want a 3-d one-hot matrix?

      – A.D

      Oct 18, 2017 at 22:39

    • 14

      Can anyone point to an explanation of why this works, but the slice with [:, a] does not?

      – N. McA.

      Feb 16, 2018 at 19:40

    • 4

      @ A.D. Solution for the 2d -> 3d case: stackoverflow.com/questions/36960320/…

      Sep 29, 2018 at 2:37

    • You can also use scipy.sparse.

      – mathtick

      Apr 8, 2019 at 20:17

    252

    >>> values = [1, 0, 3]
    >>> n_values = np.max(values) + 1
    >>> np.eye(n_values)[values]
    array([[ 0.,  1.,  0.,  0.],
           [ 1.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  1.]])
    

    6

    • 15

      This solution is the only one useful for an input N-D matrix to one-hot N+1D matrix. Example: input_matrix=np.asarray([[0,1,1] , [1,1,2]]) ; np.eye(3)[input_matrix] # output 3D tensor

      – Isaías

      Mar 21, 2017 at 16:06


    • 10

      +1 because this should be preferred over the accepted solution. For a more general solution though, values should be a Numpy array rather than a Python list, then it works in all dimensions, not only in 1D.

      – Alex

      Oct 21, 2017 at 20:32

    • 14

      Note that taking np.max(values) + 1 as number of buckets might not be desirable if your data set is say randomly sampled and just by chance it may not contain max value. Number of buckets should be rather a parameter and assertion/check can be in place to check that each value is within 0 (incl) and buckets count (excl).

      Jan 19, 2018 at 3:46

    • 3

      To me this solution is the best and can be easily generalized to any tensor: def one_hot(x, depth=10): return np.eye(depth)[x]. Note that giving the tensor x as index returns a tensor of x.shape eye rows.

      Mar 27, 2018 at 7:37


    • 9

      Easy way to “understand” this solution and why it works for N-dims (without reading numpy docs): at each location in the original matrix (values), we have an integer k, and we “put” the 1-hot vector eye(n)[k] in that location. This adds a dimension because we’re “putting” a vector in the location of a scalar in the original matrix.

      – avivr

      Sep 24, 2019 at 14:08

    51

    In case you are using keras, there is a built in utility for that:

    from keras.utils.np_utils import to_categorical   
    
    categorical_labels = to_categorical(int_labels, num_classes=3)
    

    And it does pretty much the same as @YXD’s answer (see source-code).