• Martin Thoma
  • Home
  • Categories
  • Tags
  • Archives
  • Support me

Softmax

Contents

  • Python implementation
  • Short analysis

Softmax is an activation function for multi-layer perceptrons (MLPs). It is a function which gets applied to a vector in $\mathbb{x} \in \mathbb{R}^K$ and returns a vector in $[0, 1]^K$ with the property that the sum of all elements is 1:

$$\varphi(\mathbb{x})_j = \frac{e^{x_j}}{\sum_{k=1}^K e^{x_k}} \;\;\;\text{ for } j=1, \dots, K$$

Python implementation

The implementation is straight forward:

#! /usr/bin/env python

import numpy


def softmax(w):
    """Calculate the softmax of a list of numbers w.

    Parameters
    ----------
    w : list of numbers

    Return
    ------
    a list of the same length as w of non-negative numbers

    Examples
    --------
    >>> softmax([0.1, 0.2])
    array([ 0.47502081,  0.52497919])
    >>> softmax([-0.1, 0.2])
    array([ 0.42555748,  0.57444252])
    >>> softmax([0.9, -10])
    array([  9.99981542e-01,   1.84578933e-05])
    >>> softmax([0, 10])
    array([  4.53978687e-05,   9.99954602e-01])
    """
    e = numpy.exp(numpy.array(w))
    dist = e / numpy.sum(e)
    return dist


if __name__ == "__main__":
    import doctest

    doctest.testmod()

Short analysis

One obvious property of the softmax function is that the sum of all elements is one due to the normalization in the denominator.

By printing the following you can see that values below 1 get closer together and elements above 1 get farer away.

def percentage(before):
    before = numpy.array(before)
    after = softmax(before)
    print("Before: %s" % str(before / before[-1]))
    print("After:  %s" % str(after / after[-1]))
    print("-" * 60)


experiments = []
experiments.append([1, 1.1, 1.1])
experiments.append([1, 2, 3])
experiments.append([0.6, 0.7, 0.8])

for a in experiments:
    a = sorted(a, reverse=True)
    percentage(a)

gives

Before: [ 1.1  1.1  1. ]
After:  [ 1.10517092  1.10517092  1.        ]
------------------------------------------------------------
Before: [3 2 1]
After:  [ 7.3890561   2.71828183  1.        ]
------------------------------------------------------------
Before: [ 1.33333333  1.16666667  1.        ]
After:  [ 1.22140276  1.10517092  1.        ]
------------------------------------------------------------

Published

Feb 9, 2016
by Martin Thoma

Category

Machine Learning

Tags

  • Activation Functions 1
  • Machine Learning 81
  • Neural Networks 7

Contact

  • Martin Thoma - A blog about Code, the Web and Cyberculture
  • E-mail subscription
  • RSS-Feed
  • Privacy/Datenschutzerklärung
  • Impressum
  • Powered by Pelican. Theme: Elegant by Talha Mansoor