Softmax is an activation function for multi-layer perceptrons (MLPs). It is a function which gets applied to a vector in \(\mathbb{x} \in \mathbb{R}^K\) and returns a vector in \([0, 1]^K\) with the property that the sum of all elements is 1:
$$\varphi(\mathbb{x})_j = \frac{e^{x_j}}{\sum_{k=1}^K e^{x_k}} \;\;\;\text{ for } j=1, \dots, K$$
Python implementation
The implementation is straight forward:
#! /usr/bin/env python
import numpy
def softmax(w):
"""Calculate the softmax of a list of numbers w.
Parameters
----------
w : list of numbers
Return
------
a list of the same length as w of non-negative numbers
Examples
--------
>>> softmax([0.1, 0.2])
array([ 0.47502081, 0.52497919])
>>> softmax([-0.1, 0.2])
array([ 0.42555748, 0.57444252])
>>> softmax([0.9, -10])
array([ 9.99981542e-01, 1.84578933e-05])
>>> softmax([0, 10])
array([ 4.53978687e-05, 9.99954602e-01])
"""
e = numpy.exp(numpy.array(w))
dist = e / numpy.sum(e)
return dist
if __name__ == "__main__":
import doctest
doctest.testmod()
Short analysis
One obvious property of the softmax function is that the sum of all elements is one due to the normalization in the denominator.
By printing the following you can see that values below 1 get closer together and elements above 1 get farer away.
def percentage(before):
before = numpy.array(before)
after = softmax(before)
print("Before: %s" % str(before / before[-1]))
print("After: %s" % str(after / after[-1]))
print("-" * 60)
experiments = []
experiments.append([1, 1.1, 1.1])
experiments.append([1, 2, 3])
experiments.append([0.6, 0.7, 0.8])
for a in experiments:
a = sorted(a, reverse=True)
percentage(a)
gives
Before: [ 1.1 1.1 1. ]
After: [ 1.10517092 1.10517092 1. ]
------------------------------------------------------------
Before: [3 2 1]
After: [ 7.3890561 2.71828183 1. ]
------------------------------------------------------------
Before: [ 1.33333333 1.16666667 1. ]
After: [ 1.22140276 1.10517092 1. ]
------------------------------------------------------------