A standard sigmoid function used in machine learning is the logistic function

$\sigma \left(x\right)=\frac{1}{1+{e}^{-x}}$

Part of the reason for its use is the simplicity of its first derivative:

${\sigma}^{\prime}=\frac{{e}^{-x}}{(1+{e}^{-x}{)}^{2}}=\frac{1+{e}^{-x}-1}{(1+{e}^{-x}{)}^{2}}=\sigma -{\sigma}^{2}=\sigma (1-\sigma )$

To evaluate higher-order derivatives, assume an expression of the form

${\sigma}^{\left(n\right)}=\sum _{k=1}^{n+1}{c}_{n,k}{\sigma}^{k}$

with ${c}_{0,1}\equiv 1$ to return the function itself when no derivative is taken. Since the expression always contains a linear term, the next derivative is

$\begin{array}{l}{\sigma}^{(n+1)}={\sigma}^{\prime}\sum _{k=1}^{n+1}k{c}_{n,k}{\sigma}^{k-1}\\ \phantom{{\sigma}^{(n+1)}}=\sum _{k=1}^{n+1}k{c}_{n,k}[{\sigma}^{k}-{\sigma}^{k+1}]\\ \phantom{{\sigma}^{(n+1)}}=\sum _{k=1}^{n+1}k{c}_{n,k}{\sigma}^{k}-\sum _{k=2}^{n+2}(k-1){c}_{n,k-1}{\sigma}^{k}\\ {\sigma}^{(n+1)}={c}_{n,1}\sigma -(n+1){c}_{n,n+1}{\sigma}^{n+2}\\ \phantom{\rule{5em}{0ex}}+\sum _{k=2}^{n+1}[k{c}_{n,k}-(k-1\left){c}_{n,k-1}\right]{\sigma}^{k}\equiv \sum _{k=1}^{n+2}{c}_{n+1,k}{\sigma}^{k}\end{array}$

where terms in each sum with indices not included in the other sum have been separated. Comparing these separated terms with the first and last terms on the right-hand side gives

${c}_{n+1,1}={c}_{n,1}\phantom{\rule{5em}{0ex}}{c}_{n+1,n+2}=-(n+1){c}_{n,n+1}$

The left-hand expression here indicates that all coefficients for $k=1$ are equal. With the initial value already assumed for consistency with not taking a derivative, this means ${c}_{n,1}=1$ .

The remaining right-hand expression indicates that there is a change in sign and an additional numerical factor every time either *n* or *k* increases. Inspecting the pair of terms in brackets in the last line of the derivative evaluation, the index that changes is not *n* but *k*. Assuming the latter is responsible for the behavior of the right-hand expression, one can take

${c}_{n,k}=(-1{)}^{k+1}(k-1)!\phantom{\rule{.3em}{0ex}}S(n+1,k)$

where the remaining functional behavior will be determined by comparing the remaining terms of the derivative evaluation. Canceling common factors, this gives

$S(n+2,k)=kS(n+1,k)+S(n+1,k-1)$

This is the recursion relation for Stirling numbers of the second kind, quantities well known in combinatorics and number theory. Explicit values are available online as OEIS A008277. The offset in the first index is necessary due to how Stirling numbers are defined.

The final expression for the arbitrary multiple derivative of the sigmoid function is thus

${\sigma}^{\left(n\right)}=\sum _{k=1}^{n+1}(-1{)}^{k+1}(k-1)!\phantom{\rule{.3em}{0ex}}S(n+1,k){\sigma}^{k}$

This result is consistent with the evaluation by Minai and Williams. Explicit values of the coefficients can also be found online as OEIS A163626.

These derivatives find application in using neural networks to solve differential equations.

The results of this presentation are unchanged if the function is taken with a either a positive or negative sign in the denominator,

$\sigma \left(x\right)=\frac{1}{1\pm {e}^{-x}}$

because the first derivative remains the same:

${\sigma}^{\prime}=\pm \frac{{e}^{-x}}{(1\pm {e}^{-x}{)}^{2}}=\frac{1\pm {e}^{-x}-1}{(1\pm {e}^{-x}{)}^{2}}=\sigma -{\sigma}^{2}=\sigma (1-\sigma )$

One can in fact use any positive or negative amount as a multiplicative factor in the denominator, since it arises as a constant of integration in solving the differential equation:

$\begin{array}{c}\frac{{\sigma}^{\prime}}{\sigma (1-\sigma )}=\frac{{\sigma}^{\prime}}{\sigma}+\frac{{\sigma}^{\prime}}{1-\sigma}=1\\ ln\sigma -ln(1-\sigma )=x-lnc\\ \frac{\sigma}{1-\sigma}=\frac{{e}^{x}}{c}\phantom{\rule{1em}{0ex}}\to \phantom{\rule{1em}{0ex}}\sigma =\frac{1}{1+c{e}^{-x}}\end{array}$

Curiouser and curiouser...

*Uploaded 2020.02.22 — Updated 2020.06.13*
analyticphysics.com