UPDF AI

Searching for Activation Functions

Prajit Ramachandran,Barret Zoph,Quoc V. Le

2018 · DBLP: conf/iclr/RamachandranZL18
arXiv.org · 3,492 Citations

TLDR

The experiments show that the best discovered activation function, f(x) = x \cdot \text{sigmoid}(\beta x)$, which is named Swish, tends to work better than ReLU on deeper models across a number of challenging datasets.

Cited Papers
Citing Papers