Searching for Activation Functions
Prajit Ramachandran,Barret Zoph,Quoc V. Le
2018 · DBLP: conf/iclr/RamachandranZL18
arXiv.org · 3,492 Citations
TLDR
The experiments show that the best discovered activation function, f(x) = x \cdot \text{sigmoid}(\beta x)$, which is named Swish, tends to work better than ReLU on deeper models across a number of challenging datasets.
Cited Papers
Citing Papers
