non-linear한 함수를 활성화하는 방법이 activation function.

ReLU는 기본 선택이며, sigmoid or tanh는 주로 출력층에서만 사용한다.

<aside> <img src="data:image/svg+xml;charset=utf-8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI0NCIgaGVpZ2h0PSI0NCIgZmlsbD0iI2U3YmNiNSIgY2xhc3M9ImJpIGJpLWNoYXQtZG90cy1maWxsIiB2aWV3Qm94PSIwIDAgMTYgMTYiIGlkPSJpY29uLWNoYXQtZG90cy1maWxsLTM0NCI+PHBhdGggZD0iTTE2IDhjMCAzLjg2Ni0zLjU4MiA3LTggN2E5LjA2IDkuMDYgMCAwIDEtMi4zNDctLjMwNmMtLjU4NC4yOTYtMS45MjUuODY0LTQuMTgxIDEuMjM0LS4yLjAzMi0uMzUyLS4xNzYtLjI3My0uMzYyLjM1NC0uODM2LjY3NC0xLjk1Ljc3LTIuOTY2Qy43NDQgMTEuMzcgMCA5Ljc2IDAgOGMwLTMuODY2IDMuNTgyLTcgOC03czggMy4xMzQgOCA3ek01IDhhMSAxIDAgMSAwLTIgMCAxIDEgMCAwIDAgMiAwem00IDBhMSAxIDAgMSAwLTIgMCAxIDEgMCAwIDAgMiAwem0zIDFhMSAxIDAgMSAwIDAtMiAxIDEgMCAwIDAgMCAyeiI+PC9wYXRoPjwvc3ZnPg==" alt="data:image/svg+xml;charset=utf-8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI0NCIgaGVpZ2h0PSI0NCIgZmlsbD0iI2U3YmNiNSIgY2xhc3M9ImJpIGJpLWNoYXQtZG90cy1maWxsIiB2aWV3Qm94PSIwIDAgMTYgMTYiIGlkPSJpY29uLWNoYXQtZG90cy1maWxsLTM0NCI+PHBhdGggZD0iTTE2IDhjMCAzLjg2Ni0zLjU4MiA3LTggN2E5LjA2IDkuMDYgMCAwIDEtMi4zNDctLjMwNmMtLjU4NC4yOTYtMS45MjUuODY0LTQuMTgxIDEuMjM0LS4yLjAzMi0uMzUyLS4xNzYtLjI3My0uMzYyLjM1NC0uODM2LjY3NC0xLjk1Ljc3LTIuOTY2Qy43NDQgMTEuMzcgMCA5Ljc2IDAgOGMwLTMuODY2IDMuNTgyLTcgOC03czggMy4xMzQgOCA3ek01IDhhMSAxIDAgMSAwLTIgMCAxIDEgMCAwIDAgMiAwem00IDBhMSAxIDAgMSAwLTIgMCAxIDEgMCAwIDAgMiAwem0zIDFhMSAxIDAgMSAwIDAtMiAxIDEgMCAwIDAgMCAyeiI+PC9wYXRoPjwvc3ZnPg==" width="40px" /> ReLU는 sigmoid나 Tanh에 비해 Vanishing Gradient 문제를 피할 수 있고, 계산이 간단하며, 희소 활성화를 통해 모델의 일반화 능력을 높이고, 실제 성능에서도 좋은 결과를 보이기 때문에 더 널리 사용된다. 이러한 이유로 많은 딥러닝 네트워크에서 기본 활성화 함수로 ReLU가 선호된다.

</aside>

sigmoid

단점: killed(vanishing) gradients, Not zero-centered output

스크린샷 2024-08-13 오전 11.12.32.png

$$ a = g(z) = \frac{1}{1 + e^{-z}} $$

$$ g'(z) = \frac{d}{dz} g(z) = \text{slope of } g(z) \text{ at } z \\ = \frac{1}{1 + e^{-z}} \left(1 - \frac{1}{1 + e^{-z}}\right) \\ = g(z) \left(1 - g(z)\right) \\ = a \left(1 - a\right) $$

tanh

단점: killed(vanishing) gradients

image.png

$$ a =g(z) = \tanh(z) \\ = \frac{e^z - e^{-z}}{e^z + e^{-z}} $$

$$ g'(z) = \frac{d}{dz} g(z) = \text{slope of } g(z) \text{ at } z \\ = 1 - \left(\tanh(z)\right)^2 \\ \quad g'(z) = 1 - a^2 $$

relu

단점: Not zero-centered output, Dead ReLU problem(출력 값이 음수라면 saturated되는 문제 발생), x=0일 때 미분 불가능.

image.png

$$ g(z) = \max(0, z) \\

$$

Leaky ReLU

image.png

$$ g(z) = \max(0.01z, z) \\

$$