새롭게 알게 된 내용

torch.max

Linear


고민한 내용

모델을 특정 디바이스로 이동시키는 코드가 왜 필요한가요?

def set_device(self, device):
    # [CODE START]
    self.weight = self.weight.to(device)
    self.bias = self.bias.to(device)
    # [CODE END]

<aside> <img src="data:image/svg+xml;charset=utf-8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI0NCIgaGVpZ2h0PSI0NCIgZmlsbD0iI2U3YmNiNSIgY2xhc3M9ImJpIGJpLWNoYXQtZG90cy1maWxsIiB2aWV3Qm94PSIwIDAgMTYgMTYiIGlkPSJpY29uLWNoYXQtZG90cy1maWxsLTM0NCI+PHBhdGggZD0iTTE2IDhjMCAzLjg2Ni0zLjU4MiA3LTggN2E5LjA2IDkuMDYgMCAwIDEtMi4zNDctLjMwNmMtLjU4NC4yOTYtMS45MjUuODY0LTQuMTgxIDEuMjM0LS4yLjAzMi0uMzUyLS4xNzYtLjI3My0uMzYyLjM1NC0uODM2LjY3NC0xLjk1Ljc3LTIuOTY2Qy43NDQgMTEuMzcgMCA5Ljc2IDAgOGMwLTMuODY2IDMuNTgyLTcgOC03czggMy4xMzQgOCA3ek01IDhhMSAxIDAgMSAwLTIgMCAxIDEgMCAwIDAgMiAwem00IDBhMSAxIDAgMSAwLTIgMCAxIDEgMCAwIDAgMiAwem0zIDFhMSAxIDAgMSAwIDAtMiAxIDEgMCAwIDAgMCAyeiI+PC9wYXRoPjwvc3ZnPg==" alt="data:image/svg+xml;charset=utf-8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI0NCIgaGVpZ2h0PSI0NCIgZmlsbD0iI2U3YmNiNSIgY2xhc3M9ImJpIGJpLWNoYXQtZG90cy1maWxsIiB2aWV3Qm94PSIwIDAgMTYgMTYiIGlkPSJpY29uLWNoYXQtZG90cy1maWxsLTM0NCI+PHBhdGggZD0iTTE2IDhjMCAzLjg2Ni0zLjU4MiA3LTggN2E5LjA2IDkuMDYgMCAwIDEtMi4zNDctLjMwNmMtLjU4NC4yOTYtMS45MjUuODY0LTQuMTgxIDEuMjM0LS4yLjAzMi0uMzUyLS4xNzYtLjI3My0uMzYyLjM1NC0uODM2LjY3NC0xLjk1Ljc3LTIuOTY2Qy43NDQgMTEuMzcgMCA5Ljc2IDAgOGMwLTMuODY2IDMuNTgyLTcgOC03czggMy4xMzQgOCA3ek01IDhhMSAxIDAgMSAwLTIgMCAxIDEgMCAwIDAgMiAwem00IDBhMSAxIDAgMSAwLTIgMCAxIDEgMCAwIDAgMiAwem0zIDFhMSAxIDAgMSAwIDAtMiAxIDEgMCAwIDAgMCAyeiI+PC9wYXRoPjwvc3ZnPg==" width="40px" /> 디바이스에 할당하는 역할입니다.

</aside>

[Linear 중] 신경망의 가중치 초기화와 학습성능은 어떤 상관관계를 가질까?

<aside> <img src="data:image/svg+xml;charset=utf-8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI0NCIgaGVpZ2h0PSI0NCIgZmlsbD0iI2U3YmNiNSIgY2xhc3M9ImJpIGJpLWNoYXQtZG90cy1maWxsIiB2aWV3Qm94PSIwIDAgMTYgMTYiIGlkPSJpY29uLWNoYXQtZG90cy1maWxsLTM0NCI+PHBhdGggZD0iTTE2IDhjMCAzLjg2Ni0zLjU4MiA3LTggN2E5LjA2IDkuMDYgMCAwIDEtMi4zNDctLjMwNmMtLjU4NC4yOTYtMS45MjUuODY0LTQuMTgxIDEuMjM0LS4yLjAzMi0uMzUyLS4xNzYtLjI3My0uMzYyLjM1NC0uODM2LjY3NC0xLjk1Ljc3LTIuOTY2Qy43NDQgMTEuMzcgMCA5Ljc2IDAgOGMwLTMuODY2IDMuNTgyLTcgOC03czggMy4xMzQgOCA3ek01IDhhMSAxIDAgMSAwLTIgMCAxIDEgMCAwIDAgMiAwem00IDBhMSAxIDAgMSAwLTIgMCAxIDEgMCAwIDAgMiAwem0zIDFhMSAxIDAgMSAwIDAtMiAxIDEgMCAwIDAgMCAyeiI+PC9wYXRoPjwvc3ZnPg==" alt="data:image/svg+xml;charset=utf-8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI0NCIgaGVpZ2h0PSI0NCIgZmlsbD0iI2U3YmNiNSIgY2xhc3M9ImJpIGJpLWNoYXQtZG90cy1maWxsIiB2aWV3Qm94PSIwIDAgMTYgMTYiIGlkPSJpY29uLWNoYXQtZG90cy1maWxsLTM0NCI+PHBhdGggZD0iTTE2IDhjMCAzLjg2Ni0zLjU4MiA3LTggN2E5LjA2IDkuMDYgMCAwIDEtMi4zNDctLjMwNmMtLjU4NC4yOTYtMS45MjUuODY0LTQuMTgxIDEuMjM0LS4yLjAzMi0uMzUyLS4xNzYtLjI3My0uMzYyLjM1NC0uODM2LjY3NC0xLjk1Ljc3LTIuOTY2Qy43NDQgMTEuMzcgMCA5Ljc2IDAgOGMwLTMuODY2IDMuNTgyLTcgOC03czggMy4xMzQgOCA3ek01IDhhMSAxIDAgMSAwLTIgMCAxIDEgMCAwIDAgMiAwem00IDBhMSAxIDAgMSAwLTIgMCAxIDEgMCAwIDAgMiAwem0zIDFhMSAxIDAgMSAwIDAtMiAxIDEgMCAwIDAgMCAyeiI+PC9wYXRoPjwvc3ZnPg==" width="40px" /> 가중치 초기화는 사용되는 활성화 함수에 따라 다르게 설계되어야 한다. 예를 들어, ReLU와 같은 활성화 함수는 입력값이 양수일 때만 활성화되므로, Kaiming 초기화(Kaiming initialization)가 자주 사용된다. 이는 ReLU가 활성화될 확률을 고려해 가중치를 초기화한다.

반면, 시그모이드(sigmoid)나 하이퍼볼릭 탄젠트(tanh)와 같은 활성화 함수를 사용하는 경우, 기울기 소실 문제가 더 자주 발생할 수 있기 때문에 Xavier 초기화(Xavier initialization)를 사용해 가중치를 적절한 범위로 설정한다.

</aside>