Steepest descent Method 총정리

Steepest descent algorithm은 기울기(gradient)를 따라 함수 값을 최소화하는 최적화 알고리즘입니다. 주어진 함수에서 현재 위치의 기울기 반대 방향으로 이동하며 함수 값이 가장 빠르게 감소하게 됩니다. 각 단계에서 step size에 따라 이동하며, 기울기가 0에 가까워질수록 최적점(optimal point)에 수렴합니다.

2. Steepest-Descent algorithm의 Stability

Step size $μ <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>μ</mi></math>$ 는 너무 작으면 최적점에 느리게 수렴하고, 너무 크면 발산할 수 있어 적절한 값을 선택하는 것이 중요합니다. 위 예시에서는 $c (n) <math xmlns="http://www.w3.org/1998/Math/MathML"><mtext mathvariant="bold">c</mtext><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></math>$ 이 원점에 있을 때가 가장 optimal하다고 가정합니다.

$v k (n) <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>v</mi><mi>k</mi></msub><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></math>$ 식을 geometric series로 표현하여 각 time step에서 값이 exponential하게 감소함을 나타낼 수 있습니다. 여기서 $μ <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>μ</mi></math>$ 가 0과 $2λmax<math xmlns="http://www.w3.org/1998/Math/MathML"><mfrac><mn>2</mn><msub><mi>λ</mi><mtext>max</mtext></msub></mfrac></math>$ 사이에 있어야 $v k (n) <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>v</mi><mi>k</mi></msub><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></math>$ 이 수렴할 수 있습니다. $v k (n) <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>v</mi><mi>k</mi></msub><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></math>$ 의 convergence와 관련된 시간 $τ k <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>τ</mi><mi>k</mi></msub></math>$ 를 정의하고, 이 식을 $J (n) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>J</mi><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></math>$ 에 적용하면 MSE가 최소화될 때까지 걸리는 시간을 대략적으로 계산할 수 있습니다.

3. (Example) 2nd order real-valued AR process

$2 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>2</mn></math>$ 차 auto-regressive (AR) process이 위 식과 같을 때, 예시를 살펴보겠습니다. $ν (n) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>ν</mi><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></math>$ 은 white gaussian noise이기 때문에, 현재 시점에만 영향을 주고 과거에는 영향을 미치지 않습니다.

2차 AR process에서 $a 1, a 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mn>1</mn></msub><mo>,</mo><msub><mi>a</mi><mn>2</mn></msub></math>$ coefficient를 알고 싶을 때, 과거의 입력 신호를 순서대로 곱한 후 expectation을 취함으로써, 각 delay에 따른 autocorrelation을 구할 수 있습니다. 이를 통해 일련의 계산을 거쳐 $r (0) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>r</mi><mo stretchy="false">(</mo><mn>0</mn><mo stretchy="false">)</mo></math>$ 을 AR coefficient인 $a 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mn>1</mn></msub></math>$ 과 $a 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mn>2</mn></msub></math>$ 에 대한 식으로 나타낼 수 있게 됩니다.

$R <math xmlns="http://www.w3.org/1998/Math/MathML"><mtext mathvariant="bold">R</mtext></math>$ 은 autocorrelation function $r (0) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>r</mi><mo stretchy="false">(</mo><mn>0</mn><mo stretchy="false">)</mo></math>$ 과 $r (1) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>r</mi><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo></math>$ 을 기반으로 계산되며 $σ 2 u <math xmlns="http://www.w3.org/1998/Math/MathML"><msubsup><mi>σ</mi><mi>u</mi><mn>2</mn></msubsup></math>$ 는 input signal의 variance를 나타냅니다. autocorrelation matrix의 eigenvalue는 $R <math xmlns="http://www.w3.org/1998/Math/MathML"><mtext mathvariant="bold">R</mtext></math>$ 에서 determinant $det (R - λ I) <math xmlns="http://www.w3.org/1998/Math/MathML"><mtext>det</mtext><mo stretchy="false">(</mo><mtext mathvariant="bold">R</mtext><mo>-</mo><mi>λ</mi><mtext mathvariant="bold">I</mtext><mo stretchy="false">)</mo></math>$ 식을 통해 구합니다. eigenvalue spread는 두 eigenvalue 간 크기 비율을 나타내는 지표입니다. Error function $J (n) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>J</mi><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></math>$ 은 타원 형태의 error 곡선으로 나타낼 수 있는데 여기서 $v 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>v</mi><mn>1</mn></msub></math>$ 과 $v 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>v</mi><mn>2</mn></msub></math>$ 는 각각 eigenvector에 따른 좌표 성분을 나타냅니다. 각 축은 eigenvalue에 비례하기 때문에 그에 따라 error가 다른 속도로 감소합니다.

desired signal에 optimal filter output을 뺐을 때 white noise power만 남게 됩니다. AR coefficient $a 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mn>1</mn></msub></math>$ 과 $a 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mn>2</mn></msub></math>$ 를 이용해 $J m i n <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>J</mi><mrow data-mjx-texclass="ORD"><mi>m</mi><mi>i</mi><mi>n</mi></mrow></msub></math>$ 값을 나타낼 수 있습니다. $J m i n <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>J</mi><mrow data-mjx-texclass="ORD"><mi>m</mi><mi>i</mi><mi>n</mi></mrow></msub></math>$ 은 eigenvalue spread $χ <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>χ</mi></math>$ 값이 증가할수록 감소합니다. $v k (n) <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>v</mi><mi>k</mi></msub><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></math>$ 은 시간에 따라 filter weight가 어떻게 shift되고 rotate 되는지 보여주는 업데이트 식으로, $n = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>=</mo><mn>0</mn></math>$ 을 대입하면 초기 상태 값을 알 수 있습니다.

$J (n) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>J</mi><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></math>$ 은 고정된 상태로, 모든 $J (n) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>J</mi><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></math>$ 값을 average하여 최적값에 수렴하는 선 (convergence trajectory)을 그려주었습니다. eigenvalue spread 값이 작을수록 원에 가까운 모양을 가지고 여기에 더해 initial 값이 장축이나 단축에 위치한다면 거의 직선 경로로 빠르게 수렴하는 것을 볼 수 있습니다. 반대로, eigenvalue spread 값이 클 때, 찌그러진 타원 모양에 $v n <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>v</mi><mi>n</mi></msub></math>$ 도 거의 움직이지 않아 수렴 속도가 매우 느려진 것을 관찰할 수 있습니다.

시간에 따른 $J (n) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>J</mi><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></math>$ 값을 나타낸 것인데, eigenvalue spread가 클수록 input process가 특정 성분에 치우져져 있어 상대적으로 correlation이 더 커지는 경향이 있으며, 예측은 잘되지만 수렴 속도가 느린 걸 확인할 수 있습니다.

overdamped란 최적점에 도달하는데 시간이 오래 걸리지만 안정적으로 수렴하는 상태를 말합니다. underdamped는 상대적으로 weight 변화량이 더 커서 수렴 속도가 빠르지만, learning rate가 필요 이상으로 커지면 발산합니다. 어느 것이 더 빠른지는 그림만 보고는 알 수 없는 문제로, 수렴 속도와 예측 정확도를 고려하여 적절한 learning rate 값을 설정해주는 것이 중요합니다.

GIST 신종원 교수님 '적응신호처리' 수업 자료를 바탕으로 쓴 글입니다.

'연구 노트 > 적응신호처리' 카테고리의 다른 글

Least-Mean-Square Adaptive Filter (LMS) 알아보기 (1)	2024.10.24
Gradient Descent를 위한 최적화 알고리즘 (0)	2024.10.23
Linear Prediction 바로 알기 (0)	2024.10.21
Wiener filter 총정리 (0)	2024.10.19
NLMS filter의 Stability (0)	2024.10.18

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Sunny Archive ☀️

Steepest descent Method 총정리

목차

1. Steepest descent algorithm이란?

2. Steepest-Descent algorithm의 Stability

3. (Example) 2nd order real-valued AR process

'연구 노트 > 적응신호처리' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

Steepest descent Method 총정리

목차

1. Steepest descent algorithm이란?

2. Steepest-Descent algorithm의 Stability

3. (Example) 2nd order real-valued AR process

'연구 노트 > 적응신호처리' 카테고리의 다른 글

관련글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역