算法
A t = argmax i ( μ ^ i ( t − 1 ) + 2 log f ( t ) T i ( t − 1 ) ) A_t=\operatorname{argmax}_i\left(\hat{\mu}_i(t-1)+\sqrt{\frac{2 \log f(t)}{T_i(t-1)}}\right)
A t = a r g m a x i ( μ ^ i ( t − 1 ) + T i ( t − 1 ) 2 log f ( t ) )
即把UCB算法中的 1 / δ 1/\delta 1 / δ 改写成 f ( t ) = 1 + t log 2 ( t ) f(t)=1+t \log ^{2}(t) f ( t ) = 1 + t log 2 ( t )
引理一
对于 X i − μ X_{i}-\mu X i − μ 服从 1 − subgaussian 1-\text{subgaussian} 1 − subgaussian 分布,μ ^ t = 1 t ∑ s = 1 t X s \hat{\mu}_{t}=\frac{1}{t} \sum_{s=1}^{t} X_{s} μ ^ t = t 1 ∑ s = 1 t X s ,我们引入
κ = ∑ t = 1 n I { μ ^ t + 2 a t ⩾ ε } \kappa=\sum_{t=1}^n \mathbb{I}\left\{\hat{\mu}_t+\sqrt{\frac{2 a}{t}} \geqslant \varepsilon\right\}
κ = t = 1 ∑ n I { μ ^ t + t 2 a ⩾ ε }
以及
κ ′ = u + ∑ t = ⌈ u ⌉ n I { μ ^ t + 2 a t ⩾ ε } , u = 2 a ε − 2 \kappa^{\prime}=u+\sum_{t=\lceil u\rceil}^n \mathbb{I}\left\{\hat{\mu}_t+\sqrt{\frac{2 a}{t}} \geqslant \varepsilon\right\}, \quad u=2 a \varepsilon^{-2}
κ ′ = u + t = ⌈ u ⌉ ∑ n I { μ ^ t + t 2 a ⩾ ε } , u = 2 a ε − 2
有
E [ κ ] ⩽ E [ κ ′ ] ⩽ 1 + 2 ε 2 ( a + π a + 1 ) \mathbb{E}[\kappa] \leqslant \mathbb{E}\left[\kappa^{\prime}\right] \leqslant 1+\frac{2}{\varepsilon^2}(a+\sqrt{\pi a}+1)
E [ κ ] ⩽ E [ κ ′ ] ⩽ 1 + ε 2 2 ( a + π a + 1 )
一方面,
假设 t ⩽ u t\leqslant u t ⩽ u 时,I { μ ^ t + 2 a t ⩾ ε } \mathbb{I}\left\{\hat{\mu}_{t}+\sqrt{\frac{2 a}{t}} \geqslant \varepsilon\right\} I { μ ^ t + t 2 a ⩾ ε } 所代表的事件发生,当 t > u t>u t > u 时,要满足 μ ^ t + 2 a t < ε \hat{\mu}_{t}+\sqrt{\frac{2 a}{t}} < \varepsilon μ ^ t + t 2 a < ε
,故 u = 2 a ε − 2 u=2a\varepsilon^{-2} u = 2 a ε − 2 ,可以得出 E [ κ ] ⩽ E [ κ ′ ] \mathbb{E}[\kappa] \leqslant \mathbb{E}\left[\kappa^{\prime}\right] E [ κ ] ⩽ E [ κ ′ ] .
另一方面,
E [ κ ] ⩽ E [ κ ′ ] = u + ∑ t = ⌈ u ⌉ n P ( μ ^ t + 2 a t ⩾ ε ) ⩽ u + ∑ t = ⌈ u ⌉ n exp ( − t ( ε − 2 a t ) 2 2 ) ⩽ 1 + u + ∫ u ∞ exp ( − t ( ε − 2 a t ) 2 2 ) d t = 1 + 2 ε 2 ( a + π a + 1 ) \begin{aligned}
\mathbb{E}[\kappa] & \leqslant \mathbb{E}\left[\kappa^{\prime}\right]=u+\sum_{t=\lceil u\rceil}^n \mathbb{P}\left(\hat{\mu}_t+\sqrt{\frac{2 a}{t}} \geqslant \varepsilon\right) \\
& \leqslant u+\sum_{t=\lceil u\rceil}^n \exp \left(-\frac{t\left(\varepsilon-\sqrt{\frac{2 a}{t}}\right)^2}{2}\right) \\
& \leqslant 1+u+\int_u^{\infty} \exp \left(-\frac{t\left(\varepsilon-\sqrt{\frac{2 a}{t}}\right)^2}{2}\right) d t=1+\frac{2}{\varepsilon^2}(a+\sqrt{\pi a}+1)
\end{aligned}
E [ κ ] ⩽ E [ κ ′ ] = u + t = ⌈ u ⌉ ∑ n P ( μ ^ t + t 2 a ⩾ ε ) ⩽ u + t = ⌈ u ⌉ ∑ n exp ⎝ ⎜ ⎛ − 2 t ( ε − t 2 a ) 2 ⎠ ⎟ ⎞ ⩽ 1 + u + ∫ u ∞ exp ⎝ ⎜ ⎛ − 2 t ( ε − t 2 a ) 2 ⎠ ⎟ ⎞ d t = 1 + ε 2 2 ( a + π a + 1 )
倒数第二步利用ETC算法引理三,最后一步转化成积分令 s = ε t − 2 a s=\varepsilon \sqrt{t}-\sqrt{2 a} s = ε t − 2 a 可以计算得出
Regret分析
R n ⩽ ∑ i : Δ i > 0 inf ε ∈ ( 0 , Δ i ) Δ i ( 1 + 5 ε 2 + 2 ( log f ( n ) + π log f ( n ) + 1 ) ( Δ i − ε ) 2 ) R_n \leqslant \sum_{i: \Delta_i>0} \inf _{\varepsilon \in\left(0, \Delta_i\right)} \Delta_i\left(1+\frac{5}{\varepsilon^2}+\frac{2(\log f(n)+\sqrt{\pi \log f(n)}+1)}{\left(\Delta_i-\varepsilon\right)^2}\right)
R n ⩽ i : Δ i > 0 ∑ ε ∈ ( 0 , Δ i ) inf Δ i ( 1 + ε 2 5 + ( Δ i − ε ) 2 2 ( log f ( n ) + π log f ( n ) + 1 ) )
根据 A = ( A ∩ B ) ∪ ( A ∩ B ˉ ) A=(A\cap B)\cup (A\cap\bar{B}) A = ( A ∩ B ) ∪ ( A ∩ B ˉ )
T i ( n ) = ∑ t = 1 n I { A t = i } ⩽ ∑ t = 1 n I { μ ^ 1 ( t − 1 ) + 2 log f ( t ) T 1 ( t − 1 ) ⩽ μ 1 − ε } + ∑ t = 1 n I { μ ^ i ( t − 1 ) + 2 log f ( t ) T i ( t − 1 ) ⩾ μ 1 − ε and A t = i } \begin{aligned}
T_i(n)=\sum_{t=1}^n \mathbb{I}\left\{A_t=\right. & i\} \leqslant \sum_{t=1}^n \mathbb{I}\left\{\hat{\mu}_1(t-1)+\sqrt{\frac{2 \log f(t)}{T_1(t-1)}} \leqslant \mu_1-\varepsilon\right\} \\
& +\sum_{t=1}^n \mathbb{I}\left\{\hat{\mu}_i(t-1)+\sqrt{\frac{2 \log f(t)}{T_i(t-1)}} \geqslant \mu_1-\varepsilon \text { and } A_t=i\right\}
\end{aligned}
T i ( n ) = t = 1 ∑ n I { A t = i } ⩽ t = 1 ∑ n I { μ ^ 1 ( t − 1 ) + T 1 ( t − 1 ) 2 log f ( t ) ⩽ μ 1 − ε } + t = 1 ∑ n I { μ ^ i ( t − 1 ) + T i ( t − 1 ) 2 log f ( t ) ⩾ μ 1 − ε and A t = i }
首先考虑第一项
E [ ∑ t = 1 n I { μ ^ 1 ( t − 1 ) + 2 log f ( t ) T 1 ( t − 1 ) ⩽ μ 1 − ε } ] ⩽ ∑ t = 1 n ∑ s = 1 n P ( μ ^ 1 s + 2 log f ( t ) s ⩽ μ 1 − ε ) ( s = T 1 ( t − 1 ) ) ⩽ ∑ t = 1 n ∑ s = 1 n exp ( − s ( 2 log f ( t ) s + ε ) 2 2 ) ⩽ ∑ t = 1 n 1 f ( t ) ∑ s = 1 n exp ( − s ε 2 2 ) ⩽ 5 ε 2 \begin{aligned}
& \mathbb{E}\left[\sum_{t=1}^n \mathbb{I}\left\{\hat{\mu}_1(t-1)+\sqrt{\frac{2 \log f(t)}{T_1(t-1)}} \leqslant \mu_1-\varepsilon\right\}\right] \\
& \leqslant \sum_{t=1}^n \sum_{s=1}^n \mathbb{P}\left(\hat{\mu}_{1 s}+\sqrt{\frac{2 \log f(t)}{s}} \leqslant \mu_1-\varepsilon\right) \quad\left(s=T_1(t-1)\right) \\
& \leqslant \sum_{t=1}^n \sum_{s=1}^n \exp \left(-\frac{s\left(\sqrt{\frac{2 \log f(t)}{s}}+\varepsilon\right)^2}{2}\right) \\
& \leqslant \sum_{t=1}^n \frac{1}{f(t)} \sum_{s=1}^n \exp \left(-\frac{s \varepsilon^2}{2}\right) \\
& \leqslant \frac{5}{\varepsilon^2}
\end{aligned}
E [ t = 1 ∑ n I { μ ^ 1 ( t − 1 ) + T 1 ( t − 1 ) 2 log f ( t ) ⩽ μ 1 − ε } ] ⩽ t = 1 ∑ n s = 1 ∑ n P ( μ ^ 1 s + s 2 log f ( t ) ⩽ μ 1 − ε ) ( s = T 1 ( t − 1 ) ) ⩽ t = 1 ∑ n s = 1 ∑ n exp ⎝ ⎜ ⎜ ⎜ ⎛ − 2 s ( s 2 log f ( t ) + ε ) 2 ⎠ ⎟ ⎟ ⎟ ⎞ ⩽ t = 1 ∑ n f ( t ) 1 s = 1 ∑ n exp ( − 2 s ε 2 ) ⩽ ε 2 5
倒数第三步利用ETC算法引理三,倒数第二步可以拆出一个 1 / f ( t ) 1/f(t) 1 / f ( t ) 和 exp ( − s ε 2 2 ) \exp \left(-\frac{s \varepsilon^{2}}{2}\right) exp ( − 2 s ε 2 )
对于最后一步
∑ t = 1 n 1 f ( t ) ∑ s = 1 n exp ( − s ε 2 2 ) ⩽ ∑ t = 1 n 1 f ( t ) 2 ε 2 = 2 ε 2 ∑ t = 1 n 1 t log 2 t + 1 ⩽ 2 ε 2 ( 1 + ∑ t = 1 n 1 t log 2 t ) ⩽ 2 ε 2 ( 1 + 1 + ∫ 2 n 1 t log 2 t d t ) ⩽ 5 ε 2 \begin{aligned}
& \sum_{t=1}^n \frac{1}{f(t)} \sum_{s=1}^n \exp \left(-\frac{s \varepsilon^2}{2}\right) \\
& \leqslant \sum_{t=1}^n \frac{1}{f(t)} \frac{2}{\varepsilon^2}=\frac{2}{\varepsilon^2} \sum_{t=1}^n \frac{1}{t \log ^2 t+1} \\
& \leqslant \frac{2}{\varepsilon^2}\left(1+\sum_{t=1}^n \frac{1}{t \log ^2 t}\right) \\
& \leqslant \frac{2}{\varepsilon^2}\left(1+1+\int_2^n \frac{1}{t \log ^2 t} d t\right) \\
& \leqslant \frac{5}{\varepsilon^2}
\end{aligned}
t = 1 ∑ n f ( t ) 1 s = 1 ∑ n exp ( − 2 s ε 2 ) ⩽ t = 1 ∑ n f ( t ) 1 ε 2 2 = ε 2 2 t = 1 ∑ n t log 2 t + 1 1 ⩽ ε 2 2 ( 1 + t = 1 ∑ n t log 2 t 1 ) ⩽ ε 2 2 ( 1 + 1 + ∫ 2 n t log 2 t 1 d t ) ⩽ ε 2 5
再考虑第二项
E [ ∑ t = 1 n I { μ ^ i ( t − 1 ) + 2 log f ( t ) T i ( t − 1 ) ⩾ μ 1 − ε and A t = i } ] ⩽ E [ ∑ t = 1 n I { μ ^ i ( t − 1 ) + 2 log f ( n ) T i ( t − 1 ) ⩾ μ 1 − ε and A t = i } ] ⩽ E [ ∑ s = 1 n I { μ ^ i s + 2 log f ( n ) s ⩾ μ 1 − ε } ] = E [ ∑ s = 1 n I { μ ^ i s − μ i + 2 log f ( n ) s ⩾ Δ i − ε } ] ⩽ 1 + 2 ( Δ i − ε ) 2 ( log f ( n ) + π log f ( n ) + 1 ) \begin{aligned}
& \mathbb{E}\left[\sum_{t=1}^n \mathbb{I}\left\{\hat{\mu}_i(t-1)+\sqrt{\frac{2 \log f(t)}{T_i(t-1)}} \geqslant \mu_1-\varepsilon \text { and } A_t=i\right\}\right] \\
& \leqslant \mathbb{E}\left[\sum_{t=1}^n \mathbb{I}\left\{\hat{\mu}_i(t-1)+\sqrt{\frac{2 \log f(n)}{T_i(t-1)}} \geqslant \mu_1-\varepsilon \text { and } A_t=i\right\}\right] \\
& \leqslant \mathbb{E}\left[\sum_{s=1}^n \mathbb{I}\left\{\hat{\mu}_{i s}+\sqrt{\frac{2 \log f(n)}{s}} \geqslant \mu_1-\varepsilon\right\}\right] \\
& =\mathbb{E}\left[\sum_{s=1}^n \mathbb{I}\left\{\hat{\mu}_{i s}-\mu_i+\sqrt{\frac{2 \log f(n)}{s}} \geqslant \Delta_i-\varepsilon\right\}\right] \\
& \leqslant 1+\frac{2}{\left(\Delta_i-\varepsilon\right)^2}(\log f(n)+\sqrt{\pi \log f(n)}+1)
\end{aligned}
E [ t = 1 ∑ n I { μ ^ i ( t − 1 ) + T i ( t − 1 ) 2 log f ( t ) ⩾ μ 1 − ε and A t = i } ] ⩽ E [ t = 1 ∑ n I { μ ^ i ( t − 1 ) + T i ( t − 1 ) 2 log f ( n ) ⩾ μ 1 − ε and A t = i } ] ⩽ E [ s = 1 ∑ n I { μ ^ i s + s 2 log f ( n ) ⩾ μ 1 − ε } ] = E [ s = 1 ∑ n I { μ ^ i s − μ i + s 2 log f ( n ) ⩾ Δ i − ε } ] ⩽ 1 + ( Δ i − ε ) 2 2 ( log f ( n ) + π log f ( n ) + 1 )
将两项结合在一起便可以得出结论
如果对Regret上界取极限可得
lim sup n → ∞ R n log ( n ) ⩽ ∑ i : Δ i > 0 2 Δ i \limsup _{n \rightarrow \infty} \frac{R_{n}}{\log (n)} \leqslant \sum_{i: \Delta_{i}>0} \frac{2}{\Delta_{i}}
n → ∞ l i m s u p log ( n ) R n ⩽ i : Δ i > 0 ∑ Δ i 2
运行结果
与ETC和UCB算法的对比