AsyOpt UCB

算法

At=argmaxi(μ^i(t1)+2logf(t)Ti(t1))A_t=\operatorname{argmax}_i\left(\hat{\mu}_i(t-1)+\sqrt{\frac{2 \log f(t)}{T_i(t-1)}}\right)

即把UCB算法中的 1/δ1/\delta改写成 f(t)=1+tlog2(t)f(t)=1+t \log ^{2}(t)

引理一

对于 XiμX_{i}-\mu服从 1subgaussian1-\text{subgaussian}分布,μ^t=1ts=1tXs\hat{\mu}_{t}=\frac{1}{t} \sum_{s=1}^{t} X_{s},我们引入

κ=t=1nI{μ^t+2atε}\kappa=\sum_{t=1}^n \mathbb{I}\left\{\hat{\mu}_t+\sqrt{\frac{2 a}{t}} \geqslant \varepsilon\right\}

以及

κ=u+t=unI{μ^t+2atε},u=2aε2\kappa^{\prime}=u+\sum_{t=\lceil u\rceil}^n \mathbb{I}\left\{\hat{\mu}_t+\sqrt{\frac{2 a}{t}} \geqslant \varepsilon\right\}, \quad u=2 a \varepsilon^{-2}

E[κ]E[κ]1+2ε2(a+πa+1)\mathbb{E}[\kappa] \leqslant \mathbb{E}\left[\kappa^{\prime}\right] \leqslant 1+\frac{2}{\varepsilon^2}(a+\sqrt{\pi a}+1)

一方面,

假设 tut\leqslant u时,I{μ^t+2atε}\mathbb{I}\left\{\hat{\mu}_{t}+\sqrt{\frac{2 a}{t}} \geqslant \varepsilon\right\}所代表的事件发生,当 t>ut>u时,要满足 μ^t+2at<ε\hat{\mu}_{t}+\sqrt{\frac{2 a}{t}} < \varepsilon

,故 u=2aε2u=2a\varepsilon^{-2},可以得出 E[κ]E[κ]\mathbb{E}[\kappa] \leqslant \mathbb{E}\left[\kappa^{\prime}\right].

另一方面,

E[κ]E[κ]=u+t=unP(μ^t+2atε)u+t=unexp(t(ε2at)22)1+u+uexp(t(ε2at)22)dt=1+2ε2(a+πa+1)\begin{aligned} \mathbb{E}[\kappa] & \leqslant \mathbb{E}\left[\kappa^{\prime}\right]=u+\sum_{t=\lceil u\rceil}^n \mathbb{P}\left(\hat{\mu}_t+\sqrt{\frac{2 a}{t}} \geqslant \varepsilon\right) \\ & \leqslant u+\sum_{t=\lceil u\rceil}^n \exp \left(-\frac{t\left(\varepsilon-\sqrt{\frac{2 a}{t}}\right)^2}{2}\right) \\ & \leqslant 1+u+\int_u^{\infty} \exp \left(-\frac{t\left(\varepsilon-\sqrt{\frac{2 a}{t}}\right)^2}{2}\right) d t=1+\frac{2}{\varepsilon^2}(a+\sqrt{\pi a}+1) \end{aligned}

倒数第二步利用ETC算法引理三,最后一步转化成积分令 s=εt2as=\varepsilon \sqrt{t}-\sqrt{2 a}可以计算得出

Regret分析

Rni:Δi>0infε(0,Δi)Δi(1+5ε2+2(logf(n)+πlogf(n)+1)(Δiε)2)R_n \leqslant \sum_{i: \Delta_i>0} \inf _{\varepsilon \in\left(0, \Delta_i\right)} \Delta_i\left(1+\frac{5}{\varepsilon^2}+\frac{2(\log f(n)+\sqrt{\pi \log f(n)}+1)}{\left(\Delta_i-\varepsilon\right)^2}\right)

根据 A=(AB)(ABˉ)A=(A\cap B)\cup (A\cap\bar{B})

Ti(n)=t=1nI{At=i}t=1nI{μ^1(t1)+2logf(t)T1(t1)μ1ε}+t=1nI{μ^i(t1)+2logf(t)Ti(t1)μ1ε and At=i}\begin{aligned} T_i(n)=\sum_{t=1}^n \mathbb{I}\left\{A_t=\right. & i\} \leqslant \sum_{t=1}^n \mathbb{I}\left\{\hat{\mu}_1(t-1)+\sqrt{\frac{2 \log f(t)}{T_1(t-1)}} \leqslant \mu_1-\varepsilon\right\} \\ & +\sum_{t=1}^n \mathbb{I}\left\{\hat{\mu}_i(t-1)+\sqrt{\frac{2 \log f(t)}{T_i(t-1)}} \geqslant \mu_1-\varepsilon \text { and } A_t=i\right\} \end{aligned}

首先考虑第一项

E[t=1nI{μ^1(t1)+2logf(t)T1(t1)μ1ε}]t=1ns=1nP(μ^1s+2logf(t)sμ1ε)(s=T1(t1))t=1ns=1nexp(s(2logf(t)s+ε)22)t=1n1f(t)s=1nexp(sε22)5ε2\begin{aligned} & \mathbb{E}\left[\sum_{t=1}^n \mathbb{I}\left\{\hat{\mu}_1(t-1)+\sqrt{\frac{2 \log f(t)}{T_1(t-1)}} \leqslant \mu_1-\varepsilon\right\}\right] \\ & \leqslant \sum_{t=1}^n \sum_{s=1}^n \mathbb{P}\left(\hat{\mu}_{1 s}+\sqrt{\frac{2 \log f(t)}{s}} \leqslant \mu_1-\varepsilon\right) \quad\left(s=T_1(t-1)\right) \\ & \leqslant \sum_{t=1}^n \sum_{s=1}^n \exp \left(-\frac{s\left(\sqrt{\frac{2 \log f(t)}{s}}+\varepsilon\right)^2}{2}\right) \\ & \leqslant \sum_{t=1}^n \frac{1}{f(t)} \sum_{s=1}^n \exp \left(-\frac{s \varepsilon^2}{2}\right) \\ & \leqslant \frac{5}{\varepsilon^2} \end{aligned}

倒数第三步利用ETC算法引理三,倒数第二步可以拆出一个 1/f(t)1/f(t)exp(sε22)\exp \left(-\frac{s \varepsilon^{2}}{2}\right)

对于最后一步

t=1n1f(t)s=1nexp(sε22)t=1n1f(t)2ε2=2ε2t=1n1tlog2t+12ε2(1+t=1n1tlog2t)2ε2(1+1+2n1tlog2tdt)5ε2\begin{aligned} & \sum_{t=1}^n \frac{1}{f(t)} \sum_{s=1}^n \exp \left(-\frac{s \varepsilon^2}{2}\right) \\ & \leqslant \sum_{t=1}^n \frac{1}{f(t)} \frac{2}{\varepsilon^2}=\frac{2}{\varepsilon^2} \sum_{t=1}^n \frac{1}{t \log ^2 t+1} \\ & \leqslant \frac{2}{\varepsilon^2}\left(1+\sum_{t=1}^n \frac{1}{t \log ^2 t}\right) \\ & \leqslant \frac{2}{\varepsilon^2}\left(1+1+\int_2^n \frac{1}{t \log ^2 t} d t\right) \\ & \leqslant \frac{5}{\varepsilon^2} \end{aligned}

再考虑第二项

E[t=1nI{μ^i(t1)+2logf(t)Ti(t1)μ1ε and At=i}]E[t=1nI{μ^i(t1)+2logf(n)Ti(t1)μ1ε and At=i}]E[s=1nI{μ^is+2logf(n)sμ1ε}]=E[s=1nI{μ^isμi+2logf(n)sΔiε}]1+2(Δiε)2(logf(n)+πlogf(n)+1)\begin{aligned} & \mathbb{E}\left[\sum_{t=1}^n \mathbb{I}\left\{\hat{\mu}_i(t-1)+\sqrt{\frac{2 \log f(t)}{T_i(t-1)}} \geqslant \mu_1-\varepsilon \text { and } A_t=i\right\}\right] \\ & \leqslant \mathbb{E}\left[\sum_{t=1}^n \mathbb{I}\left\{\hat{\mu}_i(t-1)+\sqrt{\frac{2 \log f(n)}{T_i(t-1)}} \geqslant \mu_1-\varepsilon \text { and } A_t=i\right\}\right] \\ & \leqslant \mathbb{E}\left[\sum_{s=1}^n \mathbb{I}\left\{\hat{\mu}_{i s}+\sqrt{\frac{2 \log f(n)}{s}} \geqslant \mu_1-\varepsilon\right\}\right] \\ & =\mathbb{E}\left[\sum_{s=1}^n \mathbb{I}\left\{\hat{\mu}_{i s}-\mu_i+\sqrt{\frac{2 \log f(n)}{s}} \geqslant \Delta_i-\varepsilon\right\}\right] \\ & \leqslant 1+\frac{2}{\left(\Delta_i-\varepsilon\right)^2}(\log f(n)+\sqrt{\pi \log f(n)}+1) \end{aligned}

将两项结合在一起便可以得出结论

如果对Regret上界取极限可得

lim supnRnlog(n)i:Δi>02Δi\limsup _{n \rightarrow \infty} \frac{R_{n}}{\log (n)} \leqslant \sum_{i: \Delta_{i}>0} \frac{2}{\Delta_{i}}

运行结果

与ETC和UCB算法的对比