Yudai Yaguchi Games

Tools Projects Blog

Loading…

🎰 K-armed Bandit

Each arm hides a payout probability. How do you balance explore vs exploit? The classic problem behind Thompson Sampling, UCB1 and ε-greedy.

Arms KPull limit T

(no pulls yet)

Keys: 1–K to pull an arm / R for a new game

Progress

pulls left

50 / 50

total reward

0

optimal (theory)

0.0

cumulative regret

0.00

K-armed Bandit: Each arm has a hidden win probability p_i; every turn you choose which to pull. Explore = gather info (find out which is best); exploit = pull the current best-looking one. Balancing the two is the whole game. The Simulation tab compares ε-greedy / UCB1 / Thompson Sampling.