LowHigh
0
Steps
1.0
Uncertainty
0
Reward
No
Goal
🧠 Belief State (Goal Location)
📜 Reasoning Trace
⚡ Policy Decomposition
π(z, a | history) = πreason(z | history) × πact(a | history, z)
π_reason (Thinking)
Analyzing observations...
π_act (Action)
Waiting...