Components

Component	Architectural
Input Processing	Standard Embedding Layer
Low-Level Module	Encoder-Only Transformer Block
High-Level Module	Encoder-Only Transformer Block
Output Processing	Standard Linear Layer + Softmax

Responsibility

forward residual ? full self-attention Adaptive Computation Time Mechanism → control over H-Module and L-Module ‘recurrent’ = cycle in graph

How many segment to run module for ?
act as a RL agent ???????
- Q-Head
  1. Q-Halt: expected reward if we stop thinking now and give the current answer
  2. Q-Continue: expected reward if we spend more computation and run for another segment
- Decision Rule: Q-Halt > Q-Continue