Components
Component | Architectural |
---|---|
Input Processing | Standard Embedding Layer |
Low-Level Module | Encoder-Only Transformer Block |
High-Level Module | Encoder-Only Transformer Block |
Output Processing | Standard Linear Layer + Softmax |
Responsibility
- H-Module โ directs overall problem-solving strategy
- L-module โ executes the intensive search or refinement
forward residual ? full self-attention Adaptive Computation Time Mechanism โ control over H-Module and L-Module โrecurrentโ = cycle in graph
Deep Supervision
- break forward passes into smaller passes โ how ?
- takes per segment step ?
Adaptive Computation Time (ACT)
- How many segment to run module for ?
- act as a RL agent ???????
- Q-Head
- Q-Halt: expected reward if we stop thinking now and give the current answer
- Q-Continue: expected reward if we spend more computation and run for another segment
- Decision Rule: Q-Halt > Q-Continue
- Q-Head
The Recurrent Operations
- Low-Level Module takes T steps (T is definitely learnable)
- High-Level Module takes 1 steps
- Repeat
To Read
- Universal Transformer