ðŠī Berwin Gan
Search
Search
Dark mode
Light mode
Explorer
Notes ðïļ
Algorithm ð§Đ
Geometric Folding Algorithms: Linkages, Origami, Polhedra
Machine Learning ðĪ
GPU Mode
FlexOlmo: Open Language Models for Flexible Data Use
Luminal - Search-Based Deep Learning Compilers
Modern Position Encodings in Transformers (RoPE/Yarn and PaTH)
Research
Holding
REOrdering Patches Improves Vision Models
Mechanistic Interpretability
How Attention Sinks Keep Language Models Stable
Optimizer
Muon - An optimizer for hidden layers in neural networks
Distillation Robustifies Unlearning
H-Net
Hierarchical Reasoning Model
Language Models in Plato's Cave
Learning Compositional Models of the World
STP: Self-Play LLM Theorem Provers with Iterative Conjecturing and Proving
Aggregate Voting Rank ðģïļ
Covering Discs and Orthants ð
Lambda Calculus ð§Ū
Home
âŊ
tags
âŊ
Tag: transformer
Tag: transformer
3 items with this tag.
Aug 26, 2025
FlexOlmo: Open Language Models for Flexible Data Use
MoE
transformer
ScaleML
Aug 08, 2025
How Attention Sinks Keep Language Models Stable
streaming
transformer
Jun 15, 2025
REOrdering Patches Improves Vision Models
vision
transformer