🪴 Berwin Gan

          • Hyperloglog
          • SQL Lite Clone
              • REOrdering Patches Improves Vision Models
              • Muon - An optimizer for hidden layers in neural networks
              • Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
              • PPO vs EPO
            • Distillation Robustifies Unlearning
            • Language Models in Plato's Cave
            • Learning Compositional Models of the World
            • Location and Editing Factual Associations in GPT
            • STP: Self-Play LLM Theorem Provers with Iterative Conjecturing and Proving
            • Strategic Classification
          • Large Language Model Agents 🧠 (CS 294/197-196)
      • Aggregate Voting Rank 🗳️
      • Covering Discs and Orthants 📐
      • GPU Mode
      • Lambda Calculus 🧮
    Home

    ❯

    Study Room

    ❯

    Machine Learning 🤖

    ❯

    Research

    ❯

    Reinforcement Learning

    Folder: Study-Room/Machine-Learning-🤖/Research/Reinforcement-Learning

    2 items under this folder.

    • Jul 01, 2025

      Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards

      • Jul 01, 2025

        PPO vs EPO


        Created with Quartz v4.4.0 © 2025

        • GitHub
        • Discord Community