Reinforcement Learning Graphical Representations

This repository contains a full set of 230 visualizations representing foundational concepts, algorithms, and advanced topics in Reinforcement Learning.

Category Component Illustration Details Context
MDP & Environment Agent-Environment Interaction Loop Illustration Core cycle: observation of state → selection of action → environment transition → receipt of reward + next state All RL algorithms
MDP & Environment Markov Decision Process (MDP) Tuple Illustration (S, A, P, R, γ) with transition dynamics and reward function s,a) and R(s,a,s′))
MDP & Environment State Transition Graph Illustration Full probabilistic transitions between discrete states Gridworld, Taxi, Cliff Walking
MDP & Environment Trajectory / Episode Sequence Illustration Sequence of (s₀, a₀, r₁, s₁, …, s_T) Monte Carlo, episodic tasks
MDP & Environment Continuous State/Action Space Visualization Illustration High-dimensional spaces (e.g., robot joints, pixel inputs) Continuous-control tasks (MuJoCo, PyBullet)
MDP & Environment Reward Function / Landscape Illustration Scalar reward as function of state/action All algorithms; especially reward shaping
MDP & Environment Discount Factor (γ) Effect Illustration How future rewards are weighted All discounted MDPs
Value & Policy State-Value Function V(s) Illustration Expected return from state s under policy π Value-based methods
Value & Policy Action-Value Function Q(s,a) Illustration Expected return from state-action pair Q-learning family
Value & Policy *Policy π(s) or π(a* Illustration s) Arrow overlays on grid (optimal policy), probability bar charts, or softmax heatmaps
Value & Policy Advantage Function A(s,a) Illustration Q(s,a) – V(s) A2C, PPO, SAC, TD3
Value & Policy Optimal Value Function V / Q** Illustration Solution to Bellman optimality Value iteration, Q-learning
Dynamic Programming Policy Evaluation Backup Illustration Iterative update of V using Bellman expectation Policy iteration
Dynamic Programming Policy Improvement Illustration Greedy policy update over Q Policy iteration
Dynamic Programming Value Iteration Backup Illustration Update using Bellman optimality Value iteration
Dynamic Programming Policy Iteration Full Cycle Illustration Evaluation → Improvement loop Classic DP methods
Monte Carlo Monte Carlo Backup Illustration Update using full episode return G_t First-visit / every-visit MC
Monte Carlo Monte Carlo Tree (MCTS) Illustration Search tree with selection, expansion, simulation, backprop AlphaGo, AlphaZero
Monte Carlo Importance Sampling Ratio Illustration Off-policy correction ρ = π(a\ s)
Temporal Difference TD(0) Backup Illustration Bootstrapped update using R + γV(s′) TD learning
Temporal Difference Bootstrapping (general) Illustration Using estimated future value instead of full return All TD methods
Temporal Difference n-step TD Backup Illustration Multi-step return G_t^{(n)} n-step TD, TD(λ)
Temporal Difference TD(λ) & Eligibility Traces Illustration Decaying trace z_t for credit assignment TD(λ), SARSA(λ), Q(λ)
Temporal Difference SARSA Update Illustration On-policy TD control SARSA
Temporal Difference Q-Learning Update Illustration Off-policy TD control Q-learning, Deep Q-Network
Temporal Difference Expected SARSA Illustration Expectation over next action under policy Expected SARSA
Temporal Difference Double Q-Learning / Double DQN Illustration Two separate Q estimators to reduce overestimation Double DQN, TD3
Temporal Difference Dueling DQN Architecture Illustration Separate streams for state value V(s) and advantage A(s,a) Dueling DQN
Temporal Difference Prioritized Experience Replay Illustration Importance sampling of transitions by TD error Prioritized DQN, Rainbow
Temporal Difference Rainbow DQN Components Illustration All extensions combined (Double, Dueling, PER, etc.) Rainbow DQN
Function Approximation Linear Function Approximation Illustration Feature vector φ(s) → wᵀφ(s) Tabular → linear FA
Function Approximation Neural Network Layers (MLP, CNN, RNN, Transformer) Illustration Full deep network for value/policy DQN, A3C, PPO, Decision Transformer
Function Approximation Computation Graph / Backpropagation Flow Illustration Gradient flow through network All deep RL
Function Approximation Target Network Illustration Frozen copy of Q-network for stability DQN, DDQN, SAC, TD3
Policy Gradients Policy Gradient Theorem Illustration ∇_θ J(θ) = E[∇_θ log π(a\ Flow diagram from reward → log-prob → gradient
Policy Gradients REINFORCE Update Illustration Monte-Carlo policy gradient REINFORCE
Policy Gradients Baseline / Advantage Subtraction Illustration Subtract b(s) to reduce variance All modern PG
Policy Gradients Trust Region (TRPO) Illustration KL-divergence constraint on policy update TRPO
Policy Gradients Proximal Policy Optimization (PPO) Illustration Clipped surrogate objective PPO, PPO-Clip
Actor-Critic Actor-Critic Architecture Illustration Separate or shared actor (policy) + critic (value) networks A2C, A3C, SAC, TD3
Actor-Critic Advantage Actor-Critic (A2C/A3C) Illustration Synchronous/asynchronous multi-worker A2C/A3C
Actor-Critic Soft Actor-Critic (SAC) Illustration Entropy-regularized policy + twin critics SAC
Actor-Critic Twin Delayed DDPG (TD3) Illustration Twin critics + delayed policy + target smoothing TD3
Exploration ε-Greedy Strategy Illustration Probability ε of random action DQN family
Exploration Softmax / Boltzmann Exploration Illustration Temperature τ in softmax Softmax policies
Exploration Upper Confidence Bound (UCB) Illustration Optimism in face of uncertainty UCB1, bandits
Exploration Intrinsic Motivation / Curiosity Illustration Prediction error as intrinsic reward ICM, RND, Curiosity-driven RL
Exploration Entropy Regularization Illustration Bonus term αH(π) SAC, maximum-entropy RL
Hierarchical RL Options Framework Illustration High-level policy over options (temporally extended actions) Option-Critic
Hierarchical RL Feudal Networks / Hierarchical Actor-Critic Illustration Manager-worker hierarchy Feudal RL
Hierarchical RL Skill Discovery Illustration Unsupervised emergence of reusable skills DIAYN, VALOR
Model-Based RL Learned Dynamics Model Illustration ˆP(s′\ Separate model network diagram (often RNN or transformer)
Model-Based RL Model-Based Planning Illustration Rollouts inside learned model MuZero, DreamerV3
Model-Based RL Imagination-Augmented Agents (I2A) Illustration Imagination module + policy I2A
Offline RL Offline Dataset Illustration Fixed batch of trajectories BC, CQL, IQL
Offline RL Conservative Q-Learning (CQL) Illustration Penalty on out-of-distribution actions CQL
Multi-Agent RL Multi-Agent Interaction Graph Illustration Agents communicating or competing MARL, MADDPG
Multi-Agent RL Centralized Training Decentralized Execution (CTDE) Illustration Shared critic during training QMIX, VDN, MADDPG
Multi-Agent RL Cooperative / Competitive Payoff Matrix Illustration Joint reward for multiple agents Prisoner's Dilemma, multi-agent gridworlds
Inverse RL / IRL Reward Inference Illustration Infer reward from expert demonstrations IRL, GAIL
Inverse RL / IRL Generative Adversarial Imitation Learning (GAIL) Illustration Discriminator vs. policy generator GAIL, AIRL
Meta-RL Meta-RL Architecture Illustration Outer loop (meta-policy) + inner loop (task adaptation) MAML for RL, RL²
Meta-RL Task Distribution Visualization Illustration Multiple MDPs sampled from meta-distribution Meta-RL benchmarks
Advanced / Misc Experience Replay Buffer Illustration Stored (s,a,r,s′,done) tuples DQN and all off-policy deep RL
Advanced / Misc State Visitation / Occupancy Measure Illustration Frequency of visiting each state All algorithms (analysis)
Advanced / Misc Learning Curve Illustration Average episodic return vs. episodes / steps Standard performance reporting
Advanced / Misc Regret / Cumulative Regret Illustration Sub-optimality accumulated Bandits and online RL
Advanced / Misc Attention Mechanisms (Transformers in RL) Illustration Attention weights Decision Transformer, Trajectory Transformer
Advanced / Misc Diffusion Policy Illustration Denoising diffusion process for action generation Diffusion-RL policies
Advanced / Misc Graph Neural Networks for RL Illustration Node/edge message passing Graph RL, relational RL
Advanced / Misc World Model / Latent Space Illustration Encoder-decoder dynamics in latent space Dreamer, PlaNet
Advanced / Misc Convergence Analysis Plots Illustration Error / value change over iterations DP, TD, value iteration
Advanced / Misc RL Algorithm Taxonomy Illustration Comprehensive classification of algorithms All RL
Advanced / Misc Probabilistic Graphical Model (RL as Inference) Illustration Formalizing RL as probabilistic inference Control as Inference, MaxEnt RL
Value & Policy Distributional RL (C51 / Categorical) Illustration Representing return as a probability distribution C51, QR-DQN, IQN
Exploration Hindsight Experience Replay (HER) Illustration Learning from failures by relabeling goals Sparse reward robotics, HER
Model-Based RL Dyna-Q Architecture Illustration Integration of real experience and model-based planning Dyna-Q, Dyna-2
Function Approximation Noisy Networks (Parameter Noise) Illustration Stochastic weights for exploration Noisy DQN, Rainbow
Exploration Intrinsic Curiosity Module (ICM) Illustration Reward based on prediction error Curiosity-driven exploration, ICM
Temporal Difference V-trace (IMPALA) Illustration Asynchronous off-policy importance sampling IMPALA, V-trace
Multi-Agent RL QMIX Mixing Network Illustration Monotonic value function factorization QMIX, VDN
Advanced / Misc Saliency Maps / Attention on State Illustration Visualizing what the agent "sees" or prioritizes Interpretability, Atari RL
Exploration Action Selection Noise (OU vs Gaussian) Illustration Temporal correlation in exploration noise DDPG, TD3
Advanced / Misc t-SNE / UMAP State Embeddings Illustration Dimension reduction of high-dim neural states Interpretability, SRL
Advanced / Misc Loss Landscape Visualization Illustration Optimization surface geometry Training stability analysis
Advanced / Misc Success Rate vs Steps Illustration Percentage of successful episodes Goal-conditioned RL, Robotics
Advanced / Misc Hyperparameter Sensitivity Heatmap Illustration Performance across parameter grids Hyperparameter tuning
Dynamics Action Persistence (Frame Skipping) Illustration Temporal abstraction by repeating actions Atari RL, Robotics
Model-Based RL MuZero Dynamics Search Tree Illustration Planning with learned transition and value functions MuZero, Gumbel MuZero
Deep RL Policy Distillation Illustration Compressing knowledge from teacher to student Kickstarting, multitask learning
Transformers Decision Transformer Token Sequence Illustration Sequential modeling of RL as a translation task Decision Transformer, TT
Advanced / Misc Performance Profiles (rliable) Illustration Robust aggregate performance metrics Reliable RL evaluation
Safety RL Safety Shielding / Barrier Functions Illustration Hard constraints on the action space Constrained MDPs, Safe RL
Training Automated Curriculum Learning Illustration Progressively increasing task difficulty Curriculum RL, ALP-GMM
Sim-to-Real Domain Randomization Illustration Generalizing across environment variations Robotics, Sim-to-Real
Alignment RL with Human Feedback (RLHF) Illustration Aligning agents with human preferences ChatGPT, InstructGPT
Neuro-inspired RL Successor Representation (SR) Illustration Predictive state representations SR-Dyna, Neuro-RL
Inverse RL / IRL Maximum Entropy IRL Illustration Probability distribution over trajectories MaxEnt IRL, Ziebart
Theory Information Bottleneck Illustration Mutual information $I(S;Z)$ and $I(Z;A)$ balance VIB-RL, Information Theory
Evolutionary RL Evolutionary Strategies Population Illustration Population-based parameter search OpenAI-ES, Salimans
Safety RL Control Barrier Functions (CBF) Illustration Set-theoretic safety guarantees CBF-RL, Control Theory
Exploration Count-based Exploration Heatmap Illustration Visitation frequency and intrinsic bonus MBIE-EB, RND
Exploration Thompson Sampling Posteriors Illustration Direct uncertainty-based action selection Bandits, Bayesian RL
Multi-Agent RL Adversarial RL Interaction Illustration Competition between protaganist and antagonist Robust RL, RARL
Hierarchical RL Hierarchical Subgoal Trajectory Illustration Decomposing long-horizon tasks Subgoal RL, HIRO
Offline RL Offline Action Distribution Shift Illustration Mismatch between dataset and current policy CQL, IQL, D4RL
Exploration Random Network Distillation (RND) Illustration Prediction error as intrinsic reward RND, OpenAI
Offline RL Batch-Constrained Q-learning (BCQ) Illustration Constraining actions to behavior dataset BCQ, Fujimoto
Training Population-Based Training (PBT) Illustration Evolutionary hyperparameter optimization PBT, DeepMind
Deep RL Recurrent State Flow (DRQN/R2D2) Illustration Temporal dependency in state-action value DRQN, R2D2
Theory Belief State in POMDPs Illustration Probability distribution over hidden states POMDPs, Belief Space
Multi-Objective RL Multi-Objective Pareto Front Illustration Balancing conflicting reward signals MORL, Pareto Optimal
Theory Differential Value (Average Reward RL) Illustration Values relative to average gain Average Reward RL, Mahadevan
Infrastructure Distributed RL Cluster (Ray/RLLib) Illustration Parallelizing experience collection Ray, RLLib, Ape-X
Evolutionary RL Neuroevolution Topology Evolution Illustration Evolving neural network architectures NEAT, HyperNEAT
Continual RL Elastic Weight Consolidation (EWC) Illustration Preventing catastrophic forgetting EWC, Kirkpatric
Theory Successor Features (SF) Illustration Generalizing predictive representations SF-Dyna, Barreto
Safety Adversarial State Noise (Perception) Illustration Attacks on agent observation space Adversarial RL, Huang
Imitation Learning Behavioral Cloning (Imitation) Illustration Direct supervised learning from experts BC, DAGGER
Relational RL Relational Graph State Representation Illustration Modeling objects and their relations Relational MDPs, BoxWorld
Quantum RL Quantum RL Circuit (PQC) Illustration Gate-based quantum policy networks Quantum RL, PQC
Symbolic RL Symbolic Policy Tree Illustration Policies as mathematical expressions Symbolic RL, GP
Control Differentiable Physics Gradient Flow Illustration Gradient-based planning through simulators Brax, Isaac Gym
Multi-Agent RL MARL Communication Channel Illustration Information exchange between agents CommNet, DIAL
Safety Lagrangian Constraint Landscape Illustration Constrained optimization boundaries Constrained RL, CPO
Hierarchical RL MAXQ Task Hierarchy Illustration Recursive task decomposition MAXQ, Dietterich
Agentic AI ReAct Agentic Cycle Illustration Reasoning-Action loops for LLMs ReAct, Agentic LLM
Bio-inspired RL Synaptic Plasticity RL Illustration Hebbian-style synaptic weight updates Hebbian RL, STDP
Control Guided Policy Search (GPS) Illustration Distilling trajectories into a policy GPS, Levine
Robotics Sim-to-Real Jitter & Latency Illustration Temporal robustness in transfer Sim-to-Real, Robustness
Policy Gradients Deterministic Policy Gradient (DDPG) Flow Illustration Gradient flow for deterministic policies DDPG
Model-Based RL Dreamer Latent Imagination Illustration Learning and planning in latent space Dreamer (V1-V3)
Deep RL UNREAL Auxiliary Tasks Illustration Learning from non-reward signals UNREAL, A3C extension
Offline RL Implicit Q-Learning (IQL) Expectile Illustration In-sample learning via expectile regression IQL
Model-Based RL Prioritized Sweeping Illustration Planning prioritized by TD error Sutton & Barto classic MBRL
Imitation Learning DAgger Expert Loop Illustration Training on expert labels in agent-visited states DAgger
Representation Self-Predictive Representations (SPR) Illustration Consistency between predicted and target latents SPR, sample-efficient RL
Multi-Agent RL Joint Action Space Illustration Cartesian product of individual actions MARL theory, Game Theory
Multi-Agent RL Dec-POMDP Formal Model Illustration Decentralized partially observable MDP Multi-agent coordination
Theory Bisimulation Metric Illustration State equivalence based on transitions/rewards State abstraction, bisimulation theory
Theory Potential-Based Reward Shaping Illustration Reward transformation preserving optimal policy Sutton & Barto, Ng et al.
Training Transfer RL: Source to Target Illustration Reusing knowledge across different MDPs Transfer Learning, Distillation
Deep RL Multi-Task Backbone Arch Illustration Single agent learning multiple tasks Multi-task RL, IMPALA
Bandits Contextual Bandit Pipeline Illustration Decision making given context but no transitions Personalization, Ad-tech
Theory Theoretical Regret Bounds Illustration Analytical performance guarantees Online Learning, Bandits
Value-based Soft Q Boltzmann Probabilities Illustration Probabilistic action selection from Q-values s) \propto \exp(Q/\tau)$
Robotics Autonomous Driving RL Pipeline Illustration End-to-end or modular driving stack Wayve, Tesla, Comma.ai
Policy Policy action gradient comparison Illustration Comparison of gradient derivation types PG Theorem vs DPG Theorem
Inverse RL / IRL IRL: Feature Expectation Matching Illustration Comparing expert vs learner feature visitor frequency \mu(\pi^*) - \mu(\pi)
Imitation Learning Apprenticeship Learning Loop Illustration Training to match expert performance via reward inference Apprenticeship Learning
Theory Active Inference Loop Illustration Agents minimizing surprise (free energy) Free Energy Principle, Friston
Theory Bellman Residual Landscape Illustration Training surface of the Bellman error TD learning, fitted Q-iteration
Model-Based RL Plan-to-Explore Uncertainty Map Illustration Systematic exploration in learned world models Plan-to-Explore, Sekar et al.
Safety RL Robust RL Uncertainty Set Illustration Optimizing for the worst-case environment transition Robust MDPs, minimax RL
Training HPO Bayesian Opt Cycle Illustration Automating hyperparameter selection with GP Hyperparameter Optimization
Applied RL Slate RL Recommendation Illustration Optimizing list/slate of items for users Recommender Systems, Ie et al.
Multi-Agent RL Fictitious Play Interaction Illustration Belief-based learning in games Game Theory, Brown (1951)
Conceptual Universal RL Framework Diagram Illustration High-level summary of RL components All RL
Offline RL Offline Density Ratio Estimator Illustration Estimating $w(s,a)$ for off-policy data Importance Sampling, Offline RL
Continual RL Continual Task Interference Heatmap Illustration Measuring negative transfer between tasks Lifelong Learning, EWC
Safety RL Lyapunov Stability Safe Set Illustration Invariant sets for safe control Lyapunov RL, Chow et al.
Applied RL Molecular RL (Atom Coordinates) Illustration RL for molecular design/protein folding Chemistry RL, AlphaFold-style
Architecture MoE Multi-task Architecture Illustration Scaling models with mixture of experts MoE-RL, Sparsity
Direct Policy Search CMA-ES Policy Search Illustration Evolutionary strategy for policy weights ES for RL, Salimans
Alignment Elo Rating Preference Plot Illustration Measuring agent strength over time AlphaZero, League training
Explainable RL Explainable RL (SHAP Attribution) Illustration Local attribution of features to agent actions Interpretability, SHAP/LIME
Meta-RL PEARL Context Encoder Illustration Learning latent task representations PEARL, Rakelly et al.
Applied RL Medical RL Therapy Pipeline Illustration Personalized medicine and dosing Healthcare RL, ICU Sepsis
Applied RL Supply Chain RL Pipeline Illustration Optimizing stock levels and orders Logistics, Inventory Management
Robotics Sim-to-Real SysID Loop Illustration Closing the reality gap via parameter estimation System Identification, Robotics
Architecture Transformer World Model Illustration Sequence-to-sequence dynamics modeling DreamerV3, Transframer
Applied RL Network Traffic RL Illustration Optimizing data packet routing in graphs Networking, Traffic Engineering
Training RLHF: PPO with Reference Policy Illustration Ensuring RL fine-tuning doesn't drift too far InstructGPT, Llama 2/3
Multi-Agent RL PSRO Meta-Game Update Illustration Reaching Nash equilibrium in large games PSRO, Lanctot et al.
Multi-Agent RL DIAL: Differentiable Comm Illustration End-to-end learning of communication protocols DIAL, Foerster et al.
Batch RL Fitted Q-Iteration Loop Illustration Data-driven iteration with a supervised regressor Ernst et al. (2005)
Safety RL CMDP Feasible Region Illustration Constrained optimization within a safety budget Constrained MDPs, Altman
Control MPC vs RL Planning Illustration Comparison of control paradigms Control Theory vs RL
AutoML Learning to Optimize (L2O) Illustration Using RL to learn an optimization update rule L2O, Li & Malik
Applied RL Smart Grid RL Management Illustration Optimizing energy supply and demand Energy RL, Smart Grids
Applied RL Quantum State Tomography RL Illustration RL for quantum state estimation Quantum RL, Neural Tomography
Applied RL RL for Chip Placement Illustration Placing components on silicon grids Google Chip Placement
Applied RL RL Compiler Optimization (MLGO) Illustration Inlining and sizing in compilers MLGO, LLVM
Applied RL RL for Theorem Proving Illustration Automated reasoning and proof search LeanRL, AlphaProof
Modern RL Diffusion-QL Offline RL Illustration Policy as reverse diffusion process s,k)$ with noise injection
Principles Fairness-reward Pareto Frontier Illustration Balancing equity and returns Fair RL, Jabbari et al.
Principles Differentially Private RL Illustration Privacy-preserving training DP-RL, Agarwal et al.
Applied RL Smart Agriculture RL Illustration Optimizing crop yield and resources Precision Agriculture
Applied RL Climate Mitigation RL (Grid) Illustration Environmental control policies ClimateRL, Carbon Control
Applied RL AI Education (Knowledge Tracing) Illustration Personalized learning paths ITS, Bayesian Knowledge Tracing
Modern RL Decision SDE Flow Illustration RL in continuous stochastic systems Neural SDEs, Control
Control Differentiable physics (Brax) Illustration Gradients through simulators Brax, PhysX, MuJoCo
Applied RL Wireless Beamforming RL Illustration Optimizing antenna signal directions 5G/6G Networking
Applied RL Quantum Error Correction RL Illustration Correcting noise in quantum circuits Quantum Computing RL
Multi-Agent RL Mean Field RL Interaction Illustration Large population agent dynamics MF-RL, Yang et al.
HRL Goal-GAN Curriculum Illustration Automatic goal generation Goal-GAN, Florensa et al.
Modern RL JEPA: Predictive Architecture Illustration LeCun's world model framework JEPA, I-JEPA
Offline RL CQL Value Penalty Landscape Illustration Conservatism in value functions CQL, Kumar et al.
Applied RL Causal RL Illustration Causal Inverse RL Graph DAG with $S, A, R$ and latent $U$
Quantum RL VQE-RL Optimization Illustration Quantum circuit param tuning VQE, Quantum RL
Applied RL De-novo Drug Discovery RL Illustration Generating optimized lead molecules Drug Discovery, Molecule RL
Applied RL Traffic Signal Coordination RL Illustration Multi-intersection coordination IntelliLight, PressLight
Applied RL Mars Rover Pathfinding RL Illustration Navigation on rough terrain Space RL, Mars Rover
Applied RL Sports Player Movement RL Illustration Predicting/Optimizing player actions Sports Analytics, Ghosting
Applied RL Cryptography Attack RL Illustration Searching for keys/vulnerabilities Crypto-RL, Learning to Attack
Applied RL Humanitarian Resource RL Illustration Disaster response allocation AI for Good, Resource RL
Applied RL Video Compression RL (RD) Illustration Optimizing bit-rate vs distortion Learned Video Compression
Applied RL Kubernetes Auto-scaling RL Illustration Cloud resource management Cloud RL, K8s Scaling
Applied RL Fluid Dynamics Flow Control RL Illustration Airfoil/Turbulence control Aero-RL, Flow Control
Applied RL Structural Optimization RL Illustration Topology/Material design Structural RL, Topology Opt
Applied RL Human Decision Modeling Illustration Prospect Theory in RL Behavioral RL, Prospect Theory
Applied RL Semantic Parsing RL Illustration Language to Logic transformation Semantic Parsing, Seq2Seq-RL
Applied RL Music Melody RL Illustration Reward-based melody generation Music-RL, Magenta
Applied RL Plasma Fusion Control RL Illustration Magnetic control of Tokamaks DeepMind Fusion, Tokamak RL
Applied RL Carbon Capture RL cycle Illustration Adsorption/Desorption optimization Carbon Capture, Green RL
Applied RL Swarm Robotics RL Illustration Decentralized swarm coordination Swarm-RL, Multi-Robot
Applied RL Legal Compliance RL Game Illustration Regulatory games Legal-RL, RegTech
Physics RL Physics-Informed RL (PINN) Illustration Constraint-based RL loss PINN-RL, SciML
Modern RL Neuro-Symbolic RL Illustration Combining logic and neural nets Neuro-Symbolic, Logic RL
Applied RL DeFi Liquidity Pool RL Illustration Yield farming/Liquidity balancing DeFi-RL, AMM Optimization
Neuro RL Dopamine Reward Prediction Error Illustration Biological RL signal curves Neuroscience-RL, Wolfram
Robotics Proprioceptive Sensory-Motor RL Illustration Low-level joint control Proprioceptive RL, Unitree
Applied RL AR Object Placement RL Illustration AR visual overlay optimization AR-RL, Visual Overlay
Reco RL Sequential Bundle RL Illustration Recommendation item grouping Bundle-RL, E-commerce
Theoretical Online Gradient Descent vs RL Illustration Gradient-based learning comparison Online Learning, Regret
Modern RL Active Learning: Query RL Illustration Query-based sample selection Active-RL, Query Opt
Modern RL Federated RL global Aggregator Illustration Privacy-preserving distributed RL Federated-RL, FedAvg-RL
Conceptual Ultimate Universal RL Mastery Diagram Illustration Final summary of 230 items Absolute Mastery Milestone
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support