PPO Agent playing LunarLander-v3

This is a PPO agent trained on the LunarLander-v3 environment.

Usage

import torch
import gymnasium as gym
from pathlib import Path

# Load the model
checkpoint = torch.load("model.pth")
network = Network(config)  # You need to define the Network class
network.load_state_dict(checkpoint['model_state_dict'])

# Test the agent
env = gym.make("LunarLander-v3")
state, _ = env.reset()
done = False
total_reward = 0

while not done:
    action, _, _, _ = network.get_action_and_value(state)
    state, reward, terminated, truncated, _ = env.step(action)
    total_reward += reward
    done = terminated or truncated

print(f"Total reward: {total_reward}")

Training Results

Environment: LunarLander-v3
Training Episodes: 3000
Final Performance: 212.4 ± 113.1
Best Episode: 332.4307750590245

Algorithm Details

Algorithm: Proximal Policy Optimization (PPO)
Network Architecture: Actor-Critic with shared features
Learning Rate: 0.0003
Clip Epsilon: 0.2
Training Episodes: 3000

Downloads last month: -

Video Preview

Reinforcement Learning