PPO Agent playing LunarLander-v3

This is a PPO agent trained on the LunarLander-v3 environment.

Usage

import torch
import gymnasium as gym
from pathlib import Path

# Load the model
checkpoint = torch.load("model.pth")
network = Network(config)  # You need to define the Network class
network.load_state_dict(checkpoint['model_state_dict'])

# Test the agent
env = gym.make("LunarLander-v3")
state, _ = env.reset()
done = False
total_reward = 0

while not done:
    action, _, _, _ = network.get_action_and_value(state)
    state, reward, terminated, truncated, _ = env.step(action)
    total_reward += reward
    done = terminated or truncated

print(f"Total reward: {total_reward}")

Training Results

  • Environment: LunarLander-v3
  • Training Episodes: 3000
  • Final Performance: 212.4 ± 113.1
  • Best Episode: 332.4307750590245

Algorithm Details

  • Algorithm: Proximal Policy Optimization (PPO)
  • Network Architecture: Actor-Critic with shared features
  • Learning Rate: 0.0003
  • Clip Epsilon: 0.2
  • Training Episodes: 3000
Downloads last month
-
Video Preview
loading