PPO Agent playing LunarLander-v3
This is a PPO agent trained on the LunarLander-v3 environment.
Usage
import torch
import gymnasium as gym
from pathlib import Path
# Load the model
checkpoint = torch.load("model.pth")
network = Network(config) # You need to define the Network class
network.load_state_dict(checkpoint['model_state_dict'])
# Test the agent
env = gym.make("LunarLander-v3")
state, _ = env.reset()
done = False
total_reward = 0
while not done:
action, _, _, _ = network.get_action_and_value(state)
state, reward, terminated, truncated, _ = env.step(action)
total_reward += reward
done = terminated or truncated
print(f"Total reward: {total_reward}")
Training Results
- Environment: LunarLander-v3
- Training Episodes: 3000
- Final Performance: 212.4 ± 113.1
- Best Episode: 332.4307750590245
Algorithm Details
- Algorithm: Proximal Policy Optimization (PPO)
- Network Architecture: Actor-Critic with shared features
- Learning Rate: 0.0003
- Clip Epsilon: 0.2
- Training Episodes: 3000
- Downloads last month
- -