PPO Agent for Boeing 747 Pitch Angle Control

Proximal Policy Optimization (PPO) for Longitudinal Aircraft Control

Model Description
This model is a Proximal Policy Optimization (PPO) agent trained to control the pitch angle (θ) of a Boeing 747 aircraft in a longitudinal flight dynamics simulation. The agent receives normalized state observations and outputs continuous elevator deflection commands to track reference pitch angle signals.


Intended Uses
- Primary Use: Automatic pitch angle tracking and stabilization for Boeing 747 aircraft simulation
- Research Applications: Benchmarking RL algorithms for aerospace control systems
- Educational: Learning reinforcement learning concepts in aerospace applications
- Hybrid Control: Can be combined with PID/MPC controllers for robust flight control
Model Architecture
The PPO agent consists of separate Actor and Critic neural networks:
Actor Network (Policy)
| Layer |
Configuration |
| Input |
4 (observation dim) |
| Hidden 1 |
Linear(4, 256) + ReLU |
| Hidden 2 |
Linear(256, 256) + ReLU |
| Output (μ) |
Linear(256, 1) + Tanh |
| Output (log σ) |
Linear(256, 1), clamped to [-5.0, -1.5] |
Critic Network (Value Function)
| Layer |
Configuration |
| Input |
4 (observation dim) |
| Hidden 1 |
Linear(4, 256) + ReLU |
| Hidden 2 |
Linear(256, 256) + ReLU |
| Output |
Linear(256, 1) |
State Space
The observation vector consists of 4 normalized states representing the longitudinal dynamics:
| Index |
State |
Description |
Units |
| 0 |
u |
Forward velocity perturbation |
normalized |
| 1 |
w |
Vertical velocity perturbation |
normalized |
| 2 |
q |
Pitch rate |
normalized |
| 3 |
θ |
Pitch angle (tracking target) |
normalized |
Action Space
| Dimension |
Description |
Range |
| 1 |
Elevator deflection |
[-1.0, 1.0] (normalized) |
The normalized action is scaled to physical elevator deflection in degrees by the environment.
Training Details
Training Configuration
| Hyperparameter |
Value |
| Algorithm |
PPO (Clip) |
| Max Episodes |
90,000 |
| Rollout Length |
256 steps |
| Batch Size |
16,384 |
| Epochs per Update |
2 |
| Clip Parameter (ε) |
0.15 |
| Discount Factor (γ) |
0.995 |
| GAE Lambda (λ) |
0.95 |
| Actor Learning Rate |
1e-4 |
| Critic Learning Rate |
2e-4 |
| Entropy Coefficient |
0.01 |
| Max Gradient Norm |
0.5 |
| Target KL |
0.01 |
| Normalize Observations |
False |
| Normalize Rewards |
True |
Environment Configuration
| Parameter |
Value |
| Environment |
ImprovedB747VecEnvTorch |
| Number of Parallel Envs |
64 |
| Time Step (dt) |
0.1 s |
| Episode Duration |
20 s |
| Initial State |
[0, 0, 0, 0] |
| Reference Signal |
Step function |
| Step Amplitude Range |
1.0° |
| Step Time Range |
5.0 s |
Training Infrastructure
- Hardware: NVIDIA GPU with CUDA support
- Framework: PyTorch 2.0+
- Training Time: ~7,510 episodes to best checkpoint
- Best Episode: 7,510
Evaluation Results
Performance Metrics
| Metric |
Value |
| Best Evaluation Reward |
0.9137 |
| Overshoot |
0.49% |
| Settling Time |
0.60 s |
| Rise Time |
0.30 s |
| Peak Time |
0.80 s |
| Static Error |
-0.0046 |
| Oscillation Count |
1 |
| Performance Index |
3.06 |
Integral Criteria
| Criterion |
Value |
| IAE (Integral Absolute Error) |
4.08 |
| ISE (Integral Squared Error) |
2.64 |
| ITAE (Integral Time-weighted Absolute Error) |
4.77 |
Step Response Characteristics
The agent demonstrates excellent step tracking performance with:
- ✅ Minimal overshoot (<1%)
- ✅ Fast settling time (0.6s)
- ✅ Quick rise time (0.3s)
- ✅ Near-zero static error
- ✅ Minimal oscillations (1 cycle)
Usage
Installation
pip install tensoraerospace
Quick Start
import numpy as np
import torch
from tensoraerospace.agent.ppo.model import PPO
from tensoraerospace.envs.b747 import ImprovedB747Env
from tensoraerospace.signals.standart import unit_step
from tensoraerospace.utils import generate_time_period, convert_tp_to_sec_tp
agent = PPO.from_pretrained("TensorAeroSpace/ppo-b747-pitch-control")
dt = 0.1
tp = generate_time_period(tn=20, dt=dt)
tps = convert_tp_to_sec_tp(tp, dt=dt)
reference = unit_step(tp=tps, degree=1.0, time_step=5.0, output_rad=True).reshape(1, -1)
env = ImprovedB747Env(
initial_state=np.array([0.0, 0.0, 0.0, 0.0], dtype=np.float32),
reference_signal=reference,
number_time_steps=len(tp),
dt=dt,
)
obs, _ = env.reset()
done = False
while not done:
action, mean_action, _ = agent.act(obs, deterministic=True)
action_scalar = float(np.asarray(mean_action).flatten()[0])
obs, reward, terminated, truncated, info = env.step(action_scalar)
done = terminated or truncated
Load from Local Checkpoint
from tensoraerospace.agent.ppo.model import PPO
agent = PPO.from_pretrained("./path/to/checkpoint")
Limitations
- Fixed Aircraft Model: Trained specifically on Boeing 747 longitudinal dynamics; may not generalize to other aircraft
- Step Reference Only: Optimized for step reference tracking; performance on other signal types (sine, ramp) may vary
- Simulation Gap: Trained in simulation; real-world deployment would require additional validation
- State Observability: Assumes all 4 longitudinal states are observable
- Linear Dynamics: Based on linearized aircraft model around trim conditions
Ethical Considerations
- Not for Real Flight Control: This model is for research and educational purposes only. It should NOT be used for actual aircraft control systems without extensive testing, certification, and regulatory approval.
- Simulation Only: All training and evaluation performed in simulation environments.
Citation
If you use this model in your research, please cite:
@software{tensoraerospace2024,
title = {TensorAeroSpace: Advanced Aerospace Control Systems \& Reinforcement Learning Framework},
author = {TensorAeroSpace Team},
year = {2024},
url = {https://github.com/TensorAeroSpace/TensorAeroSpace},
license = {MIT}
}
Model Card Authors
TensorAeroSpace Team
Model Card Contact