ranjan56cse commited on
Commit
ab0d62d
·
verified ·
1 Parent(s): 9f209ca

Upload training_summary.json with huggingface_hub

Browse files
Files changed (1) hide show
  1. training_summary.json +3273 -0
training_summary.json ADDED
@@ -0,0 +1,3273 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "status": "completed",
3
+ "total_time_seconds": 19658.6311378479,
4
+ "total_time_human": "5h 27m",
5
+ "total_steps": 19128,
6
+ "total_epochs": 2.999764761232651,
7
+ "best_eval_loss": 0.5003042221069336,
8
+ "checkpoints_saved": 19,
9
+ "final_metrics": {
10
+ "train_runtime": 19658.632,
11
+ "train_samples_per_second": 31.138,
12
+ "train_steps_per_second": 0.973,
13
+ "total_flos": 3.7606424506269696e+17,
14
+ "train_loss": 0.9305070004967754,
15
+ "epoch": 3.0,
16
+ "step": 19128
17
+ },
18
+ "all_metrics_history": [
19
+ {
20
+ "step": 50,
21
+ "epoch": 0.01,
22
+ "loss": 12.5022,
23
+ "learning_rate": 2.88e-05,
24
+ "gpu_memory_gb": 0.8661794662475586,
25
+ "system_memory_percent": 4.3
26
+ },
27
+ {
28
+ "step": 100,
29
+ "epoch": 0.02,
30
+ "loss": 10.3469,
31
+ "learning_rate": 5.82e-05,
32
+ "gpu_memory_gb": 0.8661794662475586,
33
+ "system_memory_percent": 4.3
34
+ },
35
+ {
36
+ "step": 150,
37
+ "epoch": 0.02,
38
+ "loss": 4.02,
39
+ "learning_rate": 8.819999999999999e-05,
40
+ "gpu_memory_gb": 0.8661794662475586,
41
+ "system_memory_percent": 4.5
42
+ },
43
+ {
44
+ "step": 200,
45
+ "epoch": 0.03,
46
+ "loss": 0.9201,
47
+ "learning_rate": 0.0001182,
48
+ "gpu_memory_gb": 0.8661794662475586,
49
+ "system_memory_percent": 4.5
50
+ },
51
+ {
52
+ "step": 250,
53
+ "epoch": 0.04,
54
+ "loss": 0.7357,
55
+ "learning_rate": 0.0001482,
56
+ "gpu_memory_gb": 0.8661794662475586,
57
+ "system_memory_percent": 4.5
58
+ },
59
+ {
60
+ "step": 300,
61
+ "epoch": 0.05,
62
+ "loss": 0.6602,
63
+ "learning_rate": 0.00017699999999999997,
64
+ "gpu_memory_gb": 0.8661794662475586,
65
+ "system_memory_percent": 4.5
66
+ },
67
+ {
68
+ "step": 350,
69
+ "epoch": 0.05,
70
+ "loss": 0.6121,
71
+ "learning_rate": 0.00020699999999999996,
72
+ "gpu_memory_gb": 0.8661794662475586,
73
+ "system_memory_percent": 4.5
74
+ },
75
+ {
76
+ "step": 400,
77
+ "epoch": 0.06,
78
+ "loss": 0.5817,
79
+ "learning_rate": 0.000237,
80
+ "gpu_memory_gb": 0.8661794662475586,
81
+ "system_memory_percent": 4.5
82
+ },
83
+ {
84
+ "step": 450,
85
+ "epoch": 0.07,
86
+ "loss": 0.5916,
87
+ "learning_rate": 0.000267,
88
+ "gpu_memory_gb": 0.8661794662475586,
89
+ "system_memory_percent": 4.5
90
+ },
91
+ {
92
+ "step": 500,
93
+ "epoch": 0.08,
94
+ "loss": 0.5675,
95
+ "learning_rate": 0.00029699999999999996,
96
+ "gpu_memory_gb": 0.8661794662475586,
97
+ "system_memory_percent": 4.5
98
+ },
99
+ {
100
+ "step": 550,
101
+ "epoch": 0.09,
102
+ "loss": 0.57,
103
+ "learning_rate": 0.0002992913893064204,
104
+ "gpu_memory_gb": 0.8661794662475586,
105
+ "system_memory_percent": 4.6
106
+ },
107
+ {
108
+ "step": 600,
109
+ "epoch": 0.09,
110
+ "loss": 0.561,
111
+ "learning_rate": 0.0002984861498818982,
112
+ "gpu_memory_gb": 0.8661794662475586,
113
+ "system_memory_percent": 4.6
114
+ },
115
+ {
116
+ "step": 650,
117
+ "epoch": 0.1,
118
+ "loss": 0.5669,
119
+ "learning_rate": 0.000297680910457376,
120
+ "gpu_memory_gb": 0.8661794662475586,
121
+ "system_memory_percent": 4.5
122
+ },
123
+ {
124
+ "step": 700,
125
+ "epoch": 0.11,
126
+ "loss": 0.5659,
127
+ "learning_rate": 0.00029687567103285373,
128
+ "gpu_memory_gb": 0.8661794662475586,
129
+ "system_memory_percent": 4.8
130
+ },
131
+ {
132
+ "step": 750,
133
+ "epoch": 0.12,
134
+ "loss": 0.5673,
135
+ "learning_rate": 0.0002960704316083315,
136
+ "gpu_memory_gb": 0.8661794662475586,
137
+ "system_memory_percent": 7.3
138
+ },
139
+ {
140
+ "step": 800,
141
+ "epoch": 0.13,
142
+ "loss": 0.5619,
143
+ "learning_rate": 0.0002952651921838093,
144
+ "gpu_memory_gb": 0.8661794662475586,
145
+ "system_memory_percent": 5.4
146
+ },
147
+ {
148
+ "step": 850,
149
+ "epoch": 0.13,
150
+ "loss": 0.5719,
151
+ "learning_rate": 0.00029449216233626795,
152
+ "gpu_memory_gb": 0.8661794662475586,
153
+ "system_memory_percent": 5.7
154
+ },
155
+ {
156
+ "step": 900,
157
+ "epoch": 0.14,
158
+ "loss": 0.5576,
159
+ "learning_rate": 0.00029368692291174576,
160
+ "gpu_memory_gb": 0.8661794662475586,
161
+ "system_memory_percent": 6.0
162
+ },
163
+ {
164
+ "step": 950,
165
+ "epoch": 0.15,
166
+ "loss": 0.5567,
167
+ "learning_rate": 0.0002928816834872235,
168
+ "gpu_memory_gb": 0.8661794662475586,
169
+ "system_memory_percent": 6.1
170
+ },
171
+ {
172
+ "step": 1000,
173
+ "epoch": 0.16,
174
+ "loss": 0.5597,
175
+ "learning_rate": 0.00029210865363968216,
176
+ "gpu_memory_gb": 0.8661794662475586,
177
+ "system_memory_percent": 6.4
178
+ },
179
+ {
180
+ "step": 1000,
181
+ "epoch": 0.16,
182
+ "eval_loss": 0.5003042221069336,
183
+ "eval_runtime": 95.3235,
184
+ "eval_samples_per_second": 118.879,
185
+ "eval_steps_per_second": 7.438,
186
+ "gpu_memory_gb": 0.8661794662475586,
187
+ "system_memory_percent": 6.9
188
+ },
189
+ {
190
+ "step": 1050,
191
+ "epoch": 0.16,
192
+ "loss": 0.5565,
193
+ "learning_rate": 0.0002913195190036504,
194
+ "gpu_memory_gb": 0.8661794662475586,
195
+ "system_memory_percent": 6.8
196
+ },
197
+ {
198
+ "step": 1100,
199
+ "epoch": 0.17,
200
+ "loss": 0.5767,
201
+ "learning_rate": 0.00029057869873308994,
202
+ "gpu_memory_gb": 0.8661794662475586,
203
+ "system_memory_percent": 6.7
204
+ },
205
+ {
206
+ "step": 1150,
207
+ "epoch": 0.18,
208
+ "loss": 0.562,
209
+ "learning_rate": 0.00028978956409705817,
210
+ "gpu_memory_gb": 0.8661794662475586,
211
+ "system_memory_percent": 6.7
212
+ },
213
+ {
214
+ "step": 1200,
215
+ "epoch": 0.19,
216
+ "loss": 0.5864,
217
+ "learning_rate": 0.0002890004294610264,
218
+ "gpu_memory_gb": 0.8661794662475586,
219
+ "system_memory_percent": 6.7
220
+ },
221
+ {
222
+ "step": 1250,
223
+ "epoch": 0.2,
224
+ "loss": 0.626,
225
+ "learning_rate": 0.00028819519003650415,
226
+ "gpu_memory_gb": 0.8661794662475586,
227
+ "system_memory_percent": 6.7
228
+ },
229
+ {
230
+ "step": 1300,
231
+ "epoch": 0.2,
232
+ "loss": 0.7742,
233
+ "learning_rate": 0.0002873899506119819,
234
+ "gpu_memory_gb": 0.8661794662475586,
235
+ "system_memory_percent": 6.7
236
+ },
237
+ {
238
+ "step": 1350,
239
+ "epoch": 0.21,
240
+ "loss": 1.1101,
241
+ "learning_rate": 0.0002866169207644406,
242
+ "gpu_memory_gb": 0.8661794662475586,
243
+ "system_memory_percent": 6.8
244
+ },
245
+ {
246
+ "step": 1400,
247
+ "epoch": 0.22,
248
+ "loss": 1.3211,
249
+ "learning_rate": 0.00028581168133991837,
250
+ "gpu_memory_gb": 0.8661794662475586,
251
+ "system_memory_percent": 6.9
252
+ },
253
+ {
254
+ "step": 1450,
255
+ "epoch": 0.23,
256
+ "loss": 1.413,
257
+ "learning_rate": 0.0002850064419153961,
258
+ "gpu_memory_gb": 0.8661794662475586,
259
+ "system_memory_percent": 7.0
260
+ },
261
+ {
262
+ "step": 1500,
263
+ "epoch": 0.24,
264
+ "loss": 1.4265,
265
+ "learning_rate": 0.00028420120249087393,
266
+ "gpu_memory_gb": 0.8661794662475586,
267
+ "system_memory_percent": 7.0
268
+ },
269
+ {
270
+ "step": 1550,
271
+ "epoch": 0.24,
272
+ "loss": 1.47,
273
+ "learning_rate": 0.0002834281726433326,
274
+ "gpu_memory_gb": 0.8661794662475586,
275
+ "system_memory_percent": 7.2
276
+ },
277
+ {
278
+ "step": 1600,
279
+ "epoch": 0.25,
280
+ "loss": 1.4561,
281
+ "learning_rate": 0.00028262293321881034,
282
+ "gpu_memory_gb": 0.8661794662475586,
283
+ "system_memory_percent": 8.8
284
+ },
285
+ {
286
+ "step": 1650,
287
+ "epoch": 0.26,
288
+ "loss": 1.4693,
289
+ "learning_rate": 0.0002818337985827786,
290
+ "gpu_memory_gb": 0.8661794662475586,
291
+ "system_memory_percent": 11.4
292
+ },
293
+ {
294
+ "step": 1700,
295
+ "epoch": 0.27,
296
+ "loss": 1.4729,
297
+ "learning_rate": 0.0002810285591582563,
298
+ "gpu_memory_gb": 0.8661794662475586,
299
+ "system_memory_percent": 13.3
300
+ },
301
+ {
302
+ "step": 1750,
303
+ "epoch": 0.27,
304
+ "loss": 1.4599,
305
+ "learning_rate": 0.00028022331973373413,
306
+ "gpu_memory_gb": 0.8661794662475586,
307
+ "system_memory_percent": 13.3
308
+ },
309
+ {
310
+ "step": 1800,
311
+ "epoch": 0.28,
312
+ "loss": 1.4725,
313
+ "learning_rate": 0.0002794180803092119,
314
+ "gpu_memory_gb": 0.8661794662475586,
315
+ "system_memory_percent": 13.2
316
+ },
317
+ {
318
+ "step": 1850,
319
+ "epoch": 0.29,
320
+ "loss": 1.4503,
321
+ "learning_rate": 0.0002786128408846897,
322
+ "gpu_memory_gb": 0.8661794662475586,
323
+ "system_memory_percent": 13.2
324
+ },
325
+ {
326
+ "step": 1900,
327
+ "epoch": 0.3,
328
+ "loss": 1.4812,
329
+ "learning_rate": 0.0002778237062486579,
330
+ "gpu_memory_gb": 0.8661794662475586,
331
+ "system_memory_percent": 13.2
332
+ },
333
+ {
334
+ "step": 1950,
335
+ "epoch": 0.31,
336
+ "loss": 1.4761,
337
+ "learning_rate": 0.0002770184668241357,
338
+ "gpu_memory_gb": 0.8661794662475586,
339
+ "system_memory_percent": 13.2
340
+ },
341
+ {
342
+ "step": 2000,
343
+ "epoch": 0.31,
344
+ "loss": 1.496,
345
+ "learning_rate": 0.00027621322739961344,
346
+ "gpu_memory_gb": 0.8661794662475586,
347
+ "system_memory_percent": 13.3
348
+ },
349
+ {
350
+ "step": 2000,
351
+ "epoch": 0.31,
352
+ "eval_loss": 1.251204252243042,
353
+ "eval_runtime": 94.8348,
354
+ "eval_samples_per_second": 119.492,
355
+ "eval_steps_per_second": 7.476,
356
+ "gpu_memory_gb": 0.8661794662475586,
357
+ "system_memory_percent": 13.2
358
+ },
359
+ {
360
+ "step": 2050,
361
+ "epoch": 0.32,
362
+ "loss": 1.4488,
363
+ "learning_rate": 0.00027542409276358167,
364
+ "gpu_memory_gb": 0.8661794662475586,
365
+ "system_memory_percent": 13.2
366
+ },
367
+ {
368
+ "step": 2100,
369
+ "epoch": 0.33,
370
+ "loss": 1.455,
371
+ "learning_rate": 0.00027461885333905943,
372
+ "gpu_memory_gb": 0.8661794662475586,
373
+ "system_memory_percent": 13.3
374
+ },
375
+ {
376
+ "step": 2150,
377
+ "epoch": 0.34,
378
+ "loss": 1.4353,
379
+ "learning_rate": 0.00027381361391453724,
380
+ "gpu_memory_gb": 0.8661794662475586,
381
+ "system_memory_percent": 13.2
382
+ },
383
+ {
384
+ "step": 2200,
385
+ "epoch": 0.35,
386
+ "loss": 1.4524,
387
+ "learning_rate": 0.000273008374490015,
388
+ "gpu_memory_gb": 0.8661794662475586,
389
+ "system_memory_percent": 13.2
390
+ },
391
+ {
392
+ "step": 2250,
393
+ "epoch": 0.35,
394
+ "loss": 1.4701,
395
+ "learning_rate": 0.0002722192398539832,
396
+ "gpu_memory_gb": 0.8661794662475586,
397
+ "system_memory_percent": 13.2
398
+ },
399
+ {
400
+ "step": 2300,
401
+ "epoch": 0.36,
402
+ "loss": 1.4734,
403
+ "learning_rate": 0.000271414000429461,
404
+ "gpu_memory_gb": 0.8661794662475586,
405
+ "system_memory_percent": 13.2
406
+ },
407
+ {
408
+ "step": 2350,
409
+ "epoch": 0.37,
410
+ "loss": 1.5035,
411
+ "learning_rate": 0.0002706409705819197,
412
+ "gpu_memory_gb": 0.8661794662475586,
413
+ "system_memory_percent": 13.6
414
+ },
415
+ {
416
+ "step": 2400,
417
+ "epoch": 0.38,
418
+ "loss": 1.4513,
419
+ "learning_rate": 0.00026983573115739744,
420
+ "gpu_memory_gb": 0.8661794662475586,
421
+ "system_memory_percent": 6.7
422
+ },
423
+ {
424
+ "step": 2450,
425
+ "epoch": 0.38,
426
+ "loss": 1.4641,
427
+ "learning_rate": 0.00026904659652136567,
428
+ "gpu_memory_gb": 0.8661794662475586,
429
+ "system_memory_percent": 6.7
430
+ },
431
+ {
432
+ "step": 2500,
433
+ "epoch": 0.39,
434
+ "loss": 1.4585,
435
+ "learning_rate": 0.0002682413570968434,
436
+ "gpu_memory_gb": 0.8661794662475586,
437
+ "system_memory_percent": 6.7
438
+ },
439
+ {
440
+ "step": 2550,
441
+ "epoch": 0.4,
442
+ "loss": 1.4673,
443
+ "learning_rate": 0.00026743611767232123,
444
+ "gpu_memory_gb": 0.8661794662475586,
445
+ "system_memory_percent": 6.7
446
+ },
447
+ {
448
+ "step": 2600,
449
+ "epoch": 0.41,
450
+ "loss": 1.4671,
451
+ "learning_rate": 0.0002666469830362894,
452
+ "gpu_memory_gb": 0.8661794662475586,
453
+ "system_memory_percent": 6.7
454
+ },
455
+ {
456
+ "step": 2650,
457
+ "epoch": 0.42,
458
+ "loss": 1.4702,
459
+ "learning_rate": 0.0002658578484002577,
460
+ "gpu_memory_gb": 0.8661794662475586,
461
+ "system_memory_percent": 6.7
462
+ },
463
+ {
464
+ "step": 2700,
465
+ "epoch": 0.42,
466
+ "loss": 1.4612,
467
+ "learning_rate": 0.00026508481855271634,
468
+ "gpu_memory_gb": 0.8661794662475586,
469
+ "system_memory_percent": 6.8
470
+ },
471
+ {
472
+ "step": 2750,
473
+ "epoch": 0.43,
474
+ "loss": 1.4713,
475
+ "learning_rate": 0.0002642956839166845,
476
+ "gpu_memory_gb": 0.8661794662475586,
477
+ "system_memory_percent": 6.8
478
+ },
479
+ {
480
+ "step": 2800,
481
+ "epoch": 0.44,
482
+ "loss": 1.4573,
483
+ "learning_rate": 0.00026350654928065275,
484
+ "gpu_memory_gb": 0.8661794662475586,
485
+ "system_memory_percent": 6.8
486
+ },
487
+ {
488
+ "step": 2850,
489
+ "epoch": 0.45,
490
+ "loss": 1.4586,
491
+ "learning_rate": 0.000262717414644621,
492
+ "gpu_memory_gb": 0.8661794662475586,
493
+ "system_memory_percent": 6.8
494
+ },
495
+ {
496
+ "step": 2900,
497
+ "epoch": 0.45,
498
+ "loss": 1.4674,
499
+ "learning_rate": 0.00026191217522009873,
500
+ "gpu_memory_gb": 0.8661794662475586,
501
+ "system_memory_percent": 6.8
502
+ },
503
+ {
504
+ "step": 2950,
505
+ "epoch": 0.46,
506
+ "loss": 1.4466,
507
+ "learning_rate": 0.00026110693579557654,
508
+ "gpu_memory_gb": 0.8661794662475586,
509
+ "system_memory_percent": 6.8
510
+ },
511
+ {
512
+ "step": 3000,
513
+ "epoch": 0.47,
514
+ "loss": 1.4897,
515
+ "learning_rate": 0.0002603339059480352,
516
+ "gpu_memory_gb": 0.8661794662475586,
517
+ "system_memory_percent": 6.8
518
+ },
519
+ {
520
+ "step": 3000,
521
+ "epoch": 0.47,
522
+ "eval_loss": 1.2417596578598022,
523
+ "eval_runtime": 94.8105,
524
+ "eval_samples_per_second": 119.523,
525
+ "eval_steps_per_second": 7.478,
526
+ "gpu_memory_gb": 0.8661794662475586,
527
+ "system_memory_percent": 6.7
528
+ },
529
+ {
530
+ "step": 3050,
531
+ "epoch": 0.48,
532
+ "loss": 1.4621,
533
+ "learning_rate": 0.0002595447713120034,
534
+ "gpu_memory_gb": 0.8661794662475586,
535
+ "system_memory_percent": 6.7
536
+ },
537
+ {
538
+ "step": 3100,
539
+ "epoch": 0.49,
540
+ "loss": 1.4443,
541
+ "learning_rate": 0.0002587395318874812,
542
+ "gpu_memory_gb": 0.8661794662475586,
543
+ "system_memory_percent": 6.7
544
+ },
545
+ {
546
+ "step": 3150,
547
+ "epoch": 0.49,
548
+ "loss": 1.4314,
549
+ "learning_rate": 0.0002579503972514494,
550
+ "gpu_memory_gb": 0.8661794662475586,
551
+ "system_memory_percent": 6.7
552
+ },
553
+ {
554
+ "step": 3200,
555
+ "epoch": 0.5,
556
+ "loss": 1.4172,
557
+ "learning_rate": 0.0002571451578269272,
558
+ "gpu_memory_gb": 0.8661794662475586,
559
+ "system_memory_percent": 6.7
560
+ },
561
+ {
562
+ "step": 3250,
563
+ "epoch": 0.51,
564
+ "loss": 1.4878,
565
+ "learning_rate": 0.00025637212797938587,
566
+ "gpu_memory_gb": 0.8661794662475586,
567
+ "system_memory_percent": 6.6
568
+ },
569
+ {
570
+ "step": 3300,
571
+ "epoch": 0.52,
572
+ "loss": 1.4344,
573
+ "learning_rate": 0.00025558299334335404,
574
+ "gpu_memory_gb": 0.8661794662475586,
575
+ "system_memory_percent": 6.6
576
+ },
577
+ {
578
+ "step": 3350,
579
+ "epoch": 0.53,
580
+ "loss": 1.4634,
581
+ "learning_rate": 0.00025477775391883185,
582
+ "gpu_memory_gb": 0.8661794662475586,
583
+ "system_memory_percent": 6.6
584
+ },
585
+ {
586
+ "step": 3400,
587
+ "epoch": 0.53,
588
+ "loss": 1.4679,
589
+ "learning_rate": 0.0002539725144943096,
590
+ "gpu_memory_gb": 0.8661794662475586,
591
+ "system_memory_percent": 6.6
592
+ },
593
+ {
594
+ "step": 3450,
595
+ "epoch": 0.54,
596
+ "loss": 1.4641,
597
+ "learning_rate": 0.00025318337985827784,
598
+ "gpu_memory_gb": 0.8661794662475586,
599
+ "system_memory_percent": 6.7
600
+ },
601
+ {
602
+ "step": 3500,
603
+ "epoch": 0.55,
604
+ "loss": 1.4396,
605
+ "learning_rate": 0.00025239424522224607,
606
+ "gpu_memory_gb": 0.8661794662475586,
607
+ "system_memory_percent": 6.8
608
+ },
609
+ {
610
+ "step": 3550,
611
+ "epoch": 0.56,
612
+ "loss": 1.485,
613
+ "learning_rate": 0.00025160511058621425,
614
+ "gpu_memory_gb": 0.8661794662475586,
615
+ "system_memory_percent": 6.8
616
+ },
617
+ {
618
+ "step": 3600,
619
+ "epoch": 0.56,
620
+ "loss": 1.4355,
621
+ "learning_rate": 0.0002508159759501825,
622
+ "gpu_memory_gb": 0.8661794662475586,
623
+ "system_memory_percent": 6.8
624
+ },
625
+ {
626
+ "step": 3650,
627
+ "epoch": 0.57,
628
+ "loss": 1.4419,
629
+ "learning_rate": 0.0002500107365256603,
630
+ "gpu_memory_gb": 0.8661794662475586,
631
+ "system_memory_percent": 6.8
632
+ },
633
+ {
634
+ "step": 3700,
635
+ "epoch": 0.58,
636
+ "loss": 1.4224,
637
+ "learning_rate": 0.00024920549710113804,
638
+ "gpu_memory_gb": 0.8661794662475586,
639
+ "system_memory_percent": 6.8
640
+ },
641
+ {
642
+ "step": 3750,
643
+ "epoch": 0.59,
644
+ "loss": 1.4473,
645
+ "learning_rate": 0.0002484002576766158,
646
+ "gpu_memory_gb": 0.8661794662475586,
647
+ "system_memory_percent": 6.8
648
+ },
649
+ {
650
+ "step": 3800,
651
+ "epoch": 0.6,
652
+ "loss": 1.4341,
653
+ "learning_rate": 0.0002475950182520936,
654
+ "gpu_memory_gb": 0.8661794662475586,
655
+ "system_memory_percent": 6.8
656
+ },
657
+ {
658
+ "step": 3850,
659
+ "epoch": 0.6,
660
+ "loss": 1.4463,
661
+ "learning_rate": 0.00024678977882757136,
662
+ "gpu_memory_gb": 0.8661794662475586,
663
+ "system_memory_percent": 6.8
664
+ },
665
+ {
666
+ "step": 3900,
667
+ "epoch": 0.61,
668
+ "loss": 1.4348,
669
+ "learning_rate": 0.00024598453940304917,
670
+ "gpu_memory_gb": 0.8661794662475586,
671
+ "system_memory_percent": 6.8
672
+ },
673
+ {
674
+ "step": 3950,
675
+ "epoch": 0.62,
676
+ "loss": 1.4326,
677
+ "learning_rate": 0.00024517929997852693,
678
+ "gpu_memory_gb": 0.8661794662475586,
679
+ "system_memory_percent": 6.8
680
+ },
681
+ {
682
+ "step": 4000,
683
+ "epoch": 0.63,
684
+ "loss": 1.4586,
685
+ "learning_rate": 0.00024439016534249516,
686
+ "gpu_memory_gb": 0.8661794662475586,
687
+ "system_memory_percent": 6.8
688
+ },
689
+ {
690
+ "step": 4000,
691
+ "epoch": 0.63,
692
+ "eval_loss": 1.2329678535461426,
693
+ "eval_runtime": 94.8153,
694
+ "eval_samples_per_second": 119.517,
695
+ "eval_steps_per_second": 7.478,
696
+ "gpu_memory_gb": 0.8661794662475586,
697
+ "system_memory_percent": 6.9
698
+ },
699
+ {
700
+ "step": 4050,
701
+ "epoch": 0.64,
702
+ "loss": 1.4624,
703
+ "learning_rate": 0.00024360103070646336,
704
+ "gpu_memory_gb": 0.8661794662475586,
705
+ "system_memory_percent": 6.9
706
+ },
707
+ {
708
+ "step": 4100,
709
+ "epoch": 0.64,
710
+ "loss": 1.455,
711
+ "learning_rate": 0.00024282800085892204,
712
+ "gpu_memory_gb": 0.8661794662475586,
713
+ "system_memory_percent": 6.9
714
+ },
715
+ {
716
+ "step": 4150,
717
+ "epoch": 0.65,
718
+ "loss": 1.4294,
719
+ "learning_rate": 0.00024202276143439982,
720
+ "gpu_memory_gb": 0.8661794662475586,
721
+ "system_memory_percent": 6.9
722
+ },
723
+ {
724
+ "step": 4200,
725
+ "epoch": 0.66,
726
+ "loss": 1.4675,
727
+ "learning_rate": 0.00024124973158685847,
728
+ "gpu_memory_gb": 0.8661794662475586,
729
+ "system_memory_percent": 6.9
730
+ },
731
+ {
732
+ "step": 4250,
733
+ "epoch": 0.67,
734
+ "loss": 1.432,
735
+ "learning_rate": 0.00024044449216233625,
736
+ "gpu_memory_gb": 0.8661794662475586,
737
+ "system_memory_percent": 6.8
738
+ },
739
+ {
740
+ "step": 4300,
741
+ "epoch": 0.67,
742
+ "loss": 1.4357,
743
+ "learning_rate": 0.00023963925273781404,
744
+ "gpu_memory_gb": 0.8661794662475586,
745
+ "system_memory_percent": 6.8
746
+ },
747
+ {
748
+ "step": 4350,
749
+ "epoch": 0.68,
750
+ "loss": 1.4419,
751
+ "learning_rate": 0.0002388340133132918,
752
+ "gpu_memory_gb": 0.8661794662475586,
753
+ "system_memory_percent": 6.8
754
+ },
755
+ {
756
+ "step": 4400,
757
+ "epoch": 0.69,
758
+ "loss": 1.4272,
759
+ "learning_rate": 0.00023802877388876957,
760
+ "gpu_memory_gb": 0.8661794662475586,
761
+ "system_memory_percent": 6.8
762
+ },
763
+ {
764
+ "step": 4450,
765
+ "epoch": 0.7,
766
+ "loss": 1.4133,
767
+ "learning_rate": 0.00023722353446424736,
768
+ "gpu_memory_gb": 0.8661794662475586,
769
+ "system_memory_percent": 6.9
770
+ },
771
+ {
772
+ "step": 4500,
773
+ "epoch": 0.71,
774
+ "loss": 1.434,
775
+ "learning_rate": 0.0002364182950397251,
776
+ "gpu_memory_gb": 0.8661794662475586,
777
+ "system_memory_percent": 6.9
778
+ },
779
+ {
780
+ "step": 4550,
781
+ "epoch": 0.71,
782
+ "loss": 1.4218,
783
+ "learning_rate": 0.0002356130556152029,
784
+ "gpu_memory_gb": 0.8661794662475586,
785
+ "system_memory_percent": 6.9
786
+ },
787
+ {
788
+ "step": 4600,
789
+ "epoch": 0.72,
790
+ "loss": 1.4682,
791
+ "learning_rate": 0.0002348239209791711,
792
+ "gpu_memory_gb": 0.8661794662475586,
793
+ "system_memory_percent": 6.9
794
+ },
795
+ {
796
+ "step": 4650,
797
+ "epoch": 0.73,
798
+ "loss": 1.4333,
799
+ "learning_rate": 0.00023401868155464888,
800
+ "gpu_memory_gb": 0.8661794662475586,
801
+ "system_memory_percent": 6.9
802
+ },
803
+ {
804
+ "step": 4700,
805
+ "epoch": 0.74,
806
+ "loss": 1.4359,
807
+ "learning_rate": 0.00023321344213012666,
808
+ "gpu_memory_gb": 0.8661794662475586,
809
+ "system_memory_percent": 6.9
810
+ },
811
+ {
812
+ "step": 4750,
813
+ "epoch": 0.74,
814
+ "loss": 1.4054,
815
+ "learning_rate": 0.00023240820270560445,
816
+ "gpu_memory_gb": 0.8661794662475586,
817
+ "system_memory_percent": 6.9
818
+ },
819
+ {
820
+ "step": 4800,
821
+ "epoch": 0.75,
822
+ "loss": 1.4215,
823
+ "learning_rate": 0.00023160296328108223,
824
+ "gpu_memory_gb": 0.8661794662475586,
825
+ "system_memory_percent": 6.9
826
+ },
827
+ {
828
+ "step": 4850,
829
+ "epoch": 0.76,
830
+ "loss": 1.4471,
831
+ "learning_rate": 0.00023081382864505043,
832
+ "gpu_memory_gb": 0.8661794662475586,
833
+ "system_memory_percent": 6.9
834
+ },
835
+ {
836
+ "step": 4900,
837
+ "epoch": 0.77,
838
+ "loss": 1.4238,
839
+ "learning_rate": 0.00023000858922052821,
840
+ "gpu_memory_gb": 0.8661794662475586,
841
+ "system_memory_percent": 6.9
842
+ },
843
+ {
844
+ "step": 4950,
845
+ "epoch": 0.78,
846
+ "loss": 1.4218,
847
+ "learning_rate": 0.000229203349796006,
848
+ "gpu_memory_gb": 0.8661794662475586,
849
+ "system_memory_percent": 6.9
850
+ },
851
+ {
852
+ "step": 5000,
853
+ "epoch": 0.78,
854
+ "loss": 1.4419,
855
+ "learning_rate": 0.00022839811037148378,
856
+ "gpu_memory_gb": 0.8661794662475586,
857
+ "system_memory_percent": 6.9
858
+ },
859
+ {
860
+ "step": 5000,
861
+ "epoch": 0.78,
862
+ "eval_loss": 1.22481107711792,
863
+ "eval_runtime": 95.042,
864
+ "eval_samples_per_second": 119.231,
865
+ "eval_steps_per_second": 7.46,
866
+ "gpu_memory_gb": 0.8661794662475586,
867
+ "system_memory_percent": 6.8
868
+ },
869
+ {
870
+ "step": 5050,
871
+ "epoch": 0.79,
872
+ "loss": 1.4405,
873
+ "learning_rate": 0.00022760897573545198,
874
+ "gpu_memory_gb": 0.8661794662475586,
875
+ "system_memory_percent": 7.0
876
+ },
877
+ {
878
+ "step": 5100,
879
+ "epoch": 0.8,
880
+ "loss": 1.449,
881
+ "learning_rate": 0.00022683594588791063,
882
+ "gpu_memory_gb": 0.8661794662475586,
883
+ "system_memory_percent": 6.9
884
+ },
885
+ {
886
+ "step": 5150,
887
+ "epoch": 0.81,
888
+ "loss": 1.4233,
889
+ "learning_rate": 0.0002260468112518789,
890
+ "gpu_memory_gb": 0.8661794662475586,
891
+ "system_memory_percent": 6.9
892
+ },
893
+ {
894
+ "step": 5200,
895
+ "epoch": 0.82,
896
+ "loss": 1.423,
897
+ "learning_rate": 0.00022524157182735667,
898
+ "gpu_memory_gb": 0.8661794662475586,
899
+ "system_memory_percent": 6.9
900
+ },
901
+ {
902
+ "step": 5250,
903
+ "epoch": 0.82,
904
+ "loss": 1.4315,
905
+ "learning_rate": 0.0002244363324028344,
906
+ "gpu_memory_gb": 0.8661794662475586,
907
+ "system_memory_percent": 6.9
908
+ },
909
+ {
910
+ "step": 5300,
911
+ "epoch": 0.83,
912
+ "loss": 1.418,
913
+ "learning_rate": 0.00022363109297831218,
914
+ "gpu_memory_gb": 0.8661794662475586,
915
+ "system_memory_percent": 6.9
916
+ },
917
+ {
918
+ "step": 5350,
919
+ "epoch": 0.84,
920
+ "loss": 1.4056,
921
+ "learning_rate": 0.00022282585355378997,
922
+ "gpu_memory_gb": 0.8661794662475586,
923
+ "system_memory_percent": 6.9
924
+ },
925
+ {
926
+ "step": 5400,
927
+ "epoch": 0.85,
928
+ "loss": 1.4351,
929
+ "learning_rate": 0.00022202061412926775,
930
+ "gpu_memory_gb": 0.8661794662475586,
931
+ "system_memory_percent": 6.9
932
+ },
933
+ {
934
+ "step": 5450,
935
+ "epoch": 0.85,
936
+ "loss": 1.4377,
937
+ "learning_rate": 0.00022123147949323598,
938
+ "gpu_memory_gb": 0.8661794662475586,
939
+ "system_memory_percent": 6.9
940
+ },
941
+ {
942
+ "step": 5500,
943
+ "epoch": 0.86,
944
+ "loss": 1.4065,
945
+ "learning_rate": 0.00022042624006871374,
946
+ "gpu_memory_gb": 0.8661794662475586,
947
+ "system_memory_percent": 6.9
948
+ },
949
+ {
950
+ "step": 5550,
951
+ "epoch": 0.87,
952
+ "loss": 1.4246,
953
+ "learning_rate": 0.00021963710543268196,
954
+ "gpu_memory_gb": 0.8661794662475586,
955
+ "system_memory_percent": 6.9
956
+ },
957
+ {
958
+ "step": 5600,
959
+ "epoch": 0.88,
960
+ "loss": 1.4607,
961
+ "learning_rate": 0.00021886407558514064,
962
+ "gpu_memory_gb": 0.8661794662475586,
963
+ "system_memory_percent": 6.9
964
+ },
965
+ {
966
+ "step": 5650,
967
+ "epoch": 0.89,
968
+ "loss": 1.4211,
969
+ "learning_rate": 0.0002180588361606184,
970
+ "gpu_memory_gb": 0.8661794662475586,
971
+ "system_memory_percent": 6.9
972
+ },
973
+ {
974
+ "step": 5700,
975
+ "epoch": 0.89,
976
+ "loss": 1.4475,
977
+ "learning_rate": 0.00021728580631307707,
978
+ "gpu_memory_gb": 0.8661794662475586,
979
+ "system_memory_percent": 6.9
980
+ },
981
+ {
982
+ "step": 5750,
983
+ "epoch": 0.9,
984
+ "loss": 1.3977,
985
+ "learning_rate": 0.00021648056688855486,
986
+ "gpu_memory_gb": 0.8661794662475586,
987
+ "system_memory_percent": 6.9
988
+ },
989
+ {
990
+ "step": 5800,
991
+ "epoch": 0.91,
992
+ "loss": 1.4034,
993
+ "learning_rate": 0.00021567532746403264,
994
+ "gpu_memory_gb": 0.8661794662475586,
995
+ "system_memory_percent": 6.9
996
+ },
997
+ {
998
+ "step": 5850,
999
+ "epoch": 0.92,
1000
+ "loss": 1.4237,
1001
+ "learning_rate": 0.00021487008803951037,
1002
+ "gpu_memory_gb": 0.8661794662475586,
1003
+ "system_memory_percent": 6.9
1004
+ },
1005
+ {
1006
+ "step": 5900,
1007
+ "epoch": 0.93,
1008
+ "loss": 1.4371,
1009
+ "learning_rate": 0.00021406484861498815,
1010
+ "gpu_memory_gb": 0.8661794662475586,
1011
+ "system_memory_percent": 6.9
1012
+ },
1013
+ {
1014
+ "step": 5950,
1015
+ "epoch": 0.93,
1016
+ "loss": 1.4416,
1017
+ "learning_rate": 0.00021325960919046593,
1018
+ "gpu_memory_gb": 0.8661794662475586,
1019
+ "system_memory_percent": 6.9
1020
+ },
1021
+ {
1022
+ "step": 6000,
1023
+ "epoch": 0.94,
1024
+ "loss": 1.4164,
1025
+ "learning_rate": 0.0002124704745544342,
1026
+ "gpu_memory_gb": 0.8661794662475586,
1027
+ "system_memory_percent": 6.9
1028
+ },
1029
+ {
1030
+ "step": 6000,
1031
+ "epoch": 0.94,
1032
+ "eval_loss": 1.2173230648040771,
1033
+ "eval_runtime": 95.0111,
1034
+ "eval_samples_per_second": 119.27,
1035
+ "eval_steps_per_second": 7.462,
1036
+ "gpu_memory_gb": 0.8661794662475586,
1037
+ "system_memory_percent": 6.9
1038
+ },
1039
+ {
1040
+ "step": 6050,
1041
+ "epoch": 0.95,
1042
+ "loss": 1.397,
1043
+ "learning_rate": 0.00021166523512991192,
1044
+ "gpu_memory_gb": 0.8661794662475586,
1045
+ "system_memory_percent": 6.9
1046
+ },
1047
+ {
1048
+ "step": 6100,
1049
+ "epoch": 0.96,
1050
+ "loss": 1.4268,
1051
+ "learning_rate": 0.00021089220528237062,
1052
+ "gpu_memory_gb": 0.8661794662475586,
1053
+ "system_memory_percent": 6.9
1054
+ },
1055
+ {
1056
+ "step": 6150,
1057
+ "epoch": 0.96,
1058
+ "loss": 1.4388,
1059
+ "learning_rate": 0.00021010307064633883,
1060
+ "gpu_memory_gb": 0.8661794662475586,
1061
+ "system_memory_percent": 7.0
1062
+ },
1063
+ {
1064
+ "step": 6200,
1065
+ "epoch": 0.97,
1066
+ "loss": 1.4208,
1067
+ "learning_rate": 0.00020931393601030706,
1068
+ "gpu_memory_gb": 0.8661794662475586,
1069
+ "system_memory_percent": 6.9
1070
+ },
1071
+ {
1072
+ "step": 6250,
1073
+ "epoch": 0.98,
1074
+ "loss": 1.4352,
1075
+ "learning_rate": 0.00020852480137427526,
1076
+ "gpu_memory_gb": 0.8661794662475586,
1077
+ "system_memory_percent": 6.9
1078
+ },
1079
+ {
1080
+ "step": 6300,
1081
+ "epoch": 0.99,
1082
+ "loss": 1.4053,
1083
+ "learning_rate": 0.00020771956194975304,
1084
+ "gpu_memory_gb": 0.8661794662475586,
1085
+ "system_memory_percent": 7.0
1086
+ },
1087
+ {
1088
+ "step": 6350,
1089
+ "epoch": 1.0,
1090
+ "loss": 1.4242,
1091
+ "learning_rate": 0.00020691432252523082,
1092
+ "gpu_memory_gb": 0.8661794662475586,
1093
+ "system_memory_percent": 6.9
1094
+ },
1095
+ {
1096
+ "step": 6400,
1097
+ "epoch": 1.0,
1098
+ "loss": 1.4197,
1099
+ "learning_rate": 0.00020612518788919903,
1100
+ "gpu_memory_gb": 0.8661794662475586,
1101
+ "system_memory_percent": 6.9
1102
+ },
1103
+ {
1104
+ "step": 6450,
1105
+ "epoch": 1.01,
1106
+ "loss": 1.4225,
1107
+ "learning_rate": 0.00020533605325316726,
1108
+ "gpu_memory_gb": 0.8661794662475586,
1109
+ "system_memory_percent": 6.8
1110
+ },
1111
+ {
1112
+ "step": 6500,
1113
+ "epoch": 1.02,
1114
+ "loss": 1.4188,
1115
+ "learning_rate": 0.00020453081382864504,
1116
+ "gpu_memory_gb": 0.8661794662475586,
1117
+ "system_memory_percent": 6.9
1118
+ },
1119
+ {
1120
+ "step": 6550,
1121
+ "epoch": 1.03,
1122
+ "loss": 1.4347,
1123
+ "learning_rate": 0.00020375778398110372,
1124
+ "gpu_memory_gb": 0.8661794662475586,
1125
+ "system_memory_percent": 6.9
1126
+ },
1127
+ {
1128
+ "step": 6600,
1129
+ "epoch": 1.04,
1130
+ "loss": 1.4371,
1131
+ "learning_rate": 0.00020296864934507192,
1132
+ "gpu_memory_gb": 0.8661794662475586,
1133
+ "system_memory_percent": 6.8
1134
+ },
1135
+ {
1136
+ "step": 6650,
1137
+ "epoch": 1.04,
1138
+ "loss": 1.4228,
1139
+ "learning_rate": 0.0002021634099205497,
1140
+ "gpu_memory_gb": 0.8661794662475586,
1141
+ "system_memory_percent": 6.8
1142
+ },
1143
+ {
1144
+ "step": 6700,
1145
+ "epoch": 1.05,
1146
+ "loss": 1.4289,
1147
+ "learning_rate": 0.00020137427528451793,
1148
+ "gpu_memory_gb": 0.8661794662475586,
1149
+ "system_memory_percent": 6.9
1150
+ },
1151
+ {
1152
+ "step": 6750,
1153
+ "epoch": 1.06,
1154
+ "loss": 1.4224,
1155
+ "learning_rate": 0.0002005690358599957,
1156
+ "gpu_memory_gb": 0.8661794662475586,
1157
+ "system_memory_percent": 6.9
1158
+ },
1159
+ {
1160
+ "step": 6800,
1161
+ "epoch": 1.07,
1162
+ "loss": 1.4783,
1163
+ "learning_rate": 0.00019981211080094478,
1164
+ "gpu_memory_gb": 0.8661794662475586,
1165
+ "system_memory_percent": 6.9
1166
+ },
1167
+ {
1168
+ "step": 6850,
1169
+ "epoch": 1.07,
1170
+ "loss": 1.4469,
1171
+ "learning_rate": 0.00019902297616491304,
1172
+ "gpu_memory_gb": 0.8661794662475586,
1173
+ "system_memory_percent": 6.9
1174
+ },
1175
+ {
1176
+ "step": 6900,
1177
+ "epoch": 1.08,
1178
+ "loss": 1.4335,
1179
+ "learning_rate": 0.00019823384152888124,
1180
+ "gpu_memory_gb": 0.8661794662475586,
1181
+ "system_memory_percent": 6.9
1182
+ },
1183
+ {
1184
+ "step": 6950,
1185
+ "epoch": 1.09,
1186
+ "loss": 1.3973,
1187
+ "learning_rate": 0.000197428602104359,
1188
+ "gpu_memory_gb": 0.8661794662475586,
1189
+ "system_memory_percent": 6.9
1190
+ },
1191
+ {
1192
+ "step": 7000,
1193
+ "epoch": 1.1,
1194
+ "loss": 1.4493,
1195
+ "learning_rate": 0.0001966394674683272,
1196
+ "gpu_memory_gb": 0.8661794662475586,
1197
+ "system_memory_percent": 6.9
1198
+ },
1199
+ {
1200
+ "step": 7000,
1201
+ "epoch": 1.1,
1202
+ "eval_loss": 1.210498571395874,
1203
+ "eval_runtime": 95.0044,
1204
+ "eval_samples_per_second": 119.279,
1205
+ "eval_steps_per_second": 7.463,
1206
+ "gpu_memory_gb": 0.8661794662475586,
1207
+ "system_memory_percent": 6.9
1208
+ },
1209
+ {
1210
+ "step": 7050,
1211
+ "epoch": 1.11,
1212
+ "loss": 1.4059,
1213
+ "learning_rate": 0.00019583422804380499,
1214
+ "gpu_memory_gb": 0.8661794662475586,
1215
+ "system_memory_percent": 6.9
1216
+ },
1217
+ {
1218
+ "step": 7100,
1219
+ "epoch": 1.11,
1220
+ "loss": 1.3989,
1221
+ "learning_rate": 0.00019502898861928277,
1222
+ "gpu_memory_gb": 0.8661794662475586,
1223
+ "system_memory_percent": 6.9
1224
+ },
1225
+ {
1226
+ "step": 7150,
1227
+ "epoch": 1.12,
1228
+ "loss": 1.3932,
1229
+ "learning_rate": 0.00019422374919476055,
1230
+ "gpu_memory_gb": 0.8661794662475586,
1231
+ "system_memory_percent": 7.1
1232
+ },
1233
+ {
1234
+ "step": 7200,
1235
+ "epoch": 1.13,
1236
+ "loss": 1.4181,
1237
+ "learning_rate": 0.00019341850977023833,
1238
+ "gpu_memory_gb": 0.8661794662475586,
1239
+ "system_memory_percent": 6.9
1240
+ },
1241
+ {
1242
+ "step": 7250,
1243
+ "epoch": 1.14,
1244
+ "loss": 1.4567,
1245
+ "learning_rate": 0.000192645479922697,
1246
+ "gpu_memory_gb": 0.8661794662475586,
1247
+ "system_memory_percent": 6.9
1248
+ },
1249
+ {
1250
+ "step": 7300,
1251
+ "epoch": 1.14,
1252
+ "loss": 1.4207,
1253
+ "learning_rate": 0.00019184024049817477,
1254
+ "gpu_memory_gb": 0.8661794662475586,
1255
+ "system_memory_percent": 6.9
1256
+ },
1257
+ {
1258
+ "step": 7350,
1259
+ "epoch": 1.15,
1260
+ "loss": 1.402,
1261
+ "learning_rate": 0.00019103500107365255,
1262
+ "gpu_memory_gb": 0.8661794662475586,
1263
+ "system_memory_percent": 6.9
1264
+ },
1265
+ {
1266
+ "step": 7400,
1267
+ "epoch": 1.16,
1268
+ "loss": 1.4161,
1269
+ "learning_rate": 0.00019022976164913033,
1270
+ "gpu_memory_gb": 0.8661794662475586,
1271
+ "system_memory_percent": 6.9
1272
+ },
1273
+ {
1274
+ "step": 7450,
1275
+ "epoch": 1.17,
1276
+ "loss": 1.4214,
1277
+ "learning_rate": 0.00018944062701309854,
1278
+ "gpu_memory_gb": 0.8661794662475586,
1279
+ "system_memory_percent": 6.9
1280
+ },
1281
+ {
1282
+ "step": 7500,
1283
+ "epoch": 1.18,
1284
+ "loss": 1.4018,
1285
+ "learning_rate": 0.00018863538758857632,
1286
+ "gpu_memory_gb": 0.8661794662475586,
1287
+ "system_memory_percent": 6.9
1288
+ },
1289
+ {
1290
+ "step": 7550,
1291
+ "epoch": 1.18,
1292
+ "loss": 1.3888,
1293
+ "learning_rate": 0.0001878301481640541,
1294
+ "gpu_memory_gb": 0.8661794662475586,
1295
+ "system_memory_percent": 6.9
1296
+ },
1297
+ {
1298
+ "step": 7600,
1299
+ "epoch": 1.19,
1300
+ "loss": 1.4376,
1301
+ "learning_rate": 0.0001870732231050032,
1302
+ "gpu_memory_gb": 0.8661794662475586,
1303
+ "system_memory_percent": 6.9
1304
+ },
1305
+ {
1306
+ "step": 7650,
1307
+ "epoch": 1.2,
1308
+ "loss": 1.4172,
1309
+ "learning_rate": 0.00018628408846897143,
1310
+ "gpu_memory_gb": 0.8661794662475586,
1311
+ "system_memory_percent": 7.0
1312
+ },
1313
+ {
1314
+ "step": 7700,
1315
+ "epoch": 1.21,
1316
+ "loss": 1.4116,
1317
+ "learning_rate": 0.0001854788490444492,
1318
+ "gpu_memory_gb": 0.8661794662475586,
1319
+ "system_memory_percent": 7.0
1320
+ },
1321
+ {
1322
+ "step": 7750,
1323
+ "epoch": 1.22,
1324
+ "loss": 1.4148,
1325
+ "learning_rate": 0.0001846897144084174,
1326
+ "gpu_memory_gb": 0.8661794662475586,
1327
+ "system_memory_percent": 7.0
1328
+ },
1329
+ {
1330
+ "step": 7800,
1331
+ "epoch": 1.22,
1332
+ "loss": 1.4197,
1333
+ "learning_rate": 0.00018390057977238564,
1334
+ "gpu_memory_gb": 0.8661794662475586,
1335
+ "system_memory_percent": 7.0
1336
+ },
1337
+ {
1338
+ "step": 7850,
1339
+ "epoch": 1.23,
1340
+ "loss": 1.4202,
1341
+ "learning_rate": 0.00018311144513635385,
1342
+ "gpu_memory_gb": 0.8661794662475586,
1343
+ "system_memory_percent": 7.0
1344
+ },
1345
+ {
1346
+ "step": 7900,
1347
+ "epoch": 1.24,
1348
+ "loss": 1.4046,
1349
+ "learning_rate": 0.00018230620571183163,
1350
+ "gpu_memory_gb": 0.8661794662475586,
1351
+ "system_memory_percent": 7.0
1352
+ },
1353
+ {
1354
+ "step": 7950,
1355
+ "epoch": 1.25,
1356
+ "loss": 1.3885,
1357
+ "learning_rate": 0.0001815009662873094,
1358
+ "gpu_memory_gb": 0.8661794662475586,
1359
+ "system_memory_percent": 7.0
1360
+ },
1361
+ {
1362
+ "step": 8000,
1363
+ "epoch": 1.25,
1364
+ "loss": 1.4116,
1365
+ "learning_rate": 0.00018071183165127761,
1366
+ "gpu_memory_gb": 0.8661794662475586,
1367
+ "system_memory_percent": 6.9
1368
+ },
1369
+ {
1370
+ "step": 8000,
1371
+ "epoch": 1.25,
1372
+ "eval_loss": 1.2042211294174194,
1373
+ "eval_runtime": 95.1992,
1374
+ "eval_samples_per_second": 119.035,
1375
+ "eval_steps_per_second": 7.448,
1376
+ "gpu_memory_gb": 0.8661794662475586,
1377
+ "system_memory_percent": 7.0
1378
+ },
1379
+ {
1380
+ "step": 8050,
1381
+ "epoch": 1.26,
1382
+ "loss": 1.4231,
1383
+ "learning_rate": 0.00017993880180373632,
1384
+ "gpu_memory_gb": 0.8661794662475586,
1385
+ "system_memory_percent": 7.0
1386
+ },
1387
+ {
1388
+ "step": 8100,
1389
+ "epoch": 1.27,
1390
+ "loss": 1.3973,
1391
+ "learning_rate": 0.00017913356237921405,
1392
+ "gpu_memory_gb": 0.8661794662475586,
1393
+ "system_memory_percent": 7.0
1394
+ },
1395
+ {
1396
+ "step": 8150,
1397
+ "epoch": 1.28,
1398
+ "loss": 1.4082,
1399
+ "learning_rate": 0.00017832832295469183,
1400
+ "gpu_memory_gb": 0.8661794662475586,
1401
+ "system_memory_percent": 7.1
1402
+ },
1403
+ {
1404
+ "step": 8200,
1405
+ "epoch": 1.29,
1406
+ "loss": 1.4026,
1407
+ "learning_rate": 0.0001775230835301696,
1408
+ "gpu_memory_gb": 0.8661794662475586,
1409
+ "system_memory_percent": 7.1
1410
+ },
1411
+ {
1412
+ "step": 8250,
1413
+ "epoch": 1.29,
1414
+ "loss": 1.4261,
1415
+ "learning_rate": 0.0001767500536826283,
1416
+ "gpu_memory_gb": 0.8661794662475586,
1417
+ "system_memory_percent": 7.1
1418
+ },
1419
+ {
1420
+ "step": 8300,
1421
+ "epoch": 1.3,
1422
+ "loss": 1.4162,
1423
+ "learning_rate": 0.00017594481425810607,
1424
+ "gpu_memory_gb": 0.8661794662475586,
1425
+ "system_memory_percent": 7.1
1426
+ },
1427
+ {
1428
+ "step": 8350,
1429
+ "epoch": 1.31,
1430
+ "loss": 1.4007,
1431
+ "learning_rate": 0.00017513957483358383,
1432
+ "gpu_memory_gb": 0.8661794662475586,
1433
+ "system_memory_percent": 7.1
1434
+ },
1435
+ {
1436
+ "step": 8400,
1437
+ "epoch": 1.32,
1438
+ "loss": 1.4075,
1439
+ "learning_rate": 0.0001743343354090616,
1440
+ "gpu_memory_gb": 0.8661794662475586,
1441
+ "system_memory_percent": 7.1
1442
+ },
1443
+ {
1444
+ "step": 8450,
1445
+ "epoch": 1.33,
1446
+ "loss": 1.3911,
1447
+ "learning_rate": 0.0001735290959845394,
1448
+ "gpu_memory_gb": 0.8661794662475586,
1449
+ "system_memory_percent": 7.1
1450
+ },
1451
+ {
1452
+ "step": 8500,
1453
+ "epoch": 1.33,
1454
+ "loss": 1.3977,
1455
+ "learning_rate": 0.00017272385656001715,
1456
+ "gpu_memory_gb": 0.8661794662475586,
1457
+ "system_memory_percent": 7.1
1458
+ },
1459
+ {
1460
+ "step": 8550,
1461
+ "epoch": 1.34,
1462
+ "loss": 1.4008,
1463
+ "learning_rate": 0.00017193472192398538,
1464
+ "gpu_memory_gb": 0.8661794662475586,
1465
+ "system_memory_percent": 7.0
1466
+ },
1467
+ {
1468
+ "step": 8600,
1469
+ "epoch": 1.35,
1470
+ "loss": 1.3947,
1471
+ "learning_rate": 0.00017112948249946316,
1472
+ "gpu_memory_gb": 0.8661794662475586,
1473
+ "system_memory_percent": 6.9
1474
+ },
1475
+ {
1476
+ "step": 8650,
1477
+ "epoch": 1.36,
1478
+ "loss": 1.3825,
1479
+ "learning_rate": 0.00017032424307494094,
1480
+ "gpu_memory_gb": 0.8661794662475586,
1481
+ "system_memory_percent": 6.9
1482
+ },
1483
+ {
1484
+ "step": 8700,
1485
+ "epoch": 1.36,
1486
+ "loss": 1.4022,
1487
+ "learning_rate": 0.00016953510843890915,
1488
+ "gpu_memory_gb": 0.8661794662475586,
1489
+ "system_memory_percent": 6.9
1490
+ },
1491
+ {
1492
+ "step": 8750,
1493
+ "epoch": 1.37,
1494
+ "loss": 1.3865,
1495
+ "learning_rate": 0.00016872986901438693,
1496
+ "gpu_memory_gb": 0.8661794662475586,
1497
+ "system_memory_percent": 6.9
1498
+ },
1499
+ {
1500
+ "step": 8800,
1501
+ "epoch": 1.38,
1502
+ "loss": 1.435,
1503
+ "learning_rate": 0.00016795683916684558,
1504
+ "gpu_memory_gb": 0.8661794662475586,
1505
+ "system_memory_percent": 6.9
1506
+ },
1507
+ {
1508
+ "step": 8850,
1509
+ "epoch": 1.39,
1510
+ "loss": 1.41,
1511
+ "learning_rate": 0.00016716770453081384,
1512
+ "gpu_memory_gb": 0.8661794662475586,
1513
+ "system_memory_percent": 6.9
1514
+ },
1515
+ {
1516
+ "step": 8900,
1517
+ "epoch": 1.4,
1518
+ "loss": 1.3888,
1519
+ "learning_rate": 0.00016636246510629157,
1520
+ "gpu_memory_gb": 0.8661794662475586,
1521
+ "system_memory_percent": 6.9
1522
+ },
1523
+ {
1524
+ "step": 8950,
1525
+ "epoch": 1.4,
1526
+ "loss": 1.4151,
1527
+ "learning_rate": 0.00016557333047025982,
1528
+ "gpu_memory_gb": 0.8661794662475586,
1529
+ "system_memory_percent": 7.0
1530
+ },
1531
+ {
1532
+ "step": 9000,
1533
+ "epoch": 1.41,
1534
+ "loss": 1.3786,
1535
+ "learning_rate": 0.00016478419583422802,
1536
+ "gpu_memory_gb": 0.8661794662475586,
1537
+ "system_memory_percent": 7.0
1538
+ },
1539
+ {
1540
+ "step": 9000,
1541
+ "epoch": 1.41,
1542
+ "eval_loss": 1.1985399723052979,
1543
+ "eval_runtime": 94.774,
1544
+ "eval_samples_per_second": 119.569,
1545
+ "eval_steps_per_second": 7.481,
1546
+ "gpu_memory_gb": 0.8661794662475586,
1547
+ "system_memory_percent": 7.0
1548
+ },
1549
+ {
1550
+ "step": 9050,
1551
+ "epoch": 1.42,
1552
+ "loss": 1.3709,
1553
+ "learning_rate": 0.00016399506119819625,
1554
+ "gpu_memory_gb": 0.8661794662475586,
1555
+ "system_memory_percent": 7.0
1556
+ },
1557
+ {
1558
+ "step": 9100,
1559
+ "epoch": 1.43,
1560
+ "loss": 1.4021,
1561
+ "learning_rate": 0.00016318982177367404,
1562
+ "gpu_memory_gb": 0.8661794662475586,
1563
+ "system_memory_percent": 7.0
1564
+ },
1565
+ {
1566
+ "step": 9150,
1567
+ "epoch": 1.43,
1568
+ "loss": 1.4054,
1569
+ "learning_rate": 0.00016238458234915182,
1570
+ "gpu_memory_gb": 0.8661794662475586,
1571
+ "system_memory_percent": 7.0
1572
+ },
1573
+ {
1574
+ "step": 9200,
1575
+ "epoch": 1.44,
1576
+ "loss": 1.3909,
1577
+ "learning_rate": 0.00016157934292462955,
1578
+ "gpu_memory_gb": 0.8661794662475586,
1579
+ "system_memory_percent": 7.5
1580
+ },
1581
+ {
1582
+ "step": 9250,
1583
+ "epoch": 1.45,
1584
+ "loss": 1.3949,
1585
+ "learning_rate": 0.0001607902082885978,
1586
+ "gpu_memory_gb": 0.8661794662475586,
1587
+ "system_memory_percent": 7.0
1588
+ },
1589
+ {
1590
+ "step": 9300,
1591
+ "epoch": 1.46,
1592
+ "loss": 1.3927,
1593
+ "learning_rate": 0.000160001073652566,
1594
+ "gpu_memory_gb": 0.8661794662475586,
1595
+ "system_memory_percent": 7.0
1596
+ },
1597
+ {
1598
+ "step": 9350,
1599
+ "epoch": 1.47,
1600
+ "loss": 1.4053,
1601
+ "learning_rate": 0.0001591958342280438,
1602
+ "gpu_memory_gb": 0.8661794662475586,
1603
+ "system_memory_percent": 7.0
1604
+ },
1605
+ {
1606
+ "step": 9400,
1607
+ "epoch": 1.47,
1608
+ "loss": 1.4217,
1609
+ "learning_rate": 0.00015842280438050244,
1610
+ "gpu_memory_gb": 0.8661794662475586,
1611
+ "system_memory_percent": 7.3
1612
+ },
1613
+ {
1614
+ "step": 9450,
1615
+ "epoch": 1.48,
1616
+ "loss": 1.4174,
1617
+ "learning_rate": 0.0001576497745329611,
1618
+ "gpu_memory_gb": 0.8661794662475586,
1619
+ "system_memory_percent": 7.0
1620
+ },
1621
+ {
1622
+ "step": 9500,
1623
+ "epoch": 1.49,
1624
+ "loss": 1.4396,
1625
+ "learning_rate": 0.00015686063989692935,
1626
+ "gpu_memory_gb": 0.8661794662475586,
1627
+ "system_memory_percent": 6.9
1628
+ },
1629
+ {
1630
+ "step": 9550,
1631
+ "epoch": 1.5,
1632
+ "loss": 1.3969,
1633
+ "learning_rate": 0.00015605540047240713,
1634
+ "gpu_memory_gb": 0.8661794662475586,
1635
+ "system_memory_percent": 6.9
1636
+ },
1637
+ {
1638
+ "step": 9600,
1639
+ "epoch": 1.51,
1640
+ "loss": 1.3905,
1641
+ "learning_rate": 0.0001552501610478849,
1642
+ "gpu_memory_gb": 0.8661794662475586,
1643
+ "system_memory_percent": 6.9
1644
+ },
1645
+ {
1646
+ "step": 9650,
1647
+ "epoch": 1.51,
1648
+ "loss": 1.3712,
1649
+ "learning_rate": 0.0001544449216233627,
1650
+ "gpu_memory_gb": 0.8661794662475586,
1651
+ "system_memory_percent": 6.9
1652
+ },
1653
+ {
1654
+ "step": 9700,
1655
+ "epoch": 1.52,
1656
+ "loss": 1.3787,
1657
+ "learning_rate": 0.00015363968219884042,
1658
+ "gpu_memory_gb": 0.8661794662475586,
1659
+ "system_memory_percent": 6.9
1660
+ },
1661
+ {
1662
+ "step": 9750,
1663
+ "epoch": 1.53,
1664
+ "loss": 1.3795,
1665
+ "learning_rate": 0.0001528344427743182,
1666
+ "gpu_memory_gb": 0.8661794662475586,
1667
+ "system_memory_percent": 6.9
1668
+ },
1669
+ {
1670
+ "step": 9800,
1671
+ "epoch": 1.54,
1672
+ "loss": 1.4115,
1673
+ "learning_rate": 0.00015207751771526733,
1674
+ "gpu_memory_gb": 0.8661794662475586,
1675
+ "system_memory_percent": 6.9
1676
+ },
1677
+ {
1678
+ "step": 9850,
1679
+ "epoch": 1.54,
1680
+ "loss": 1.3968,
1681
+ "learning_rate": 0.00015128838307923553,
1682
+ "gpu_memory_gb": 0.8661794662475586,
1683
+ "system_memory_percent": 6.9
1684
+ },
1685
+ {
1686
+ "step": 9900,
1687
+ "epoch": 1.55,
1688
+ "loss": 1.4296,
1689
+ "learning_rate": 0.0001505153532316942,
1690
+ "gpu_memory_gb": 0.8661794662475586,
1691
+ "system_memory_percent": 6.9
1692
+ },
1693
+ {
1694
+ "step": 9950,
1695
+ "epoch": 1.56,
1696
+ "loss": 1.3983,
1697
+ "learning_rate": 0.00014975842817264334,
1698
+ "gpu_memory_gb": 0.8661794662475586,
1699
+ "system_memory_percent": 6.9
1700
+ },
1701
+ {
1702
+ "step": 10000,
1703
+ "epoch": 1.57,
1704
+ "loss": 1.3813,
1705
+ "learning_rate": 0.0001489531887481211,
1706
+ "gpu_memory_gb": 0.8661794662475586,
1707
+ "system_memory_percent": 6.9
1708
+ },
1709
+ {
1710
+ "step": 10000,
1711
+ "epoch": 1.57,
1712
+ "eval_loss": 1.1934435367584229,
1713
+ "eval_runtime": 95.1956,
1714
+ "eval_samples_per_second": 119.039,
1715
+ "eval_steps_per_second": 7.448,
1716
+ "gpu_memory_gb": 0.8661794662475586,
1717
+ "system_memory_percent": 6.9
1718
+ },
1719
+ {
1720
+ "step": 10050,
1721
+ "epoch": 1.58,
1722
+ "loss": 1.3868,
1723
+ "learning_rate": 0.00014814794932359887,
1724
+ "gpu_memory_gb": 0.8661794662475586,
1725
+ "system_memory_percent": 6.9
1726
+ },
1727
+ {
1728
+ "step": 10100,
1729
+ "epoch": 1.58,
1730
+ "loss": 1.4145,
1731
+ "learning_rate": 0.00014737491947605752,
1732
+ "gpu_memory_gb": 0.8661794662475586,
1733
+ "system_memory_percent": 6.9
1734
+ },
1735
+ {
1736
+ "step": 10150,
1737
+ "epoch": 1.59,
1738
+ "loss": 1.3916,
1739
+ "learning_rate": 0.00014658578484002575,
1740
+ "gpu_memory_gb": 0.8661794662475586,
1741
+ "system_memory_percent": 6.9
1742
+ },
1743
+ {
1744
+ "step": 10200,
1745
+ "epoch": 1.6,
1746
+ "loss": 1.3796,
1747
+ "learning_rate": 0.00014578054541550354,
1748
+ "gpu_memory_gb": 0.8661794662475586,
1749
+ "system_memory_percent": 6.9
1750
+ },
1751
+ {
1752
+ "step": 10250,
1753
+ "epoch": 1.61,
1754
+ "loss": 1.4049,
1755
+ "learning_rate": 0.00014499141077947174,
1756
+ "gpu_memory_gb": 0.8661794662475586,
1757
+ "system_memory_percent": 6.9
1758
+ },
1759
+ {
1760
+ "step": 10300,
1761
+ "epoch": 1.62,
1762
+ "loss": 1.3931,
1763
+ "learning_rate": 0.00014420227614343997,
1764
+ "gpu_memory_gb": 0.8661794662475586,
1765
+ "system_memory_percent": 6.9
1766
+ },
1767
+ {
1768
+ "step": 10350,
1769
+ "epoch": 1.62,
1770
+ "loss": 1.3685,
1771
+ "learning_rate": 0.00014339703671891775,
1772
+ "gpu_memory_gb": 0.8661794662475586,
1773
+ "system_memory_percent": 6.9
1774
+ },
1775
+ {
1776
+ "step": 10400,
1777
+ "epoch": 1.63,
1778
+ "loss": 1.3856,
1779
+ "learning_rate": 0.0001425917972943955,
1780
+ "gpu_memory_gb": 0.8661794662475586,
1781
+ "system_memory_percent": 6.9
1782
+ },
1783
+ {
1784
+ "step": 10450,
1785
+ "epoch": 1.64,
1786
+ "loss": 1.3871,
1787
+ "learning_rate": 0.00014180266265836374,
1788
+ "gpu_memory_gb": 0.8661794662475586,
1789
+ "system_memory_percent": 6.9
1790
+ },
1791
+ {
1792
+ "step": 10500,
1793
+ "epoch": 1.65,
1794
+ "loss": 1.3822,
1795
+ "learning_rate": 0.00014099742323384152,
1796
+ "gpu_memory_gb": 0.8661794662475586,
1797
+ "system_memory_percent": 6.9
1798
+ },
1799
+ {
1800
+ "step": 10550,
1801
+ "epoch": 1.65,
1802
+ "loss": 1.3909,
1803
+ "learning_rate": 0.0001401921838093193,
1804
+ "gpu_memory_gb": 0.8661794662475586,
1805
+ "system_memory_percent": 6.9
1806
+ },
1807
+ {
1808
+ "step": 10600,
1809
+ "epoch": 1.66,
1810
+ "loss": 1.3876,
1811
+ "learning_rate": 0.0001394030491732875,
1812
+ "gpu_memory_gb": 0.8661794662475586,
1813
+ "system_memory_percent": 6.9
1814
+ },
1815
+ {
1816
+ "step": 10650,
1817
+ "epoch": 1.67,
1818
+ "loss": 1.3611,
1819
+ "learning_rate": 0.0001385978097487653,
1820
+ "gpu_memory_gb": 0.8661794662475586,
1821
+ "system_memory_percent": 6.9
1822
+ },
1823
+ {
1824
+ "step": 10700,
1825
+ "epoch": 1.68,
1826
+ "loss": 1.3871,
1827
+ "learning_rate": 0.0001378086751127335,
1828
+ "gpu_memory_gb": 0.8661794662475586,
1829
+ "system_memory_percent": 6.9
1830
+ },
1831
+ {
1832
+ "step": 10750,
1833
+ "epoch": 1.69,
1834
+ "loss": 1.3808,
1835
+ "learning_rate": 0.00013700343568821127,
1836
+ "gpu_memory_gb": 0.8661794662475586,
1837
+ "system_memory_percent": 6.9
1838
+ },
1839
+ {
1840
+ "step": 10800,
1841
+ "epoch": 1.69,
1842
+ "loss": 1.3733,
1843
+ "learning_rate": 0.00013619819626368906,
1844
+ "gpu_memory_gb": 0.8661794662475586,
1845
+ "system_memory_percent": 6.9
1846
+ },
1847
+ {
1848
+ "step": 10850,
1849
+ "epoch": 1.7,
1850
+ "loss": 1.3835,
1851
+ "learning_rate": 0.00013539295683916684,
1852
+ "gpu_memory_gb": 0.8661794662475586,
1853
+ "system_memory_percent": 6.9
1854
+ },
1855
+ {
1856
+ "step": 10900,
1857
+ "epoch": 1.71,
1858
+ "loss": 1.3768,
1859
+ "learning_rate": 0.00013460382220313504,
1860
+ "gpu_memory_gb": 0.8661794662475586,
1861
+ "system_memory_percent": 6.9
1862
+ },
1863
+ {
1864
+ "step": 10950,
1865
+ "epoch": 1.72,
1866
+ "loss": 1.3826,
1867
+ "learning_rate": 0.00013379858277861282,
1868
+ "gpu_memory_gb": 0.8661794662475586,
1869
+ "system_memory_percent": 6.9
1870
+ },
1871
+ {
1872
+ "step": 11000,
1873
+ "epoch": 1.73,
1874
+ "loss": 1.3699,
1875
+ "learning_rate": 0.0001329933433540906,
1876
+ "gpu_memory_gb": 0.8661794662475586,
1877
+ "system_memory_percent": 6.9
1878
+ },
1879
+ {
1880
+ "step": 11000,
1881
+ "epoch": 1.73,
1882
+ "eval_loss": 1.1888667345046997,
1883
+ "eval_runtime": 94.9678,
1884
+ "eval_samples_per_second": 119.325,
1885
+ "eval_steps_per_second": 7.466,
1886
+ "gpu_memory_gb": 0.8661794662475586,
1887
+ "system_memory_percent": 6.9
1888
+ },
1889
+ {
1890
+ "step": 11050,
1891
+ "epoch": 1.73,
1892
+ "loss": 1.3872,
1893
+ "learning_rate": 0.00013218810392956836,
1894
+ "gpu_memory_gb": 0.8661794662475586,
1895
+ "system_memory_percent": 6.9
1896
+ },
1897
+ {
1898
+ "step": 11100,
1899
+ "epoch": 1.74,
1900
+ "loss": 1.3747,
1901
+ "learning_rate": 0.0001313989692935366,
1902
+ "gpu_memory_gb": 0.8661794662475586,
1903
+ "system_memory_percent": 6.9
1904
+ },
1905
+ {
1906
+ "step": 11150,
1907
+ "epoch": 1.75,
1908
+ "loss": 1.3827,
1909
+ "learning_rate": 0.00013059372986901438,
1910
+ "gpu_memory_gb": 0.8661794662475586,
1911
+ "system_memory_percent": 6.9
1912
+ },
1913
+ {
1914
+ "step": 11200,
1915
+ "epoch": 1.76,
1916
+ "loss": 1.4229,
1917
+ "learning_rate": 0.0001298368048099635,
1918
+ "gpu_memory_gb": 0.8661794662475586,
1919
+ "system_memory_percent": 6.9
1920
+ },
1921
+ {
1922
+ "step": 11250,
1923
+ "epoch": 1.76,
1924
+ "loss": 1.3915,
1925
+ "learning_rate": 0.00012903156538544126,
1926
+ "gpu_memory_gb": 0.8661794662475586,
1927
+ "system_memory_percent": 7.1
1928
+ },
1929
+ {
1930
+ "step": 11300,
1931
+ "epoch": 1.77,
1932
+ "loss": 1.388,
1933
+ "learning_rate": 0.00012822632596091904,
1934
+ "gpu_memory_gb": 0.8661794662475586,
1935
+ "system_memory_percent": 6.9
1936
+ },
1937
+ {
1938
+ "step": 11350,
1939
+ "epoch": 1.78,
1940
+ "loss": 1.3952,
1941
+ "learning_rate": 0.00012743719132488727,
1942
+ "gpu_memory_gb": 0.8661794662475586,
1943
+ "system_memory_percent": 6.9
1944
+ },
1945
+ {
1946
+ "step": 11400,
1947
+ "epoch": 1.79,
1948
+ "loss": 1.3712,
1949
+ "learning_rate": 0.00012664805668885547,
1950
+ "gpu_memory_gb": 0.8661794662475586,
1951
+ "system_memory_percent": 6.9
1952
+ },
1953
+ {
1954
+ "step": 11450,
1955
+ "epoch": 1.8,
1956
+ "loss": 1.3949,
1957
+ "learning_rate": 0.0001258589220528237,
1958
+ "gpu_memory_gb": 0.8661794662475586,
1959
+ "system_memory_percent": 6.9
1960
+ },
1961
+ {
1962
+ "step": 11500,
1963
+ "epoch": 1.8,
1964
+ "loss": 1.3744,
1965
+ "learning_rate": 0.00012505368262830148,
1966
+ "gpu_memory_gb": 0.8661794662475586,
1967
+ "system_memory_percent": 6.9
1968
+ },
1969
+ {
1970
+ "step": 11550,
1971
+ "epoch": 1.81,
1972
+ "loss": 1.3609,
1973
+ "learning_rate": 0.00012424844320377924,
1974
+ "gpu_memory_gb": 0.8661794662475586,
1975
+ "system_memory_percent": 6.9
1976
+ },
1977
+ {
1978
+ "step": 11600,
1979
+ "epoch": 1.82,
1980
+ "loss": 1.3655,
1981
+ "learning_rate": 0.00012344320377925702,
1982
+ "gpu_memory_gb": 0.8661794662475586,
1983
+ "system_memory_percent": 6.9
1984
+ },
1985
+ {
1986
+ "step": 11650,
1987
+ "epoch": 1.83,
1988
+ "loss": 1.3702,
1989
+ "learning_rate": 0.0001226379643547348,
1990
+ "gpu_memory_gb": 0.8661794662475586,
1991
+ "system_memory_percent": 6.9
1992
+ },
1993
+ {
1994
+ "step": 11700,
1995
+ "epoch": 1.83,
1996
+ "loss": 1.3929,
1997
+ "learning_rate": 0.00012184882971870302,
1998
+ "gpu_memory_gb": 0.8661794662475586,
1999
+ "system_memory_percent": 6.9
2000
+ },
2001
+ {
2002
+ "step": 11750,
2003
+ "epoch": 1.84,
2004
+ "loss": 1.3611,
2005
+ "learning_rate": 0.00012104359029418079,
2006
+ "gpu_memory_gb": 0.8661794662475586,
2007
+ "system_memory_percent": 6.9
2008
+ },
2009
+ {
2010
+ "step": 11800,
2011
+ "epoch": 1.85,
2012
+ "loss": 1.37,
2013
+ "learning_rate": 0.00012023835086965857,
2014
+ "gpu_memory_gb": 0.8661794662475586,
2015
+ "system_memory_percent": 6.9
2016
+ },
2017
+ {
2018
+ "step": 11850,
2019
+ "epoch": 1.86,
2020
+ "loss": 1.4018,
2021
+ "learning_rate": 0.00011946532102211722,
2022
+ "gpu_memory_gb": 0.8661794662475586,
2023
+ "system_memory_percent": 6.9
2024
+ },
2025
+ {
2026
+ "step": 11900,
2027
+ "epoch": 1.87,
2028
+ "loss": 1.3757,
2029
+ "learning_rate": 0.000118660081597595,
2030
+ "gpu_memory_gb": 0.8661794662475586,
2031
+ "system_memory_percent": 6.9
2032
+ },
2033
+ {
2034
+ "step": 11950,
2035
+ "epoch": 1.87,
2036
+ "loss": 1.3949,
2037
+ "learning_rate": 0.00011788705175005367,
2038
+ "gpu_memory_gb": 0.8661794662475586,
2039
+ "system_memory_percent": 6.9
2040
+ },
2041
+ {
2042
+ "step": 12000,
2043
+ "epoch": 1.88,
2044
+ "loss": 1.3671,
2045
+ "learning_rate": 0.00011708181232553145,
2046
+ "gpu_memory_gb": 0.8661794662475586,
2047
+ "system_memory_percent": 6.9
2048
+ },
2049
+ {
2050
+ "step": 12000,
2051
+ "epoch": 1.88,
2052
+ "eval_loss": 1.1848528385162354,
2053
+ "eval_runtime": 94.7376,
2054
+ "eval_samples_per_second": 119.615,
2055
+ "eval_steps_per_second": 7.484,
2056
+ "gpu_memory_gb": 0.8661794662475586,
2057
+ "system_memory_percent": 6.9
2058
+ },
2059
+ {
2060
+ "step": 12050,
2061
+ "epoch": 1.89,
2062
+ "loss": 1.3686,
2063
+ "learning_rate": 0.00011627657290100923,
2064
+ "gpu_memory_gb": 0.8661794662475586,
2065
+ "system_memory_percent": 6.9
2066
+ },
2067
+ {
2068
+ "step": 12100,
2069
+ "epoch": 1.9,
2070
+ "loss": 1.3721,
2071
+ "learning_rate": 0.00011547133347648699,
2072
+ "gpu_memory_gb": 0.8661794662475586,
2073
+ "system_memory_percent": 6.9
2074
+ },
2075
+ {
2076
+ "step": 12150,
2077
+ "epoch": 1.91,
2078
+ "loss": 1.3638,
2079
+ "learning_rate": 0.00011466609405196477,
2080
+ "gpu_memory_gb": 0.8661794662475586,
2081
+ "system_memory_percent": 6.9
2082
+ },
2083
+ {
2084
+ "step": 12200,
2085
+ "epoch": 1.91,
2086
+ "loss": 1.375,
2087
+ "learning_rate": 0.00011386085462744256,
2088
+ "gpu_memory_gb": 0.8661794662475586,
2089
+ "system_memory_percent": 6.9
2090
+ },
2091
+ {
2092
+ "step": 12250,
2093
+ "epoch": 1.92,
2094
+ "loss": 1.3774,
2095
+ "learning_rate": 0.00011305561520292033,
2096
+ "gpu_memory_gb": 0.8661794662475586,
2097
+ "system_memory_percent": 6.9
2098
+ },
2099
+ {
2100
+ "step": 12300,
2101
+ "epoch": 1.93,
2102
+ "loss": 1.3897,
2103
+ "learning_rate": 0.00011226648056688854,
2104
+ "gpu_memory_gb": 0.8661794662475586,
2105
+ "system_memory_percent": 6.9
2106
+ },
2107
+ {
2108
+ "step": 12350,
2109
+ "epoch": 1.94,
2110
+ "loss": 1.369,
2111
+ "learning_rate": 0.00011146124114236632,
2112
+ "gpu_memory_gb": 0.8661794662475586,
2113
+ "system_memory_percent": 6.9
2114
+ },
2115
+ {
2116
+ "step": 12400,
2117
+ "epoch": 1.94,
2118
+ "loss": 1.8621,
2119
+ "learning_rate": 0.00011089757354520076,
2120
+ "gpu_memory_gb": 0.8661794662475586,
2121
+ "system_memory_percent": 6.9
2122
+ },
2123
+ {
2124
+ "step": 12450,
2125
+ "epoch": 1.95,
2126
+ "loss": 0.0,
2127
+ "learning_rate": 0.00011089757354520076,
2128
+ "gpu_memory_gb": 0.8661794662475586,
2129
+ "system_memory_percent": 6.9
2130
+ },
2131
+ {
2132
+ "step": 12500,
2133
+ "epoch": 1.96,
2134
+ "loss": 0.0,
2135
+ "learning_rate": 0.00011089757354520076,
2136
+ "gpu_memory_gb": 0.8661794662475586,
2137
+ "system_memory_percent": 6.9
2138
+ },
2139
+ {
2140
+ "step": 12550,
2141
+ "epoch": 1.97,
2142
+ "loss": 0.0,
2143
+ "learning_rate": 0.00011089757354520076,
2144
+ "gpu_memory_gb": 0.8661794662475586,
2145
+ "system_memory_percent": 6.9
2146
+ },
2147
+ {
2148
+ "step": 12600,
2149
+ "epoch": 1.98,
2150
+ "loss": 0.0,
2151
+ "learning_rate": 0.00011089757354520076,
2152
+ "gpu_memory_gb": 0.8661794662475586,
2153
+ "system_memory_percent": 6.9
2154
+ },
2155
+ {
2156
+ "step": 12650,
2157
+ "epoch": 1.98,
2158
+ "loss": 0.0,
2159
+ "learning_rate": 0.00011089757354520076,
2160
+ "gpu_memory_gb": 0.8661794662475586,
2161
+ "system_memory_percent": 6.9
2162
+ },
2163
+ {
2164
+ "step": 12700,
2165
+ "epoch": 1.99,
2166
+ "loss": 0.0,
2167
+ "learning_rate": 0.00011089757354520076,
2168
+ "gpu_memory_gb": 0.8661794662475586,
2169
+ "system_memory_percent": 6.9
2170
+ },
2171
+ {
2172
+ "step": 12750,
2173
+ "epoch": 2.0,
2174
+ "loss": 0.0,
2175
+ "learning_rate": 0.00011089757354520076,
2176
+ "gpu_memory_gb": 0.8661794662475586,
2177
+ "system_memory_percent": 6.9
2178
+ },
2179
+ {
2180
+ "step": 12800,
2181
+ "epoch": 2.01,
2182
+ "loss": 0.0,
2183
+ "learning_rate": 0.00011089757354520076,
2184
+ "gpu_memory_gb": 0.8661794662475586,
2185
+ "system_memory_percent": 7.1
2186
+ },
2187
+ {
2188
+ "step": 12850,
2189
+ "epoch": 2.02,
2190
+ "loss": 0.0,
2191
+ "learning_rate": 0.00011089757354520076,
2192
+ "gpu_memory_gb": 0.8661794662475586,
2193
+ "system_memory_percent": 7.0
2194
+ },
2195
+ {
2196
+ "step": 12900,
2197
+ "epoch": 2.02,
2198
+ "loss": 0.0,
2199
+ "learning_rate": 0.00011089757354520076,
2200
+ "gpu_memory_gb": 0.8661794662475586,
2201
+ "system_memory_percent": 6.9
2202
+ },
2203
+ {
2204
+ "step": 12950,
2205
+ "epoch": 2.03,
2206
+ "loss": 0.0,
2207
+ "learning_rate": 0.00011089757354520076,
2208
+ "gpu_memory_gb": 0.8661794662475586,
2209
+ "system_memory_percent": 7.0
2210
+ },
2211
+ {
2212
+ "step": 13000,
2213
+ "epoch": 2.04,
2214
+ "loss": 0.0,
2215
+ "learning_rate": 0.00011089757354520076,
2216
+ "gpu_memory_gb": 0.8661794662475586,
2217
+ "system_memory_percent": 7.0
2218
+ },
2219
+ {
2220
+ "step": 13000,
2221
+ "epoch": 2.04,
2222
+ "eval_loss": NaN,
2223
+ "eval_runtime": 93.4731,
2224
+ "eval_samples_per_second": 121.233,
2225
+ "eval_steps_per_second": 7.585,
2226
+ "gpu_memory_gb": 0.8661794662475586,
2227
+ "system_memory_percent": 7.0
2228
+ },
2229
+ {
2230
+ "step": 13050,
2231
+ "epoch": 2.05,
2232
+ "loss": 0.0,
2233
+ "learning_rate": 0.00011089757354520076,
2234
+ "gpu_memory_gb": 0.8661794662475586,
2235
+ "system_memory_percent": 7.0
2236
+ },
2237
+ {
2238
+ "step": 13100,
2239
+ "epoch": 2.05,
2240
+ "loss": 0.0,
2241
+ "learning_rate": 0.00011089757354520076,
2242
+ "gpu_memory_gb": 0.8661794662475586,
2243
+ "system_memory_percent": 7.0
2244
+ },
2245
+ {
2246
+ "step": 13150,
2247
+ "epoch": 2.06,
2248
+ "loss": 0.0,
2249
+ "learning_rate": 0.00011089757354520076,
2250
+ "gpu_memory_gb": 0.8661794662475586,
2251
+ "system_memory_percent": 7.0
2252
+ },
2253
+ {
2254
+ "step": 13200,
2255
+ "epoch": 2.07,
2256
+ "loss": 0.0,
2257
+ "learning_rate": 0.00011089757354520076,
2258
+ "gpu_memory_gb": 0.8661794662475586,
2259
+ "system_memory_percent": 7.0
2260
+ },
2261
+ {
2262
+ "step": 13250,
2263
+ "epoch": 2.08,
2264
+ "loss": 0.0,
2265
+ "learning_rate": 0.00011089757354520076,
2266
+ "gpu_memory_gb": 0.8661794662475586,
2267
+ "system_memory_percent": 7.0
2268
+ },
2269
+ {
2270
+ "step": 13300,
2271
+ "epoch": 2.09,
2272
+ "loss": 0.0,
2273
+ "learning_rate": 0.00011089757354520076,
2274
+ "gpu_memory_gb": 0.8661794662475586,
2275
+ "system_memory_percent": 7.2
2276
+ },
2277
+ {
2278
+ "step": 13350,
2279
+ "epoch": 2.09,
2280
+ "loss": 0.0,
2281
+ "learning_rate": 0.00011089757354520076,
2282
+ "gpu_memory_gb": 0.8661794662475586,
2283
+ "system_memory_percent": 7.0
2284
+ },
2285
+ {
2286
+ "step": 13400,
2287
+ "epoch": 2.1,
2288
+ "loss": 0.0,
2289
+ "learning_rate": 0.00011089757354520076,
2290
+ "gpu_memory_gb": 0.8661794662475586,
2291
+ "system_memory_percent": 7.0
2292
+ },
2293
+ {
2294
+ "step": 13450,
2295
+ "epoch": 2.11,
2296
+ "loss": 0.0,
2297
+ "learning_rate": 0.00011089757354520076,
2298
+ "gpu_memory_gb": 0.8661794662475586,
2299
+ "system_memory_percent": 7.0
2300
+ },
2301
+ {
2302
+ "step": 13500,
2303
+ "epoch": 2.12,
2304
+ "loss": 0.0,
2305
+ "learning_rate": 0.00011089757354520076,
2306
+ "gpu_memory_gb": 0.8661794662475586,
2307
+ "system_memory_percent": 7.0
2308
+ },
2309
+ {
2310
+ "step": 13550,
2311
+ "epoch": 2.12,
2312
+ "loss": 0.0,
2313
+ "learning_rate": 0.00011089757354520076,
2314
+ "gpu_memory_gb": 0.8661794662475586,
2315
+ "system_memory_percent": 7.0
2316
+ },
2317
+ {
2318
+ "step": 13600,
2319
+ "epoch": 2.13,
2320
+ "loss": 0.0,
2321
+ "learning_rate": 0.00011089757354520076,
2322
+ "gpu_memory_gb": 0.8661794662475586,
2323
+ "system_memory_percent": 4.6
2324
+ },
2325
+ {
2326
+ "step": 13650,
2327
+ "epoch": 2.14,
2328
+ "loss": 0.0,
2329
+ "learning_rate": 0.00011089757354520076,
2330
+ "gpu_memory_gb": 0.8661794662475586,
2331
+ "system_memory_percent": 4.6
2332
+ },
2333
+ {
2334
+ "step": 13700,
2335
+ "epoch": 2.15,
2336
+ "loss": 0.0,
2337
+ "learning_rate": 0.00011089757354520076,
2338
+ "gpu_memory_gb": 0.8661794662475586,
2339
+ "system_memory_percent": 4.6
2340
+ },
2341
+ {
2342
+ "step": 13750,
2343
+ "epoch": 2.16,
2344
+ "loss": 0.0,
2345
+ "learning_rate": 0.00011089757354520076,
2346
+ "gpu_memory_gb": 0.8661794662475586,
2347
+ "system_memory_percent": 4.6
2348
+ },
2349
+ {
2350
+ "step": 13800,
2351
+ "epoch": 2.16,
2352
+ "loss": 0.0,
2353
+ "learning_rate": 0.00011089757354520076,
2354
+ "gpu_memory_gb": 0.8661794662475586,
2355
+ "system_memory_percent": 4.5
2356
+ },
2357
+ {
2358
+ "step": 13850,
2359
+ "epoch": 2.17,
2360
+ "loss": 0.0,
2361
+ "learning_rate": 0.00011089757354520076,
2362
+ "gpu_memory_gb": 0.8661794662475586,
2363
+ "system_memory_percent": 4.5
2364
+ },
2365
+ {
2366
+ "step": 13900,
2367
+ "epoch": 2.18,
2368
+ "loss": 0.0,
2369
+ "learning_rate": 0.00011089757354520076,
2370
+ "gpu_memory_gb": 0.8661794662475586,
2371
+ "system_memory_percent": 4.5
2372
+ },
2373
+ {
2374
+ "step": 13950,
2375
+ "epoch": 2.19,
2376
+ "loss": 0.0,
2377
+ "learning_rate": 0.00011089757354520076,
2378
+ "gpu_memory_gb": 0.8661794662475586,
2379
+ "system_memory_percent": 5.0
2380
+ },
2381
+ {
2382
+ "step": 14000,
2383
+ "epoch": 2.2,
2384
+ "loss": 0.0,
2385
+ "learning_rate": 0.00011089757354520076,
2386
+ "gpu_memory_gb": 0.8661794662475586,
2387
+ "system_memory_percent": 4.5
2388
+ },
2389
+ {
2390
+ "step": 14000,
2391
+ "epoch": 2.2,
2392
+ "eval_loss": NaN,
2393
+ "eval_runtime": 93.4623,
2394
+ "eval_samples_per_second": 121.247,
2395
+ "eval_steps_per_second": 7.586,
2396
+ "gpu_memory_gb": 0.8661794662475586,
2397
+ "system_memory_percent": 4.5
2398
+ },
2399
+ {
2400
+ "step": 14050,
2401
+ "epoch": 2.2,
2402
+ "loss": 0.0,
2403
+ "learning_rate": 0.00011089757354520076,
2404
+ "gpu_memory_gb": 0.8661794662475586,
2405
+ "system_memory_percent": 4.5
2406
+ },
2407
+ {
2408
+ "step": 14100,
2409
+ "epoch": 2.21,
2410
+ "loss": 0.0,
2411
+ "learning_rate": 0.00011089757354520076,
2412
+ "gpu_memory_gb": 0.8661794662475586,
2413
+ "system_memory_percent": 4.5
2414
+ },
2415
+ {
2416
+ "step": 14150,
2417
+ "epoch": 2.22,
2418
+ "loss": 0.0,
2419
+ "learning_rate": 0.00011089757354520076,
2420
+ "gpu_memory_gb": 0.8661794662475586,
2421
+ "system_memory_percent": 4.5
2422
+ },
2423
+ {
2424
+ "step": 14200,
2425
+ "epoch": 2.23,
2426
+ "loss": 0.0,
2427
+ "learning_rate": 0.00011089757354520076,
2428
+ "gpu_memory_gb": 0.8661794662475586,
2429
+ "system_memory_percent": 4.5
2430
+ },
2431
+ {
2432
+ "step": 14250,
2433
+ "epoch": 2.23,
2434
+ "loss": 0.0,
2435
+ "learning_rate": 0.00011089757354520076,
2436
+ "gpu_memory_gb": 0.8661794662475586,
2437
+ "system_memory_percent": 4.5
2438
+ },
2439
+ {
2440
+ "step": 14300,
2441
+ "epoch": 2.24,
2442
+ "loss": 0.0,
2443
+ "learning_rate": 0.00011089757354520076,
2444
+ "gpu_memory_gb": 0.8661794662475586,
2445
+ "system_memory_percent": 4.5
2446
+ },
2447
+ {
2448
+ "step": 14350,
2449
+ "epoch": 2.25,
2450
+ "loss": 0.0,
2451
+ "learning_rate": 0.00011089757354520076,
2452
+ "gpu_memory_gb": 0.8661794662475586,
2453
+ "system_memory_percent": 4.5
2454
+ },
2455
+ {
2456
+ "step": 14400,
2457
+ "epoch": 2.26,
2458
+ "loss": 0.0,
2459
+ "learning_rate": 0.00011089757354520076,
2460
+ "gpu_memory_gb": 0.8661794662475586,
2461
+ "system_memory_percent": 4.5
2462
+ },
2463
+ {
2464
+ "step": 14450,
2465
+ "epoch": 2.27,
2466
+ "loss": 0.0,
2467
+ "learning_rate": 0.00011089757354520076,
2468
+ "gpu_memory_gb": 0.8661794662475586,
2469
+ "system_memory_percent": 4.5
2470
+ },
2471
+ {
2472
+ "step": 14500,
2473
+ "epoch": 2.27,
2474
+ "loss": 0.0,
2475
+ "learning_rate": 0.00011089757354520076,
2476
+ "gpu_memory_gb": 0.8661794662475586,
2477
+ "system_memory_percent": 4.6
2478
+ },
2479
+ {
2480
+ "step": 14550,
2481
+ "epoch": 2.28,
2482
+ "loss": 0.0,
2483
+ "learning_rate": 0.00011089757354520076,
2484
+ "gpu_memory_gb": 0.8661794662475586,
2485
+ "system_memory_percent": 4.5
2486
+ },
2487
+ {
2488
+ "step": 14600,
2489
+ "epoch": 2.29,
2490
+ "loss": 0.0,
2491
+ "learning_rate": 0.00011089757354520076,
2492
+ "gpu_memory_gb": 0.8661794662475586,
2493
+ "system_memory_percent": 4.5
2494
+ },
2495
+ {
2496
+ "step": 14650,
2497
+ "epoch": 2.3,
2498
+ "loss": 0.0,
2499
+ "learning_rate": 0.00011089757354520076,
2500
+ "gpu_memory_gb": 0.8661794662475586,
2501
+ "system_memory_percent": 4.5
2502
+ },
2503
+ {
2504
+ "step": 14700,
2505
+ "epoch": 2.31,
2506
+ "loss": 0.0,
2507
+ "learning_rate": 0.00011089757354520076,
2508
+ "gpu_memory_gb": 0.8661794662475586,
2509
+ "system_memory_percent": 4.5
2510
+ },
2511
+ {
2512
+ "step": 14750,
2513
+ "epoch": 2.31,
2514
+ "loss": 0.0,
2515
+ "learning_rate": 0.00011089757354520076,
2516
+ "gpu_memory_gb": 0.8661794662475586,
2517
+ "system_memory_percent": 4.5
2518
+ },
2519
+ {
2520
+ "step": 14800,
2521
+ "epoch": 2.32,
2522
+ "loss": 0.0,
2523
+ "learning_rate": 0.00011089757354520076,
2524
+ "gpu_memory_gb": 0.8661794662475586,
2525
+ "system_memory_percent": 4.5
2526
+ },
2527
+ {
2528
+ "step": 14850,
2529
+ "epoch": 2.33,
2530
+ "loss": 0.0,
2531
+ "learning_rate": 0.00011089757354520076,
2532
+ "gpu_memory_gb": 0.8661794662475586,
2533
+ "system_memory_percent": 4.5
2534
+ },
2535
+ {
2536
+ "step": 14900,
2537
+ "epoch": 2.34,
2538
+ "loss": 0.0,
2539
+ "learning_rate": 0.00011089757354520076,
2540
+ "gpu_memory_gb": 0.8661794662475586,
2541
+ "system_memory_percent": 4.5
2542
+ },
2543
+ {
2544
+ "step": 14950,
2545
+ "epoch": 2.34,
2546
+ "loss": 0.0,
2547
+ "learning_rate": 0.00011089757354520076,
2548
+ "gpu_memory_gb": 0.8661794662475586,
2549
+ "system_memory_percent": 4.5
2550
+ },
2551
+ {
2552
+ "step": 15000,
2553
+ "epoch": 2.35,
2554
+ "loss": 0.0,
2555
+ "learning_rate": 0.00011089757354520076,
2556
+ "gpu_memory_gb": 0.8661794662475586,
2557
+ "system_memory_percent": 4.5
2558
+ },
2559
+ {
2560
+ "step": 15000,
2561
+ "epoch": 2.35,
2562
+ "eval_loss": NaN,
2563
+ "eval_runtime": 93.4289,
2564
+ "eval_samples_per_second": 121.29,
2565
+ "eval_steps_per_second": 7.589,
2566
+ "gpu_memory_gb": 0.8661794662475586,
2567
+ "system_memory_percent": 4.5
2568
+ },
2569
+ {
2570
+ "step": 15050,
2571
+ "epoch": 2.36,
2572
+ "loss": 0.0,
2573
+ "learning_rate": 0.00011089757354520076,
2574
+ "gpu_memory_gb": 0.8661794662475586,
2575
+ "system_memory_percent": 4.6
2576
+ },
2577
+ {
2578
+ "step": 15100,
2579
+ "epoch": 2.37,
2580
+ "loss": 0.0,
2581
+ "learning_rate": 0.00011089757354520076,
2582
+ "gpu_memory_gb": 0.8661794662475586,
2583
+ "system_memory_percent": 4.5
2584
+ },
2585
+ {
2586
+ "step": 15150,
2587
+ "epoch": 2.38,
2588
+ "loss": 0.0,
2589
+ "learning_rate": 0.00011089757354520076,
2590
+ "gpu_memory_gb": 0.8661794662475586,
2591
+ "system_memory_percent": 4.5
2592
+ },
2593
+ {
2594
+ "step": 15200,
2595
+ "epoch": 2.38,
2596
+ "loss": 0.0,
2597
+ "learning_rate": 0.00011089757354520076,
2598
+ "gpu_memory_gb": 0.8661794662475586,
2599
+ "system_memory_percent": 4.5
2600
+ },
2601
+ {
2602
+ "step": 15250,
2603
+ "epoch": 2.39,
2604
+ "loss": 0.0,
2605
+ "learning_rate": 0.00011089757354520076,
2606
+ "gpu_memory_gb": 0.8661794662475586,
2607
+ "system_memory_percent": 4.5
2608
+ },
2609
+ {
2610
+ "step": 15300,
2611
+ "epoch": 2.4,
2612
+ "loss": 0.0,
2613
+ "learning_rate": 0.00011089757354520076,
2614
+ "gpu_memory_gb": 0.8661794662475586,
2615
+ "system_memory_percent": 4.5
2616
+ },
2617
+ {
2618
+ "step": 15350,
2619
+ "epoch": 2.41,
2620
+ "loss": 0.0,
2621
+ "learning_rate": 0.00011089757354520076,
2622
+ "gpu_memory_gb": 0.8661794662475586,
2623
+ "system_memory_percent": 4.5
2624
+ },
2625
+ {
2626
+ "step": 15400,
2627
+ "epoch": 2.42,
2628
+ "loss": 0.0,
2629
+ "learning_rate": 0.00011089757354520076,
2630
+ "gpu_memory_gb": 0.8661794662475586,
2631
+ "system_memory_percent": 4.6
2632
+ },
2633
+ {
2634
+ "step": 15450,
2635
+ "epoch": 2.42,
2636
+ "loss": 0.0,
2637
+ "learning_rate": 0.00011089757354520076,
2638
+ "gpu_memory_gb": 0.8661794662475586,
2639
+ "system_memory_percent": 4.5
2640
+ },
2641
+ {
2642
+ "step": 15500,
2643
+ "epoch": 2.43,
2644
+ "loss": 0.0,
2645
+ "learning_rate": 0.00011089757354520076,
2646
+ "gpu_memory_gb": 0.8661794662475586,
2647
+ "system_memory_percent": 4.5
2648
+ },
2649
+ {
2650
+ "step": 15550,
2651
+ "epoch": 2.44,
2652
+ "loss": 0.0,
2653
+ "learning_rate": 0.00011089757354520076,
2654
+ "gpu_memory_gb": 0.8661794662475586,
2655
+ "system_memory_percent": 4.5
2656
+ },
2657
+ {
2658
+ "step": 15600,
2659
+ "epoch": 2.45,
2660
+ "loss": 0.0,
2661
+ "learning_rate": 0.00011089757354520076,
2662
+ "gpu_memory_gb": 0.8661794662475586,
2663
+ "system_memory_percent": 4.5
2664
+ },
2665
+ {
2666
+ "step": 15650,
2667
+ "epoch": 2.45,
2668
+ "loss": 0.0,
2669
+ "learning_rate": 0.00011089757354520076,
2670
+ "gpu_memory_gb": 0.8661794662475586,
2671
+ "system_memory_percent": 4.6
2672
+ },
2673
+ {
2674
+ "step": 15700,
2675
+ "epoch": 2.46,
2676
+ "loss": 0.0,
2677
+ "learning_rate": 0.00011089757354520076,
2678
+ "gpu_memory_gb": 0.8661794662475586,
2679
+ "system_memory_percent": 4.5
2680
+ },
2681
+ {
2682
+ "step": 15750,
2683
+ "epoch": 2.47,
2684
+ "loss": 0.0,
2685
+ "learning_rate": 0.00011089757354520076,
2686
+ "gpu_memory_gb": 0.8661794662475586,
2687
+ "system_memory_percent": 4.5
2688
+ },
2689
+ {
2690
+ "step": 15800,
2691
+ "epoch": 2.48,
2692
+ "loss": 0.0,
2693
+ "learning_rate": 0.00011089757354520076,
2694
+ "gpu_memory_gb": 0.8661794662475586,
2695
+ "system_memory_percent": 4.5
2696
+ },
2697
+ {
2698
+ "step": 15850,
2699
+ "epoch": 2.49,
2700
+ "loss": 0.0,
2701
+ "learning_rate": 0.00011089757354520076,
2702
+ "gpu_memory_gb": 0.8661794662475586,
2703
+ "system_memory_percent": 4.5
2704
+ },
2705
+ {
2706
+ "step": 15900,
2707
+ "epoch": 2.49,
2708
+ "loss": 0.0,
2709
+ "learning_rate": 0.00011089757354520076,
2710
+ "gpu_memory_gb": 0.8661794662475586,
2711
+ "system_memory_percent": 4.5
2712
+ },
2713
+ {
2714
+ "step": 15950,
2715
+ "epoch": 2.5,
2716
+ "loss": 0.0,
2717
+ "learning_rate": 0.00011089757354520076,
2718
+ "gpu_memory_gb": 0.8661794662475586,
2719
+ "system_memory_percent": 4.5
2720
+ },
2721
+ {
2722
+ "step": 16000,
2723
+ "epoch": 2.51,
2724
+ "loss": 0.0,
2725
+ "learning_rate": 0.00011089757354520076,
2726
+ "gpu_memory_gb": 0.8661794662475586,
2727
+ "system_memory_percent": 4.5
2728
+ },
2729
+ {
2730
+ "step": 16000,
2731
+ "epoch": 2.51,
2732
+ "eval_loss": NaN,
2733
+ "eval_runtime": 93.441,
2734
+ "eval_samples_per_second": 121.274,
2735
+ "eval_steps_per_second": 7.588,
2736
+ "gpu_memory_gb": 0.8661794662475586,
2737
+ "system_memory_percent": 4.5
2738
+ },
2739
+ {
2740
+ "step": 16050,
2741
+ "epoch": 2.52,
2742
+ "loss": 0.0,
2743
+ "learning_rate": 0.00011089757354520076,
2744
+ "gpu_memory_gb": 0.8661794662475586,
2745
+ "system_memory_percent": 4.6
2746
+ },
2747
+ {
2748
+ "step": 16100,
2749
+ "epoch": 2.52,
2750
+ "loss": 0.0,
2751
+ "learning_rate": 0.00011089757354520076,
2752
+ "gpu_memory_gb": 0.8661794662475586,
2753
+ "system_memory_percent": 4.5
2754
+ },
2755
+ {
2756
+ "step": 16150,
2757
+ "epoch": 2.53,
2758
+ "loss": 0.0,
2759
+ "learning_rate": 0.00011089757354520076,
2760
+ "gpu_memory_gb": 0.8661794662475586,
2761
+ "system_memory_percent": 4.5
2762
+ },
2763
+ {
2764
+ "step": 16200,
2765
+ "epoch": 2.54,
2766
+ "loss": 0.0,
2767
+ "learning_rate": 0.00011089757354520076,
2768
+ "gpu_memory_gb": 0.8661794662475586,
2769
+ "system_memory_percent": 4.5
2770
+ },
2771
+ {
2772
+ "step": 16250,
2773
+ "epoch": 2.55,
2774
+ "loss": 0.0,
2775
+ "learning_rate": 0.00011089757354520076,
2776
+ "gpu_memory_gb": 0.8661794662475586,
2777
+ "system_memory_percent": 4.5
2778
+ },
2779
+ {
2780
+ "step": 16300,
2781
+ "epoch": 2.56,
2782
+ "loss": 0.0,
2783
+ "learning_rate": 0.00011089757354520076,
2784
+ "gpu_memory_gb": 0.8661794662475586,
2785
+ "system_memory_percent": 4.5
2786
+ },
2787
+ {
2788
+ "step": 16350,
2789
+ "epoch": 2.56,
2790
+ "loss": 0.0,
2791
+ "learning_rate": 0.00011089757354520076,
2792
+ "gpu_memory_gb": 0.8661794662475586,
2793
+ "system_memory_percent": 4.5
2794
+ },
2795
+ {
2796
+ "step": 16400,
2797
+ "epoch": 2.57,
2798
+ "loss": 0.0,
2799
+ "learning_rate": 0.00011089757354520076,
2800
+ "gpu_memory_gb": 0.8661794662475586,
2801
+ "system_memory_percent": 4.5
2802
+ },
2803
+ {
2804
+ "step": 16450,
2805
+ "epoch": 2.58,
2806
+ "loss": 0.0,
2807
+ "learning_rate": 0.00011089757354520076,
2808
+ "gpu_memory_gb": 0.8661794662475586,
2809
+ "system_memory_percent": 4.6
2810
+ },
2811
+ {
2812
+ "step": 16500,
2813
+ "epoch": 2.59,
2814
+ "loss": 0.0,
2815
+ "learning_rate": 0.00011089757354520076,
2816
+ "gpu_memory_gb": 0.8661794662475586,
2817
+ "system_memory_percent": 4.5
2818
+ },
2819
+ {
2820
+ "step": 16550,
2821
+ "epoch": 2.6,
2822
+ "loss": 0.0,
2823
+ "learning_rate": 0.00011089757354520076,
2824
+ "gpu_memory_gb": 0.8661794662475586,
2825
+ "system_memory_percent": 4.5
2826
+ },
2827
+ {
2828
+ "step": 16600,
2829
+ "epoch": 2.6,
2830
+ "loss": 0.0,
2831
+ "learning_rate": 0.00011089757354520076,
2832
+ "gpu_memory_gb": 0.8661794662475586,
2833
+ "system_memory_percent": 4.5
2834
+ },
2835
+ {
2836
+ "step": 16650,
2837
+ "epoch": 2.61,
2838
+ "loss": 0.0,
2839
+ "learning_rate": 0.00011089757354520076,
2840
+ "gpu_memory_gb": 0.8661794662475586,
2841
+ "system_memory_percent": 4.5
2842
+ },
2843
+ {
2844
+ "step": 16700,
2845
+ "epoch": 2.62,
2846
+ "loss": 0.0,
2847
+ "learning_rate": 0.00011089757354520076,
2848
+ "gpu_memory_gb": 0.8661794662475586,
2849
+ "system_memory_percent": 4.6
2850
+ },
2851
+ {
2852
+ "step": 16750,
2853
+ "epoch": 2.63,
2854
+ "loss": 0.0,
2855
+ "learning_rate": 0.00011089757354520076,
2856
+ "gpu_memory_gb": 0.8661794662475586,
2857
+ "system_memory_percent": 4.5
2858
+ },
2859
+ {
2860
+ "step": 16800,
2861
+ "epoch": 2.63,
2862
+ "loss": 0.0,
2863
+ "learning_rate": 0.00011089757354520076,
2864
+ "gpu_memory_gb": 0.8661794662475586,
2865
+ "system_memory_percent": 4.5
2866
+ },
2867
+ {
2868
+ "step": 16850,
2869
+ "epoch": 2.64,
2870
+ "loss": 0.0,
2871
+ "learning_rate": 0.00011089757354520076,
2872
+ "gpu_memory_gb": 0.8661794662475586,
2873
+ "system_memory_percent": 4.5
2874
+ },
2875
+ {
2876
+ "step": 16900,
2877
+ "epoch": 2.65,
2878
+ "loss": 0.0,
2879
+ "learning_rate": 0.00011089757354520076,
2880
+ "gpu_memory_gb": 0.8661794662475586,
2881
+ "system_memory_percent": 4.5
2882
+ },
2883
+ {
2884
+ "step": 16950,
2885
+ "epoch": 2.66,
2886
+ "loss": 0.0,
2887
+ "learning_rate": 0.00011089757354520076,
2888
+ "gpu_memory_gb": 0.8661794662475586,
2889
+ "system_memory_percent": 4.5
2890
+ },
2891
+ {
2892
+ "step": 17000,
2893
+ "epoch": 2.67,
2894
+ "loss": 0.0,
2895
+ "learning_rate": 0.00011089757354520076,
2896
+ "gpu_memory_gb": 0.8661794662475586,
2897
+ "system_memory_percent": 4.5
2898
+ },
2899
+ {
2900
+ "step": 17000,
2901
+ "epoch": 2.67,
2902
+ "eval_loss": NaN,
2903
+ "eval_runtime": 93.4207,
2904
+ "eval_samples_per_second": 121.301,
2905
+ "eval_steps_per_second": 7.589,
2906
+ "gpu_memory_gb": 0.8661794662475586,
2907
+ "system_memory_percent": 4.5
2908
+ },
2909
+ {
2910
+ "step": 17050,
2911
+ "epoch": 2.67,
2912
+ "loss": 0.0,
2913
+ "learning_rate": 0.00011089757354520076,
2914
+ "gpu_memory_gb": 0.8661794662475586,
2915
+ "system_memory_percent": 4.5
2916
+ },
2917
+ {
2918
+ "step": 17100,
2919
+ "epoch": 2.68,
2920
+ "loss": 0.0,
2921
+ "learning_rate": 0.00011089757354520076,
2922
+ "gpu_memory_gb": 0.8661794662475586,
2923
+ "system_memory_percent": 4.5
2924
+ },
2925
+ {
2926
+ "step": 17150,
2927
+ "epoch": 2.69,
2928
+ "loss": 0.0,
2929
+ "learning_rate": 0.00011089757354520076,
2930
+ "gpu_memory_gb": 0.8661794662475586,
2931
+ "system_memory_percent": 4.5
2932
+ },
2933
+ {
2934
+ "step": 17200,
2935
+ "epoch": 2.7,
2936
+ "loss": 0.0,
2937
+ "learning_rate": 0.00011089757354520076,
2938
+ "gpu_memory_gb": 0.8661794662475586,
2939
+ "system_memory_percent": 4.6
2940
+ },
2941
+ {
2942
+ "step": 17250,
2943
+ "epoch": 2.71,
2944
+ "loss": 0.0,
2945
+ "learning_rate": 0.00011089757354520076,
2946
+ "gpu_memory_gb": 0.8661794662475586,
2947
+ "system_memory_percent": 4.5
2948
+ },
2949
+ {
2950
+ "step": 17300,
2951
+ "epoch": 2.71,
2952
+ "loss": 0.0,
2953
+ "learning_rate": 0.00011089757354520076,
2954
+ "gpu_memory_gb": 0.8661794662475586,
2955
+ "system_memory_percent": 4.5
2956
+ },
2957
+ {
2958
+ "step": 17350,
2959
+ "epoch": 2.72,
2960
+ "loss": 0.0,
2961
+ "learning_rate": 0.00011089757354520076,
2962
+ "gpu_memory_gb": 0.8661794662475586,
2963
+ "system_memory_percent": 4.6
2964
+ },
2965
+ {
2966
+ "step": 17400,
2967
+ "epoch": 2.73,
2968
+ "loss": 0.0,
2969
+ "learning_rate": 0.00011089757354520076,
2970
+ "gpu_memory_gb": 0.8661794662475586,
2971
+ "system_memory_percent": 4.5
2972
+ },
2973
+ {
2974
+ "step": 17450,
2975
+ "epoch": 2.74,
2976
+ "loss": 0.0,
2977
+ "learning_rate": 0.00011089757354520076,
2978
+ "gpu_memory_gb": 0.8661794662475586,
2979
+ "system_memory_percent": 4.5
2980
+ },
2981
+ {
2982
+ "step": 17500,
2983
+ "epoch": 2.74,
2984
+ "loss": 0.0,
2985
+ "learning_rate": 0.00011089757354520076,
2986
+ "gpu_memory_gb": 0.8661794662475586,
2987
+ "system_memory_percent": 4.6
2988
+ },
2989
+ {
2990
+ "step": 17550,
2991
+ "epoch": 2.75,
2992
+ "loss": 0.0,
2993
+ "learning_rate": 0.00011089757354520076,
2994
+ "gpu_memory_gb": 0.8661794662475586,
2995
+ "system_memory_percent": 4.5
2996
+ },
2997
+ {
2998
+ "step": 17600,
2999
+ "epoch": 2.76,
3000
+ "loss": 0.0,
3001
+ "learning_rate": 0.00011089757354520076,
3002
+ "gpu_memory_gb": 0.8661794662475586,
3003
+ "system_memory_percent": 4.5
3004
+ },
3005
+ {
3006
+ "step": 17650,
3007
+ "epoch": 2.77,
3008
+ "loss": 0.0,
3009
+ "learning_rate": 0.00011089757354520076,
3010
+ "gpu_memory_gb": 0.8661794662475586,
3011
+ "system_memory_percent": 4.5
3012
+ },
3013
+ {
3014
+ "step": 17700,
3015
+ "epoch": 2.78,
3016
+ "loss": 0.0,
3017
+ "learning_rate": 0.00011089757354520076,
3018
+ "gpu_memory_gb": 0.8661794662475586,
3019
+ "system_memory_percent": 4.6
3020
+ },
3021
+ {
3022
+ "step": 17750,
3023
+ "epoch": 2.78,
3024
+ "loss": 0.0,
3025
+ "learning_rate": 0.00011089757354520076,
3026
+ "gpu_memory_gb": 0.8661794662475586,
3027
+ "system_memory_percent": 4.5
3028
+ },
3029
+ {
3030
+ "step": 17800,
3031
+ "epoch": 2.79,
3032
+ "loss": 0.0,
3033
+ "learning_rate": 0.00011089757354520076,
3034
+ "gpu_memory_gb": 0.8661794662475586,
3035
+ "system_memory_percent": 4.5
3036
+ },
3037
+ {
3038
+ "step": 17850,
3039
+ "epoch": 2.8,
3040
+ "loss": 0.0,
3041
+ "learning_rate": 0.00011089757354520076,
3042
+ "gpu_memory_gb": 0.8661794662475586,
3043
+ "system_memory_percent": 4.5
3044
+ },
3045
+ {
3046
+ "step": 17900,
3047
+ "epoch": 2.81,
3048
+ "loss": 0.0,
3049
+ "learning_rate": 0.00011089757354520076,
3050
+ "gpu_memory_gb": 0.8661794662475586,
3051
+ "system_memory_percent": 4.5
3052
+ },
3053
+ {
3054
+ "step": 17950,
3055
+ "epoch": 2.82,
3056
+ "loss": 0.0,
3057
+ "learning_rate": 0.00011089757354520076,
3058
+ "gpu_memory_gb": 0.8661794662475586,
3059
+ "system_memory_percent": 4.5
3060
+ },
3061
+ {
3062
+ "step": 18000,
3063
+ "epoch": 2.82,
3064
+ "loss": 0.0,
3065
+ "learning_rate": 0.00011089757354520076,
3066
+ "gpu_memory_gb": 0.8661794662475586,
3067
+ "system_memory_percent": 5.0
3068
+ },
3069
+ {
3070
+ "step": 18000,
3071
+ "epoch": 2.82,
3072
+ "eval_loss": NaN,
3073
+ "eval_runtime": 93.3927,
3074
+ "eval_samples_per_second": 121.337,
3075
+ "eval_steps_per_second": 7.592,
3076
+ "gpu_memory_gb": 0.8661794662475586,
3077
+ "system_memory_percent": 4.5
3078
+ },
3079
+ {
3080
+ "step": 18050,
3081
+ "epoch": 2.83,
3082
+ "loss": 0.0,
3083
+ "learning_rate": 0.00011089757354520076,
3084
+ "gpu_memory_gb": 0.8661794662475586,
3085
+ "system_memory_percent": 4.5
3086
+ },
3087
+ {
3088
+ "step": 18100,
3089
+ "epoch": 2.84,
3090
+ "loss": 0.0,
3091
+ "learning_rate": 0.00011089757354520076,
3092
+ "gpu_memory_gb": 0.8661794662475586,
3093
+ "system_memory_percent": 4.5
3094
+ },
3095
+ {
3096
+ "step": 18150,
3097
+ "epoch": 2.85,
3098
+ "loss": 0.0,
3099
+ "learning_rate": 0.00011089757354520076,
3100
+ "gpu_memory_gb": 0.8661794662475586,
3101
+ "system_memory_percent": 4.5
3102
+ },
3103
+ {
3104
+ "step": 18200,
3105
+ "epoch": 2.85,
3106
+ "loss": 0.0,
3107
+ "learning_rate": 0.00011089757354520076,
3108
+ "gpu_memory_gb": 0.8661794662475586,
3109
+ "system_memory_percent": 4.5
3110
+ },
3111
+ {
3112
+ "step": 18250,
3113
+ "epoch": 2.86,
3114
+ "loss": 0.0,
3115
+ "learning_rate": 0.00011089757354520076,
3116
+ "gpu_memory_gb": 0.8661794662475586,
3117
+ "system_memory_percent": 4.5
3118
+ },
3119
+ {
3120
+ "step": 18300,
3121
+ "epoch": 2.87,
3122
+ "loss": 0.0,
3123
+ "learning_rate": 0.00011089757354520076,
3124
+ "gpu_memory_gb": 0.8661794662475586,
3125
+ "system_memory_percent": 4.5
3126
+ },
3127
+ {
3128
+ "step": 18350,
3129
+ "epoch": 2.88,
3130
+ "loss": 0.0,
3131
+ "learning_rate": 0.00011089757354520076,
3132
+ "gpu_memory_gb": 0.8661794662475586,
3133
+ "system_memory_percent": 4.5
3134
+ },
3135
+ {
3136
+ "step": 18400,
3137
+ "epoch": 2.89,
3138
+ "loss": 0.0,
3139
+ "learning_rate": 0.00011089757354520076,
3140
+ "gpu_memory_gb": 0.8661794662475586,
3141
+ "system_memory_percent": 4.5
3142
+ },
3143
+ {
3144
+ "step": 18450,
3145
+ "epoch": 2.89,
3146
+ "loss": 0.0,
3147
+ "learning_rate": 0.00011089757354520076,
3148
+ "gpu_memory_gb": 0.8661794662475586,
3149
+ "system_memory_percent": 4.5
3150
+ },
3151
+ {
3152
+ "step": 18500,
3153
+ "epoch": 2.9,
3154
+ "loss": 0.0,
3155
+ "learning_rate": 0.00011089757354520076,
3156
+ "gpu_memory_gb": 0.8661794662475586,
3157
+ "system_memory_percent": 4.5
3158
+ },
3159
+ {
3160
+ "step": 18550,
3161
+ "epoch": 2.91,
3162
+ "loss": 0.0,
3163
+ "learning_rate": 0.00011089757354520076,
3164
+ "gpu_memory_gb": 0.8661794662475586,
3165
+ "system_memory_percent": 4.5
3166
+ },
3167
+ {
3168
+ "step": 18600,
3169
+ "epoch": 2.92,
3170
+ "loss": 0.0,
3171
+ "learning_rate": 0.00011089757354520076,
3172
+ "gpu_memory_gb": 0.8661794662475586,
3173
+ "system_memory_percent": 4.5
3174
+ },
3175
+ {
3176
+ "step": 18650,
3177
+ "epoch": 2.92,
3178
+ "loss": 0.0,
3179
+ "learning_rate": 0.00011089757354520076,
3180
+ "gpu_memory_gb": 0.8661794662475586,
3181
+ "system_memory_percent": 4.5
3182
+ },
3183
+ {
3184
+ "step": 18700,
3185
+ "epoch": 2.93,
3186
+ "loss": 0.0,
3187
+ "learning_rate": 0.00011089757354520076,
3188
+ "gpu_memory_gb": 0.8661794662475586,
3189
+ "system_memory_percent": 4.5
3190
+ },
3191
+ {
3192
+ "step": 18750,
3193
+ "epoch": 2.94,
3194
+ "loss": 0.0,
3195
+ "learning_rate": 0.00011089757354520076,
3196
+ "gpu_memory_gb": 0.8661794662475586,
3197
+ "system_memory_percent": 4.5
3198
+ },
3199
+ {
3200
+ "step": 18800,
3201
+ "epoch": 2.95,
3202
+ "loss": 0.0,
3203
+ "learning_rate": 0.00011089757354520076,
3204
+ "gpu_memory_gb": 0.8661794662475586,
3205
+ "system_memory_percent": 4.5
3206
+ },
3207
+ {
3208
+ "step": 18850,
3209
+ "epoch": 2.96,
3210
+ "loss": 0.0,
3211
+ "learning_rate": 0.00011089757354520076,
3212
+ "gpu_memory_gb": 0.8661794662475586,
3213
+ "system_memory_percent": 4.5
3214
+ },
3215
+ {
3216
+ "step": 18900,
3217
+ "epoch": 2.96,
3218
+ "loss": 0.0,
3219
+ "learning_rate": 0.00011089757354520076,
3220
+ "gpu_memory_gb": 0.8661794662475586,
3221
+ "system_memory_percent": 4.5
3222
+ },
3223
+ {
3224
+ "step": 18950,
3225
+ "epoch": 2.97,
3226
+ "loss": 0.0,
3227
+ "learning_rate": 0.00011089757354520076,
3228
+ "gpu_memory_gb": 0.8661794662475586,
3229
+ "system_memory_percent": 5.0
3230
+ },
3231
+ {
3232
+ "step": 19000,
3233
+ "epoch": 2.98,
3234
+ "loss": 0.0,
3235
+ "learning_rate": 0.00011089757354520076,
3236
+ "gpu_memory_gb": 0.8661794662475586,
3237
+ "system_memory_percent": 4.5
3238
+ },
3239
+ {
3240
+ "step": 19000,
3241
+ "epoch": 2.98,
3242
+ "eval_loss": NaN,
3243
+ "eval_runtime": 93.384,
3244
+ "eval_samples_per_second": 121.348,
3245
+ "eval_steps_per_second": 7.592,
3246
+ "gpu_memory_gb": 0.8661794662475586,
3247
+ "system_memory_percent": 4.5
3248
+ },
3249
+ {
3250
+ "step": 19050,
3251
+ "epoch": 2.99,
3252
+ "loss": 0.0,
3253
+ "learning_rate": 0.00011089757354520076,
3254
+ "gpu_memory_gb": 0.8661794662475586,
3255
+ "system_memory_percent": 4.5
3256
+ },
3257
+ {
3258
+ "step": 19100,
3259
+ "epoch": 3.0,
3260
+ "loss": 0.0,
3261
+ "learning_rate": 0.00011089757354520076,
3262
+ "gpu_memory_gb": 0.8661794662475586,
3263
+ "system_memory_percent": 4.5
3264
+ }
3265
+ ],
3266
+ "config": {
3267
+ "mode": "full",
3268
+ "batch_size": 16,
3269
+ "gradient_accumulation": 2,
3270
+ "learning_rate": 0.0003,
3271
+ "num_epochs": 3
3272
+ }
3273
+ }