Flame Graph Reset Zoom Search ic optimizer.py:280:wrapper (994,990,380 bytes, 4.25%) optim.. model.py:78:forward (150,994,944 bytes, 0.64%) functional.py:1843:softmax (6,039,797,760 bytes, 25.77%) functional.py:1843:softmax train.py:254:model_step (2,472,542,208 bytes, 10.55%) train.py:254:mo.. <non-python> (7,156,639,328 bytes, 30.53%) <non-python> module.py:1501:_call_impl (773,849,088 bytes, 3.30%) mod.. module.py:1143:convert (339,738,624 bytes, 1.45%) optimizer.py:33:_use_grad (994,990,380 bytes, 4.25%) optim.. model.py:79:forward (11,534,336 bytes, 0.05%) train.py:145:<module> (8,519,680 bytes, 0.04%) module.py:1501:_call_impl (116,785,152 bytes, 0.50%) train.py:209:<module> (547,826,688 bytes, 2.34%) t.. model.py:195:forward (10,498,555,904 bytes, 44.79%) model.py:195:forward model.py:86:forward (5,898,240 bytes, 0.03%) module.py:797:_apply (547,826,688 bytes, 2.34%) m.. linear.py:114:forward (116,785,152 bytes, 0.50%) module.py:797:_apply (339,738,624 bytes, 1.45%) module.py:1143:convert (50,331,648 bytes, 0.21%) model.py:83:forward (150,994,944 bytes, 0.64%) stream_0 (23,437,770,752 bytes, 100.00%) stream_0 <non-python> (506,014,720 bytes, 2.16%) <.. model.py:115:forward (9,533,652,992 bytes, 40.68%) model.py:115:forward module.py:820:_apply (157,682,688 bytes, 0.67%) module.py:1501:_call_impl (9,533,652,992 bytes, 40.68%) module.py:1501:_call_impl adamw.py:114:_init_group (497,495,040 bytes, 2.12%) a.. train.py:345:<module> (994,990,380 bytes, 4.25%) train.. inactive (21,373,367,296 bytes, 91.19%) inactive module.py:1145:to (547,826,688 bytes, 2.34%) m.. <gaps> (1,094,643,248 bytes, 4.67%) <gaps> linear.py:114:forward (773,849,088 bytes, 3.30%) lin.. model.py:80:forward (6,039,797,760 bytes, 25.77%) model.py:80:forward model.py:98:forward (773,849,088 bytes, 3.30%) mod.. module.py:844:_apply (50,331,648 bytes, 0.21%) module.py:1501:_call_impl (5,898,240 bytes, 0.03%) train.py:140:dummy_step (8,519,680 bytes, 0.04%) all (23,437,770,752 bytes, 100%) train.py:331:<module> (12,971,204,608 bytes, 55.34%) train.py:331:<module> adamw.py:118:_init_group (497,495,040 bytes, 2.12%) a.. model.py:67:forward (116,785,152 bytes, 0.50%) module.py:1143:convert (157,682,688 bytes, 0.67%) functional.py:3029:cross_entropy (2,472,542,208 bytes, 10.55%) functional.py:3.. module.py:1501:_call_impl (10,498,555,904 bytes, 44.79%) module.py:1501:_call_impl grad_scaler.py:358:step (994,990,380 bytes, 4.25%) grad_.. train.py:253:model_step (10,498,662,400 bytes, 44.79%) train.py:253:model_step model.py:117:forward (962,592,768 bytes, 4.11%) mode.. module.py:797:_apply (390,144,000 bytes, 1.66%) module.py:797:_apply (547,826,688 bytes, 2.34%) m.. module.py:797:_apply (390,144,000 bytes, 1.66%) module.py:1501:_call_impl (10,498,662,400 bytes, 44.79%) module.py:1501:_call_impl model.py:82:forward (3,057,647,616 bytes, 13.05%) model.py:82:forward module.py:1501:_call_impl (773,849,088 bytes, 3.30%) mod.. adamw.py:160:step (994,990,380 bytes, 4.25%) adamw.. active_allocated (2,064,403,456 bytes, 8.81%) active_alloc.. module.py:820:_apply (339,738,624 bytes, 1.45%) <gaps> (6,855,368 bytes, 0.03%) linear.py:114:forward (5,898,240 bytes, 0.03%)