Attention computation
#4
by
Serpient
- opened
When i am trying out, i notice there are three specified ways in the model, flex attention, sdpa and eager. It seems eager is supposed to be the default option? But as i try to generate using the demo, i find that the default is sdpa, and when i set config._attn_implementation to eager, the generation output becomes gibberish.
Thanks for the feedback. We'll look into it.