Attention computation

#4
by Serpient - opened

When i am trying out, i notice there are three specified ways in the model, flex attention, sdpa and eager. It seems eager is supposed to be the default option? But as i try to generate using the demo, i find that the default is sdpa, and when i set config._attn_implementation to eager, the generation output becomes gibberish.

inclusionAI org

Thanks for the feedback. We'll look into it.

Sign up or log in to comment