UI-TARS
🌖
33
Predict click location on a UI screenshot
Generate realistic audio from text
Generate AI-powered text responses from your prompts
Generate a talking face video from an image and audio
Generate animated face images using a driving video
Generate multilingual talking-face videos from your text