BytedancePaid

UI-TARS 7B

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...

bytedance/ui-tars-1.5-7b
💬 Chat with UI-TARS 7B

Capabilities

👁️Vision🧩Structured

Specifications

Context window
128K tokens
Input price
$0.10/M
Output price
$0.20/M
Provider
Bytedance
Input modalities
image, text
Output modalities
text
Pricing
Pay-per-token
Model ID
bytedance/ui-tars-1.5-7b

Strengths

  • +Understands images (vision input)
  • +Low cost per token

Considerations

  • Limited or no tool-calling support

Alternatives