← all stories other 1 sources · 1h ago

NVIDIA Releases LocateAnything, an Object Detection AI Model

LocateAnything extends object detection beyond natural images to UI elements and text, making it a single model for tasks that previously required separate systems.

Reporting from 1 sources: GIGAZINE.

NVIDIA Releases LocateAnything, an Object Detection AI Model

NVIDIA has released LocateAnything, an AI model designed for fast and high-quality object detection. The model can identify objects in photographs, application screenshots, and documents. It is intended for use in robotics and automated PC operations. LocateAnything uses parallel box decoding to process images quickly. In benchmarks provided by NVIDIA, LocateAnything outperformed Qwen3-VL and Rex-Omni on tasks such as recognizing individual windows in a building or individual pieces of wood. It also showed higher accuracy in text recognition. A demonstration application is available on Hugging Face, where users can upload an image and specify objects to detect. In one demo, the model correctly identified all video game packages in a photo. It can also detect multiple UI elements simultaneously, such as recognizing "File," "Edit," and "View" in a Notepad screenshot. The model is open and available for download on Hugging Face under the name nvidia/LocateAnything-3B. NVIDIA published the model on May 29, 2026.

NVIDIA's LocateAnything is a vision-language grounding model that processes images with parallel box decoding, a method the company says enables faster inference than sequential approaches. The model was trained on a dataset that includes photographs, application screenshots, and documents, giving it the ability to detect UI elements and text in addition to everyday objects. In NVIDIA's published comparisons, LocateAnything handled fine-grained recognition tasks that rival models struggled with, such as identifying every window on a building facade or each piece of stacked lumber. The text recognition benchmark also showed LocateAnything outperforming Qwen3-VL and Rex-Omni. A live demo on Hugging Face lets users test the model by uploading an image and typing a query. The model is available for download under the name nvidia/LocateAnything-3B. NVIDIA announced the release on May 29, 2026.

Synthesized by Yomimono from the 1 cited source below, including Japanese-language reporting where cited, then editorially reviewed before publishing.

Sources