The way AI visually understands images has evolved tremendously. Initially, AI could tell us "where" an object was using bounding boxes. Then, segmentation models arrived, precisely outlining an object's shape. More recently, open-vocabulary models emerged, allowing us to segment objects using less common labels like "blue ski boot" or "xylophone" without needing a predefined list of categories. P
