COCO-Stuff: Thing and Stuff Classes in Context


Semantic classes can be either things (objects with a well-defined shape, e.g. car, person) or stuff (amorphous background regions, e.g. grass, sky). While lots of classifi- cation and detection works focus on thing classes, less at- tention has been given to stuff classes. Nonetheless, stuff classes are important as they allow to explain important aspects of an image, including (1) scene type; (2) which thing classes are likely to be present and their location (through contextual reasoning); (3) physical attributes, ma- terial types and geometric properties of the scene. To un- derstand stuff and things in context we introduce COCO- Stuff, which augments 120,000 images of the COCO dataset with pixel-wise annotations for 91 stuff classes. We introduce an efficient stuff annotation protocol based on superpixels which leverages the original thing annotations. We quantify the speed versus quality trade-off of our protocol and explore the relation be- tween annotation time and boundary complexity. Further- more, we use COCO-Stuff to analyze: (a) the importance of stuff and thing classes in terms of their surface cover and how frequently they are mentioned in image captions; (b) the spatial relations between stuff and things, highlighting the rich contextual relations that make our dataset unique; (c) the performance of a modern semantic segmentation method on stuff and thing classes, and whether stuff is easier to segment than things.