Datasets:
image imagewidth (px) 1.2k 1.2k | label class label 2
classes |
|---|---|
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
0color | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth | |
1depth |
FunTHOR
π Paper | π Project Page
FunTHOR is a synthetic dataset for functional 3D scene understanding, built on top of the AI2-THOR simulator. It provides part-level ground-truth geometry and dense, rule-based functional-relation annotations (e.g. knife slices apple, handle pulls to open door, stove knob turns on/off burner) for 12 indoor scenes, together with posed RGB-D sequences for each scene.
The dataset was introduced as a benchmark in our work on constructing probabilistic, open-vocabulary functional 3D scene graphs from posed RGB-D images. Compared to existing manually annotated datasets, FunTHOR is designed to provide dense annotations covering both partβobject relations (a part operating its parent object) and objectβobject relations (one object acting on another).
Dataset at a glance
- 12 scenes (kitchens, living rooms, bedrooms, bathrooms) selected from AI2-THOR.
- 621 ground-truth nodes total (objects + functional parts), 92 of which are functional parts.
- 164 functional-relation edges total, annotated by transparent, inspectable rules.
- 60 posed RGB-D frames per scene (1200Γ680), randomly sampled from reachable viewpoints.
- Object- and part-centric point clouds and an objectβpart hierarchy per scene.
- A visible subset per scene that retains only nodes/edges observable from the sampled RGB-D frames.
| Scene | nodes | parts | visible nodes | edges | frames |
|---|---|---|---|---|---|
| FloorPlan1_physics | 117 | 32 | 113 | 45 | 60 |
| FloorPlan5_physics | 111 | 29 | 107 | 45 | 60 |
| FloorPlan12_physics | 76 | 6 | 73 | 12 | 60 |
| FloorPlan202_physics | 26 | 1 | 25 | 4 | 60 |
| FloorPlan205_physics | 39 | 1 | 39 | 5 | 60 |
| FloorPlan206_physics | 34 | 1 | 34 | 3 | 60 |
| FloorPlan311_physics | 41 | 3 | 41 | 8 | 60 |
| FloorPlan313_physics | 31 | 1 | 31 | 3 | 60 |
| FloorPlan321_physics | 28 | 1 | 28 | 3 | 60 |
| FloorPlan401_physics | 34 | 1 | 34 | 8 | 60 |
| FloorPlan405_physics | 40 | 6 | 37 | 12 | 60 |
| FloorPlan422_physics | 44 | 10 | 43 | 16 | 60 |
| Total | 621 | 92 | 605 | 164 | 720 |
Dataset structure
.
βββ dataset_unique_labels.json # all distinct object/part labels across the dataset
βββ dataset_unique_relations.json # all distinct functional-relation strings across the dataset
βββ dataset_functional_labels.json # labels that are categorized as functional elements for evaluation
βββ annotation_rules/ # the rules used to auto-generate the functional edges (see below)
β βββ functional_relations_config.json
β βββ manual_annotations/
β βββ FloorPlan*_physics.json
βββ FloorPlan<ID>_physics/ # one folder per scene
βββ node_list.pkl # list of all ground-truth nodes (objects + parts)
βββ object_metadata.json # per-object metadata + objectβparts hierarchy
βββ annotations/ # one JSON per node: maps node β point indices in pointcloud.ply
β βββ node_XXXX_<Label>.json
βββ annotations_aggregated.json # all per-node annotations aggregated into one file
βββ functional_relations.json # ground-truth functional edges for this scene
βββ pointcloud.ply # dense scene point cloud (annotation indices reference this)
βββ visible/ # the visibility-filtered subset (see below)
β βββ node_list.pkl # only nodes observable from the sampled frames
β βββ visibility_stats.json # per-node visible-point counts and visibility ratios
βββ dataset/ # posed RGB-D capture for this scene
βββ intrinsics.npy # 3Γ3 pinhole camera intrinsics (shared by all frames)
βββ color/000000.png β¦ 000059.png # RGB images, 1200Γ680, uint8
βββ depth/000000.png β¦ 000059.png # depth images, 1200Γ680, uint16 (millimeters)
βββ pose/000000.npy β¦ 000059.npy # 4Γ4 camera-to-world matrices
Nodes (node_list.pkl)
Each scene's node_list.pkl is a pickled Python list of node dictionaries. Each node is either a whole
object or a functional part. Fields:
| key | type | description |
|---|---|---|
node_id |
int | unique node index within the scene |
object_id |
int | id of the owning object (a part shares its parent's object_id) |
label |
str | AI2-THOR label in UpperCamelCase, e.g. StoveKnob, LightSwitch |
is_part |
bool | True for functional parts (handles, knobs, buttons β¦), False for objects |
pcd |
(N, 3) float64 |
points sampled on the node's mesh surface, in world coordinates (meters) |
colors |
(N, 3) uint8 |
per-point RGB |
center |
(3,) float64 |
centroid of pcd |
Nodes with label Undefined are placeholders and are typically skipped by loaders.
Object metadata & part hierarchy (object_metadata.json)
A list of object records (one per object_id). Fields are object_id, label, has_parts_annotation,
and parts (the list of part labels belonging to the object). This encodes the object -> part hierarchy
referenced by the node list.
Per-node point annotations (annotations/)
Each annotations/node_XXXX_<Label>.json maps a node to its supporting points as indices into the scene's pointcloud.ply:
{ "node_id": 0, "object_id": 0, "label": "Tomato", "is_part": false, "point_indices": [50000, 50001, ...] }
annotations_aggregated.json contains the same information for all nodes in a single file.
Functional relations (functional_relations.json)
A list of directed functional edges. Each edge connects a subject node (first_*) to an object node
(second_*):
{
"relation_type": "exact_match",
"first_node_id": 40, "first_object_id": 31, "first_label": "Faucet",
"relation": "fill with water",
"second_node_id": 96, "second_object_id": 68, "second_label": "Kettle"
}
relation_type is one of exact_match, proximity_based, part_based, or manual_annotation
(see Annotation rules below). Node ids reference node_list.pkl. The dataset-level
dataset_unique_relations.json lists all distinct relation strings.
Visible subset (visible/)
Because some objects are never observed from the sampled viewpoints (e.g. items inside closed cabinets),
each scene also ships a visibility-filtered version intended for evaluation. visible/node_list.pkl holds the visible nodes and visible/visibility_stats.json records the
per-node visible-point counts and visibility ratios.
Coordinate systems
World / scene frame. All node point clouds, centers, and pointcloud.ply are expressed in a single,
metric, right-handed, Z-up world frame (units: meters). Note this differs from AI2-THOR's native Unity convention
(left-handed, Y-up); the released data has already been converted to the Z-up right-handed frame above.
Camera frame. dataset/pose/NNNNNN.npy is a 4Γ4 camera-to-world transform T_wc (rotation has
determinant +1). The camera uses the OpenCV convention: +x right, +y down, +z forward (into the
scene).
Intrinsics. Shared 3Γ3 pinhole matrix for all frames. Depth PNGs are 16-bit and stored in millimeters
(divide by 1000 for meters); a depth of 0 denotes a missing/invalid measurement.
Annotation rules (annotation_rules/)
The functional edges are produced automatically and transparently from a small set of inspectable rules,
rather than hand-labeled per scene. We include the exact rule configuration used to generate this release in
annotation_rules/ so that the annotation logic is fully reproducible and auditable.
Each rule is a functional triplet (first_label, relation, second_label). Rules are grouped by matching
strategy (annotation_rules/functional_relations_config.json):
exact_match_relationsβ annotate an edge whenever a node's label exactly matchesfirst_labeland another node's label exactly matchessecond_label(e.g. Knife β can slice or cut β Apple; Faucet β fill with water β Kettle).proximity_based_relationsβ for each subject node, connect it to the nearest node matchingsecond_label, provided the distance between centers is below a threshold (default 1 m, with optional per-ruledistance_thresholdoverrides). Matching is greedy and globally distance-ordered so the result does not depend on rule ordering (e.g. Faucet β run water into β Sink; Faucet β run water into β Bathtub).part_based_relationsβ for objects with toggleable/openable AI2-THOR properties that expose explicit functional parts, connect the part to its parent object (e.g. Lever β push down to start toasting β Toaster; Handle β pull to open β Door).- manual annotations β a few semantically ambiguous associations are recorded by hand in
annotation_rules/manual_annotations/<scene>.json(currently only stove-knob β stove-burner pairings, with the relation turn on/off).
The functional-relation rule set was constructed by referencing the AI2-THOR object type documentation, in particular each object type's Actionable Properties (e.g. sliceable, toggleable, openable, fillable), to decide which functional triplets are physically plausible.
Credits and acknowledgements
- Ground-truth meshes and scenes. The object CAD models and AI2-THOR scenes used to generate FunTHOR's object- and part-centric ground-truth annotations come from the hssd/ai2thor-hab dataset (AI2-THORβHabitat). We decomposed and re-annotated relevant assets into semantically meaningful parts to build the part-aware geometry. We gratefully acknowledge the HSSD / AI2-THORβHabitat authors.
- AI2-THOR. Scenes and the simulation infrastructure are based on AI2-THOR (Kolve et al., 2017).
Citation
If you use FunTHOR, please cite our paper:
@inproceedings{Fu_2026_funfact,
title = {FunFact: Building Probabilistic Functional 3D Scene Graphs via Factor-Graph Reasoning},
author = {Fu, Zhengyu and Zurbr\"ugg, Ren\'e and Qu, Kaixian and Pollefeys, Marc and Hutter, Marco and
Blum, Hermann and Bauer, Zuria},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2026}
}
- Downloads last month
- 139