Task Detection with ViT
Uses a Vision Transformer (DINO ViT) to detect robot manipulation tasks independent of the environment, tested in the ManiSkill2 simulator.