Conference on Robots and Vision

Keynote Speakers

Dr. Mac Schwager

Talk Title: How general are generalist robot policies? Data scaling, diagnostic tools, and memorization in VLAs

Abstract

Vision-Language-Action (VLA) policies have recently emerged as a promising paradigm for generalist robot autonomy. However, VLAs have several challenges that must be overcome before they can achieve their potential. Firstly, these models require fine tuning with human-teleoperation demonstrations, which can be tedious, expensive, and time consuming to collect. Secondly, policy performance is limited to teleop demonstration quality, which can be highly variable depending on the human teleoperator's skill and the dexterity barrier of the teleop interface. Lastly, VLA models, with the current state of practice, appear to suffer from strong overfitting to the fine-tuning data. All of these issues lead to "generalist" policies that do not generalize very well. In this talk I will describe recent work in my lab to address each of these problems. I will describe techniques we have developed to scale up demonstration data by leveraging 3D Gaussian Splatting models and optimization-based planning experts to generate arbitrary volumes of high-quality visual demonstrations to augment or replace human teleop data. I will describe our work on multi-task progress models that can track, based on visual inputs and text prompts, the progress of a demonstration. This can be used to filter human teleop data for high quality training data, and can be used as an online performance monitor during policy execution for fault detection, recovery guidance, and diagnostics. Finally, I will describe our work on memorization vs generalization in visuo-motor policies, where we find that current fine tuning practices cause overfitting to the training data, limiting a VLA's generalization capabilities. I will explore some remedies for this problem. The talk will include experimental results for drone navigation policies, drone aerial manipulation policies, and table-top manipulation policies.

Bio

Dr. Schwager is an Associate Professor of Aeronautics and Astronautics at Stanford University, with a courtesy appointment in Computer Science. He directs the Multi-robot Systems Lab (MSL) where he studies robot autonomy. He is interested in learning-based autonomy for UAVs, manipulators, and robotic vehicles, 3D mapping and SLAM, analytical and statistical tools for verifiable safety in learning-based autonomy, and collaborative intelligence in groups of robots and human-robot teams. He obtained his BS degree from Stanford, his MS and PhD degrees from MIT, and he was a postdoctoral researcher at the University of Pennsylvania and MIT. He received the NSF CAREER award in 2014, the DARPA YFA in 2018, and has received numerous best paper awards including the IEEE Transactions on Robotics Best Paper Award (2016), Best Paper Award in Robot Manipulation (ICRA 2018), and Best Paper Award in Multi-Robot Systems (ICRA).

Dr. Ian Stavness

Talk Title: Advances in 3D capture and feedforward modeling for visual perception

Abstract

Breakthroughs in 3D radiance-field rendering and feedforward models that directly infer 3D structure are rapidly reshaping what's possible in 3D visual perception for metrology, robotics, and beyond. In this talk, I will survey the fast-moving frontier of 3D Gaussian splatting, 3D tokenization for transformer architectures, and emerging perceptual pipelines that lift 2D semantic understanding into fully reconstructed 3D worlds with semantic labels. These new paradigms deliver striking gains precisely where traditional photogrammetry struggles most: in highly cluttered environments, scenes rich in fine-scale structure, and objects that are slender, flat, or otherwise difficult to capture with conventional geometry pipelines. I will ground these advances in the demanding real-world challenge of measuring agronomic plants that are densely packed, self-occluded, and often highly self-similar. Accurate 3D plant capture unlocks new opportunities for plant breeding, digital agriculture, and large-scale plant phenotyping. I will conclude with a discussion of the human-factor considerations that shape how people perceive, interact with, and interpret the large-scale 3D information that is enabled these new 3D capture and modeling methods.

Bio

Dr. Stavness is a Professor and the Head of the Department of Computer Science at the University of Saskatchewan. He holds a Research Chair at the Global Institute for Food Security and is the Director of the CREATE in Computational Agriculture training program. He obtained his PhD from the University of British Columbia and was a Postdoctoral Fellow in Bio-Engineering at Stanford University prior to joining the University of Saskatchewan in 2012. His research focuses on machine learning, computer vision, and computer graphics with applications in biology, agriculture, and medicine.

Symposium Speakers

Dr. Risto Ojala

Talk Title: Perception solutions for enabling automated driving in winter conditions

Abstract

This talk presents methods and findings from research on perception solutions for automated driving in winter conditions, carried out at the Autonomy & Mobility Laboratory, Aalto University. Winter conditions pose several challenges for automated vehicle perception pipelines, which currently limit the applicability of the technology in adverse weather conditions. The talk focuses on two main research directions: denoising snowflakes from LiDAR data and road segmentation in snowy conditions. Airborne snowflakes introduce significant noise into LiDAR scans, which can hinder downstream perception tasks. To address this challenge, the talk presents deep learning approaches for point cloud denoising based on both supervised and self-supervised learning. In addition, snowy conditions drastically alter the visual appearance of the environment and the road, rendering road segmentation methods trained on traditional datasets unreliable. To overcome this, trajectory-based approaches leveraging vision foundation models are presented for learning varied road appearance without requiring manual labeling.

Bio

Risto Ojala, DSc (Tech) is an Assistant Professor at Aalto University, Finland, where he leads the Autonomy & Mobility Laboratory within the Mechatronics research group. His research focuses on intelligent vehicles and mobile robotics, with particular emphasis on perception, sensor fusion, and applied machine learning for autonomous systems. He is also currently a Visiting Scholar at Simon Fraser University, Canada, collaborating with the Multi-Agent Robotic Systems Laboratory on research in semantic understanding for mobile robotics. His work develops perception solutions that enable robust autonomous operation in challenging environments. A central application of his research is automated driving in winter conditions, addressing problems such as road understanding, situational awareness, and perception reliability. He has published extensively in leading robotics and intelligent transportation venues and collaborates closely with both academic and industrial partners.

Dr. Nils Wilde

Talk Title: User Preferences and Trade-offs in Robot Planning

Abstract

Real-world robot deployment requires adaptation to end-user needs. This often involves finding trade-offs between opposing criteria to align with user preferences. We explore two sides of the problem. First, we study how human-in-the-loop learning, i.e., repeated, simple interactions such as choosing among two presented robot trajectories, enables inexperienced users to quickly refine planning algorithms to their needs. Second, we study the problem of designing planning algorithms that attain all relevant trade-offs. Using direct treatment as multi-objective optimization, such problems are converted into single-objective formulations with tunable parameters, e.g., a cost function balancing trajectory length and risk with adjustable weights. We derive fundamental methods for exploring relevant weights based on error-approximations as well as novel formulations for scalar objectives with provable theoretical advantages. The presented methods are showcased in the context of path and motion planning and multi-robot coordination.

Bio

Dr. Nils Wilde is an Assistant Professor in Computer Science at Dalhousie University in Halifax, Canada, where he leads the Laboratory for Interactive Systems and Adaptive Robotics. Prior, he was a Postdoctoral Fellow at TU Delft and the University of Waterloo where he also completed his PhD in Electrical and Computer Engineering. Dr. Wilde's research focuses on cognitive robotics, in particular human-robot interaction, preference learning and learning from human feedback, motion planning and multi-objective planning, as well as multi-robot coordination and task assignment. He is a member of the Atlantic AI Institute and an IEEE member. Dr. Wilde's research is currently supported by NSERC and has been published in top-tier robotics journals and conferences, including T-RO, RA-L, IJRR, CoRL, WAFR, ICRA, IROS and CDC. Further, he co-organized workshops on Human Multi-Robot Interaction at IROS 2023 and on Multi-Objective Optimization and Planning in Robotics at RSS 2025.

Dr. Mahdis Bisheban

Talk Title: Intelligent Dynamics and Control for Autonomous Robotic Systems in Complex Environments

Abstract

Autonomous robotic systems are increasingly deployed in complex and uncertain environments, from infrastructure inspection to search and rescue missions. However, achieving reliable autonomy in such settings remains a significant challenge due to dynamic conditions, uncertain models, and real-time decision-making requirements. In this talk, I will present recent advances from the Intelligent Dynamics and Control Lab at the University of Calgary, focusing on the integration of control theory, machine learning, and robotics. Specifically, I will discuss our work on aerial manipulation, and modelling under uncertainty.

Bio

Dr. Mahdis Bisheban is an Assistant Professor in the Department of Mechanical and Manufacturing Engineering at the University of Calgary and the Founder and Director of the Intelligent Dynamics and Control Lab (IDCL). She earned her Ph.D. in Mechanical and Aerospace Engineering from The George Washington University and completed postdoctoral research at Queen’s University. At IDCL, her research focuses on the intersection of advanced robotics for aerospace applications, machine learning, and intelligent control systems, with an emphasis on developing autonomous aerial and ground robots that can think, adapt, and collaborate. Beyond research, Dr. Bisheban is actively engaged in the professional community as the AIAA V/STOL Technical Committee Education Chair, an Associate Editor for the Transactions of the Canadian Society for Mechanical Engineering, and a member of the Canadian Society for Mechanical Engineering Mechatronics, Robotics, and Controls Technical Committee. She is committed to training the next generation of engineers and researchers, mentoring postdoctoral fellows, graduate and undergraduate students, and hosting high school students each summer. IDCL is distinguished by its collaborative, multi-level mentoring culture, where learners at all stages teach, learn, and contribute meaningfully to research.

Dr. Yani Ioannou

Talk Title: Expert Pruning in Sparsely-activated Mixture-of-Expert Models

Abstract

Sparsely-activated Mixture-of-Experts (SMoE) models offer efficient pre-training and low latency, but their large parameter counts create significant memory overhead, motivating research into expert compression. In our recent ICLR 2026 paper, we find that expert pruning is a superior strategy for generative tasks to expert merging which has been evaluated predominantly on discriminative benchmarks. We demonstrate that existing merging techniques introduce an irreducible error due to the loss of fine-grained routing control over experts. Leveraging this insight, we propose Router-weighted Expert Activation Pruning (REAP), a novel pruning criterion that considers both router gate-values and expert activation norms to minimize the reconstruction error bound. Across a diverse set of SMoE models ranging from 20B to 1T parameters, REAP consistently outperforms merging and other pruning methods on generative benchmarks, especially at 50% compression. Notably, our method achieves near-lossless compression on code generation tasks with Qwen3-Coder-480B and Kimi-K2, even after pruning 50% of experts.

Bio

Yani Ioannou is an Assistant Professor and Schulich Research Chair in the Department of Electrical and Software Engineering of the Schulich School of Engineering, at the University of Calgary in Canada, Alberta. Yani was previously a Visiting Researcher at Google Brain Toronto (DeepMind) with Dr. Geoffrey Hinton, and a Post-doctoral Fellow at the Vector Institute with Dr. Graham Taylor. Yani completed his PhD at the University of Cambridge in 2018 supported by a Microsoft Research Ph.D. Scholarship, where he was supervised by Dr. Roberto Cipolla and Dr. Antonio Criminisi.

Dr. Xinxin Zuo

Talk Title: Generative Models Should Adapt: Fast Test-Time Personalization for Video and 3D Editing

Abstract

Generative models have recently achieved remarkable advances in video synthesis and 3D editing. Despite this progress, most current methods remain fundamentally static at inference time, relying on a single frozen model to handle diverse subjects, scenes, and geometric configurations. In this talk, I will present a broader research perspective that challenges this assumption. Rather than expecting one pretrained model to generalize perfectly to every test instance, we can develop generative systems that are trained to adapt rapidly, efficiently, and reliably at inference time. This viewpoint opens new opportunities for personalized video generation, geometry-aware 3D editing, and, more generally, adaptive generative models that are better aligned with the demands of real-world deployment.

Bio

Dr. Xinxin Zuo is an Assistant Professor in the Department of Electrical and Computer Engineering at Concordia University, where she leads the X-Lab. Before joining Concordia, she was a Staff Researcher at Huawei Canada and a Postdoctoral Fellow at the University of Alberta. Her research interests span machine learning and computer vision, with a recent focus on 3D AI-generated content (AIGC), embodied AI, human motion generation, and 3D reconstruction. She currently serves as an Associate Editor for IEEE Transactions on Multimedia. She has published over 50 papers and has received more than 3,000 citations.

Dr. Nandita Vijaykumar

Talk Title: Architecting Efficient and Scalable Systems for Physical Intelligence and Visual Computing

Abstract

A next frontier in intelligent systems lies in enabling machines to perceive and interact with the physical world—powering applications from robotics and self-driving cars to AR/VR and content creation. These systems must perceive and represent complex 3D environments, render photorealistic content, and generate interactive outputs—all under tight constraints on latency, memory, and scalability. In this talk, I will explore the systems challenges that arise in emerging visual computing pipelines, and how they push the limits of today’s abstractions for memory, compute, and programmability. I will then discuss some of our recent research on building across-the-stack frameworks that offer better primitives for 3D vision, differentiable rendering, and generative pipelines—spanning hardware architecture support, compiler and runtime design, and memory and storage hierarchies.

Bio

Nandita Vijaykumar is an Assistant Professor in the Department of Computer Science at the University of Toronto, where she leads the embARC research group. She is also a faculty member at the Vector Institute for Artificial Intelligence and a Research Scholar at Amazon. She received her Ph.D. from Carnegie Mellon University and has previously worked at AMD, Intel, Microsoft, and Nvidia. Her research explores the intersection of computer systems/architecture with visual computing, including computer vision, robotics, and machine learning. She is particularly interested in building efficient, scalable, and programmable systems that enable machines to perceive, interpret, and interact with the physical world. Her research has been supported by industry and academic partners including Intel, AMD, LG, Nvidia, Sony, Rebellions, NSERC, MITACS, CentML, and CSE Canada. She is a recipient of the Connaught New Researcher Award, the Benjamin Garver Lamme Fellowship, was a Qualcomm Fellowship Finalist, and has been inducted into the ISCA Hall of Fame.