| Day | Time | Number | Paper ID | Session Topic | Title | Authors |
| Monday | 11:45-12:00 | 1 | 871 | Architectures and Techniques | On the Lipschitz Constant of Deep Networks and Double Descent | Matteo Gamba (KTH)*; Hossein Azizpour (KTH (Royal Institute of Technology)); Marten Bjorkman (KTH) |
| 12:00-12:15 | 2 | 265 | Architectures and Techniques | A Multi-step Fusion Network Based on Environmental Knowledge Graph for Camouflaged Object Detection | Zheng Wang (Tianjin University)*; Wenjun Huang (Tianjin University); Ruoxun Su (Tianjin University); Xinyu Yan (Tianjin University); Meijun Sun (Tianjin University) | |
| 12:15-12:30 | 3 | 437 | Architectures and Techniques | Maturity-Aware Active Learning for Semantic Segmentation with Hierarchically-Adaptive Sample Assessment | Amirsaeed Yazdani (Pennsylvania State University)*; Xuelu Li (Amazon); Vishal Monga (The Pennsylvania State University) | |
| 12:30-12:45 | 4 | 912 | Architectures and Techniques | Group Orthogonalization Regularization for Vision Models Adaptation and Robustness | Yoav Kurtz (Tel Aviv University); Noga Bar (Tel Aviv University); Raja Giryes (Tel Aviv University)* | |
| 12:45-13:00 | 5 | 276 | Architectures and Techniques | Attentive Contractive Flow with Lipschitz Constrained Self-Attention | Avideep Mukherjee (Indian Institute of Technology Kanpur)*; Badri N Patro (KU Leuven); Vinay Namboodiri (University of Bath) | |
| 14:00-14:15 | 6 | 497 | Architectures and Techniques | EDeNN: Event Decay Neural Networks for low latency vision | Celyn Walters (University of Surrey); Simon Hadfield (University of Surrey)* | |
| 14:15-14:30 | 7 | 669 | Architectures and Techniques | SRBGCN: Tangent space-Free Lorentz Transformations for Graph Feature Learning | Abdelrahman Mostafa (University of Oulu)*; Wei Peng (Stanford University); Guoying Zhao (University of Oulu) | |
| 14:30-14:45 | 8 | 601 | Architectures and Techniques | Overcoming Degradation Imbalance for Consistent Image Dehazing | Pranjay Shyam (Faurecia IRYStec)*; Hyunjin Yoo (Faurecia IRYStec) | |
| 14:45-15:00 | 9 | 829 | Architectures and Techniques | PseudoCal: Towards Initialisation-Free Deep Learning-Based Camera-LiDAR Self-Calibration | Mathieu Cocheteux (Université de Technologie de Compiègne)*; Franck Davoine (Heudiasyc - CNRS - Université de technologie de Compiègne); Julien Moreau (UTC, Heudiasyc-SyRI) | |
| 15:00-15:15 | 10 | 721 | Architectures and Techniques | Convolution kernel adaptation to calibrated fisheye | Bruno Berenguel-Baeta (Universidad de Zaragoza)*; Maria Santos-Villafranca (Universidad de Zaragoza); Jesus Bermudez-Cameo (Universidad de Zaragoza); Alejandro Perez Yus (Universidad de Zaragoza); Josechu Guerrero (Universidad de Zaragoza) | |
| 15:45-16:00 | 11 | 901 | 3D Analysis | On-Site Adaptation for Monocular Depth Estimation with a Static Camera | Huan Li (Bologna University)*; Matteo Poggi (University of Bologna); Fabio Tosi (University of Bologna); Stefano Mattoccia (University of Bologna) | |
| 16:00-16:15 | 12 | 823 | 3D Analysis | Improved Photometric Stereo through Efficient and Differentiable Shadow Estimation | Po-Hung Yeh (National Taiwan University); Pei-Yuan Wu (National Taiwan University); Jun-Cheng Chen (Academia Sinica)* | |
| 15:15-16:30 | 13 | 405 | 3D Analysis | Breathing New Life into 3D Assets with Generative Repainting | Tianfu Wang (ETH Zurich); Menelaos Kanakis (ETH Zurich); Konrad Schindler (ETH Zurich); Luc Van Gool (ETH Zurich); Anton Obukhov (ETH Zurich)* | |
| 16:30-16:45 | 14 | 14 | 3D Analysis | A-Scan2BIM: Assistive Scan to Building Information Modeling | Weilian Song (Simon Fraser University)*; Jieliang Luo (Autodesk Research); Dale Zhao (Autodesk Research); Yan Fu (Autodesk Research); Chin-Yi Cheng (Google Research); Yasutaka Furukawa (Simon Fraser University) | |
| 17:00-17:15 | 15 | 571 | 3D Analysis | Exploiting Multiple Priors for Neural 3D Indoor Reconstruction | Federico Lincetto (University of Padova)*; Gianluca Agresti (Sony Europe B.V.); Mattia Rossi (SONY Europe B.V.); Pietro Zanuttigh (University of Padova) | |
| 17:15-17:30 | 16 | 339 | 3D Analysis | Structured Knowledge Distillation Towards Efficient Multi-View 3D Object Detection | Linfeng Zhang (Tsinghua University )*; Yukang Shi (Xi’an Jiaotong University); Ke Wang (UNC Chapel Hill); Zhipeng Zhang (DiDi); Hung-Shuo Tai (Didi Autonomous Drive); Yuan He (KargoBot); Kaisheng Ma (Tsinghua University ) | |
| 17:30-17:45 | 17 | 305 | 3D Analysis | Learnable Geometry and Connectivity Modelling of BIM Objects | Haritha Jayasinghe (University of Cambridge)*; Ioannis Brilakis (University of Cambridge) | |
| 17:45-18:00 | 18 | 438 | 3D Analysis | Score-PA: Score-based 3D Part Assembly | Junfeng Cheng (Imperial College London); Mingdong Wu (Peking University); Ruiyuan Zhang (zhejiang university); Guanqi Zhan (University of Oxford); Chao Wu (Zhejiang University); Hao Dong (Peking University)* | |
| Tuesday | 10:00-10:15 | 19 | 187 | Efficient and scalable vision | Can Deep Networks be Highly Performant, Efficient and Robust simultaneously? | Madan Ravi Ganesh (BCAI)*; Salimeh Yasaei Sekeh (University of Maine); Jason J Corso (University of Michigan) |
| 10:15-10:30 | 20 | 290 | Efficient and scalable vision | Highly Efficient SNNs for High-speed Object Detection | Nemin Qiu (Beijing University of Posts and Telecommunications)*; zhiguo li (Peking University); Yuan Li (Peking University); Chuang Zhu (Beijing University of Posts and Telecommunications ) | |
| 10:30-10:45 | 21 | 832 | Efficient and scalable vision | Feather: An Elegant Solution to Effective DNN Sparsification | Athanasios Glentis Georgoulakis (National Technical University of Athens)*; George Retsinas (National Technical University of Athens); Petros Maragos (National Technical University of Athens) | |
| 10:45-11:00 | 22 | 311 | Efficient and scalable vision | RepQ: Generalizing Quantization-Aware Training for Re-Parametrized Architectures | Anastasiia Prutianova (Huawei)*; Alexey Zaytsev (Skoltech); Chung-Kuei Lee (Huawei); Fengyu Sun (Huawei); Ivan Koryakovskiy (Huawei Technologies Co., Ltd.) | |
| 11:45-12:00 | 23 | 53 | Explainable AI & Representation Learning | Unsupervised Hashing with Similarity Distribution Calibration | Kam Woh Ng (University of Surrey)*; Xiatian Zhu (University of Surrey); Jiun Tian Hoe (Nanyang Technological University); Chee Seng Chan (University of Malaya); Tianyu Zhang (Geek Plus); Yi-Zhe Song (University of Surrey); Tao Xiang (University of Surrey) | |
| 12:00-12:15 | 24 | 70 | Explainable AI & Representation Learning | Diversifying the High-level Features for better Adversarial Transferability | Zhiyuan Wang (Huazhong University of Science and Technology); Zeliang Zhang (University of Rochester); Siyuan Liang (Chinese Academy of Sciences); Xiaosen Wang (Huazhong University of Science and Technology)* | |
| 12:15-12:30 | 25 | 685 | Explainable AI & Representation Learning | Protecting Publicly Available Data With Machine Learning Shortcuts | Nicolas M Müller (Fraunhofer AISEC)*; Maximilian Burgert (TU Munich); Pascal Debus (Fraunhofer AISEC); Jennifer Williams (University of Southampton); Philip Sperl (Fraunhofer AISEC); Konstantin Böttinger (Fraunhofer AISEC) | |
| 12:30-12:45 | 26 | 771 | Explainable AI & Representation Learning | Vision Transformers are Inherently Saliency Learners | Yasser Abdelaziz DAHOU DJILALI (Dublin City UNIVERISTY )*; Kevin McGuinness (DCU); Noel O Connor (Home) | |
| 12:45-13:00 | 27 | 578 | Explainable AI & Representation Learning | H-NeXt: The next step towards roto-translation invariant networks | Tomáš Karella (Institute of Information Theory and Automation, Czech Academy of Sciences)*; Filip Šroubek (Institute of Information Theory and Automation, Czech Academy of Sciences); Jan Blažek (Institute of Information Theory and Automation, Czech Academy of Sciences); Jan Flusser (UTIA, Czech Academy of Sciences); Václav Košík (UTIA, Czech Academy of Sciences) | |
| 15:45-16:00 | 28 | 45 | Vision and language | A Critical Robustness Evaluation for Referring Expression Comprehension Methods | zhipeng zhang (Northwestern Polytechnical University); Zhimin Wei (Northwestern Polytechnical University); Peng Wang (Northwestern Polytechnical University)* | |
| 16:00-16:15 | 29 | 377 | Vision and language | Describe Your Facial Expressions by Linking Image Encoders and Large Language Models | Yujian Yuan (Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences); Jiabei Zeng (Institute of Computing Technology, Chinese Academy of Sciences)*; Shiguang Shan (Institute of Computing Technology, Chinese Academy of Sciences) | |
| 15:15-16:30 | 30 | 722 | Vision and language | Spatio-Temporal Graph Diffusion for Text-Driven Human Motion Generation | Chang Liu (University of Trento)*; Mengyi Zhao (Beihang University); Bin Ren (University of Trento); Mengyuan Liu (Peking University, Shenzhen Graduate School); Nicu Sebe (University of Trento) | |
| 16:30-16:45 | 31 | 366 | Vision and language | Divide & Bind Your Attention for Improved Generative Semantic Nursing | Yumeng Li (Bosch Center for Artificial Intelligence)*; Margret Keuper (University of Siegen, Max Planck Institute for Informatics); Dan Zhang (Bosch Center for Artificial Intelligence); Anna Khoreva (Bosch Center for Artificial Intelligence) | |
| 17:00-17:15 | 32 | 748 | Vision and language | Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes | Alexandros Delitzas (ETH Zurich)*; Maria Parelli (ETH Zurich); Nikolas Hars (ETH Zurich); Georgios Vlassis (ETH Zurich); Sotirios-Konstantinos Anagnostidis (ETH Zurich); Gregor Bachmann (ETH Zurich); Thomas Hofmann (ETH Zurich) | |
| 17:15-17:30 | 33 | 670 | Vision and language | DisCLIP: Open-Vocabulary Referring Expression Generation | Lior Bracha (Bar Ilan University)*; Eitan Shaar (bar Ilan University); Aviv Shamsian (Bar Ilan University); Ethan Fetaya (Bar Ilan University); Gal Chechik (NVIDIA) | |
| 17:30-17:45 | 34 | 581 | Vision and language | Video-adverb retrieval with compositional adverb-action embeddings | Thomas Hummel (University of Tübingen)*; A. Sophia Koepke (University of Tübingen); Otniel-Bogdan Mercea (University of Tübingen); Zeynep Akata (University of Tübingen) | |
| 17:45-18:00 | 35 | 429 | Vision and language | Zero-Shot Video Captioning by Evolving Pseudo-tokens | Yoad Tewel (Tel-Aviv University)*; Yoav Shalev (Tel Aviv University); Roy Nadler (Tel Aviv University); Idan Schwartz (Technion); Lior Wolf (Tel Aviv University, Israel) | |
| Wednesday | 10:00-10:15 | 36 | 114 | Action and event understanding | Attributes-Aware Network for Temporal Action Detection | Rui Dai (INRIA)*; Srijan Das (University of North Carolina at Charlotte); Michael S Ryoo (Stony Brook/Google); Francois Bremond (Inria Sophia Antipolis, France) |
| 10:15-10:30 | 37 | 179 | Action and event understanding | Boost Video Frame Interpolation via Motion Adaptation | Haoning Wu (Shanghai Jiao Tong University); Xiaoyun Zhang (Shanghai Jiao Tong University)*; Weidi Xie (Shanghai Jiao Tong University); Ya Zhang (Cooperative Medianet Innovation Center, Shang hai Jiao Tong University); Yan-Feng Wang (Cooperative medianet innovation center of Shanghai Jiao Tong University) | |
| 10:30-10:45 | 38 | 589 | Action and event understanding | Staged Contact-Aware Global Human Motion Forecasting | Luca Scofano (Sapienza University of Rome); Alessio Sampieri (Sapienza University)*; Elisabeth Schiele (Technische Universität München); Edoardo De Matteis (Sapienza University of Rome); Laura Leal-Taixé (NVIDIA); Fabio Galasso (Sapienza University) | |
| 10:45-11:00 | 39 | 317 | Action and event understanding | Spherical Vision Transformer for 360° Video Saliency Prediction | Mert Cokelek (Koç University)*; Nevrez Imamoglu (AIST); Cagri Ozcinar (Samsung); Erkut Erdem (Hacettepe University); Aykut Erdem (Koc University) | |
| 11:45-12:00 | 40 | 402 | Recognition/Identification/Detetcion | Revisiting the Encoding of Satellite Image Time Series | Xin Cai (Ulster University)*; Yaxin Bi (Ulster University); Peter Nicholl (Ulster University); Roy Sterritt (Ulster University) | |
| 12:00-12:15 | 41 | 10 | Recognition/Identification/Detetcion | Improving Out-of-Distribution Detection Performance using Synthetic Outlier Exposure Generated by Visual Foundation Models | Gitaek Kwon (VUNO Inc.); Jaeyoung Kim (VUNO Inc.)*; Hong-Jun Choi (VUNO Inc.); Byung-Moo Yoon (Gachon University); Sungchul Choi (Pukyong National University); Kyu-Hwan Jung (Sungkyunkwan University) | |
| 12:15-12:30 | 42 | 82 | Recognition/Identification/Detetcion | Object-Centric Multi-Task Learning for Human Instances | Hyeongseok Son (Samsung Advanced Institute of Technology)*; Sangil Jung (Samsung); Solae Lee (Samsung Advanced Institute of Technology); Seongeun Kim (Samsung); Seung-In Park (SAIT); ByungIn Yoo (Samsung Advanced Institute of Technology) | |
| 12:30-12:45 | 43 | 197 | Recognition/Identification/Detetcion | Domain-Sum Feature Transformation For Multi-Target Domain Adaptation | Takumi Kobayashi (National Institute of Advanced Industrial Science and Technology)*; Lincon Souza (National Institute of Advanced Industrial Science and Technology (AIST)); Kazuhiro Fukui (University of Tsukuba) | |
| 12:45-13:00 | 44 | 192 | Recognition/Identification/Detetcion | MILA: Memory-Based Instance-Level Adaptation for Cross-Domain Object Detection | Onkar Krishna (Hitachi Ltd.)*; Hiroki Ohashi (Hitachi Ltd); Saptarshi Sinha (University of Bristol) | |
| 15:30-15:45 | 45 | 617 | Medical and biological vision | Learning Anatomically Consistent Embedding for Chest Radiography | Ziyu Zhou (Shanghai Jiao Tong University); Haozhe Luo ( Arizona State University, USA ); Jiaxuan Pang (Arizona State University); xiaowei ding (Shanghai Jiao Tong University); Michael Gotway (Mayo Clinic); Jianming Liang (Arizona State University, USA)* | |
| 15:45-16:00 | 46 | 754 | Medical and biological vision | Single-Landmark vs. Multi-Landmark Deep Learning Approaches to Brain MRI Landmarking: a Case Study with Healthy Controls and Down Syndrome Individuals | Jordi Malé (La Salle - Ramon Llull University)*; Yann Heuzé (CNRS, Univ. Bordeaux, MC, PACEA, UMR5199); Juan Fortea (Hospital of Sant Pau); Neus Martinez Abadias (Universitat de Barcelona); Xavier Sevillano (La Salle - Universitat Ramon Llull) | |
| 16:00-16:15 | 47 | 482 | Medical and biological vision | BiUNet: Towards More Effective UNet with Bi-Level Routing Attention | Kun Dong (University of Chinese Academy of Sciences); Jian Xue (University of Chinese Academy of Sciences); Xing Lan (University of Chinese Academy of Sciences); Ke Lu (University of Chinese Academy of Sciences)* | |
| 16:15-16:30 | 48 | 575 | Medical and biological vision | Dual-Query Multiple Instance Learning for Dynamic Meta-Embedding based Tumor Classification | Simon Holdenried-Krafft (University of Tübingen)*; Peter Somers (University of Tübingen); Ivonne Montes-Mojarro (University Hospital of Tübingen); Diana Silimon (University Hospital of Tübingen); Cristina Tarín (University of Stuttgart); Falko Fend (University Hospital of Tübingen); Hendrik P. A. Lensch (University of Tübingen) | |
| 16:30-16:45 | 49 | 806 | Medical and biological vision | Adaptation of Distinct Semantics for Uncertain Areas in Polyp Segmentation | Quang Vinh Nguyen (Chonnam National University)*; Van Thong Huynh (Chonnam National University); Soo-Hyung Kim (Chonnam National University) | |
| 16:45-17:00 | 50 | 152 | Medical and biological vision | Primitive Geometry Segment Pre-training for 3D Medical Image Segmentation | Ryu Tadokoro (Tohoku University)*; Ryosuke Yamada (University of Tsukuba, National Institute of Advanced Industrial Science and Technology (AIST)); Kodai Nakashima (CyberAgent, Univ. of Tsukuba, AIST); Ryo Nakamura (Fukuoka University, National Institute of Advanced Industrial Science and Technology (AIST)); Hirokatsu Kataoka (National Institute of Advanced Industrial Science and Technology (AIST)) | |
| 17:00-17:15 | 51 | 881 | Medical and biological vision | Rethinking Transfer Learning for Medical Image Classification | Le Peng (University of Minnesota)*; Hengyue Liang (University of Minnesota); Gaoxiang Luo (University of Minnesota); Taihui Li (University of Minnesota); Ju Sun (University of Minnesota) | |
| 17:15-17:30 | 52 | 730 | Medical and biological vision | SA2-Net: Scale-aware Attention Network for Cell Segmentation and Beyond | Mustansar Fiaz (MBZUAI)*; Moein Heidari (Iran University of Science and Technology); Rao Muhammad Anwer (MBZUAI/AALTO); Hisham Cholakkal (MBZUAI) | |
| Thursday | 10:00-10:15 | 53 | 193 | Human/Object Pose Estimation | Functional Hand Type Prior for 3D Hand Pose Estimation and Action Recognition from Egocentric View Monocular Videos | WONSEOK ROH (Korea University); Seung Hyun Lee (Korea University); Won Jeong Ryoo (Korea University); Gyeongrok Oh (Korea University); Jakyung Lee (Korea University); Sooyeon Hwang (Korea University Sejong); Hyung-gun Chi (Purdue University); Sangpil Kim (Korea University)* |
| 10:15-10:30 | 54 | 609 | Human/Object Pose Estimation | Cross-attention Masked Auto-Encoder for Human 3D Motion Infilling and Denoising | David Björkstrand (KTH Royal Institute of Technology / Tracab)*; Josephine Sullivan (KTH Royal Institute of Technology); Lars M C Bretzner (Tracab AB); Gareth Loy (TRACAB); Tiesheng Wang (Tracab) | |
| 10:30-10:45 | 55 | 167 | Human/Object Pose Estimation | Efficient Vision Transformer for Human Pose Estimation via Patch Selection | Kaleab A Kinfu (Johns Hopkins University)*; Rene Vidal (Johns Hopkins University, USA) | |
| 10:45-11:00 | 56 | 543 | Human/Object Pose Estimation | Robust and Efficient Edge-guided Pose Estimation with Resolution-conditioned NeRF | Liesbeth Claessens (ETH Zurich)*; Fabian Manhardt (Google); Ricardo Martin-Brualla (Google); Roland Siegwart (ETH Zürich, Autonomous Systems Lab); Cesar Cadena Lerma (ETH Zurich); Federico Tombari (Google) | |
| 11:45-12:00 | 57 | 329 | Transfer, low-shot learning & Segmentation | Maskomaly: Zero-Shot Mask Anomaly Segmentation | Jan Ackermann (ETH Zurich)*; Christos Sakaridis (ETH Zurich); Fisher Yu (ETH Zurich) | |
| 12:00-12:15 | 58 | 606 | Transfer, low-shot learning & Segmentation | STARS: Zero-shot Sim-to-Real Transfer for Segmentation of Shipwrecks in Sonar Imagery | Advaith V Sethuraman (University of Michigan )*; Katherine A Skinner (University of Michigan) | |
| 12:15-12:30 | 59 | 544 | Transfer, low-shot learning & Segmentation | Re-Degradation and Contrastive Learning for Zero-shot Underwater Image Restoration | Nisha Varghese (IIT Madras)*; Rajagopalan N Ambasamudram (Indian Institute of Technology Madras) | |
| 12:30-12:45 | 60 | 566 | Transfer, low-shot learning & Segmentation | Polarimetric Imaging for Perception | Michael Baltaxe (General Motors)*; Tomer Pe'er (General Motors); Dan Levi (General Motors) | |
| 12:45-13:00 | 61 | 95 | Transfer, low-shot learning & Segmentation | SketchDreamer: Interactive Text-Augmented Creative Sketch Ideation | Zhiyu Qu (University of Surrey)*; Tao Xiang (University of Surrey); Yi-Zhe Song (University of Surrey) | |
| 15:45-16:00 | 62 | 535 | Faces and gestures | Security Analysis on Locality-Sensitive Hashing-based Biometric Template Protection Schemes | Seunghun Paik (Hanyang University); Sunpill Kim (Hanyang University); Jae Hong Seo (Hanyang university)* | |
| 16:00-16:15 | 63 | 506 | Faces and gestures | GestSync: Determining who is speaking without a talking head | Sindhu B Hegde (University of Oxford)*; Andrew Zisserman (University of Oxford) | |
| 16:15-16:30 | 64 | 282 | Faces and gestures | SlackedFace: Learning a Slacked Margin for Low-Resolution Face Recognition | Cheng Yaw Low (Institute for Basic Science)*; Jacky Chen Long Chai (Yonsei University); Jaewoo Park (Yonsei University); KYEONGJIN ANN (KAIST); Meeyoung Cha (KAIST & IBS) | |
| 16:30-16:45 | 65 | 598 | Faces and gestures | Unsupervised Landmark Discovery Using Consistency-Guided Bottleneck | Mamona Awan (MBZU); Muhammad Haris Khan (Muhammad Bin Zayed University of Artificial Intelligence)*; Sanoojan Baliah (Mohamed Bin Zayed University of Artificial Intelligence); Muhammad Ahmad Waseem (Information Technology University); Salman Khan (MBZUAI); Fahad Shahbaz Khan (MBZUAI); Arif Mahmood (Information Technology University) | |
| 16:45-17:00 | 66 | 105 | Faces and gestures | High-Fidelity Eye Animatable Neural Radiance Fields for Human Face | Hengfei Wang (University of Birmingham); Zhongqun Zhang (University of Birmingham); Yihua Cheng (University of Birmingham)*; Hyung Jin Chang (University of Birmingham) | |
| 17:00-17:15 | 67 | 216 | Faces and gestures | READ Avatars: Realistic Emotion-controllable Audio Driven Avatars | Jack Saunders (University of Bath)*; Vinay Namboodiri (University of Bath) |