Show cover of Talking Papers Podcast

Talking Papers Podcast

🎙️ Welcome to the Talking Papers Podcast: Where Research Meets Conversation 🌟Are you ready to explore the fascinating world of cutting-edge research in computer vision, machine learning, artificial intelligence, graphics, and beyond? Join us on this podcast by researchers, for researchers, as we venture into the heart of groundbreaking academic papers.At Talking Papers, we've reimagined the way research is shared. In each episode, we engage in insightful discussions with the main authors of academic papers, offering you a unique opportunity to dive deep into the minds behind the innovation.📚 Structure That Resembles a Paper 📝Just like a well-structured research paper, each episode takes you on a journey through the academic landscape. We provide a concise TL;DR (abstract) to set the stage, followed by a thorough exploration of related work, approach, results, conclusions, and a peek into future work.🔍 Peer Review Unveiled: "What Did Reviewer 2 Say?" 📢But that's not all! We bring you an exclusive bonus section where authors candidly share their experiences in the peer review process. Discover the insights, challenges, and triumphs behind the scenes of academic publishing.🚀 Join the Conversation 💬Whether you're a seasoned researcher or an enthusiast eager to explore the frontiers of knowledge, Talking Papers Podcast is your gateway to in-depth, engaging discussions with the experts shaping the future of technology and science.🎧 Tune In and Stay Informed 🌐Don't miss out on the latest in research and innovation. Subscribe and stay tuned for our enlightening episodes. Welcome to the future of research dissemination – welcome to Talking Papers Podcast! Enjoy the journey! 🌠 #TalkingPapersPodcast #ResearchDissemination #AcademicInsights

Tracks

3DInAction - Yizhak Ben-Shabat
🎙️ **Unveiling 3DInAction with Yizhak Ben-Shabat | Talking Papers Podcast** 🎙️📚 *Title:* 3DInAction: Understanding Human Actions in 3D Point Clouds  📅 *Published In:* CVPR 2024  👤 *Guest:* Yizhak (Itzik) Ben-ShabatWelcome back to another exciting episode of the Talking Papers Podcast, where we bring you the latest breakthroughs in academic research directly from early career academics and PhD students! This week, we have the pleasure of hosting Itzik Ben-Shabat to discuss his groundbreaking paper *3DInAction: Understanding Human Actions in 3D Point Clouds*, published in CVPR 2024 as a highlight.In this episode, we delve into a novel method for 3D point cloud action recognition. Itzik explains how this innovative pipeline addresses the major limitations of point cloud data, such as lack of structure, permutation invariance, and varying number of points. With patches moving in time (t-patches) and a hierarchical architecture, 3DInAction significantly enhances spatio-temporal representation learning, achieving superior performance on datasets like DFAUST and IKEA ASM.   **Main Contributions:**    1. Introduction of the 3DInAction pipeline for 3D point cloud action recognition.  2. Detailed explanation of t-patches as a key building block.  3. Presentation of a hierarchical architecture for improved spatio-temporal representations.  4. Demonstration of enhanced performance on existing benchmarks.**Host Insights:** Given my involvement in the project, I can share that when I embarked on this journey, there were only a handful of studies tackling the intricate task of 3D action recognition from point cloud data. Today, this has burgeoned into an active and evolving field of research, showing just how pivotal and timely this work is.**Anecdotes and Behind the Scenes:** The title "3DInAction" signifies the culmination of three years of passionate research coinciding with my fellowship's theme. This episode is unique as it's hosted by an AI avatar created by Synthesia—Itzik was looking for an exciting way to share this story using the latest technology. While there is no sponsorship, the use of AI avatars adds an innovative twist to our discussion. Don't miss this intellectually stimulating conversation with Itzik Ben-Shabat. Be sure to leave your thoughts and questions in the comments section below—we’d love to hear from you! And if you haven't already, hit that subscribe button to stay updated with our latest episodes.🔗 **Links and References:**- Watch the full episode: [Podcast Link]- Read the full paper: [Paper Link]📢 **Engage with Us:**- What are your thoughts on 3D point cloud action recognition? Drop a comment below!- Don’t forget to like, subscribe, and hit the notification bell for more insightful episodes!Join us in pushing the boundaries of what's possible in research and technology!--- Ready to be part of this journey? Click play and let’s dive deep into the world of 3D action recognition! 🚀All links and resources are available in the blogpost: https://www.itzikbs.com/3dinactionNote that the host of this episode is not a real person. It is an AI generated avatar and everything she said in the episode was fully scripted.🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
30:55 6/3/24
Cameras as Rays - Jason Y. Zhang
 Talking Papers Podcast Episode: "Cameras as Rays: Pose Estimation via Ray Diffusion" with Jason ZhangWelcome to the latest episode of the Talking Papers Podcast! This week's guest is Jason Zhang, a PhD student at the Robotics Institute at Carnegie Mellon University who joined us to discuss his paper, "Cameras as Rays: Pose Estimation via Ray Diffusion". The paper was published in the highly-respected conference ICLR, 2024.Jason's research hones in on the pivotal task of estimating camera poses for 3D reconstruction - a challenge made more complex with sparse views. His paper proposes an inventive and out-of-the-box representation that perceives camera poses as a bundle of rays. This innovative perspective makes a substantial impact on the issue at hand, demonstrating promising results even in the context of sparse views.What's particularly exciting is that his work, be it regression-based or diffusion-based, showcases top-notch performance on camera pose estimation on CO3D, and effectively generalizes to unseen object categories as well as captures in the wild. Throughout our conversation, Jason explained his insightful approach and how the denoising diffusion model and set-level transformers come into play to yield these impressive results. I found his technique a breath of fresh air in the field of camera pose estimation, notably in the formulation of both regression and diffusion models.  On a more personal note, Jason and I didn't know each other before this podcast, so it was fantastic learning about his journey from the Bay Area to Pittsburgh. His experiences truly enriched our discussion and coined one of our most memorable episodes yet. We hope you find this podcast as enlightening as we did creating it. If you enjoyed our chat, don't forget to subscribe for more thought-provoking discussions with early career academics and PhD students. Leave a comment below sharing your thoughts on Jason's paper! Until next time, keep following your curiosity and questioning the status quo.  #TalkingPapersPodcast #ICLR2024 #CameraPoseEstimation #3DReconstruction #RayDiffusion #PhDResearchers #AcademicResearch #CarnegieMellonUniversity #BayArea #PittsburghAll links and resources are available in the blogpost: https://www.itzikbs.com/cameras-as-rays🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
42:47 3/13/24
Instant3D - Jiahao Li
Welcome to another exciting episode of the Talking Papers Podcast! In this episode, I had the pleasure of hosting Jiahao Li, a talented PhD student at Toyota Technological Institute at Chicago (TTIC), who discussed his groundbreaking research paper titled "Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model". This paper, published in ICLR 2024, introduces a novel method that revolutionizes text-to-3D generation.Instant3D addresses the limitations of existing methods by combining a two-stage approach. First, a fine-tuned 2D text-to-image diffusion model generates a set of four structured and consistent views from the given text prompt. Then, a transformer-based sparse-view reconstructor directly regresses the NeRF from the generated images. The results are stunning: high-quality and diverse 3D assets are produced within a mere 20 seconds, making it a hundred times faster than previous optimization-based methods.As a 3D enthusiast myself, I found the outcomes of Instant3D truly captivating, especially considering the short amount of time it takes to generate them. While it's unusual for a 3D person like me to experience these creations through a 2D projection, the astonishing results make it impossible to ignore the potential of this approach. This paper underscores the importance of obtaining more and better 3D data, paving the way for exciting advancements in the field.Let me share a little anecdote about our guest, Jiahao Li. We were initially introduced through Yicong Hong, another brilliant guest on our podcast. Yicong, who was a PhD student at ANU during my postdoc, and Jiahao interned together at Adobe while working on this very paper. Coincidentally, Yicong also happens to be a coauthor of Instant3D. It's incredible to see such brilliant minds coming together on groundbreaking research projects.Now, unfortunately, the model developed in this paper is not publicly available. However, given the computational resources required to train these advanced models and obvious copyright issues, it's understandable that Adobe has chosen to keep it proprietary. Not all of us have a hundred GPUs lying around, right?Remember to hit that subscribe button and join the conversation in the comments section. Let's delve into the exciting world of Instant3D with Jiahao Li on this episode of Talking Papers Podcast!#TalkingPapersPodcast #ICLR2024 #Instant3D #TextTo3D  #ResearchPapers #PhDStudents #AcademicResearchAll links and resources are available in the blogpost: https://www.itzikbs.com/instant3d🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
52:41 2/16/24
Variational Barycentric Coordinates - Ana Dodik
In this exciting episode of #TalkingPapersPodcast, we have the pleasure of hosting Ana Dodik, a second-year PhD student at MIT. We delve into her research paper titled "Variational Barycentric Coordinates." Published in SIGGRAPH Asia, 2023, this paper significantly contributes to our understanding of the optimization of generalized barycentric coordinates. The paper introduces a robust variational technique that offers further control as opposed to existing models. Traditional practices are restrictive due to the representation of barycentric coordinates utilizing meshes or closed-form formulae. However, Dodik's research defies these limits by directly parameterizing the continuous function that maps any coordinate concerning a polytope's interior to its barycentric coordinates using a neural field. A profound theoretical characterization of barycentric coordinates is indeed the backbone of this innovation. This research demonstrates the versatility of the model by deploying variety of objective functions and also suggests a practical acceleration strategy.My take on this is rather profound: this tool can be very useful for artists. It sparks a thrill of anticipation of their feedback on its performance. Melding classical geometry processing methods with newer, Neural-X methods, this research stands as a testament to the significant advances in today's technology landscape.My talk with Ana was delightfully enriching. In a unique online setting, we discussed how the current times serve as the perfect opportunity to pursue a PhD. We owe that to improvements in technology.Remember to hit the subscribe button and leave a comment about your thoughts on Ana's research. We'd love to hear your insights and engage in discussions to further this fascinating discourse in academia.All links and resources are available in the blogpost: https://www.itzikbs.com/variational-barycentric-coordinates🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
41:03 12/14/23
Reverse Engineering SSL - Ravid Shwartz-Ziv
Welcome to another exciting episode of the Talking Papers Podcast! In this episode, we delve into the fascinating world of self-supervised learning with our special guest, Ravid Shwartz-Ziv. Together, we explore and dissect their research paper titled "Reverse Engineering Self-Supervised Learning," published in NeurIPS 2023.Self-supervised learning (SSL) has emerged as a game-changing technique in the field of machine learning. However, understanding the learned representations and their underlying mechanisms has remained a challenge - until now. Ravid Shwartz-Ziv's paper provides an in-depth empirical analysis of SSL-trained representations, encompassing various models, architectures, and hyperparameters.The study uncovers a captivating aspect of the SSL training process - its inherent ability to facilitate the clustering of samples based on semantic labels. Surprisingly, this clustering is driven by the regularization term in the SSL objective. Not only does this process enhance downstream classification performance, but it also exhibits a remarkable power of data compression. The paper further establishes that SSL-trained representations align more closely with semantic classes than random classes, even across different hierarchical levels. What's more, this alignment strengthens during training and as we venture deeper into the network.Join us as we discuss the insights gained from this exceptional research. One remarkable aspect of the paper is its departure from the trend of focusing solely on outperforming competitors. Instead, it dives deep into understanding the semantic clustering effect of SSL techniques, shedding light on the underlying capabilities of the tools we commonly use. It is truly a genre of research that holds immense value.During our conversation, Ravid Shwartz-Ziv - a CDS Faculty Fellow at NYU Center for Data Science - shares their perspectives and insights, providing an enriching layer to our exploration. Interestingly, despite both of us being in Israel at the time of recording, we had never met in person, highlighting the interconnectedness and collaborative nature of the academic world.Don't miss this thought-provoking episode that promises to expand your understanding of self-supervised learning and its impact on representation learning mechanisms. Subscribe to our channel now, join the discussion, and let us know your thoughts in the comments below! All links and resources are available in the blogpost: https://www.itzikbs.com/revenge_ssl🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
68:59 11/22/23
CSG on Neural SDFs - Zoë Marschner
Welcome to another exciting episode of the Talking Papers Podcast! In this installment, I had the pleasure of hosting the brilliant Zoë Marschner as we delved into the fascinating world of Constructive Solid Geometry on Neural Signed Distance Fields. This exceptional research paper, published in SIGGRAPH Asia 2023, explores the cutting-edge potential of neural networks in shaping geometric representations.In our conversation, Zoë enlightened us on the challenges surrounding the editing of shapes encoded by neural Signed Distance Fields (SDFs). While common geometric operators seem like a promising solution, they often result in incorrect outputs known as Pseudo-SDFs, rendering them unusable for downstream tasks. However, fear not! Zoë and her team have galvanized this field with groundbreaking insights.They characterize the space of Pseudo-SDFs and proffer a novel regularizer called the closest point loss. This ingenious technique encourages the output to be an exact SDF, ensuring accurate shape representation. Their findings have profound implications for operations like CSG (Constructive Solid Geometry) and swept volumes, revolutionizing their applications in fields such as computer-aided design (CAD).As a former mechanical engineer, I find the concept of combining CSGs with Neural Signed Distance fields to be immensely empowering. The potential for creating intricate and precise designs is mind-boggling!On a personal note, I couldn't be more thrilled about this episode. Not only were two of the co-authors, Derek and Silvia, previous guests on the podcast, but I also had the pleasure of virtually meeting Zoë for the first time. Recording this episode with her was an absolute blast, and I must say, her enthusiasm and expertise shine through, despite being in the early stages of her career. It's worth mentioning that she has even collaborated with some of the most senior figures in the field!Join us on this captivating journey into the world of Neural Signed Distance Fields. Don't forget to subscribe and leave your thoughts in the comments section below. We would love to hear your take on this groundbreaking research!All links and resources are available in the blogpost: https://www.itzikbs.com/CSG_on_NSDF #TalkingPapersPodcast #SIGGRAPHAsia2023 #SDFs #CSG #shapeediting #neuralnetworks #CAD #research🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
59:06 11/9/23
HMD-NeMo - Sadegh Aliakbarian
🎙️Join us on this exciting episode of the Talking Papers Podcast as we sit down with the talented Sadegh Aliakbarian to explore his groundbreaking ICCV 2023 paper "HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations" . Our guest, will take us on a journey through this pivotal research that addresses a crucial aspect of immersive mixed reality experiences.🌟 The quality of these experiences hinges on generating plausible and precise full-body avatar motion, a challenge given the limited input signals provided by Head-Mounted Devices (HMDs), typically head and hands 6-DoF. While recent approaches have made strides in generating full-body motion from such inputs, they assume full hand visibility. This assumption, however, doesn't hold in scenarios without motion controllers, relying instead on egocentric hand tracking, which can lead to partial hand visibility due to the HMD's field of view.🧠 "HMD-NeMo" presents a groundbreaking solution, offering a unified approach to generating realistic full-body motion even when hands are only partially visible. This lightweight neural network operates in real-time, incorporating a spatio-temporal encoder with adaptable mask tokens, ensuring plausible motion in the absence of complete hand observations.👤 Sadegh is currently a senior research scientist at Microsoft Mixed Reality and AI Lab-Cambridge (UK), where he's at the forefront of Microsoft Mesh and avatar motion generation. He holds a PhD from the Australian National University, where he specialized in generative modeling of human motion. His research journey includes internships at Amazon AI, Five AI, and Qualcomm AI Research, focusing on generative models, representation learning, and adversarial examples.🤝 We first crossed paths during our time at the Australian Centre for Robotic Vision (ACRV), where Sadegh was pursuing his PhD, and I was embarking on my postdoctoral journey. During this time, I had the privilege of collaborating with another co-author of the paper, Fatemeh Saleh, who also happens to be Sadegh's life partner. It's been incredible to witness their continued growth. 🚀 Join us as we uncover the critical advancements brought by "HMD-NeMo" and their implications for the future of mixed reality experiences. Stay tuned for the episode release! All links and resources are available in the blogpost: https://www.itzikbs.com/hmdnemo🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
35:53 9/29/23
CC3D - Jeong Joon Park
Join us on this exciting episode of the Talking Papers Podcast as we sit down with the brilliant Jeong Joon Park to explore his groundbreaking paper, "CC3D: Layout-Conditioned Generation of Compositional 3D Scenes," just published at ICCV 2023.Discover CC3D, a game-changing conditional generative model redefining 3D scene synthesis. Unlike traditional 3D GANs, CC3D boldly crafts complex scenes with multiple objects, guided by 2D semantic layouts. With a novel 3D field representation, CC3D delivers efficiency and superior scene quality. Get ready for a deep dive into the future of 3D scene generation.My journey with Jeong Joon Park began with his influential SDF paper at CVPR 2019. We met in person at CVPR 2022, thanks to mutual guest Despoina,  who was also a guest on our podcast. Now, as Assistant Professor at the University of Michigan CSE, JJ leads research in realistic 3D content generation, offering opportunities for students to contribute to the frontiers of computer vision and AI.Don't miss this insightful exploration of this ICCV 2023 paper and the future of 3D scene synthesis.CC3D: Layout-Conditioned Generation of Compositional 3D ScenesAuthorsSherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Xingguang Yan, Gordon Wetzstein, Leonidas Guibas, Andrea TagliasacchiAbstractIn this work, we introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts, trained using single-view images. Different from most existing 3D GANs that limit their applicability to aligned single objects, we focus on generating complex scenes with multiple objects, by modeling the compositional nature of 3D scenes. By devising a 2D layout-based approach for 3D synthesis and implementing a new 3D field representation with a stronger geometric inductive bias, we have created a 3D GAN that is both efficient and of high quality, while allowing for a more controllable generation process. Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality in comparison to previous works.All links and resources are available on the blog post: https://www.itzikbs.com/cc3d Subscribe and stay tuned! 🚀🔍🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
56:34 9/28/23
NeRF-Det - Chenfeng Xu
Welcome to another exciting episode of the Talking Papers Podcast! In this installment, I had the pleasure of hosting Chengfenfg Xu to discuss his paper "NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection" which was published at ICCV2023.In recent times, NeRF has gained widespread prominence, and the field of 3D detection has encountered well-recognized challenges. The principal contribution of this study lies in its ability to address the detection task while simultaneously training a NeRF model and enabling it to generalize to previously unobserved scenes. Although the computer vision community has been actively addressing various tasks related to images and point clouds for an extended period, it is particularly invigorating to witness the application of NeRF representation in tackling this specific challenge.Chenfeng is currently a Ph.D. candidate at UC Berkeley, collaborating with Prof. Masayoshi Tomizuka and Prof. Kurt Keutzer. His affiliations include Berkeley DeepDrive (BDD) and Berkeley AI Research (BAIR), along with the MSC lab and PALLAS. His research endeavors revolve around enhancing computational and data efficiency in machine perception, with a primary focus on temporal-3D scenes and their downstream applications. He brings together traditionally separate approaches from geometric computing and deep learning to establish both theoretical frameworks and practical algorithms for temporal-3D representations. His work spans a wide range of applications, including autonomous driving, robotics, AR/VR, and consistently demonstrates remarkable efficiency through extensive experimentation. I am eagerly looking forward to see his upcoming research papers. PAPERNeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object DetectionAUTHORSChenfeng Xu, Bichen Wu, Ji Hou, Sam Tsai, Ruilong Li, Jialiang Wang, Wei Zhan, Zijian He, Peter Vajda, Kurt Keutzer, Masayoshi TomizukaABSTRACTNeRF-Det is a novel method for 3D detection with posed RGB images as input. Our method makes novel use of NeRF in an end-to-end manner to explicitly estimate 3D geometry, thereby improving 3D detection performance. Specifically, to avoid the significant extra latency associated with per-scene optimization of NeRF, we introduce sufficient geometry priors to enhance the generalizability of NeRF-MLP. We subtly connect the detection and NeRF branches through a shared MLP, enabling an efficient adaptation of NeRF to detection and yielding geometry-aware volumetric representations for 3D detection. As a result of our joint-training design, NeRF-Det is able to generalize well to unseen scenes for object detection, view synthesis, and depth estimation tasks without per-scene optimization.All links and resources are available on the blog post: https://www.itzikbs.com/nerf-det🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
29:47 9/6/23
MagicPony - Tomas Jakab
Welcome to another exciting episode of the Talking Papers Podcast! In this installment, I had the pleasure of hosting Tomas Jakab to discuss his paper "MagicPony: Learning Articulated 3D Animals in the Wild" which was published at CVPR 2023.The motivation behind the MagicPony methodology stems from the challenge posed by the scarcity of labeled data, particularly when dealing with real-world scenarios involving freely moving articulated 3D animals. In response, the authors propose an innovative solution that addresses this issue. This novel approach takes an ordinary RGB image as input and produces a sophisticated 3D model with detailed shape, texture, and lighting characteristics. The method's uniqueness lies in its ability to learn from diverse images captured in natural settings, effectively deciphering the inherent differences between them. This enables the system to establish a foundational average shape while accounting for specific deformations that vary from instance to instance. To achieve this, the researchers blend the strengths of two techniques, radiance fields and meshes, which together contribute to the comprehensive representation of the object's attributes. Additionally, the method employs a strategic viewpoint sampling technique to enhance computational speed. While the current results may not be suitable for practical applications just yet, this endeavor constitutes a substantial advancement in the field, as demonstrated by the tangible improvements showcased both quantitatively and qualitatively.AUTHORSShangzhe Wu*, Ruining Li*, Tomas Jakab*, Christian Rupprecht, Andrea VedaldiABSTRACTWe consider the problem of learning a function that can estimate the 3D shape, articulation, viewpoint, texture, and lighting of an articulated animal like a horse, given a single test image. We present a new method, dubbed MagicPony, that learns this function purely from in-the-wild single-view images of the object category, with minimal assumptions about the topology of deformation. At its core is an implicit-explicit representation of articulated shape and appearance, combining the strengths of neural fields and meshes. In order to help the model understand an object's shape and pose, we distil the knowledge captured by an off-the-shelf self-supervised vision transformer and fuse it into the 3D model. To overcome common local optima in viewpoint estimation, we further introduce a new viewpoint sampling scheme that comes at no added training cost. Compared to prior works, we show significant quantitative and qualitative improvements on this challenging task. The model also demonstrates excellent generalisation in reconstructing abstract drawings and artefacts, despite the fact that it is only trained on real images.RELATED PAPERS📚CMR📚Deep Marching Tetrahedra📚DINO-ViTLINKS AND RESOURCES📚 Paper💻 Project page💻 CodeCONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comAll links are available in the blog post: https://www.itzikbs.com/magicpony🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
50:18 8/9/23
Word-As-Image - Shir Iluz
All links are available in this blog postWelcome to another exciting episode of the Talking Papers Podcast! In this installment, I had the pleasure of hosting Shir Iluz to discuss her groundbreaking paper titled "Word-As-Image for Semantic Typography" which won the SIGGRAPH 2023 Honorable Mention award.This scientific paper introduces an innovative approach for text morphing based on semantic context. Using bezier curves with control points, a rasterizer, and a vector diffusion model, the authors transform words like "bunny" into captivating bunny-shaped letters. Their optimization-based method accurately conveys the word's meaning. They address the readability-semantic balance with multiple loss functions, serving as "control knobs" for users to fine-tune results. The paper's compelling results are showcased in an impressive demo. Don't miss it!Their work carries immense potential, promising to revolutionize the creative processes of artists and designers. Rather than commencing from a traditional blank canvas or plain font, this innovative approach enables individuals to initiate their logo design journey by transforming a word into a captivating image. The implications of this novel technique hold the power to reshape the very workflow of artistic expression, opening up exciting new possibilities for visual communication and design aesthetics.I am eagerly anticipating the next set of papers she will sketch out (pun intended).AUTHORSShir Iluz, Yael Vinker, Amir Hertz, Daniel Berio, Daniel Cohen-Or, Ariel ShamirABSTRACTA word-as-image is a semantic typography technique where a word illustration presents a visualization of the meaning of the word, while also preserving its readability. We present a method to create word-as-image illustrations automatically. This task is highly challenging as it requires semantic understanding of the word and a creative idea of where and how to depict these semantics in a visually pleasing and legible manner. We rely on the remarkable ability of recent large pretrained language-vision models to distill textual concepts visually. We target simple, concise, black-and-white designs that convey the semantics clearly. We deliberately do not change the color or texture of the letters and do not use embellishments. Our method optimizes the outline of each letter to convey the desired concept, guided by a pretrained Stable Diffusion model. We incorporate additional loss terms to ensure the legibility of the text and the preservation of the style of the font. We show high quality and engaging results on numerous examples and compare to alternative techniques.RELATED PAPERS📚VectorFusionLINKS AND RESOURCES📚 Paper💻 Project page 💻 Code 💻 DemoCONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.com🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
26:06 7/20/23
Panoptic Lifting - Yawar Siddiqui
In this episode of the Talking Papers Podcast, I hosted Yawar Siddiqui to chat about his CVPR 2023 paper "Panoptic Lifting for 3D Scene Understanding with Neural Fields".All links are available in the blog post.In this paper, they proposed a new method for "lifting" 2D panoptic segmentation into a 3D volume represented as neural fields using in-the-wild scene images. While the semantic segmentation part is simply represented as an MLP, the instance indices are very difficult to keep track of in between the different frames. This is solved using a Hungarian algorithm and a set of custom losses. Yawar is currently a PhD student at the Technical University of Munich (TUM) under the supervision of Prof. Matthias Niessner. This work was done as part of his latest internship with Meta Zurich.  It was a pleasure chatting with him and I can't wait to see what he cooks up next. AUTHORSYawar Siddiqui, Lorenzo Porzi, Samuel Rota Bulò, Norman Müller, Matthias Nießner, Angela Dai, Peter KontschiederABSTRACTWe propose Panoptic Lifting, a novel approach for learning panoptic 3D volumetric representations from images of in-the-wild scenes. Once trained, our model can render color images together with 3D-consistent panoptic segmentation from novel viewpoints. Unlike existing approaches which use 3D input directly or indirectly, our method requires only machine-generated 2D panoptic segmentation masks inferred from a pre-trained network. Our core contribution is a panoptic lifting scheme based on a neural field representation that generates a unified and multi-view consistent, 3D panoptic representation of the scene. To account for inconsistencies of 2D instance identifiers across views, we solve a linear assignment with a cost based on the model's current predictions and the machine-generated segmentation masks, thus enabling us to lift 2D instances to 3D in a consistent way. We further propose and ablate contributions that make our method more robust to noisy, machine-generated labels, including test-time augmentations for confidence estimates, segment consistency loss, bounded segmentation fields, and gradient stopping. Experimental results validate our approach on the challenging Hypersim, Replica, and ScanNet datasets, improving by 8.4, 13.8, and 10.6% in scene-level PQ over state of the art.SPONSORThis episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.Visit YOOMFor job opportunities with YOOM visit https://www.yoom.com/careers/CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comThis episode was recorded on  July 6th,  2023.#talkingpapers #CVPR2023 #PanopticLifting #NeRF #TensoRF #AI #Segmentation #DeepLearning #MachineLearning #research #artificialintelligence #podcasts #MachineLearning #research #artificialintelligence #podcasts🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
45:17 7/10/23
MobileBrick - Kejie Li
In this episode of the Talking Papers Podcast, I hosted Kejie Li to chat about his CVPR 2023 paper "MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices".All links are available in the blog post.In this paper, they proposed a new dataset and paradigm for evaluating 3D object reconstruction. It is very difficult to create a digital twin of 3D objects, even with expensive sensors. They introduce a new RGBD dataset, captured from a mobile device. The nice trick to obtaining the ground truth is that they used LEGO bricks that have an exact CAD model. Kejie is currently a research scientist at ByteDance/ TikTok. When writing this paper he was a postdoc at Oxford. Prior to this, he successfully obtained his PhD from the University of Adelaide. Although we hadn't crossed paths until this episode, we both have some common ground in our CVs, having been affiliated with different nodes of the ACRV (Adelaide for him and ANU for me). I'm excited to see what he comes up with next, and eagerly await his future endeavours.AUTHORSKejie Li, Jia-Wang Bian, Robert Castle, Philip H.S. Torr, Victor Adrian PrisacariuABSTRACTHigh-quality 3D ground-truth shapes are critical for 3D object reconstruction evaluation. However, it is difficult to create a replica of an object in reality, and even 3D reconstructions generated by 3D scanners have artefacts that cause biases in evaluation. To address this issue, we introduce a novel multi-view RGBD dataset captured using a mobile device, which includes highly precise 3D ground-truth annotations for 153 object models featuring a diverse set of 3D structures. We obtain precise 3D ground-truth shape without relying on high-end 3D scanners by utilising LEGO models with known geometry as the 3D structures for image capture. The distinct data modality offered by high-resolution RGB images and low-resolution depth maps captured on a mobile device, when combined with precise 3D geometry annotations, presents a unique opportunity for future research on high-fidelity 3D reconstruction. Furthermore, we evaluate a range of 3D reconstruction algorithms on the proposed dataset.RELATED PAPERS📚COLMAP📚NeRF📚NeuS📚CO3DLINKS AND RESOURCES 📚 Paper 💻Project page 💻Code SPONSOR This episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.Visit YOOM For job opportunities with YOOM visit https://www.yoom.com/careers/CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comThis episode was recorded on  May 8th,  2023.#talkingpapers #CVPR2023 #NeRF #Dataset #mobilebrick #ComputerVision #AI #NeuS #DeepLearning #MachineLearning #research #artificialintelligence #podcasts🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
46:15 6/14/23
IAW Dataset - Jiahao Zhang
All links are available in the blog post.In this episode of the Talking Papers Podcast, I hosted Jiahao Zhang to chat about our CVPR 2023 paper "Aligning Step-by-Step Instructional Diagrams to Video Demonstrations".furniture assembly diagram. To do that, we collected and annotated a brand new dataset: "IKEA Assembly in the Wild" where we aligned YouTube videos with IKEA's instruction manuals. Our approach to addressing this task proposes several supervised contrastive losses that contrast between video and diagram, video and manual, and internal manual images. Jiahao is currently a PhD student at the Australian National University. His research focus is on human action recognition and multi-modal representation alignment. We first met (virtually) when Jiahao did his Honours project, where he developed an amazing (and super useful) video annotation tool ViDaT. His strong software engineering and web development background gives him a strong advantage when working on his research projects. Even though we never met in person (yet), we are actively collaborating and I already know what he is cooking up next. I hope to share it with the world soon.AUTHORSJiahao Zhang, Anoop Cherian, Yanbin Liu, Yizhak Ben-Shabat, Cristian Rodriguez, Stephen GouldRELATED PAPERS📚IKEA ASM Dataset📚CLIP📚SlowFastLINKS AND RESOURCES📚 Paper💻Project page💻Dataset page💻CodeSPONSOR This episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.Visit YOOMFor job opportunities with YOOM visit https://www.yoom.com/careers/CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comThis episode was recorded on  May 1st,  2023.#talkingpapers #CVPR2023 #IAWDataset #ComputerVision #AI #ActionRecognition #DeepLearning #MachineLearning #research #artificialintelligence #podcasts🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
34:32 5/17/23
INR2Vec - Luca De Luigi
All links are available in the blog post: https://www.itzikbs.com/inr2vec/In this episode of the Talking Papers Podcast, I hosted Luca De Luigi. We had a great chat about his paper “Deep Learning on Implicit Neural Representations of Shapes”, AKA INR2Vec, published in ICLR 2023 .In this paper, they take implicit neural representations to the next level and use them as input signals for neural networks to solve multiple downstream tasks. The core idea was captured by one of the authors in a very catchy and concise tweet: "Signals are networks so networks are data and so networks can process other networks to understand and generate signals". Luca recently received his PhD from the University of Bolognia and is currently working at a startup based in Bolognia eyecan.ai. His research focus is on neural representations of signals, especially for 3D geometry. To be honest, I knew I wanted to get Luca on the podcast the second I saw the paper on arXiv because I was working on a related topic but had to shelf it due to time management issues. This paper got me excited about that topic again. I didn't know Luca before recording the episode and it was a delight to get to know him and his work.  AUTHORSLuca De Luigi, Adriano Cardace, Riccardo Spezialetti, Pierluigi Zama Ramirez, Samuele Salti, Luigi Di StefanoABSTRACTpes, INRs allow to overcome the fragmentation and shortcomings of the popular discrete representations used so far. Yet, considering that INRs consist in neural networks, it is not clear whether and how it may be possible to feed them into deep learning pipelines aimed at solving a downstream task. In this paper, we put forward this research problem and propose inr2vec, a framework that can compute a compact latent representation for an input INR in a single inference pass. We verify that inr2vec can embed effectively the 3D shapes represented by the input INRs and show how the produced embeddings can be fed into deep learning pipelines to solve several tasks by processing exclusively INRs.RELATED PAPERS📚SIREN📚DeepSDF📚PointNetLINKS AND RESOURCES📚 Paper 💻Project page SPONSOR This episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.Visit https://www.yoom.com/ For job opportunities with YOOM visit https://www.yoom.com/careers/CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comThis episode was recorded on  March 22,  2023.#talkingpapers #ICLR2023 #INR2Vec  #ComputerVision #AI #DeepLearning #MachineLearning #INR #ImplicitNeuralRepresentation  #research #artificialintelligence #podcasts🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
49:15 3/29/23
CLIPasso - Yael Vinker
In this episode of the Talking Papers Podcast, I hosted Yael Vinker. We had a great chat about her paper "CLIPasso: SEmantically-Aware Object Sketching”, SIGGRAPH 2022 best paper award winner. In this paper, they convert images into sketches with different levels of abstraction. They avoid the need for sketch datasets by using the well-known CLIP model to distil the semantic concepts from sketches and images. There is no network training here, just optimizing the control points of Bezier curves to model the sketch strokes (initialized by a saliency map). How is this differentiable? They use a differentiable rasterizer. The degree of abstraction is controlled by the number of strokes. Don't miss the amazing demo they created.Yael is currently a PhD student at Tel Aviv University. Her research focus is on computer vision, machine learning, and computer graphics with a unique twist of combining art and technology. This work was done as part of her internship at EPFLAUTHORSYael Vinker, Ehsan Pajouheshgar, Jessica Y. Bo, Roman Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, Ariel ShamirABSTRACT Abstraction is at the heart of sketching due to the simple and minimal nature of line drawings. Abstraction entails identifying the essential visual properties of an object or scene, which requires semantic understanding and prior knowledge of high-level concepts. Abstract depictions are therefore challenging for artists, and even more so for machines. We present an object sketching method that can achieve different levels of abstraction, guided by geometric and semantic simplifications. While sketch generation methods often rely on explicit sketch datasets for training, we utilize the remarkable ability of CLIP (Contrastive-Language-Image-Pretraining) to distil semantic concepts from sketches and images alike. We define a sketch as a set of Bézier curves and use a differentiable rasterizer to optimize the parameters of the curves directly with respect to a CLIP-based perceptual loss. The abstraction degree is controlled by varying the number of strokes. The generated sketches demonstrate multiple levels of abstraction while maintaining recognizability, underlying structure, and essential visual components of the subject drawn.RELATED PAPERS📚CLIP: Connecting Text and Images📚Differentiable Vector Graphics Rasterization for Editing and LearningLINKS AND RESOURCES📚 Paper💻Project pageSPONSORThis episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.Visit YOOM.com.CONTACTIf you would like to be a guest, sponsor or share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.com🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
45:43 3/13/23
Random Walks for Adversarial Meshes - Amir Belder
All links are available in the blog post. In this episode of the Talking Papers Podcast, we hosted Amir Belder. We had a great chat about his paper "Random Walks for Adversarial Meshes”, published in SIGGRAPH 2022. In this paper, they take on the task of creating an adversarial attack for triangle meshes. This is a non-trivial task since meshes are irregular. To solve the irregularity they use random walks instead of the raw mesh. On top of that, they trained an imitating network that mimics the predictions of the attacked network and used the gradients to perturb the input points. Amir is currently a PhD student at the Computer Graphics and Multimedia Lab at the Technion Israel Institute of Technology.  His research focus is on computer graphics and geometric processing and machine learning. We spend a lot of time together at the lab and chat often about science, papers and where the field is headed. Having this paper published was a great opportunity to share one of these conversations with you.AUTHORS Amir Belder, Gal Yefet, Ran Ben-Itzhak, Ayellet TalABSTRACT have recently emerged as a useful representation for 3D shapes. These fields are We A polygonal mesh is the most-commonly used representation of surfaces in computer graphics. Therefore, it is not surprising that a number of mesh classification networks have recently been proposed. However, while adversarial attacks are wildly researched in 2D, the field of adversarial meshes is under explored. This paper proposes a novel, unified, and general adversarial attack, which leads to misclassification of several state-of-the-art mesh classification neural networks. Our attack approach is black-box, i.e. it has access only to the network’s predictions, but not to the network’s full architecture or gradients. The key idea is to train a network to imitate a given classification network. This is done by utilizing random walks along the mesh surface, which gather geometric information. These walks provide insight onto the regions of the mesh that are important for the correct prediction of the given classification network. These mesh regions are then modified more than other regions in order to attack the network in a manner that is barely visible to the naked eye.RELATED PAPERS📚Explaining and Harnessing Adversarial Examples📚Meshwalker: Deep mesh understanding by random walksLINKS AND RESOURCES📚 Paper 💻Code  To stay up to date with Amir's latest research, follow him on:🐦Twitter 👨🏻‍🎓Google Scholar👨🏻‍🎓LinkedIn CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comThis episode was recorded on November 23rd 2022.#talkingpapers #SIGGRAPH2022 #RandomWalks #MeshWalker #AdversarialAttacks #Mesh #ComputerVision #AI #DeepLearning #MachineLearning #ComputerGraphics  #research #artificialintelligence #podcasts🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
26:45 12/14/22
SPSR - Silvia Sellán
In this episode of the Talking Papers Podcast, I hosted Silvia Sellán. We had a great chat about her paper "Stochastic Poisson Surface Reconstruction”, published in SIGGRAPH Asia 2022. In this paper, they take on the task of surface reconstruction with a probabilistic twist. They take the well-known Poisson Surface reconstruction algorithm and generalize it to give it a full statistical formalism. Essentially their method quantifies the uncertainty of surface reconstruction from a point cloud. Instead of outputting an implicit function, they represent the shape as a modified Gaussian process. This unique perspective and interpretation enables conducting statistical queries, for example, given a point, is it on the surface? is it inside the shape?Silvia is currently a PhD student at the University of Toronto. Her research focus is on computer graphics and geometric processing. She is a Vanier Doctoral Scholar, an Adobe Research Fellow and the winner of the 2021 UoFT FAS Deans Doctoral excellence scholarship. I have been following Silvia's work for a while and since I have some work on surface reconstruction when SPSR came out, I knew I wanted to host her on the podcast (and gladly she agreed). Silvia is currently looking for postdoc and faculty positions to start in the fall of 2024. I am really looking forward to seeing which institute snatches her. In our conversation, I particularly liked her explanation of Gaussian Processes with the example "How long does it take my supervisor to answer an email as a function of the time of day the email was sent", You can't read that in any book. But also, we took an unexpected pause from the usual episode structure to discuss the question of "papers" as a medium for disseminating research. Don't miss it.AUTHORSSilvia Sellán, Alec JacobsonABSTRACT shapes from 3D point clouds. Instead of outputting an implicit function, we represent the reconstructed shape as a modified Gaussian Process, which allows us to conduct statistical queries (e.g., the likelihood of a point in space being on the surface or inside a solid). We show that this perspective: improves PSR's integration into the online scanning process, broadens its application realm, and opens the door to other lines of research such as applying task-specific priors. RELATED PAPERS📚Poisson Surface Reconstruction📚Geometric Priors for Gaussian Process Implicit Surfaces📚Gaussian processes for machine learningLINKS AND RESOURCES📚 Paper💻Project pageTo stay up to date with Silvia's latest research, follow him on:🐦Twitter👨🏻‍🎓Google Scholar🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
40:51 12/6/22
Beyond Periodicity - Sameera Ramasinghe
In this episode of the Talking Papers Podcast, I hosted Sameera Ranasinghe. We had a great chat about his paper "Beyond Periodicity: Towards a Unifying Framework for Activations in Coordinate-MLPs”, published in ECCV 2022 as an oral presentation. In this paper, they propose a new family of activation functions for coordinate MLPs and provide a theoretical analysis of their effectiveness. Their main proposition is that the stable rank is a good measure and design tool for such activation functions. They show that their proposed activations outperform the traditional ReLU and Sine activations for image parametrization and novel view synthesis. They further show that while the proposed family of activations does not require positional encoding they can benefit from using it by reducing the number of layers significantly.Sameera is currently an applied scientist at Amazon and the CTO and co-founder of ConscientAI. His research focus is theoretical machine learning and computer vision. This work was done when he was a postdoc at the Australian Institute of Machine Learning (AIML). He completed his PhD at the Australian National University (ANU). We first met back in 2019 when I was a research fellow at ANU and he was still doing his PhD. I immediately noticed we share research interests and after a short period of time, I flagged him as a rising star in the field. It was a pleasure to chat with Sameera and I am looking forward to reading his future papers. AUTHORSSameera Ramasinghe, Simon LuceyRELATED PAPERS📚NeRF📚SIREN📚"Fourier Features Let Networks Learn High-Frequency Functions in Low Dimensional Domains" 📚On the Spectral Bias of Neural NetworksLINKS AND RESOURCES📚 Paper💻CodeTo stay up to date with Marko's latest research, follow him on:🐦Twitter👨🏻‍🎓Google Scholar👨🏻‍🎓LinkedInRecorded on November 14th 2022.🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
32:46 11/15/22
KeypointNeRF - Marko Mihajlovic
 In this episode of the Talking Papers Podcast, I hosted Marko Mihajlovic . We had a great chat about his paper "KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints”, published in ECCV 2022. In this paper, they create a generalizable NeRF for virtual avatars. To get a high-fidelity reconstruction of humans (from sparse observations), they leverage an off-the-shelf keypoint detector in order to condition the NeRF.  Marko is a 2nd year PhD student at ETH, supervised by Siyu Tang. His research focuses on photorealistic reconstruction of static and dynamic scenes and also modeling of parametric human bodies. This work was done mainly during his internship at Meta Reality Labs. Marko and I met at CVPR 2022.  AUTHORSMarko Mihajlovic, Aayush Bansal, Michael Zollhoefer, Siyu Tang, Shunsuke SaitoABSTRACT Neural implicit fields have recently emerged as a useful representation for 3D shapes. These fields are Coordinate-based networks have emerged as a powerful tool for 3D representation and scene reconstruction. These networks are trained to map continuous input coordinates to the value of a signal at each point. Still, current architectures are black boxes: their spectral characteristics cannot be easily Image-based volumetric humans using pixel-aligned features promise generalization to unseen poses and identities. Prior work leverages global spatial encodings and multi-view geometric consistency to reduce spatial ambiguity. However, global encodings often suffer from overfitting to the distribution of the training data, and it is difficult to learn multi-view consistent reconstruction from sparse views. In this work, we investigate common issues with existing spatial encodings and propose a simple yet highly effective approach to modeling high-fidelity volumetric humans from sparse views. One of the key ideas is to encode relative spatial 3D information via sparse 3D keypoints. This approach is robust to the sparsity of viewpoints and cross-dataset domain gap. Our approach outperforms state-of-the-art methods for head reconstruction. On human body reconstruction for unseen subjects, we also achieve performance comparable to prior work that uses a parametric human body model and temporal feature aggregation. Our experiments show that a majority of errors in prior work stem from an inappropriate choice of spatial encoding and thus we suggest a new direction for high-fidelity image-based human modeling. RELATED PAPERS📚NeRF 📚IBRNet 📚PIFuLINKS AND RESOURCES 💻Project website 📚 Paper 💻Code 🎥Video To stay up to date with Marko's latest research, follow him on: 👨🏻‍🎓Personal Page🐦Twitter👨🏻‍🎓Google Scholar CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
23:57 10/19/22
BACON - David Lindell
 In this episode of the Talking Papers Podcast, I hosted David B. Lindell to chat about his paper "BACON: Band-Limited Coordinate Networks for Multiscale Scene Representation”, published in CVPR 2022.  In this paper, they took on training a coordinate network. They do this by introducing a new type of neural network architecture that has an analytical Fourier spectrum. This allows them to do things like multi-scale signal representation, and, it gives an interpretable architecture, with an explicitly controllable bandwidth. David recently completed his Postdoc at Stanford and will join the University of Toronto as an Assistant Professor. During our chat, I got to know a stellar academic with a unique view of the field and where it is going. We even got to meet in person at CVPR. I am looking forward to seeing what he comes up with next. It was a pleasure having him on the podcast. AUTHORSDavid B. Lindell, Dave Van Veen, Jeong Joon Park, Gordon WetzsteinABSTRACT Neural implicit fields have recently emerged as a useful representation for 3D shapes. These fields are Coordinate-based networks have emerged as a powerful tool for 3D representation and scene reconstruction. These networks are trained to map continuous input coordinates to the value of a signal at each point. Still, current architectures are black boxes: their spectral characteristics cannot be easily analyzed, and their behavior at unsupervised points is difficult to predict. Moreover, these networks are typically trained to represent a signal at a single scale, so naive downsampling or upsampling results in artifacts. We introduce band-limited coordinate networks (BACON), a network architecture with an analytical Fourier spectrum. BACON has constrained behavior at unsupervised points, can be designed based on the spectral characteristics of the represented signal, and can represent signals at multiple scales without per-scale supervision. We demonstrate BACON for multiscale neural representation of images, radiance fields, and 3D scenes using signed distance functions and show that it outperforms conventional single-scale coordinate networks in terms of interpretability and quality. RELATED PAPERS📚SIREN📚Multiplicative Filter Networks (MFN)📚Mip-Nerf📚Followup work: Residual MFNLINKS AND RESOURCES💻Project website📚 Paper💻Code🎥VideoTo stay up to date with David's latest research, follow him on:👨🏻‍🎓Personal Page🐦Twitter👨🏻‍🎓Google Scholar👨🏻‍🎓LinkedInRecorded on June 15th 2022.CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comSUBSCRI🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
41:13 8/9/22
Lipschitz MLP - Hsueh-Ti Derek Liu
In this episode of the Talking Papers Podcast, I hosted Hsueh-Ti Derek Liu to chat about his paper "Learning Smooth Neural Functions via Lipschitz Regularization”, published in SIGGRAPH 2022. In this paper, they took on the unique task of enforcing smoothness on Neural Fields (modelled as a neural network). They do this by introducing a regularization term that forces the Lipschitz constant of the network to be very small. They show the performance of their method on shape interpolation, extrapolation and partial shape reconstruction from 3D point clouds. I mostly like the fact that it is implemented in only 4 lines of code. Derek will soon complete his PhD at the University of Toronto and will start a research scientist position at Roblox Research. This work was done when he was interning at NVIDIA. During our chat, I had the pleasure to discover that Derek is one of the few humans on the plant that has the ability to take a complicated idea and explain it in a simple and easy-to-follow way. His strong background in geometry processing makes this paper, which is well within the learning domain, very unique in the current paper landscape. It was a pleasure recording this episode with him. AUTHORSHsueh-Ti Derek Liu, Francis Williams, Alec Jacobson, Sanja Fidler, Or LitanyABSTRACTNeural implicit fields have recently emerged as a useful representation for 3D shapes. These fields are commonly represented as neural networks which map latent descriptors and 3D coordinates to implicit function values. The latent descriptor of a neural field acts as a deformation handle for the 3D shape it represents. Thus, smoothness with respect to this descriptor is paramount for performing shape-editing operations. In this work, we introduce a novel regularization designed to encourage smooth latent spaces in neural fields by penalizing the upper bound on the field's Lipschitz constant. Compared with prior Lipschitz regularized networks, ours is computationally fast, can be implemented in four lines of code, and requires minimal hyperparameter tuning for geometric applications. We demonstrate the effectiveness of our approach on shape interpolation and extrapolation as well as partial shape reconstruction from 3D point clouds, showing both qualitative and quantitative improvements over existing state-of-the-art and non-regularized baselines.RELATED PAPERS📚DeepSDF📚Neural Fields (collection of works)📚Sorting Out Lipschitz Function ApproximationLINKS AND RESOURCES💻Project website📚 Paper💻CodeTo stay up to date with Derek's latest research, follow him on:👨🏻‍🎓Personal Page🐦Twitter👨🏻‍🎓Google ScholarRecorded on May 30th 2022.CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
35:59 7/19/22
DiGS - Chamin Hewa Koneputugodage
In this episode of the Talking Papers Podcast, I hosted Chamin Hewa Koneputugodage to chat about OUR paper "DiGS: Divergence guided shape implicit neural representation for unoriented point clouds”, published in CVPR 2022. In this paper, we took on the task of surface reconstruction using a novel divergence-guided approach.  Unlike previous methods, we do not use normal vectors for supervision. To compensate for that, we add a divergence minimization loss as a regularize to get a coarse shape and then anneal it as training progresses to get finer detail. Additionally, we propose two new geometric initialization for SIREN-based networks that enable learning shape spaces.  PAPER TITLE "DiGS: Divergence guided shape implicit neural representation for unoriented point clouds"  AUTHORSYizhak Ben-Shabat, Chamin Hewa Koneputugodage, Stephen GouldABSTRACTShape implicit neural representations (INR) have recently shown to be effective in shape analysis and reconstruction tasks. Existing INRs require point coordinates to learn the implicit level sets of the shape. When a normal vector is available for each point, a higher fidelity representation can be learned, however normal vectors are often not provided as raw data. Furthermore, the method's initialization has been shown to play a crucial role for surface reconstruction. In this paper, we propose a divergence guided shape representation learning approach that does not require normal vectors as input. We show that incorporating a soft constraint on the divergence of the distance function favours smooth solutions that reliably orients gradients to match the unknown normal at each point, in some cases even better than approaches that use ground truth normal vectors directly. Additionally, we introduce a novel geometric initialization method for sinusoidal INRs that further improves convergence to the desired solution. We evaluate the effectiveness of our approach on the task of surface reconstruction and shape space learning and show SOTA performance compared to other unoriented methods.RELATED PAPERS📚 DeepSDF  📚 SIREN LINKS AND RESOURCES💻 Project Page 💻 Code  🎥 5 min videoTo stay up to date with Chamin's latest research, follow him on:🐦 Twitter 👨🏻‍🎓LinkedInRecorded on April 1st 2022.CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comSUBSCRIBE AND FOLLOW🎧Subscribe on your favourite podcast app📧Subscribe to our mailing list🐦Follow us on Twitter🎥YouTube Channel#talkingpapers #CVPR2022 #DiGS #NeuralImplicitRepresentation #SurfaceReconstruction #ShapeSpace #3DVision #ComputerVision #AI #DeepLearning #MachineLearning  #deeplearning #A🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
40:32 6/14/22
Dejan Azinović - Neural RGBD Surface Reconstruction
 In this episode of the Talking Papers Podcast, I hosted Dejan Azinović to chat about his paper "Neural RGB-D Surface Reconstruction”, published in CVPR 2022. In this paper, they take on the task of RGBD surface reconstruction by using novel view synthesis.  They incorporate depth measurements into the radiance field formulation by learning a neural network that stores a truncated signed distance field. This formulation is particularly useful in regions where depth is missing and the color information can help fill in the gaps. PAPER TITLE "Neural RGB-D Surface Reconstruction"  AUTHORSDejan Azinović Ricardo Martin-Brualla Dan B Goldman Matthias Nießner Justus ThiesABSTRACTIn this work, we explore how to leverage the success of implicit novel view synthesis methods for surface reconstruction. Methods which learn a neural radiance field have shown amazing image synthesis results, but the underlying geometry representation is only a coarse approximation of the real geometry. We demonstrate how depth measurements can be incorporated into the radiance field formulation to produce more detailed and complete reconstruction results than using methods based on either color or depth data alone. In contrast to a density field as the underlying geometry representation, we propose to learn a deep neural network which stores a truncated signed distance field. Using this representation, we show that one can still leverage differentiable volume rendering to estimate color values of the observed images during training to compute a reconstruction loss. This is beneficial for learning the signed distance field in regions with missing depth measurements. Furthermore, we correct for misalignment errors of the camera, improving the overall reconstruction quality. In several experiments, we show-cast our method and compare to existing works on classical RGB-D fusion and learned representations.RELATED PAPERS📚 NeRF 📚 BundleFusion LINKS AND RESOURCES💻 Project Page  💻 Code  To stay up to date with Dejan's latest research, follow him on:👨🏻‍🎓 Dejan's personal page🎓 Google Scholar🐦 Twitter👨🏻‍🎓LinkedIn:Recorded on April 4th 2022.CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comSUBSCRIBE AND FOLLOW🎧Subscribe on your favourite podcast app📧Subscribe to our mailing list 🐦Follow us on Twitter 🎥YouTube Channel#talkingpapers #CVPR2022 #NeuralRGBDSurfaceReconstruction #SurfaceReconstruction #NeRF  #3DVision #ComputerVision #AI #DeepLearning #MachineLearning  #deeplearning #AI #neuralnetworks #research  #artificialintelligence🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
31:15 5/6/22
Yuliang Xiu - ICON
In this episode of the Talking Papers Podcast, I hosted Yuliang Xiu to chat about his paper "ICON: Implicit Clothed humans Obtained from Normals”, published in CVPR 2022. SMPL(-X) body model to infer clothed humans (conditioned on the normals).  Additionally, they propose an inference-time feedback loop that alternates between refining the body's normals and the shape. PAPER TITLE "ICON: Implicit Clothed humans Obtained from Normals"  https://bit.ly/3uXe6YwAUTHORSYuliang Xiu, Jinlong Yang, Dimitrios Tzionas, Michael J. BlackABSTRACTCurrent methods for learning realistic and animatable 3D clothed avatars need either posed 3D scans or 2D images with carefully controlled user poses. In contrast, our goal is to learn an avatar from only 2D images of people in unconstrained poses. Given a set of images, our method estimates a detailed 3D surface from each image and then combines these into an animatable avatar. Implicit functions are well suited to the first task, as they can capture details like hair and clothes. Current methods, however, are not robust to varied human poses and often produce 3D surfaces with broken or disembodied limbs, missing details, or non-human shapes. The problem is that these methods use global feature encoders that are sensitive to global pose. To address this, we propose ICON ("Implicit Clothed humans Obtained from Normals"), which, instead, uses local features. ICON has two main modules, both of which exploit the SMPL(-X) body model. First, ICON infers detailed clothed-human normals (front/back) conditioned on the SMPL(-X) normals. Second, a visibility-aware implicit surface regressor produces an iso-surface of a human occupancy field. Importantly, at inference time, a feedback loop alternates between refining the SMPL(-X) mesh using the inferred clothed normals and then refining the normals. Given multiple reconstructed frames of a subject in varied poses, we use SCANimate to produce an animatable avatar from them. Evaluation on the AGORA and CAPE datasets shows that ICON outperforms the state of the art in reconstruction, even with heavily limited training data. Additionally, it is much more robust to out-of-distribution samples, e.g., in-the-wild poses/images and out-of-frame cropping. ICON takes a step towards robust 3D clothed human reconstruction from in-the-wild images. This enables creating avatars directly from video with personalized and natural pose-dependent cloth deformation.RELATED PAPERS📚 Monocular Real-Time Volumetric Performance Capture https://bit.ly/3L2S4JF📚 PIFu https://bit.ly/3rBsrYN📚 PIFuHD https://bit.ly/3rymDiELINKS AND RESOURCES💻 Project Page https://icon.is.tue.mpg.de/💻 Code  https://github.com/yuliangxiu/ICONTo stay up to date with Yulian'gs latest research, follow him on:👨🏻‍🎓 Yuliang's personal page:  https://bit.ly/3jQb16n🎓 Google Scholar:  https://bit.ly/3JW25ae🐦 Twitter:  https://twitter.com/yuliangxiu👨🏻‍🎓LinkedIn: https://www.linkedin.com/in/yuliangxiu/Recorded on March11th 2022.CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comSUBSCRIBE AND FOLLOW🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikb...📧Subscribe to our mailing list: http://eepurl.com/hRznqb🐦Follow us on Twitter: https://twitter.com/talking_papers🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
36:01 4/19/22
Itai Lang - SampleNet
In this episode of the Talking Papers Podcast, I hosted Itai Lang to chat about his paper "SampleNet: Differentiable Point Cloud Sampling”, published in CVPR 2020. In this paper, they propose a point soft-projection to allow differentiating through the sampling operation and enable learning task-specific point sampling. Combined with their regularization and task-specific losses, they can reduce the number of points to 3% of the original samples with a very low impact on task performance. I met Itai for the first time at CVPR 2019.  Being a point-cloud guy myself, I have been following his research work ever since. It is amazing how much progress he has made and I can't wait to see what he comes up with next. It was a pleasure hosting him in the podcast. PAPER TITLE "SampleNet: Differentiable Point Cloud Sampling"  https://bit.ly/3wMFwllAUTHORSItai Lang, Asaf Manor, Shai AvidanABSTRACTand offered a workaround instead. We introduce a novel differentiable relaxation for point cloud sampling that approximates sampled points as a mixture of points in the primary input cloud. Our approximation scheme leads to consistently good results on classification and geometry reconstruction applications. We also show that the proposed sampling method can be used as a front to a point cloud registration network. This is a challenging task since sampling must be consistent across two different point clouds for a shared downstream task. In all cases, our approach outperforms existing non-learned and learned sampling alternatives. Our code is publicly available.RELATED PAPERS📚 Learning to Sample https://bit.ly/3vd1FZd📚 Farthest Point Sampling (FPS)  https://bit.ly/3Lkcyx9LINKS AND RESOURCES💻 Code  https://bit.ly/3NoS0pbTo stay up to date with Itai's latest research, follow him on:🎓 Google Scholar: https://bit.ly/3wCMY2u🐦 Twitter: https://twitter.com/ItaiLangRecorded on February 15th 2022.CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comSUBSCRIBE AND FOLLOW🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikb...📧Subscribe to our mailing list: http://eepurl.com/hRznqb🐦Follow us on Twitter: https://twitter.com/talking_papers🎥YouTube Channel: https://bit.ly/3eQOgwPThis episode was recorded on February 11 2022.#talkingpapers #SampleNet #LearnToSample #CVPR2020 #3DVision #ComputerVision #AI #DeepLearning #MachineLearning  #deeplearning #AI #neuralnetworks #research  #artificialintelligence🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
37:51 3/28/22
Manuel Dahnert - Panoptic 3D Scene Reconstruction
In this episode of the Talking Papers Podcast, I hosted Manuel Dahnert to chat about his paper “Panoptic 3D Scene Reconstruction From a Single RGB Image”, published in NeurIPS 2021.  In this paper, they unify the task of reconstruction, semantic segmentation and instance segmentation in 3D from a single RGB image. They propose a holistic approach to lift the 2D features into a 3D grid.  Manuel is a good friend and colleague. We first met in my research visit at TUM during my PhD, we spent some long evenings together at the office. We have both come a long way since then and I am really looking forward to seeing what he will cook up next. I have a feeling it is not his last visit in the podcast.PAPER TITLE "Panoptic 3D Scene Reconstruction From a Single RGB Image" : https://bit.ly/3phnLGpAUTHORSManuel Dahnert, Ji Hou, Matthias Niessner, Angela DaiABSTRACTIn recent years, neural implicit representations gained popularity in 3D reconstruction due to their expressiveness and flexibility. However, the implicit nature of neural implicit representations results in Richly segmented 3D scene reconstructions are an integral basis for many high-level scene understanding tasks, such as for robotics, motion planning, or augmented reality. Existing works in 3D perception from a single RGB image tend to focus on geometric reconstruction only, or geometric reconstruction with semantic segmentation or instance segmentation. Inspired by 2D panoptic segmentation, we propose to unify the tasks of geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation into the task of panoptic 3D scene reconstruction -- from a single RGB image, predicting the complete geometric reconstruction of the scene in the camera frustum of the image, along with semantic and instance segmentations. We propose a new approach for holistic 3D scene understanding from a single RGB image which learns to lift and propagate 2D features from an input image to a 3D volumetric scene representation. Our panoptic 3D reconstruction metric evaluates both geometric reconstruction quality as well as panoptic segmentation. Our experiments demonstrate that our approach for panoptic 3D scene reconstruction outperforms alternative approaches for this taskRELATED PAPERS📚 Panoptic Segmentation: https://bit.ly/3vd1FZd📚MeshCNN: https://bit.ly/3M2lWH6📚Total3DUnderstanding: https://bit.ly/36yH9bfLINKS AND RESOURCES💻 Project Page: https://bit.ly/3JT2Dy1💻 CODE: https://github.com/xheon/panoptic-reconstruction🤐Paper's peer review: https://bit.ly/3Cij44tTo stay up to date with Manuel's latest research, check out his personal page and follow him on: 👨‍🎓Google Scholar: https://scholar.google.com/citations?user=eNypkO0AAAAJ🐦Twitter: https://twitter.com/manuel_dahnertCONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comSUBSCRIBE AND FOLLOW🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikb...📧Subscribe to our mailing list: http://eepurl.com/hRznqb🐦Follow us on Twitter: https://twitter.com/talking_papers🎥YouTube Channel: https://bit.ly/3eQOgwP🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
26:17 3/7/22
Songyou Peng - Shape As Points
In this episode of the Talking Papers Podcast, I hosted Songyou Peng to chat about his paper “Shape As Points: A Differentiable Poisson Solver”, published in NeurIPS 2021. In this paper, they take on the task of surface reconstruction and propose a hybrid representation that unifies explicit and implicit representation in addition to a differentiable solver for the classic Poisson surface reconstruction. I have been following Songyou's work for a while and was very surprised to discover that he is just about midway through his PhD (with so many good papers, I thought he is about to finish!). We first met online at the ICCV 2021 workshop on "Learning 3D Representations for Shape and Appearance" and I immediately flagged him as one of the next guests on the podcast.It was a pleasure recording this episode with him.AUTHORSSongyou Peng, Chiyu Jiang, Yiyi Liao, Michael Niemeyer, Marc Pollefeys, Andreas GeigerABSTRACTIn recent years, neural implicit representations gained popularity in 3D reconstruction due to their expressiveness and flexibility. However, the implicit nature of neural implicit representations results in slow inference time and requires careful initialization. In this paper, we revisit the classic yet ubiquitous point cloud representation and introduce a differentiable point-to-mesh layer using a differentiable formulation of Poisson Surface Reconstruction (PSR) that allows for a GPU-accelerated fast solution of the indicator function given an oriented point cloud. The differentiable PSR layer allows us to efficiently and differentiably bridge the explicit 3D point representation with the 3D mesh via the implicit indicator field, enabling end-to-end optimization of surface reconstruction metrics such as Chamfer distance. This duality between points and meshes hence allows us to represent shapes as oriented point clouds, which are explicit, lightweight and expressive. Compared to neural implicit representations, our Shape-As-Points (SAP) model is more interpretable, lightweight, and accelerates inference time by one order of magnitude. Compared to other explicit representations such as points, patches, and meshes, SAP produces topology-agnostic, watertight manifold surfaces. We demonstrate the effectiveness of SAP on the task of surface reconstruction from unoriented point clouds and learning-based reconstruction. RELATED PAPERS📚 Poisson Surface Reconstruction📚 Occupancy Networks📚 Convolutional Occupancy Networks LINKS AND RESOURCES💻 Project Page: https://pengsongyou.github.io/sap💻 CODE: https://github.com/autonomousvision/shape_as_points📚 Paper🤐Paper's peer reviewTo stay up to date with Songyou's latest research, check out his personal page and follow him on: 👨‍🎓 Google Scholar 🐦Twitter👨‍🎓LinkedInCONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via 🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
34:34 2/24/22
Yicong Hong - VLN BERT
PAPER TITLE:"VLN BERT:  A Recurrent Vision-and-Language BERT for Navigation" AUTHORS:  Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen GouldABSTRACT:Accuracy of many visiolinguistic tasks has benefited significantly from the application of vision-and-language (V&L) BERT. However, its application for the task of vision and-language navigation (VLN) remains limited. One reason for this is the difficulty adapting the BERT architecture to the partially observable Markov decision process present in VLN, requiring history-dependent attention and decision making. In this paper we propose a recurrent BERT model that is time-aware for use in VLN. Specifically, we equip the BERT model with a recurrent function that maintains cross-modal state information for the agent. Through extensive experiments on R2R and REVERIE we demonstrate that our model can replace more complex encoder-decoder models to achieve state-of-the-art results. Moreover, our approach can be generalised to other transformer-based architectures, supports pre-training, and is capable of solving navigation and referring expression tasks simultaneously.CODE: 💻  https://github.com/YicongHong/Recurrent-VLN-BERTLINKS AND RESOURCES👱Yicong's pageRELATED PAPERS:📚 Attention is All You Need📚 Towards learning a generic agent for vision-and-language navigation via pre-trainingCONTACT:-----------------If you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comThis episode was recorded on April, 16th 2021.SUBSCRIBE AND FOLLOW:🎧Subscribe on your favourite podcast app:  https://talking.papers.podcast.itzikbs.com📧Subscribe to our mailing list: http://eepurl.com/hRznqb🐦Follow us on Twitter: https://twitter.com/talking_papers🎥YouTube Channel: https://bit.ly/3eQOgwP#talkingpapers #CVPR2021 #VLNBERT#VLN #VisionAndLanguageNavigation #VisionAndLanguage #machinelearning #deeplearning #AI #neuralnetworks #research #computervision #artificialintelligence 🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
22:57 2/17/22
Despoina Paschalidou - Neural Parts
PAPER TITLE Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural NetworksAUTHORSDespoina Paschalidou , Angelos Katharopoulos, Andreas Geiger, Sanja FidlerABSTRACTImpressive progress in 3D shape extraction led to representations that can capture object geometries with high fidelity. In parallel, primitive-based methods seek to represent objects as semantically consistent part arrangements. However, due to the simplicity of existing primitive representations, these methods fail to accurately reconstruct 3D shapes using a small number of primitives/parts. We address the trade-off between reconstruction quality and number of parts with Neural Parts, a novel 3D primitive representation that defines primitives using an Invertible Neural Network (INN) which implements homeomorphic mappings between a sphere and the target object. The INN allows us to compute the inverse mapping of the homomorphism, which in turn, enables the efficient computation of both the implicit surface function of a primitive and its mesh, without any additional post-processing. Our model learns to parse 3D objects into semantically consistent part arrangements without any part-level supervision. Evaluations on ShapeNet, D-FAUST and FreiHAND demonstrate that our primitives can capture complex geometries and thus simultaneously achieve geometrically accurate as well as interpretable reconstructions using an order of magnitude fewer primitives than state-of-the-art shape abstraction methods.RELATED PAPERS📚 "KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control"📚 "Learning Shape Abstractions by Assembling Volumetric Primitives": Volumetric primitives"📚 "Superquadrics Revisited: Learning 3D Shape Parsing beyond Cuboids"📚 "CvxNet: Learnable Convex Decomposition"📚 "Neural Star Domain as Primitive Representation"LINKS AND RESOURCES💻 Project Page: https://paschalidoud.github.io/neural_parts💻 CODE: https://github.com/paschalidoud/neural_parts💻Blog Post CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comSUBSCRIBE AND FOLLOW 🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
40:43 2/10/22