Clarkson Computer Science PhD Student Wins Best Paper Award at the 19th International Conference on Signal Processing and Multimedia Applications for Research on the Use of Video Motion Vectors for Structure from 3D Reconstruction moving

Clarkson Computer Science PhD student Richard Turner’s research on the use of video motion vectors for 3D reconstruction of structure from motion received the Best Paper Award at the 19th International Conference on Signal Processing and Multimedia Applications (SIGMAP 2022) held in Lisbon, Portugal. Richard is jointly advised by Dr. Sean Banerjee and Dr. Natasha Banerjee, Associate Professors of Computer Science. SIGMAP brings together international scholars working on the theory and practice related to the representation, storage, authentication and communication of multimedia information from images, videos and audio data, as well as the sources emerging from multimodal data such as text, social media and healthcare. Richard’s work plays a vital role in reducing the computational resources needed to reconstruct 3D scenes using structure from motion (SfM). SfM is a computationally intensive process, usually performed offline with extremely powerful computing devices. 3D scene reconstruction involves the use of advanced computer vision techniques to create three-dimensional models from a series of two-dimensional images. 3D scene reconstruction is an important pipeline to provide robots and autonomous vehicles with the ability to navigate their environments. The research community is increasingly interested in performing SfM using low-power devices typically found in unmanned aerial and ground vehicles as well as autonomous vehicles.

Richard’s approach exploits video motion vectors used for video compression. H.264 video compression has become the predominant choice for devices that require live video streaming and include cell phones, laptops, and Micro Air Vehicles (MAVs). H.264 uses motion estimation to predict the distance of pixels, grouped together as macroblocks, between two or more video frames. Live video compression using H.264 is ideal because each frame contains much of the information found in previous and future frames. By estimating the motion vector of each macroblock for each frame, significant compression can be achieved. Richard’s approach provides a near real-time feature detection and matching algorithm for SfM reconstruction using the motion estimation properties found in H.264 video compression encoders. Validation of Richard’s approach was performed with video taken from a flying MAV in an urban environment.

As a Principal Software Engineer at Northrup Grumman, Richard works on next-generation challenges in space. His research focuses on real-time 3D mapping of environments for autonomous navigation. Richard’s work has wider impact in advancing technologies to perform scene reconstruction using low-resource devices found in unmanned aerial and ground vehicles for use in challenging scenarios such as disasters. or mass casualty events. Richard is currently leading an interdisciplinary team of graduate and undergraduate students to deploy his algorithm on unmanned aerial vehicles designed by the team.

Richard is a member of the Terascale All-sensing Research Studio (TARS) at Clarkson University. TARS supports the research of 15 graduate students and nearly 20 undergraduate students each semester. TARS has one of the largest high-performance computing facilities at Clarkson, with over 275,000 CUDA cores and over 4,800 Tensor cores spread across over 50 GPUs, and 1 petabyte of storage (almost full!). TARS is home to the Gazebo, a massively dense multi-modal multi-viewpoint motion capture facility for imaging multi-person interactions containing 192 high-speed 226FPS cameras, 16 Microsoft Azure Kinect RGB-D sensors, 12 Sierra Olympic Viento thermal cameras -G, and 16 surface electromyography (sEMG) sensors, and the Cube, a one- and two-person 3D imaging facility containing 4 high-speed cameras, 4 RGB-D sensors and 5 thermal cameras. TARS is researching the use of deep learning to glean an understanding of natural multi-person interactions from massive datasets, to enable next-generation technologies, for example, intelligent agents and robots, to integrate seamlessly into future human environments.

Comments are closed.