Temporal Modeling and Data Synthesis for Visual Understanding
報告人：Yang Xiaodong (QCraft)
Abstract: In this talk, I will present recent pieces of work on leveraging temporal information and synthetic data to facilitate video and image understanding. In the first part, I will introduce a progressive learning framework, Spatio-TEmporal Progressive (STEP) detector, for action detection in videos. STEP is able to more effectively make use of longer temporal information, and performs detection simply from a handful of initial proposals, while other methods rely on thousands of densely sampled anchors or an extra person detector. In the second part, I will talk about a joint discriminative and generative learning framework for person re-identification by end-to-end coupling re-id learning and image synthesis in a unified network called DG-Net. There exists an online interactive loop between the discriminative and generative modules to let the two tasks mutually benefit. An extension of DG-Net will be further presented to generalize re-id models to new domains under the unsupervised cross-domain setting. By jointly disentangling id-related/unrelated factors and selectively performs adaptation on the id-related feature space, our approach consistently brings substantial performance gains.
Biography: Xiaodong Yang (https://xiaodongyang.org) recently joined QCraft, an early-stage self-driving startup, to build and lead a perception and learning team for autonomous driving. Before that, he was a Senior Research Scientist at NVIDIA Research. His research interests are computer vision and machine learning. He has been working on large-scale image and video understanding, human activity and hand gesture recognition, dynamic facial analytics, target re-identification, deep generative models, multimedia search, 3D perception, etc. He received the B.Eng. degree from Huazhong University of Science and Technology in 2009, and the Ph.D. degree from City University of New York in 2015. He is a recipient of the best paper award from Journal of Visual Communication and Image Representation in 2015. He and his collaborators won the first place in the optical flow competition of Robust Vision Challenge at CVPR 2018. He co-organized tutorials and workshops at GTC 2019, CVPR 2019 and CVPR 2020.