Research

Event-based Dual Photography for Transparent Scene Reconstruction

Light transport contains all light information between the light source and the image sensor. As an important application of light transport, dual photography has been a popular research topic, but it is challenged by long acquisition time, low signal-noise ratio, storage and computation of a large number of measurements. In this paper, we propose a novel hardware setup that combines a flying-spot MEMS modulated projector with an event camera to implement dual photography for 3D scanning in both line-of-sight (LoS) and non-line-of-sight (NLoS) scenes with a transparent object. In particular using event light transport, depth extraction from the LoS scenes and 3D reconstruction of the object in a NLoS scene have been achieved.

SaccadeCam: Adaptive Visual Attention for Monocular Depth Sensing

Most monocular depth sensing methods use conventionally captured images that are created without considering scene content. In contrast, animal eyes have fast mechanical motions, called saccades, that control how the scene is imaged by the fovea, where resolution is highest. In this paper, we present the SaccadeCam framework for adaptively distributing resolution onto regions of interest in the scene. Our algorithm for adaptive resolution is a self-supervised network and we demonstrate results for end-to-end learning for monocular depth estimation. We also show preliminary results with a real SaccadeCam hardware prototype.

Dense Lissajous Sampling and Interpolation for Dynamic Light-Transport

Light-transport represents the complex interactions of light in a scene. Fast, compressed, and accurate light-transport capture for dynamic scenes is an open challenge in vision and graphics. In this paper, we integrate the classical idea of Lissajous sampling with novel control strategies for dynamic light-transport applications such as relighting water drops and seeing around corners. In particular, this paper introduces an improved Lissajous projector hardware design and discusses calibration and capture for a microelectromechanical (MEMS) mirror-based projector. Further, we show progress towards speeding up the hardware-based Lissajous subsampling for dual light transport frames, and investigate interpolation algorithms for recovering back the missing data. Our captured dynamic light transport results show complex light scattering effects for dense angular sampling, and we also show dual non-line-of-sight (NLoS) capture of dynamic scenes. This work is the first step towards adaptive Lissajous control for dynamic light-transport. Please see accompanying video for all the results.

Foveating Cameras

Most cameras today photograph their entire visual field. In contrast, decades of active vision research have proposed foveating camera designs, which allow for selective scene viewing. However, active vision's impact is limited by slow options for mechanical camera movement. We propose a new design, called FoveaCam, and which works by capturing reflections off a tiny, fast moving mirror. FoveaCams can obtain high resolution imagery on multiple regions of interest, even if these are at different depths and viewing directions. We first discuss our prototype and optical calibration strategies. We then outline a control algorithm for the mirror to track target pairs. Finally, we demonstrate a practical application of the full system to enable eye tracking at a distance for frontal faces.

Towards a MEMS-based Adaptive LIDAR

Most active depth sensors sample their visual field using a fixed pattern, decided by accuracy, speed and cost trade-offs, rather than scene content. However, a number of recent works have demonstrated that adapting measurement patterns to scene content can offer significantly better trade-offs. We propose a hardware LIDAR design that allows flexible real-time measurements according to dynamically specified measurement patterns. Our flexible depth sensor design consists of a controllable scanning LIDAR that can foveate, or increase resolution in regions of interest, and that can fully leverage the power of adaptive depth sensing. We describe our optical setup and calibration, which enables fast sparse depth measurements using a scanning MEMS (micro-electro mechanical) mirror. We validate the efficacy of our prototype LIDAR design by testing on over 75 static and dynamic scenes spanning a range of environments. We also show CNN-based depth-map completion from measurements obtained by our sensor. Our experiments show that our sensor can realize adaptive depth sensing systems.

Flying-Dot Photography

The light transport captures a scene’s visual complexity. Acquiring light transport for dynamic scenes is difficult, since any change in viewpoint, materials, illumination or geometry also varies the transport. One strategy to capture dynamic light transport is to use a fast “flying-dot” projector; i.e., where an impulse light-probe is quickly scanned across the scene. We have built a novel fast flying-dot projector prototype using a high speed camera and a scanning MEMS (Micro-electro- mechanical system) mirror. Our contributions are calibration strategies that enable dynamic light transport acquisition at near video rates with such a system. We develop new methods for overcoming the effects of MEMS mirror resonance. We utilize new algorithms for denoising impulse scanning at high frame rates and compare the trade-offs in visual quality between frame rate and illumination power. Finally, we show the utility of our calibrated setup by demonstrating graphics applications such as video relighting, direct/global separation, and dual videography for dynamic scenes such as fog, water, and glass.

Adaptive Depth Sensing

Recovering scene geometry is an important research problem in robotics. More recently time-of-flight (TOF) depth sensors have transformed robot perception as these sensors modulate scene illumination and extract depth from time-related features in the reflected radiance, such as phase change or temporal delays. Commercially available TOF sensors such as the Microsoft Kinect and the Velodyne Puck, have influenced fields such as autonomous cars, drone surveillance and wearable devices. Creating TOF sensors for personal drones, AR/AR glasses, IoT nodes and other miniature platforms would require transcending the energy constraints due to limited battery capacity. We demonstrate new efficiencies that are possible with angular control of a TOF sensor. We demonstrate this with a single LIDAR beam reflected off a microelectromechanical (MEMS) mirror. Our designs provide a new frame work to exploit directional control for depth sensing in an adaptive manner for applications relevant to small robotic platforms. We also present an optimized MEMS mirror design and use it to demonstrate applications in extreme wide-angle structured light.

Privacy Preserving Computational Cameras

Major advances in computer vision and the mobile revolution have set the stage for widespread deployment of connected devices with "always-on” cameras and powerful vision capabilities. While these advances have the potential to enable a wide range of novel applications and interfaces, privacy and security concerns surrounding the use of these technologies threaten to limit the range of places and devices where they are adopted. In this context, we see a resurgence in privacy preserving computer vision research, intended to help mitigate both societal and legal concerns. In this page, we highlight some of our recent work in this area including: (a) building novel computational cameras that remove sensitive data "prior to capture" via optical filtering and/or sensor-level electronics, (b) developing a framework to learn privacy preserving image encodings through adversarial optimization, and (c) performing the first systematic analysis of the privacy risks associated with working with 3D point clouds.

3D Vision and Radiological Sensor Fusion

Nuclear material trafficking is a threat to national security, and being able to detect and track people carrying nuclear material is vital to protecting our country's interests and our people. However, inexpensive radiation detectors have limitations: they are isotropic, unable to detect the direction of the incident radiation; and they are additive, unable to detect a difference between multiple sources of radiation. These limitations leave the inexpensive radiation detectors unfit for many security applications. We explore new ways to enhance the functionality of these inexpensive radiation detectors by fusing them with various 3D vision sensors. We have developed new methods of simultaneous vision-radiation sensor calibration, single source localization and tracking, multi-source localization and tracking, and tracking a source behind visual occlusions.

Wide-Angle MEMS Mirrors and Micro Vision Sensors

Achieving computer vision on micro-scale devices is a challenge. On such devices the mass and power constraints are so severe that even the most common computations are difficult. We introduce and analyze a class of micro-vision sensors and MEMS mirrors that enable a wide field-of-view within a small form. We utilize the “Snell window” effect to enlarge the scan angle of MEMS mirrors by submerging them into liquid whose refraction index is greater than in air, and show micro-vision sensors that reduce power requirements through template-based optical convolution.

Wide-angle Micro Vision Sensors

Achieving computer vision on micro-scale devices is a challenge. On such devices the mass and power constraints are so severe that even the most common computations are difficult. We introduce and analyze a class of micro-vision sensors that reduce power requirements through template-based optical convolution, and enable a wide field-of-view within a small form. In this paper we describe the trade-offs between the FOV, volume, and mass of these sensors and provide tools to navigate the design space. We also demonstrate milli-scale prototypes for computer vision tasks such as locating edges, tracking targets, and detecting faces.

Low-Power Structured Light

We develop miniature low-power structured light devices using commercially available MEMS mirrors and lasers. Without complex re-engineering, we show how to exploit the high-speed MEMS mirror motion and laser light-sources to solve a range of visual tasks including reconstruction of outdoor scenes in bright sunlight and extreme wide-angle scene reconstruction. Additionally, for each device, we explore design and fabrication trade-offs in terms of power, size, speed and stability.

Structured light for scattering media

Virtually all structured light methods assume that the scene and the sources are immersed in pure air and that light is neither scattered nor absorbed. Recently, however, structured lighting has found growing application in underwater and aerial imaging, where scattering effects cannot be ignored. In this project, we conduct a comprehensive analysis of two representative methods - light stripe range scanning and photometric stereo - in the presence of scattering. For both methods, we derive physical models for the appearances of a surface immersed in a scattering medium. Based on these models, we present results on (a) the condition for object detectability in light striping and (b) the number of sources required for photometric stereo. In both cases, we demonstrate that while traditional methods fail when scattering is significant, our methods accurately recover the scene (depths, normals, albedos) as well as the properties of the medium. These results are in turn used to restore the appearances of scenes as if they were captured in clear air. Although we have focused on light striping and photometric stereo, our approach can also be extended to other methods such as grid coding, gated and active polarization imaging.

Beyond Perspective Dual Photography with Illumination Masks

Scene appearance from the point of view of a light source is called a reciprocal or dual view. Since there exists a large diversity in illumination, these virtual views may be non-perspective and multi-viewpoint in nature. In this paper, we demonstrate the use of occluding masks to recover these dual views, which we term shadow cameras. We first show how to render a single reciprocal scene view by swapping the camera and light source positions. We then extend this technique for multiple views and build a virtual shadow camera array. We also capture non-perspective views such as orthographic, cross-slit and a pushbroom variant, while introducing novel applications such as converting between camera projections and removing catadioptric distortions. Finally, since a shadow camera is artificial, we can manipulate any of its intrinsic parameters, such as camera skew, to create perspective distortions.

Editing Stereoscopic content

A digital editor provides the timeline control necessary to tell a story through film. Current technology, although sophisticated, does not easily extend to 3D cinema because stereoscopy is a fundamentally different medium for expression and requires new tools. We formulated a mathematical framework for use in a viewer-centric digital editor for stereoscopic cinema driven by the audience's perception of the scene. Our editing tool implements this framework and allows both shot planning and after-the-fact digital manipulation of the perceived scene shape. The mathematical framework abstracts away the mechanics of converting this interaction into stereo parameters, such as interocular, field of view, and location. We demonstrate cut editing techniques to direct audience attention and ease scene transitions. User studies were performed to examine these effects.

Temporal dithering of illumination for fast active vision

Active vision techniques use programmable light sources, such as projectors, whose intensities can be controlled over space and time. We present a broad framework for fast active vision using Digital Light Processing (DLP) projectors. The digital micromirror array (DMD) in a DLP projector is capable of switching mirrors “on” and “off” at high speeds (106/s). An off-the-shelf DLP projector, however, effectively operates at much lower rates (30-60Hz) by emitting smaller intensities that are integrated over time by a sensor (eye or camera) to produce the desired brightness value. Our key idea is to exploit this “temporal dithering” of illumination, as observed by a high-speed camera. The dithering encodes each brightness value uniquely and may be used in conjunction with virtually any active vision technique. We apply our approach to five well-known problems: (a) structured light-based range finding, (b) photometric stereo, (c) illumination de-multiplexing, (d) high frequency preserving motion-blur and (e) separation of direct and global scene components, achieving significant speedups in performance. In all our methods, the projector receives a single image as input whereas the camera acquires a sequence of frames.

Illustrating motion through DLP photography

Strobe-light photography creates beautiful high-frequency effects by capturing multiple object copies. Single-chip DLP projectors produce a similar effect, with two important distinctions. Firstly, strobing occurs at different frequencies: at 10000Hz, due to the DMD chip, and at 120Hz, due to the colorwheel. Secondly, DLP illumination lacks the perception of ’on-off’ flashing that characterizes a strobe-light, since these frequencies are beyond human perception. Deblurring images taken under such strobe-like illumination is difficult, especially for articulated and deformable objects, since the deconvolution kernel can be different at each pixel. Instead we process DLP photographs to create new images that either summarize a dynamic scene or illustrate its motion. We conclude by discussing the frequencies present in DLP photographs, comparing them to images taken under skylight and fluorescent light.

Novel depth cues from uncalibrated near-field lighting

Distant lighting is widely assumed in computer vision. However, many scenes are illuminated by near light sources. An advantage of near lighting is that the intensity fall-off from the light source encodes scene depth. A drawback is that exact estimation of this depth requires the 3D position of the light source. In this paper, we analyze what kinds of depth cues are possible under uncalibrated near point lighting. A stationary scene is illuminated by a point source that is moved approximately along a line or in a plane. We observe the brightness profile at each pixel and demonstrate how to obtain three novel cues: plane-scene intersections, depth ordering and mirror symmetries. These cues are defined with respect to the line/plane in which the light source moves, and not the camera viewpoint. Plane-Scene Intersections are detected by finding those scene points that are closest to the light source path at some time instance. Depth Ordering for scenes with homogeneous BRDF is obtained by sorting pixels according to their shortest distances from a plane containing the light source. Mirror Symmetry pairs for scenes with homogeneous BRDFs are detected by reflecting scene points across a plane in which the light source moves. We show analytic results for Lambertian objects and demonstrate empirical evidence for a variety of other BRDFs.

Clustering appearance for scene analysis

We propose a new approach called “appearance clustering” for scene analysis. The key idea in this approach is that the scene points can be clustered according to their surface normals, even when the geometry, material and lighting are all unknown. We achieve this by analyzing a continuous image sequence of a scene as it is illuminated by a smoothly moving distant source. Each pixel thus gives rise to a “continuous appearance profile” that yields information about derivatives of the BRDF with respect to source direction. This information is directly related to the surface normal of the scene point when the source path follows an unstructured trajectory (obtained, say, by “hand-waving”). Based on this observation, we transform the appearance profiles and propose a metric that can be used with any unsupervised clustering algorithm to obtain iso-normal clusters. We successfully demonstrate appearance clustering for complex indoor and outdoor scenes. In addition, iso-normal clusters serve as excellent priors for scene geometry and can strongly impact any vision algorithm that attempts to estimate material, geometry and/or lighting properties in a scene from images. We demonstrate this impact for applications such as diffuse and specular separation, both calibrated and uncalibrated photometric stereo of non-lambertian scenes, light source estimation and texture transfer.