Recovering scene geometry is an important research problem in robotics. More recently time-of-flight (TOF) depth sensors have transformed robot perception as these sensors modulate scene illumination and extract depth from time-related features in the reflected radiance, such as phase change or temporal delays. Commercially available TOF sensors such as the Microsoft Kinect and the Velodyne Puck, have influenced fields such as autonomous cars, drone surveillance and wearable devices. Creating TOF sensors for personal drones, AR/AR glasses, IoT nodes and other miniature platforms would require transcending the energy constraints due to limited battery capacity. We demonstrate new efficiencies that are possible with angular control of a TOF sensor. We demonstrate this with a single LIDAR beam reflected off a microelectromechanical (MEMS) mirror. Our designs provide a new frame work to exploit directional control for depth sensing in an adaptive manner for applications relevant to small robotic platforms. We also present an optimized MEMS mirror design and use it to demonstrate applications in extreme wide-angle structured light.
Major advances in computer vision and the mobile revolution have set the stage for widespread deployment of connected devices with "always-on” cameras and powerful vision capabilities. While these advances have the potential to enable a wide range of novel applications and interfaces, privacy and security concerns surrounding the use of these technologies threaten to limit the range of places and devices where they are adopted. In this context, we see a resurgence in privacy preserving computer vision research, intended to help mitigate both societal and legal concerns. In this page, we highlight some of our recent work in this area including: (a) building novel computational cameras that remove sensitive data "prior to capture" via optical filtering and/or sensor-level electronics, (b) developing a framework to learn privacy preserving image encodings through adversarial optimization, and (c) performing the first systematic analysis of the privacy risks associated with working with 3D point clouds.
Nuclear material trafficking is a threat to national security, and being able to detect and track people carrying nuclear material is vital to protecting our country's interests and our people. However, inexpensive radiation detectors have limitations: they are isotropic, unable to detect the direction of the incident radiation; and they are additive, unable to detect a difference between multiple sources of radiation. These limitations leave the inexpensive radiation detectors unfit for many security applications. We explore new ways to enhance the functionality of these inexpensive radiation detectors by fusing them with various 3D vision sensors. We have developed new methods of simultaneous vision-radiation sensor calibration, single source localization and tracking, multi-source localization and tracking, and tracking a source behind visual occlusions.
Achieving computer vision on micro-scale devices is a challenge. On such devices the mass and power constraints are so severe that even the most common computations are difficult. We introduce and analyze a class of micro-vision sensors and MEMS mirrors that enable a wide field-of-view within a small form. We utilize the “Snell window” effect to enlarge the scan angle of MEMS mirrors by submerging them into liquid whose refraction index is greater than in air, and show micro-vision sensors that reduce power requirements through template-based optical convolution.
We develop miniature low-power structured light devices using commercially available MEMS mirrors and lasers. Without complex re-engineering, we show how to exploit the high-speed MEMS mirror motion and laser light-sources to solve a range of visual tasks including reconstruction of outdoor scenes in bright sunlight and extreme wide-angle scene reconstruction. Additionally, for each device, we explore design and fabrication trade-offs in terms of power, size, speed and stability.
Virtually all structured light methods assume that the scene and the sources are immersed in pure air and that light is neither scattered nor absorbed. Recently, however, structured lighting has found growing application in underwater and aerial imaging, where scattering effects cannot be ignored. In this project, we conduct a comprehensive analysis of two representative methods - light stripe range scanning and photometric stereo - in the presence of scattering. For both methods, we derive physical models for the appearances of a surface immersed in a scattering medium. Based on these models, we present results on (a) the condition for object detectability in light striping and (b) the number of sources required for photometric stereo. In both cases, we demonstrate that while traditional methods fail when scattering is significant, our methods accurately recover the scene (depths, normals, albedos) as well as the properties of the medium. These results are in turn used to restore the appearances of scenes as if they were captured in clear air. Although we have focused on light striping and photometric stereo, our approach can also be extended to other methods such as grid coding, gated and active polarization imaging.
Scene appearance from the point of view of a light source is called a reciprocal or dual view. Since there exists a large diversity in illumination, these virtual views may be non-perspective and multi-viewpoint in nature. In this paper, we demonstrate the use of occluding masks to recover these dual views, which we term shadow cameras. We first show how to render a single reciprocal scene view by swapping the camera and light source positions. We then extend this technique for multiple views and build a virtual shadow camera array. We also capture non-perspective views such as orthographic, cross-slit and a pushbroom variant, while introducing novel applications such as converting between camera projections and removing catadioptric distortions. Finally, since a shadow camera is artificial, we can manipulate any of its intrinsic parameters, such as camera skew, to create perspective distortions.
A digital editor provides the timeline control necessary to tell a story through film. Current technology, although sophisticated, does not easily extend to 3D cinema because stereoscopy is a fundamentally different medium for expression and requires new tools. We formulated a mathematical framework for use in a viewer-centric digital editor for stereoscopic cinema driven by the audience's perception of the scene. Our editing tool implements this framework and allows both shot planning and after-the-fact digital manipulation of the perceived scene shape. The mathematical framework abstracts away the mechanics of converting this interaction into stereo parameters, such as interocular, field of view, and location. We demonstrate cut editing techniques to direct audience attention and ease scene transitions. User studies were performed to examine these effects.
Active vision techniques use programmable light sources, such as projectors, whose intensities can be controlled over space and time. We present a broad framework for fast active vision using Digital Light Processing (DLP) projectors. The digital micromirror array (DMD) in a DLP projector is capable of switching mirrors “on” and “off” at high speeds (106/s). An off-the-shelf DLP projector, however, effectively operates at much lower rates (30-60Hz) by emitting smaller intensities that are integrated over time by a sensor (eye or camera) to produce the desired brightness value. Our key idea is to exploit this “temporal dithering” of illumination, as observed by a high-speed camera. The dithering encodes each brightness value uniquely and may be used in conjunction with virtually any active vision technique. We apply our approach to five well-known problems: (a) structured light-based range finding, (b) photometric stereo, (c) illumination de-multiplexing, (d) high frequency preserving motion-blur and (e) separation of direct and global scene components, achieving significant speedups in performance. In all our methods, the projector receives a single image as input whereas the camera acquires a sequence of frames.
Strobe-light photography creates beautiful high-frequency effects by capturing multiple object copies. Single-chip DLP projectors produce a similar effect, with two important distinctions. Firstly, strobing occurs at different frequencies: at 10000Hz, due to the DMD chip, and at 120Hz, due to the colorwheel. Secondly, DLP illumination lacks the perception of ’on-off’ flashing that characterizes a strobe-light, since these frequencies are beyond human perception. Deblurring images taken under such strobe-like illumination is difficult, especially for articulated and deformable objects, since the deconvolution kernel can be different at each pixel. Instead we process DLP photographs to create new images that either summarize a dynamic scene or illustrate its motion. We conclude by discussing the frequencies present in DLP photographs, comparing them to images taken under skylight and fluorescent light.
Distant lighting is widely assumed in computer vision. However, many scenes are illuminated by near light sources. An advantage of near lighting is that the intensity fall-off from the light source encodes scene depth. A drawback is that exact estimation of this depth requires the 3D position of the light source. In this paper, we analyze what kinds of depth cues are possible under uncalibrated near point lighting. A stationary scene is illuminated by a point source that is moved approximately along a line or in a plane. We observe the brightness profile at each pixel and demonstrate how to obtain three novel cues: plane-scene intersections, depth ordering and mirror symmetries. These cues are defined with respect to the line/plane in which the light source moves, and not the camera viewpoint. Plane-Scene Intersections are detected by finding those scene points that are closest to the light source path at some time instance. Depth Ordering for scenes with homogeneous BRDF is obtained by sorting pixels according to their shortest distances from a plane containing the light source. Mirror Symmetry pairs for scenes with homogeneous BRDFs are detected by reflecting scene points across a plane in which the light source moves. We show analytic results for Lambertian objects and demonstrate empirical evidence for a variety of other BRDFs.
We propose a new approach called “appearance clustering” for scene analysis. The key idea in this approach is that the scene points can be clustered according to their surface normals, even when the geometry, material and lighting are all unknown. We achieve this by analyzing a continuous image sequence of a scene as it is illuminated by a smoothly moving distant source. Each pixel thus gives rise to a “continuous appearance profile” that yields information about derivatives of the BRDF with respect to source direction. This information is directly related to the surface normal of the scene point when the source path follows an unstructured trajectory (obtained, say, by “hand-waving”). Based on this observation, we transform the appearance profiles and propose a metric that can be used with any unsupervised clustering algorithm to obtain iso-normal clusters. We successfully demonstrate appearance clustering for complex indoor and outdoor scenes. In addition, iso-normal clusters serve as excellent priors for scene geometry and can strongly impact any vision algorithm that attempts to estimate material, geometry and/or lighting properties in a scene from images. We demonstrate this impact for applications such as diffuse and specular separation, both calibrated and uncalibrated photometric stereo of non-lambertian scenes, light source estimation and texture transfer.