Nuclear material trafficking is a threat to national security, and being able to detect and track people carrying nuclear material is vital to protecting our country's interests and our people. However, inexpensive radiation detectors have limitations: they are isotropic, unable to detect the direction of the incident radiation; and they are additive, unable to detect a difference between multiple sources of radiation. These limitations leave the inexpensive radiation detectors unfit for many security applications. We explore new ways to enhance the functionality of these inexpensive radiation detectors by fusing them with various 3D vision sensors. We have developed new methods of simultaneous vision-radiation sensor calibration, single source localization and tracking, multi-source localization and tracking, and tracking a source behind visual occlusions.
The ongoing transformation of computer vision research is driven by two important trends. The mobile revolution has made available billions of networked cameras, which have brought computer vision to the Internet of Things (IoT). In addition, the advent of deep learning has enabled inference on large datasets, improving existing vision techniques and creating novel applications. These advances have the potential to positively impact a wide range of fields including security, search and rescue, agriculture, environmental monitoring, exploration, health, and energy. However, the privacy implications of releasing millions of networked vision sensors into the world would likely lead to significant societal push-back and legal restrictions. We aim to expand the range of places and personal devices where connected cameras can be deployed by developing privacy preserving computational cameras that perform efficient and robust privacy processing at the camera level. To this end, we show novel computational cameras that perform privacy processing via optical filtering of the incident light-field and via sensor-level application-specific integrated circuits (ASICs). Further, we show a novel learning framework, which, through adversarial training, successfully yields an encoder that permanently limits inference of a chosen private attribute, while preserving a generic notion of information or the estimation of a different desired attribute.
Microelectromechanical (MEMS) mirrors have extended imaging and vision capabilities onto mobile platforms such as hand-held projectors. However, the field-of-view (FOV) of these MEMS mirrors is usually less than 90 deg and any increase in the MEMS mirror scanning angle has design and fabrication trade-offs in terms of power, size, speed and stability. Therefore, we need techniques to increase the scanning range while still maintaining a small form factor. In this paper we exploit our recent breakthrough that has enabled the immersion of MEMS mirrors in liquid. While allowing the MEMS to move, the liquid additionally provides a “Snell’s window” effect and enables an enlarged FOV (150 deg). We present an optimized MEMS mirror design and use it to demonstrate applications in extreme wide-angle structured light.
MEMS mirrors have been used in a wide range of applications such as telecommunication, optical imaging, displays, and laser ranging. Large mirror size, large scan range, high speed and low drive voltage are always desired but remain conflicting aims. In this work, we utilize the “Snell window” effect to enlarge the scan angle by submerging MEMS mirrors into liquid whose refraction index is greater than in air.
We introduce a compact structured light device that utilizes a commercially available MEMS mirror-enabled hand-held laser projector. Without complex re-engineering, we show how to exploit the projector's high-speed MEMS mirror motion and laser light-sources to suppress ambient illumination, enabling low-cost and low-power reconstruction of outdoor scenes in bright sunlight. We discuss how the line- striping acts as a kind of "light-probe", creating distinctive patterns of light scattered by different types of materials. We investigate visual features that can be computed from these patterns and can reliably identify the dominant material characteristic of a scene, i.e. where most of the objects consist of either diffuse (wood), translucent (wax), reflective (metal) or transparent (glass) materials.
Scene appearance from the point of view of a light source is called a reciprocal or dual view. Since there exists a large diversity in illumination, these virtual views may be non-perspective and multi-viewpoint in nature. In this paper, we demonstrate the use of occluding masks to recover these dual views, which we term shadow cameras. We first show how to render a single reciprocal scene view by swapping the camera and light source positions. We then extend this technique for multiple views and build a virtual shadow camera array. We also capture non-perspective views such as orthographic, cross-slit and a pushbroom variant, while introducing novel applications such as converting between camera projections and removing catadioptric distortions. Finally, since a shadow camera is artificial, we can manipulate any of its intrinsic parameters, such as camera skew, to create perspective distortions.
Strobe-light photography creates beautiful high-frequency effects by capturing multiple object copies. Single-chip DLP projectors produce a similar effect, with two important distinctions. Firstly, strobing occurs at different frequencies: at 10000Hz, due to the DMD chip, and at 120Hz, due to the colorwheel. Secondly, DLP illumination lacks the perception of ’on-off’ flashing that characterizes a strobe-light, since these frequencies are beyond human perception. Deblurring images taken under such strobe-like illumination is difficult, especially for articulated and deformable objects, since the deconvolution kernel can be different at each pixel. Instead we process DLP photographs to create new images that either summarize a dynamic scene or illustrate its motion. We conclude by discussing the frequencies present in DLP photographs, comparing them to images taken under skylight and fluorescent light.
Active vision techniques use programmable light sources, such as projectors, whose intensities can be controlled over space and time. We present a broad framework for fast active vision using Digital Light Processing (DLP) projectors. The digital micromirror array (DMD) in a DLP projector is capable of switching mirrors “on” and “off” at high speeds (106/s). An off-the-shelf DLP projector, however, effectively operates at much lower rates (30-60Hz) by emitting smaller intensities that are integrated over time by a sensor (eye or camera) to produce the desired brightness value. Our key idea is to exploit this “temporal dithering” of illumination, as observed by a high-speed camera. The dithering encodes each brightness value uniquely and may be used in conjunction with virtually any active vision technique. We apply our approach to five well-known problems: (a) structured light-based range finding, (b) photometric stereo, (c) illumination de-multiplexing, (d) high frequency preserving motion-blur and (e) separation of direct and global scene components, achieving significant speedups in performance. In all our methods, the projector receives a single image as input whereas the camera acquires a sequence of frames.
Virtually all structured light methods assume that the scene and the sources are immersed in pure air and that light is neither scattered nor absorbed. Recently, however, structured lighting has found growing application in underwater and aerial imaging, where scattering effects cannot be ignored. In this project, we conduct a comprehensive analysis of two representative methods - light stripe range scanning and photometric stereo - in the presence of scattering. For both methods, we derive physical models for the appearances of a surface immersed in a scattering medium. Based on these models, we present results on (a) the condition for object detectability in light striping and (b) the number of sources required for photometric stereo. In both cases, we demonstrate that while traditional methods fail when scattering is significant, our methods accurately recover the scene (depths, normals, albedos) as well as the properties of the medium. These results are in turn used to restore the appearances of scenes as if they were captured in clear air. Although we have focused on light striping and photometric stereo, our approach can also be extended to other methods such as grid coding, gated and active polarization imaging.
Distant lighting is widely assumed in computer vision. However, many scenes are illuminated by near light sources. An advantage of near lighting is that the intensity fall-off from the light source encodes scene depth. A drawback is that exact estimation of this depth requires the 3D position of the light source. In this paper, we analyze what kinds of depth cues are possible under uncalibrated near point lighting. A stationary scene is illuminated by a point source that is moved approximately along a line or in a plane. We observe the brightness profile at each pixel and demonstrate how to obtain three novel cues: plane-scene intersections, depth ordering and mirror symmetries. These cues are defined with respect to the line/plane in which the light source moves, and not the camera viewpoint. Plane-Scene Intersections are detected by finding those scene points that are closest to the light source path at some time instance. Depth Ordering for scenes with homogeneous BRDF is obtained by sorting pixels according to their shortest distances from a plane containing the light source. Mirror Symmetry pairs for scenes with homogeneous BRDFs are detected by reflecting scene points across a plane in which the light source moves. We show analytic results for Lambertian objects and demonstrate empirical evidence for a variety of other BRDFs.
A digital editor provides the timeline control necessary to tell a story through film. Current technology, although sophisticated, does not easily extend to 3D cinema because stereoscopy is a fundamentally different medium for expression and requires new tools. We formulated a mathematical framework for use in a viewer-centric digital editor for stereoscopic cinema driven by the audience's perception of the scene. Our editing tool implements this framework and allows both shot planning and after-the-fact digital manipulation of the perceived scene shape. The mathematical framework abstracts away the mechanics of converting this interaction into stereo parameters, such as interocular, field of view, and location. We demonstrate cut editing techniques to direct audience attention and ease scene transitions. User studies were performed to examine these effects.
We propose a new approach called “appearance clustering” for scene analysis. The key idea in this approach is that the scene points can be clustered according to their surface normals, even when the geometry, material and lighting are all unknown. We achieve this by analyzing a continuous image sequence of a scene as it is illuminated by a smoothly moving distant source. Each pixel thus gives rise to a “continuous appearance profile” that yields information about derivatives of the BRDF with respect to source direction. This information is directly related to the surface normal of the scene point when the source path follows an unstructured trajectory (obtained, say, by “hand-waving”). Based on this observation, we transform the appearance profiles and propose a metric that can be used with any unsupervised clustering algorithm to obtain iso-normal clusters. We successfully demonstrate appearance clustering for complex indoor and outdoor scenes. In addition, iso-normal clusters serve as excellent priors for scene geometry and can strongly impact any vision algorithm that attempts to estimate material, geometry and/or lighting properties in a scene from images. We demonstrate this impact for applications such as diffuse and specular separation, both calibrated and uncalibrated photometric stereo of non-lambertian scenes, light source estimation and texture transfer.