Visual Sensing Part 1: What If the Camera Lies?
Most robustness discussions in machine learning start from a convenient fiction:
The camera is a faithful observer of the world.
In practice, that assumption is wrong. In embodied systems, it’s often catastrophically wrong.
Consider a mobile robot navigating a cluttered hallway. As the robot accelerates, the camera experiences slight motion blur. The lighting shifts as it passes beneath fluorescent lights, and the autofocus system briefly hunts for focus when a person crosses in front of the camera. To a human observer, the scene is still obvious: a hallway, a person walking, obstacles nearby. But to a vision model trained primarily on clean datasets, these small optical distortions can significantly degrade predictions.
Before a single pixel reaches your model, before quantization, compression, or normalization, the image has already passed through optics, sensor physics, and time. Lenses blur. Sensors smear. Motion leaks across frames. Light scatters. None of this is adversarial. This is simply how cameras work.
This post examines optical and sensor-level perturbations: distortions that arise before digitization, before preprocessing, before your model ever sees the data. These are among the most physically plausible perturbations you’ll encounter and some of the most under-tested.
Why optical perturbations deserve their own category
Unlike digital corruptions or color adjustments, optical effects preserve scene semantics while introducing no artificial patterns. They arise naturally from camera motion, focus mechanics, and hardware constraints and they’re ubiquitous in real deployments.
The severity values used for these perturbations follow the standard corruption scaling found [here], which defines multiple levels of corruption strength to approximate progressively worse real-world sensing conditions. This benchmark established a widely adopted protocol for evaluating robustness under common corruptions such as blur, noise, and weather effects, and we adopt similar severity ranges to maintain comparability with prior robustness evaluations.
In this post, we’ll explore optical perturbations in two groups: blur effects (Gaussian, motion, defocus, zoom, and glass) and lens artifacts (bloom/glare, vignetting, and rolling shutter). Each reveals a different violated assumption about how images form.
Not all blur is created equal
Blur might seem like a single phenomenon, but cameras blur images in fundamentally different ways depending on what caused the loss of sharpness. A stationary out-of-focus object blurs differently than a moving in-focus one. Fast camera motion creates different artifacts than optical imperfections in the lens itself. Before we examine what these differences reveal about model assumptions, try experiencing them directly:


Perturbation Description
Gaussian blur is often dismissed as a toy corruption. In reality, it reveals a critical dependency: how much high-frequency structure does your model actually need?
Figure 1: Visualize the different optical perturbations.
Each type of blur attacks spatial information differently and models respond to these attacks in revealing ways.
Complex lens effects expose hidden assumptions
Beyond blur, real camera optics introduce artifacts that most models never encounter during training. Bright lights don’t politely stay within object boundaries, they bloom, flare, and scatter across the frame. Lens vignetting darkens image corners. Rolling shutters capture different parts of the scene at different moments in time. These aren’t edge cases; they’re the ordinary behavior of real cameras under real conditions. Experience these effects directly:


Perturbation Description
Bright light doesn't respect object boundaries. Bloom and glare wash out edges, create false highlights, and introduce non-local effects. These perturbations are revealing precisely because they preserve object presence while destroying object separability. In practice, models often latch onto glare artifacts as features, until those artifacts move or disappear.
Figure 2: Visualize the different perturbations based on complex lens effects.
What makes these perturbations particularly revealing is that they break assumptions models make silently, about spatial uniformity, temporal consistency, and the relationship between brightness and geometry.
A recurring pattern: cliffs, not slopes
Across most optical perturbations, we observe a similar pattern: gradual degradation at low magnitudes, sharp collapse beyond a threshold, and binary failure modes in control policies.
Figure 3: Performance collapses abruptly beyond a perturbation threshold.
The chart shows GR00T’s performance across optical perturbations at different severity levels. Rather than uniform degradation, each perturbation reveals a different vulnerability.
Motion blur and Gaussian blur follow similar trajectories, gradual degradation through low and mid severity, then sharper decline at high intensity. This suggests the policy can tolerate losing some high-frequency detail (edges, fine textures) but crosses a threshold where critical spatial information becomes unrecoverable. These are controlled failures: performance degrades smoothly until a breaking point.
Glass blur behaves differently. It collapses sharply at mid-severity, exhibiting one of the steepest drops. Unlike uniform blur, glass blur introduces spatially varying distortion that disrupts the local spatial relationships the policy relies on for predicting contact points and affordances. When these relationships break, the policy cannot compensate.
Vignetting is the outlier, it shows almost no degradation. Even at high intensity, performance remains near baseline. This counterintuitive result reveals an accidental alignment: darkening the periphery reinforces the model’s existing center bias, helping it focus on the central region where task-relevant objects are typically located. The perturbation actually improved attention distribution rather than disrupting it.
Each perturbation attacks a different assumption, and the degradation patterns reveal which dependencies are critical versus incidental for this particular model and task.
Why embodied systems can’t ignore this
Optical perturbations test whether a model relies on fine spatial detail, assumes instantaneous capture, assumes uniform optics across the frame, confuses brightness with structure, or has overfit to clean or synthetic imagery.
In robotics and embodied AI, perception isn’t the goal; action is. Optical failures propagate downstream into incorrect affordances, mislocalized targets, unstable control, and unsafe behavior. If your policy fails under mild optical distortion, it’s not robust, regardless of benchmark performance.
The lesson is clear: a model that hasn’t seen how cameras actually work hasn’t learned to see the world, it’s learned to see the dataset.
What comes next
Part 2 shifts focus from optics to appearance. We’ll examine photometric and colorimetric perturbations, changes that preserve geometry but alter intensity, color, and illumination and demonstrate how models often depend on these signals far more than they should.