Introduction
While IMU integration addresses the temporal gaps in GNSS - bridging outages and providing high-rate updates - camera and LiDAR sensors address a different dimension of the problem: they can provide positioning information in environments where GNSS simply does not work at all. Vision and LiDAR represent the environmental sensing layer of a complete autonomous navigation system.
Visual Odometry
Visual odometry (VO) is the process of estimating a camera''s ego-motion - how it has moved - by tracking features across successive image frames. A camera captures a scene, and the algorithm identifies distinctive points (corners, edges, blobs) and tracks their movement between frames. By analysing how these features shift in the image, the algorithm can compute the camera''s translation and rotation between frames.
Stereo visual odometry uses two cameras separated by a known baseline, enabling absolute scale estimation - monocular VO suffers from scale ambiguity and cannot determine absolute distances. Visual-Inertial Odometry (VIO) fuses the camera with an IMU, using the IMU to provide scale and handle fast rotational motion that would blur image features.
Visual odometry works well in textured environments with consistent lighting. It degrades or fails in: featureless environments (smooth white walls, open snow fields), rapid illumination changes, motion blur from high-speed movement, and complete darkness. Despite these limitations, VO provides continuous relative positioning in many indoor and urban environments where GNSS is absent.
LiDAR and SLAM
LiDAR (Light Detection and Ranging) emits laser pulses and measures their return times to build dense, accurate 3D point clouds of the surrounding environment. A spinning LiDAR sensor such as those used in autonomous vehicles can produce hundreds of thousands of 3D measurements per second, building a detailed map of nearby obstacles, road surfaces, lane markings, and infrastructure features.
Simultaneous Localisation and Mapping (SLAM) uses this point cloud data to simultaneously build a map of the environment and localise the vehicle within it. By matching the current scan against a previously built map (scan matching, using algorithms such as Normal Distributions Transform or Iterative Closest Point), the system can determine precise relative position - independent of GNSS.
LiDAR SLAM is highly accurate (centimetre-level in good conditions), works in darkness, and is robust to illumination changes. Its limitations are significant cost (automotive-grade LiDAR units have historically cost tens of thousands of dollars, though prices are falling), substantial weight and size for rotating units, and degraded performance in heavy rain, snow, or fog where laser pulses scatter.
How Vision and LiDAR Complement GNSS
The complementary relationship between GNSS and vision/LiDAR sensors is defined by their respective failure modes, which overlap minimally:
| Environment | GNSS | Visual Odometry | LiDAR SLAM |
|---|---|---|---|
| Open sky, rural | Excellent | Good | Good |
| Urban canyon | Degraded (multipath) | Good | Excellent |
| Tunnel / underground | None | Good (if lit) | Excellent |
| Indoor car park | None | Good (if textured) | Excellent |
| Night, no lighting | Excellent | Poor/None | Excellent |
| Heavy rain/fog | Excellent | Degraded | Degraded |
| Featureless snow field | Excellent | Poor | Poor |
Autonomous Vehicle Approaches
Leading autonomous vehicle developers have adopted different sensor fusion philosophies. Waymo''s approach uses a combination of LiDAR, radar, cameras, and GNSS/IMU, relying on high-definition (HD) pre-built maps to provide prior knowledge of the environment. The vehicle localises itself within this HD map using LiDAR scan matching, with GNSS providing the global anchor and cameras providing lane-level semantic information. This approach delivers very high accuracy but requires extensive prior mapping of all operational areas.
A camera-centric approach, as pursued by some manufacturers, relies on neural network-based perception from camera arrays to reconstruct 3D geometry in real time. This approach is more scalable - cameras are cheap and ubiquitous - but demands enormous computational resources and robust machine learning models. GNSS remains an important input for global localisation even in this approach.
HD Map Integration
High-definition maps store geometric information about roads - lane boundaries, kerb edges, traffic signs, road markings - with centimetre-level accuracy. When a GNSS/IMU/LiDAR system localises the vehicle within an HD map, it gains access to rich prior knowledge about its environment without needing to perceive and interpret every feature in real time. HD maps effectively extend the performance of the sensor suite, but they require continuous maintenance as road networks change.
Practical Trade-offs
Building a robust multi-sensor fusion system involves significant engineering trade-offs. Adding sensors increases cost, weight, power consumption, and software complexity. Each additional sensor requires careful calibration - both internal (intrinsic calibration) and relative to other sensors (extrinsic calibration). Time synchronisation between sensors operating at different rates is a non-trivial challenge. Despite these challenges, for applications where continuous positioning is safety-critical, multi-sensor fusion is not optional - it is the only viable architecture.