5.2 Optical see-through and video see-through AR displays

3 min readaugust 7, 2024

AR displays come in two main flavors: optical see-through and video see-through. Optical displays let you see the real world directly, with virtual stuff overlaid on top. They're more natural but can struggle with alignment and occlusion.

Video displays show you a camera feed of the world instead. They give more control over the final image, making it easier to line things up and handle occlusion. But they can feel less natural and have more lag than optical displays.

Optical See-Through Displays

How Optical See-Through Displays Work

Top images from around the web for How Optical See-Through Displays Work
Top images from around the web for How Optical See-Through Displays Work
  • allow the user to see the real world directly through a transparent or semi-transparent display while virtual content is overlaid on top
  • Employ (half-silvered mirrors or ) to merge the real and virtual views
  • Light from the real world passes through the combiner while light from the display is reflected into the user's eyes, creating a composite view
  • Typically have a limited due to the size constraints of the optical combiner and display elements (Microsoft HoloLens, Meta 2)

Waveguide Displays in Optical See-Through Systems

  • are a type of optical see-through display that use a thin, transparent waveguide to guide light from a small display engine to the user's eyes
  • Light is injected into the waveguide at one end and propagates through until it reaches the other end, where it is coupled out towards the user's eyes
  • Allow for a more compact and lightweight design compared to traditional optical combiners
  • Can achieve a wider FOV by using multiple waveguides stacked together or by employing a single waveguide with multiple input and output regions (Vuzix Blade, Google Glass)

Advantages and Challenges of Optical See-Through Displays

  • Provide a direct view of the real world, resulting in a more natural and immersive experience
  • Eliminate the need for complex camera systems and image processing required for
  • Face challenges in achieving accurate between virtual content and the real world due to the lack of control over the real-world view
  • Struggle with , as virtual content always appears overlaid on top of the real world, regardless of depth (virtual objects cannot be occluded by real objects)

Video See-Through Displays

How Video See-Through Displays Work

  • Video see-through displays capture the real world using one or more cameras and display the video feed on a screen in front of the user's eyes
  • Virtual content is rendered and composited with the video feed, creating an augmented view of the real world
  • Allow for complete control over the final displayed image, enabling more accurate registration and occlusion handling compared to optical see-through displays
  • Commonly used in smartphone-based AR applications and some head-mounted displays (Samsung Gear VR, Oculus Rift AR mode)

Registration in Video See-Through Displays

  • Registration refers to the accurate alignment of virtual content with the real-world video feed
  • Achieved through computer vision techniques, such as , , and
  • Markers () or natural features in the environment can be used as reference points for registration
  • Accurate registration is crucial for creating a seamless and convincing AR experience, as misalignment can break the illusion of virtual objects being part of the real world

Occlusion Handling in Video See-Through Displays

  • Occlusion handling involves correctly displaying the depth relationships between virtual and real objects
  • In video see-through displays, occlusion can be handled by using depth information from cameras (, ) or by manually creating
  • Depth information allows the system to determine which objects should be in front of or behind others, enabling proper occlusion rendering
  • Occlusion masks define the regions where virtual objects should be visible or hidden based on the real-world geometry
  • Accurate occlusion handling enhances the realism and immersion of the AR experience, as virtual objects appear to seamlessly integrate with the real environment

Key Terms to Review (31)

Camera Input: Camera input refers to the data captured by a camera sensor, which is crucial for augmented and virtual reality systems to interpret and interact with the real world. This input can include video streams, images, or depth information that help create a seamless blend between digital content and the physical environment. In augmented reality, camera input enables real-time tracking of the user's surroundings, allowing for the accurate placement of virtual objects in relation to real-world coordinates.
Depth Sensors: Depth sensors are devices that measure the distance from the sensor to objects in the environment, creating a depth map that helps in understanding the spatial relationships within a scene. They play a crucial role in enhancing the realism and interactivity of augmented and virtual reality experiences by accurately determining how far away objects are, which allows for better rendering and interaction with digital elements.
Display optics: Display optics refers to the optical components and systems that enable the visualization of images in augmented and virtual reality environments. These optics are crucial for the manipulation of light to ensure that digital content is seamlessly integrated with the real world or presented in a fully immersive virtual space. Effective display optics enhance the user experience by providing clarity, depth perception, and a wide field of view.
Feature detection: Feature detection refers to the process of identifying and locating key elements or points within a visual input, often used in computer vision and augmented reality systems. This process is essential for understanding the environment and accurately overlaying digital content in both optical and video see-through displays. Effective feature detection helps in recognizing spatial relationships, enabling systems to understand and interact with their surroundings more intelligently.
Fiducial Markers: Fiducial markers are reference points or objects used in augmented reality and computer vision systems to provide accurate location and orientation information in a scene. They help the system recognize and track the position of real-world objects, making it easier to overlay digital content accurately. These markers can be physical objects or specially designed images that enhance the capability of tracking systems, ensuring that virtual elements align properly with the user's view of the physical world.
Field of View (FOV): Field of View (FOV) refers to the extent of the observable world that can be seen at any given moment through a display or optical device. In augmented and virtual reality, FOV is crucial as it influences user immersion, peripheral awareness, and interaction with both digital and real-world elements. A wider FOV enhances the sense of presence in virtual environments, while the type of display can significantly affect how FOV is perceived and utilized.
Gesture recognition: Gesture recognition is a technology that enables the identification and interpretation of human gestures using mathematical algorithms. It allows users to interact with devices and applications in a more intuitive manner, enhancing the user experience by translating physical movements into commands. This capability is essential in various fields, especially in virtual reality (VR) and augmented reality (AR), as it supports natural user interfaces and improves interaction with digital environments.
Head-mounted display (HMD): A head-mounted display (HMD) is a device worn on the head that incorporates a display screen, allowing users to immerse themselves in virtual or augmented environments. HMDs can present digital information overlaid on the real world or create entirely virtual experiences, depending on their design as either optical see-through or video see-through. They are essential in the field of augmented and virtual reality for providing an engaging user experience.
Hirokazu Kato: Hirokazu Kato is a prominent researcher in the field of augmented reality (AR), particularly known for his work on developing optical see-through and video see-through AR displays. His contributions have advanced the understanding and implementation of AR technologies, making them more accessible and effective in various applications, including education, gaming, and industry. Kato's research emphasizes the integration of real and virtual worlds, enhancing user interaction with augmented environments.
Holographic optical elements: Holographic optical elements (HOEs) are advanced optical components that manipulate light using holography to create specific optical effects, such as diffraction or focusing. They are crucial in augmented reality (AR) systems, allowing for the overlay of digital information onto the real world by controlling how light interacts with both virtual and real images. HOEs can be designed to enhance the visibility of virtual objects and improve the overall user experience in AR applications.
Image Registration: Image registration is the process of aligning two or more images of the same scene taken at different times, from different viewpoints, or by different sensors. This technique is crucial for accurately overlaying virtual content onto the real world in augmented reality, enabling a seamless integration of digital information with the user's view of the environment.
Industrial training: Industrial training is a structured program designed to provide practical experience and skill development in a specific industry or profession. This type of training focuses on bridging the gap between theoretical knowledge and real-world applications, allowing individuals to gain hands-on experience that enhances their employability and expertise in their field. By engaging with current technologies and practices, industrial training prepares participants for the workforce and fosters their ability to adapt to evolving industry standards.
Ivan Sutherland: Ivan Sutherland is a pioneering computer scientist known as the father of computer graphics, credited with developing the first head-mounted display system for virtual reality in the 1960s. His work laid the foundation for modern AR and VR technologies, influencing the design of both hardware and software that are essential in these fields.
Latency: Latency refers to the time delay between an action and the corresponding response in a system, which is especially critical in augmented and virtual reality applications. High latency can lead to noticeable delays between user input and system output, causing a disconnect that may disrupt the immersive experience.
Marker-based augmentation: Marker-based augmentation is a technique in augmented reality that uses visual markers to identify and track the position of virtual content in relation to the real world. This method relies on a camera to detect specific patterns or images, which serve as reference points for overlaying digital information, ensuring accurate placement and interaction with physical environments.
Markerless Augmentation: Markerless augmentation is a type of augmented reality that overlays digital content onto the real world without the use of physical markers. Instead of relying on specific images or objects to trigger the augmentation, markerless systems utilize environmental features, GPS data, or spatial mapping to identify surfaces and contexts for digital interaction. This method enhances user experience by allowing for more fluid interactions with augmented elements, as users can engage with virtual objects directly in their environment.
Medical visualization: Medical visualization is the process of creating visual representations of medical data to enhance understanding, diagnosis, and treatment planning. It combines techniques from imaging and computer graphics to help medical professionals visualize complex anatomical structures and physiological processes in a more intuitive way. This approach is particularly valuable in augmenting reality technologies, where real-time overlays of digital information can be presented to support surgical procedures or patient education.
Occlusion Handling: Occlusion handling refers to the techniques and methods used in augmented and virtual reality to manage the visibility of virtual objects in relation to real-world elements. This process ensures that virtual objects appear realistically integrated into their physical environment, which involves determining when and how these objects should be hidden or obscured by real-world objects based on their spatial relationships.
Occlusion Masks: Occlusion masks are graphical elements used in augmented reality to manage the visibility of virtual objects in relation to real-world elements, effectively blocking or obscuring parts of the virtual content based on the environment. They ensure that virtual objects appear more realistic by preventing them from being displayed in front of physical objects that should obscure them. This technique enhances immersion and provides a more believable experience when interacting with augmented content.
Optical combiners: Optical combiners are devices that merge real-world imagery with virtual elements, allowing users to see augmented information seamlessly integrated into their environment. These components play a crucial role in both optical see-through and video see-through augmented reality displays, enabling the overlay of digital content onto the user's view of the physical world. By manipulating light, optical combiners facilitate immersive experiences that enhance user interaction with both real and virtual objects.
Optical see-through displays: Optical see-through displays are augmented reality devices that overlay digital information onto the real world, allowing users to see both the actual environment and computer-generated images simultaneously. These displays use transparent screens or lenses, which enable the user to maintain visibility of their surroundings while viewing additional visual content. This technology is essential for enhancing user interaction in augmented reality applications, making it a key component in the broader context of AR/VR systems.
Pose Estimation: Pose estimation is the process of determining the position and orientation of an object or person in a given space, often in 3D coordinates. It plays a crucial role in various applications such as augmented reality, robotics, and computer vision, helping to accurately overlay virtual objects onto the real world or understand movement dynamics. Through advanced algorithms and sensor data, pose estimation allows systems to track and interpret the spatial relationships between objects and their environments.
Projector-based systems: Projector-based systems refer to augmented reality setups that utilize projectors to overlay digital content onto physical surfaces, creating interactive experiences. These systems enhance user interaction by projecting visuals that can be manipulated in real-time, making them suitable for various applications such as art installations, education, and collaborative workspaces.
Registration: In the context of augmented reality, registration refers to the process of accurately aligning virtual content with the real world so that it appears to be integrated seamlessly into a user's environment. This involves determining the spatial relationship between the real objects and the digital overlays, which is crucial for creating an immersive experience. Proper registration is essential for ensuring that virtual elements behave as if they are part of the physical world, enhancing user interaction and realism.
Resolution: Resolution refers to the amount of detail an image holds, commonly expressed in pixels, which impacts visual clarity and quality in displays. In augmented reality (AR) and virtual reality (VR), high resolution is crucial for creating lifelike experiences, as it influences how accurately virtual elements blend with the real world or how immersive a virtual environment feels. Resolution can affect user comfort, perception of depth, and overall engagement.
Stereo Cameras: Stereo cameras are devices that capture images from two distinct viewpoints, mimicking human binocular vision. By using two or more lenses spaced apart, these cameras create depth perception in images and video, which is essential for applications like augmented reality and optical tracking systems. This technology enhances the user's experience by providing a sense of three-dimensionality, making it easier to interact with virtual objects placed in real-world environments.
Total Internal Reflection: Total internal reflection is the phenomenon that occurs when a light wave traveling through a denser medium hits a boundary with a less dense medium at an angle greater than the critical angle, resulting in the light being completely reflected back into the denser medium. This principle is vital in optical technologies, including augmented reality systems that utilize optical see-through and video see-through displays to enhance visual experiences by managing how light interacts with different materials.
Tracking: Tracking refers to the technology and methods used to determine the position and orientation of objects or users in augmented and virtual reality environments. This is crucial for ensuring that digital elements are accurately aligned with the physical world or virtual spaces, enhancing immersion and interaction. Effective tracking systems can significantly improve user experience by providing realistic feedback and seamless integration of virtual content with the real world.
User interface design: User interface design is the process of creating interfaces in software or computerized devices focusing on looks and style, aiming to enhance user experience by making interactions intuitive and efficient. It involves designing all the points of interaction between the user and the system, ensuring that these interactions are seamless, accessible, and enjoyable. Good user interface design takes into account the needs of users, ensuring that information is presented clearly and that controls are easy to navigate.
Video see-through displays: Video see-through displays are augmented reality systems that use cameras to capture real-world images and then overlay computer-generated graphics onto those images in real-time. This type of display allows users to interact with virtual content while viewing the real environment, often enhancing the user's perception and understanding of both the physical and digital worlds.
Waveguide displays: Waveguide displays are optical devices that direct light from a source to a viewer's eyes, typically used in augmented reality (AR) systems. They work by using the principles of total internal reflection, allowing images to be projected while maintaining transparency for the real world. This technology enables the blending of virtual images with the real environment, enhancing user experiences in both optical see-through and video see-through AR applications.
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.