Game Audio - Sound Spatialization PDF
Document Details
Uploaded by DecisiveFriendship6081
2024
Miguel Negrão
Tags
Summary
This document provides an overview of sound spatialization techniques, particularly for game audio and multimedia applications. It discusses how sound is perceived spatially and temporally, and explores approaches to simulating real-world sound behaviour.
Full Transcript
9 - Game Audio - Sound Spatialization Sound Design - Games and Multimedia ©2024 Miguel Negrão CC BY-NC-ND 4.0 Sound spatialization is the placement of sound in specific locations or areas in space by means of loudspeakers and other technologies. In the games and film industry this...
9 - Game Audio - Sound Spatialization Sound Design - Games and Multimedia ©2024 Miguel Negrão CC BY-NC-ND 4.0 Sound spatialization is the placement of sound in specific locations or areas in space by means of loudspeakers and other technologies. In the games and film industry this is usually called "surround" or "spatial audio". All sound is spatial, since it must propagate from source to listener through space. The perception of sound is also inherently temporal, given that it takes time to listen to a sound (unlike an image). Before we look at sound spatialization techniques let us a take a look at how sound behaves spatially with real sources. Sound sources tend to be idealized as point sources. Point sources radiate sound equally in all directions from a very small area (a point). In reality it is more complicated. Large objects, e.g. sea. Many small objects, e.g. leafs in trees in a forest. Objects which radiate differently in different directions. Reflections: reverberation. Refraction and interference, standing waves. Acoustics: the branch of physics that studies mechanical waves. In most every day situations an individual is surrounded by many sound sources, some small and others large, positioned in many different positions all around. Due to evolutionary pressure humans have the ability to detect the direction of (some) sound sources. How ? Detecting the position of a single sound source We have two ears Our brain is trained to detect cues in the signals arriving at the two ears in order to detect the position of a sound source. How ? Detecting the position of a single sound source It uses the differences in the signals arriving at the two ears Amplitude difference between the two ears (high-frequencies). Sound coming from the left or right, as it moves around the head becomes softer. Detecting the position of a single sound source It uses the differences in the signals arriving at the two ears Time difference between the two ears (low-frequencies). Sound coming from left or right takes extra time to reach the opposite ear. Detecting the position of a single sound source The brain also uses: Spectral cues caused by head, outer ear and torso - helps distinguishing sounds above and below the head. Small movements of the head. Localization of a single sound source We can also determine to some extent the type of room we are in by listening to the reverberation. Human echolocation - some individuals can even gain the ability to navigate a space by listening to the interaction of a sound with the environment. We use our capability of locating sounds continuously every day ! Although we can only see what is in front of us we can hear all around us ! If we want to have realistic sound reproduction using loudspeakers or headphones, we need to somehow simulate the position and direction of sources. It is also important to simulate the spatial properties of room acoustics. Ambient sound is usually composed of many individual sound sources which fuse into one immersive soundscape which surrounds the listener from all directions. It is important to also be able to simulate this. Let's look at how sound sound reproduction systems attempt to simulate real world spatial behaviour of sound. Usually sound files used in sound design are mono. Sometimes also stereo or surround (5.1, 7.1) If we record a sound with a single- capsule microphone and play it back with a single loudspeaker almost all the spatial information is lost. With sound spatilization techniques it is possible (to some extent) to create the perception in the listener that: a sound is located at a given position (or angle). a sound is inside a room (reverb). a sound is very far away (high-frequency absorption). a sound source is directional. Spatial audio techniques attempt to faithfully reproduce spatial attributes of sound such as direction, distance, size or room acoustics using loudspeakers, and often rely on mathematical, physical and psychoacoustic knowledge. Approaches to spatialization: 1. Spatializing a mono signal: sometimes called panning. 2. stereo, and more generally, multi-channel recording with multiple capsules or microphones. Let's look at spatializing a single mono sound file. In practice this means Given a mono sound file. A sound card Many loudspeakers Or headphones How to spatialize this sound ? Spatial audio techniques can be divided between Sound Field Synthesis (right), which attempt to recreate an accurate physical sound field which in turn will create the correct perceptual cues (WFS, Ambisonics), and Perception-based methods (left) where an original sound field is not reconstructed, instead equivalent perceptual cues are created (stereophony, VBAP). source Spatial audio techniques can be divided between Sound Field Synthesis : make it real. Perception-based methods : fake it. source Stereophony Most widely used technique. A perception-based method. Two loudspeakers Usually loudspeakers are 60deg apart. Only the amplitude of the sound at each loudspeaker is manipulated. Stereo panning virtual source θmax θ Objective: spatialize a mono signal. Both loudspeakers play same signal but with different levels. Head equidistant from both loudspeakers. Can only simulate sound sources which are in an arc and 30 degrees to each side. virtual source θmax θ If only left/right loudspeaker is playing → sound localized at that loudspeaker. If both loudspeakers playing with same level, due to symmetry same signal in left and right ear. As consequence sound is localized at the center position. Stereo linear panning virtual source θmax θ Stereo linear panning Problem Sound intensity will be 3dB lower at the center position. This creates a "hole in the middle" since signal will appear louder at the endpoints then at the center position. Solution: constant power panning. Stereo constant power panning virtual source θmax θ Sound intensity remains constant for all. Multi-channel systems in the horizontal plane virtual source θ Multi-channel systems in the horizontal plane Find the two loudspeakers which are closer in angle to the source and calculate: virtual source θ This is the panning method used by Unreal Engine for surround systems. //AudioMixerDevice.cpp float Fraction = (Azimuth - PrevChannelAzimuth) / (NextChannelAzimuth - PrevChannelAzimuth); AUDIO_MIXER_CHECK(Fraction >= 0.0f && Fraction 800 channels sound files). Object based spatialization Sound stream + position metadata is sent to hardware or software spatial sound system. The software or hardware system will pan each source directly to the appropriate format (5.1, 7.1.4, 22.1, binaural via headphones). Software/Game/film doesn't need to know what is the output system (e.g. headphones or 7.1). Object based spatialization This doesn't work for ambient sound, which is not one mono source, but multiple decorrelated sources, for instance capture by a surround microphone. These must still use a channel bed (e.g. 5.1 ou 7.1.4). Ambisonics sound field reproduction technique based on the spherical harmonic decomposition of the sound field. Ambisonics Capable of spatial sound encoding for reproduction in 2D (plane) and 3D multi-loudspeaker systems. w loudspeakers source x encode y z decode When encoding a single mono sound signal, an -channel encoded signal is produced. depends on the spherical-harmonic order used: order number of channels 2D number of channels 3D 1 3 4 2 5 9 The number of channels of the encoded signal is independent from the number of loudspeakers. Theoretical minimum of speakers for horizontal playback is where is the order. Better performance if symmetric. Number of loudspeakers 2D: order minimum optimal 1 3 (4) 6 2 5 8 3 7 8 3D systems: Cube Icosahedron Usually 8 or more loudspeakers. Example: Sonic Lab, SARC, Belfast Ambisonics Panning mono source First-order w loudspeakers source x encode y z decode Ambisonics - B-format First order with 4 channels encoded is usually called B-format. Ambisonics microphones source: http://www.core-sound.com/TetraMic/1.php Ambisonics microphones Captures the full directivity information for every soundwave that hits the microphone, from every direction. The position and movement of every individual sound source is recorded. Perfect for reproduction of ambient sound in 5.1 surround systems. Signals from 4 capsules is usually called A-format. Needs to be converted to B-format. Ambisonics - manipulating entire sound field Rotate "Zoom" Select zone Invert Example: http://www.ambisonictoolkit.net/ Ambisonics - Use in digital games and VR Initially used mostly by recording enthusiasts and for electroacoustic music. Renewed interest due to VR and digital games. Very interesting for VR because head rotation can be simulated with rotation of encoded sound field. Doesn't require an object based approach, whole sound field can be saved in the encoded signals. Ambisonics - Use in digital games and VR Digital games Used in games by Codemasters (e.g. Colin McRae: DiRT) Game engines: Unreal Engine, Unity3D. Middleware: Wwise, steam audio, Google resonance audio, Oculus Audio SDK. VR DAW (reaper) + plugins (e.g. noisemakers ambi bundle) Binaural For headphones Plays at each ear the same signal that would be created by an equivalent real sound source. Binaural microphones Small microphones introduced inside the ears. source image Binaural microphones Binaural head source image 1 source image 2: wikimedia commons Binaural synthesis Capture Head Related Transfer Function (HRTF) for each position of virtual source. The HRTF captures the how much each frequency is affected in terms of gain and delay for each ear. source: wikimedia commons Binaural synthesis Research centers have created publicly available HRTF sets (IRCAM Listen, KEMAR, etc) source: wikimedia commons Binaural synthesis To pan a mono sound source convolve with the left and right HRTF files. Binaural synthesis - interpolation What if the virtual source is not at one of the recorded positions of the HRTFs ? Approach 1) Find nearest position. Interpolate the HRTFs of closest positions. Binaural synthesis - virtual ambisonics What if the virtual source is not at one of the recorded positions of the HRTFs ? Approach 2) Virtual ambisonics: - Encode to ambisonics - Decode to virtual loudspeaker setup with each speaker at positions with HRTFs. - Apply convolution to each loudspeaker signal. Binaural synthesis Used in VR experiences and digital games. Because VR requires wearing a helmet, usually there are headphones built in. The VR helmet usually has head-tracking which can be used to move the sound image when the head rotates, adding realism. For VR experiences usually some form of ambisonics to binaural conversion is used. Simulating enclosed spaces - Reverberation When using spatialization usually different reverberation signals are generated for each loudspeaker. Different patterns of early reflections can be simulated for each loudspeaker or direction. Actual geometry of space being simulated can be used to generate the reflection patterns. The ratio of dry signal to reverb signal can be changed taking into account position of source and listener. Steam Audio - Physics-Based Reverb "Reflections and reverb can add a lot to spatial audio. Steam Audio uses the actual scene geometry to simulate reverb. This lets users sense the scene around them through subtle sound cues, an important addition to VR audio.... Steam Audio can apply binaural rendering to occlusion, reverb, and sound propagation effects, so you can get a strong sense of space and direction, even from reflected sounds, reverb entering a room through a doorway, and more." (source) Microsoft - Project Acoustics - Physics-Based Reverb "Ray-based acoustics methods can check for occlusion using a single source-to-listener ray cast, or drive reverb by estimating local scene volume with a few rays. But these techniques can be unreliable because a pebble occludes as much as a boulder. Rays don't account for the way sound bends around objects, a phenomenon known as diffraction. Project Acoustics' simulation captures these effects using a wave-based simulation. The acoustics are more predictable, accurate and seamless." source demo Surround standards and formats Spatialization is used in: Film (cinema theaters, home) Television/streaming Computer games Music Theme parks, museums, etc. Different commercial standards have evolved. In the commercial context spatialization is usually called "surround" sound. Center Front-L Front-R 0° -30 ° 30 5.1 ° surround sound -11 0° Listener 110 ° Surround-L Surround-R Center 5.1 surround sound An industry standard for multi-channel audio reproduction which specifies: 0° -30 ° 30 ° 5 full bandwidth channels located at center: 0 deg front LR: -30 deg and 30 deg back LR: -110deg and 110deg Listener 1 low-frequency effects channel (for subwoofers) Defined in Recommendation ITU-R BS.775- 3 (08/2012) 5.1 surround sound How the 5 channels are created is up to the content creator, it is not specified by the standard. The 5 channels can be created with amplitude panning, microphones, etc. The low-frequency effects (LFE) channel should be used just for loud sounds with low frequency content such as explosions. LFE should have frequencies only up to 120 Hz. 5.1 Normal Full-bandwidth system Each of the 6 signals is sent directly to each of the 6 loudspeakers. 5.1 Small loudspeakers - Bass management Systems where the main 5 loudspeakers cannot reproduce frequencies down to 20Hz need bass management. The signals from the 5 loudspeakers are split with crossovers, summed and sent to the subwoofer. This is required in order to reproduce low- frequencies in the 5 main channels image source 5.1 in cinema theaters In cinemas the left surround and right surrounds are sent to 6 loudspeakers surrounding audience on each side (total 12). Each listener will hear the surround from the loudspeaker pair which is closest. image source Other common speaker arrangements 7.1 10.2 22.2 Most films in 2018 are mixed in 7.1 (some are also using an object based approach - Dolby Atmos) First film in 7.1 was 2010's Toy Story 3. Most games support 7.1. Consumer and cinema theater surround formats Dolby Digital Format for encoding 5.1 audio channels with lossy compression. Used in film projection, DVDs, Blu-rays, Game Consoles. The compression technology is called AC3. In game consoles it can be sent through the optical TOSLINK port. Many film theaters still use Dolby Digital. Consumer and cinema theatre surround formats DTS Another surround format with a different lossy compression technology. Stenven Spielberg investor; Jurassic Park (1993) Consumer and cinema theatre surround formats Dolby TrueHD lossless Up to 16 discrete audio channels 24 bits 192 kHz Blu-ray Disc players and A/V receivers Consumer and cinema theatre surround formats DTS-HD Master Audio (DTS-HD MA) lossless (lossy if device doesn't support lossless) Up to 8 discrete audio channels 24 bits 192 kHz Blu-ray Disc players and A/V receivers Consumer surround formats HDMI (v1.3) Transmission/cable standard. Can transmit: up to 8 channels of uncompressed PCM audio Dolby Digital DTS Dolby TrueHD (uncompressed) DTS-HD Master Audio (uncompressed) Up to 8 channels of one-bit DSD audio Consumer surround formats HDMI (v1.3) Game consoles send directly uncompressed audio to A/V receiver. Typical setup with game consoles: Console -> HDMI -> A/V receiver -> 7 loudspeakers + subwoofer Dolby Atmos (2012) Object based. Up to 128 simultaneous independent audio objects Each object has associated spatial audio description metadata (pan automation). Each audio track can be assigned directly to a loudspeaker or to an audio "object." Dolby Atmos (2012) By default 10-channel 7.1.2 bed for ambience stems or center dialogue, leaving 118 tracks for objects. the x in a.b.x refers to ceiling loudspeakers. Dolby Atmos for Headphones Renders Dolby Atmos streams into 2-channels using binaural for headphones. Support: XBox or Windows PC with Dolby Atmos Access app ($14.99). No Playstation support. Dolby Atmos + Unreal Engine (PC / Xbox) source Windows Sonic (2017) Audio platform for integrated spatial sound on Windows and Xbox. Dolby Atmos runs on top of Windows Sonic on PC / XboxCan abstract audio objects from audio output format. The developer doesn't need to take care if user is using 5.1, 7.1 or headphones. Windows Sonic for Headphones (binaural rendering - free). Support for object based spatialization in game engines and audio middleware Dolby Atmos - Unity3D, Unreal Engine, FMOD, Wwise. Windows Sonic - FMOD Games using Dolby Atmos Shadow of the Tomb Raider Assassin's Creed Origins Gears of War 4 Overwatch Full list Games using Windows Sonic This article suggest the API being used is the same as for Dolby Atmos, so the same list as in the previous slide. Systems for VR audio Binaural rendering Possibly also complex room simulation, occlusion, etc. Systems for VR audio Steam Audio (Binaural, occlusion, physics-based reverb) Resonance Audio - Google VR (Binaural, reverb, occlusion, directivity) Oculus (Binaural, near-field Rendering, shoebox model reverb) Microsoft Project Acoustics RealSpace 3D Auro3D Recursos para estudo: Português: "Introdução à Engenharia de Som"; Nuno Fonseca; FCA; 2012 - Capítulo 7 - Espacialização e Surround Recursos para estudo: English: "Modern Recording Techniques"; David Miles Huber, Robert E. Runstein; Focal Press; 8th edition; 2013 - Chapter 18 - Surround Sound Digital Sound and Music - Chapter 7 - Audio Processing (free, online) Head-Related Transfer Functions and Virtual Auditory Display