A perceived loudness normalization pipeline that mimics nature, specialized for audio assets after the pre-production and before the production stage, perfected for media authoring.
Loudness normalization is one of the most important technical considerations for delivering content.
In the stage of production the standards are still archaic, keeping back any producer that wants to combine sounds from many sources.
AuthorNorm™ features a logic that mimics nature, taxonomic principles from acoustic ecology, and nature-to-machine dynamic range mapping with one mission, to provide audio assets that work right from the start, needing only subtractive mixing for tweaking to taste. This is fastest and safest way to work with sound.
By using audio assets conformed with AuthorNorm™, you get:
While normalizing an audio sample seems an easy task, there’s more than meats the eye.
If you normalize in peak levels your final samples will not be perceived equal in loudness because of how human hearing works.
And when you normalize based in perceived loudness, the peak levels might be too loud and get clipped producing non-harmonic distortion (the bad kind that you don’t want).
Furthermore, there are specialized perceived loudness normalization standards for the stage of production, and algorithms made for finalized deliverables are not tuned for that kind of content. The result will be short duration samples that cannot be measured will either remain unprocessed or clipped, and long duration samples will be overcompensated and sound quieter than the rest.
In both visuals and audio what you want from proper normalization is to be balanced for how our senses perceive the world around us. Here we see both a waveform and landscape picture.
In the waveform we observe healthy peaks preserving all details of the highest and lowest levels of sound vibration. The peaks are not at their highest and there is still headroom because the levels are based on human perception and not just sample measurements.
At the landscape picture below the waveform we observe the same characteristics. The levels are balanced for human visual perception and we can see all the details from the grass new the rock shadows to the distant clouds in the sky.
Those assets are good for production and can be used in any composition and manipulated further to achieve the intended aesthetic.
On the same example, using normalization that is not properly tuned for production assets, we can observe what happens to both audio and visuals.
The sound samples are clipped because the perceived loudness normalization that was used is not made for isolated short duration oneshots. Many details on the upper levels of the signal are lost and cannot be recovered, producing digital clipping. This might sound impactful when you search for sounds and hearing demos, but using them in your production is very wrong.
At the landscape picture we see very vivid colors at the first glance, but if you observe all details near the stone shadows and the clouds on the distant sky are lost. There are also unnatural color alterations like the excessive blue color of the stone nearby.
Those assets are bad for production and introduce multiple issues when put in a composition, while their destructive styling forbids any further aesthetic tuning to fit your project.
After we developed AuthorNorm™ we conducted an online experiment in the form of a double-blind listening survey. In that survey we asked for the subjects to listen to a playlist with six tracks. Each track contained a series of sound samples spawning the full frequency spectrum of human hearing, processed with a different normalization algorithm for each track. We used the most common normalization algorithm that sound library vendors use, and we included the original unprocessed track and the track processed with our own AuthorNorm™.
After a free listening session with no time restrictions, the subjects should vote which track featured the most equal loudness throughout the frequency spectrum. Here are the track list we used for the experiment:
The results confirmed our hypothesis, that a specialized perceived normalization algorithm specialized for assets used in the stage of authoring media, would provide better results than any other non-specialized algorithm currently available.
Take a look yourself:
As you see above, AuthorNorm™ gathered the more votes for being perceived as the most equal loudness across all the frequency spectrum of human hearing. In psychoacoustics a result more than 50% in those kinds of experiments is considered a valid result and by achieving 54.2% we validated our hypothesis.
For the scientists out there, our research is under internal revision to submit for peer review, but the results are so beneficial to our production that we already use AuthorNorm™ to normalize our sound libraries, and as a service on mastering for our clients, with very positive results.
To remain true to the way humans perceive loudness we needed to fine-tune for different types of timbre, so we conducted further listening experiments to tweak our algorithm to perfection for any type of sound sample.
After using more sound effects and musical instrument tones, we achieved a normalization technique that can be used in a variety of timbres, yielding a statistical difference of less than one Decibel throughout the complete frequency spectrum of human hearing.
One Decibel is the minimum amount of energy difference that humans need to perceive a difference between two levels of loudness. Keeping the statistical trendline under 1 dB means that AuthorNorm™ can normalize the perceived loudness of any kind of sound effect or musical sample in the most compatible way for our human hearing.
Se a comparison of the trendlines of all the algorithms that we used for our experiments below:
In the diagram above you can see that the AuthorNorm™ trendline never goes above 1 dB which statistically means that it produces a stable output with no perceived inequalities across the human hearing spectrum.
While the loudness contour RMS seems to perform very well in the lower frequency range, after the middle frequencies it starts raising fast and the differences become audible (more than 1 dB perceived difference) from the mid-high range and above.
The legacy loudness and the total RMS algorithms have similar performance with the only difference that total RMS performs somehow better for production assets in the low range, as it takes into account the RMS measurement of all the file and not just one time window at a time. Nevertheless, in our listening experiments both algorithms showed perceived differences between samples in the low-mid to low and sub ranges, which was expected as RMS uses a simple calculation of energy and doesn’t take into account how human perception works.
Finally, the ITU-R BS.1770, a popular choice among sound library developers and sampled instrument makers, showed inconsistencies throughout the complete frequency spectrum as you can se from the diagram above. That is not to say that the algorithm is bad, on the contrary the algorithm is one of the best algorithms we have for measuring perceived loudness. The problem is that this algorithm is created for use with long duration final material for broadcast and when it’s used for short to medium samples like the ones found in sound libraries and sampled instruments, it fails to provide a consistent result. The use from production asset developers and sound designers can only be attributed to the popularity the algorithm has gained over the years due to over-promotion from the industry.
Below you can see a one-on-one comparison of AuthorNorm™ with the rest of the algorithms used in our experiments.
AuthorNorm™ uses a different pipeline for music. Because music is different from the sound effects, voices, and ambience noise, it needs a different approach.
By using a custom loudness curve together with a time window tailored for any tempo and material, we achieve a consistent loudness between different genres of music, which is important as many videogames and films use a variety of music styles to complement their storytelling.
Furthermore the music tracks are loud enough to support a wide range of dynamics, from ambient and classical, to heavy metal and EDM.
We also offer our AuthorNorm™ loudness mastering as a service, starting from $15 USD for each track. Our process can be applied to any style of music in any immersive multichannel format, and ensure that you song will sound loud and energetic trough any platform.
One of the most interesting parts of our investigation in loudness normalization was the approach that we had to take in normalizing ambience soundscapes and noise prints (also known as roomtones).
While we were analyzing hundreds of soundscape recordings and their extracted noise prints, from our Ambience Kits production line, we observed that with the most noisy material, changing the level changes the character so much that a listener might confuse one roomtone for another, which is perfectly natural.
The ambience noise of a quiet forest can be very similar to the power supply unit of industrial machinery, just by adding 20 dB of gain. That, of course, can go either way. A recording of a laboratory’s cryogenics room can be misunderstood for an everyday living-room, if you subtract 24 dB of gain.
To make a long story short, we decided that for this part of our production we needed to make AuthorNorm™ a human-assisted tool.
So we created a map of the real sound level of various events and environments and mapped those values with values that we took from analyzing the loudness of the same events isolated from hundreds of hit movies and videogames.
Natural sound levels and natural selection of which media the human ear prefers and enjoys as experiences, analyzed through a semi-automated human-assisted process, have as a precise match of each type of environment for its equivalent virtual representation in loudness units. See the image below to get the high level idea of how our mapping works.
The actual internal map that we use is far more complicated and also includes levels above safe hearing and how to simulate it in the virtual environment.
We use that map together with our dataset to normalize the loudness of the soundscapes and noise prints from our Ambience Kits, so that every loop you put in your project will sound natural directly form the beginning. Of course you can tweak to taste, but having a good starting point is always good.
The comments from beta testing team of filmmakers and game producers are very positive, mentioning many times that they didn’t even need to change the level of the soundscapes or noise prints when they composed the acoustic environments of their scenes.
All our audio libraries are normalized using AuthorNorm™, conforming them in the perfect loudness level for media production environments. You get audio assets that sound natural right from the start, making it easier for you to compose your project.
By adhering to the practices of a modern media production workflow, you can rely on subtractive mixing which is the most safe and fast way to mix audio for both linear and interactive media.
Game producers, filmmakers, virtual reality experience creators and all other media producers, can spend time on creating beautiful things and not trying to guess levels anymore.
Any audio asset you choose is mastered with AuthorNorm™, and if you already have found all the sounds for your project we can help you achieve better sound and an easy mix by mastering them for you, before you start compositing your scenes. Our mastering service that includes the AuthorNorm™ process starts from $15 USD per song or soundscape loop, and $15 USD per 50 sound effects. The prices are even better for more assets, so to get specific pricing please contact us directly with your needs.