Exploring the Depths of Diegesis: A Filmmaker’s Guide to the Different Types of Sound

Learn what diegetic and non-diegetic is and the rest of the virtual soundscape diegesis layers, and how to classify you sound sources depending on the narrative (or diegesis), to create vivid transitions and your own style of storytelling.

Table of Contents

In this article, I will try to present a way of organizing your project’s virtual soundscape, classifying your sound sources according to their relation with the narrative, or as it is also called, the diegesis.

This classification of your sound sources in the different layers of the virtual soundscape helps you:

  • Establish your style of audiovisual storytelling
  • Organize your project’s sound sources.
  • Differentiate them through different filters and reverb, to deliver your storytelling better.
  • Create vivid transitions between the various layers, to highlights key events in your story.

First some philosophy

Common terms that we usually find in articles about film sound, are “diegetic” and “non-diegetic”. Those are ways of separating the sound content, according to where it belongs when it comes to being part of the narrative or not.

The theory dates as back as 373 BC when Plato examined the style of poetry in the Book III of “Republic” and Aristotle argued his own classification in “Poetics”. Of course, we are not here to argue philosophically, although I am Greek and this is a classic stereotype, so I could act the part easily. 😛

We could also go full-on Wittgenstein on their tushies and start randomizing track names and mixing without listening to sound, just to make sure that our work is completely naked from human error, but this is for another article.

About organization

It doesn’t matter if you are making a game or a film. To create a good mix, one usually groups sources together to create sub-groups that are easily managed and faster to manipulate.

Even if someone doesn’t like creating groups and likes keeping a flat hierarchy (which might be the best option for small productions), in that case, each sound is actually a group.

Deep inside we like organizing, it allows us to keep things simple during fine-tuning and to create systems of logic that give our production our signature style. Design theory, based on cognitive science, argues that unorganized content that seems right for some reason still follows some basic rules of organization we have embedded in our brain or the deliberate break of those rules as an artistic expression of the self.

Basic terms

First let’s examine some special terms, “virtual soundscape” and “diegesis”.


If you are not familiar with what soundscape is, you can read the article about soundscapes on Wikipedia. Of course, the term “soundscape” has many uses, according to different industry fields or sciences. At SoundFellas we are using the term when we are talking about the complete auditory picture that the listener receives, or as Pauline Oliveros states “All of the waveforms faithfully transmitted to our audio cortex by the ear and its mechanisms”. In simple words, we use the term to describe everything that is received by the listener’s ear.

According to the definition above, when we are talking about the sound that your film or game is making, we are talking only about one part of the soundscape, as the complete soundscape also includes the acoustic environment of the listener’s living room together with the sound coming in from outside of his/her house.

Virtual soundscape

To remedy this grouping, we propose the use of the term “virtual soundscape”, to describe the sounds that are coming only from your film or game. So the soundscape coming from experiencing a movie in your living room is a combination of the virtual soundscape produced from the media you are consuming, and the real soundscape produced by the real environment that you are actually present. Juxtaposing, layering, or interpolating between those 2 soundscapes, is of great importance in the field of augmented reality and a core part of the audio narrative of the entertainment layer of the Magicverse as it is proposed by Magic Leap.


“Diegesis” is a style of fiction storytelling that presents an interior view of a world and a very cool way to say “narrative” actually. As in films and games, we always have “narrators” at least in the form of the camera, that shows us the world from a specific and guided point of view (deterministic in films, stochastic in games).

This is the level that we will use as a pivot to classify our sound elements. You can read more in the article about diegesis on Wikipedia.
In the next paragraphs, we are going to present the classic “diegetic” and “non-diegetic” layers, but also expand on some more specific layers.

Organizing your sounds according to the soundscape layers and your narrative is an important part of creating your own style and giving your audiences the best sound experience possible.

Figuring out the way you’re going to organize your project’s sound sources, from the perspective of the narrative (or diegesis), will give you the knowledge of which sounds you should gather from the set and how you should gather them. You can also make an early list, based on this knowledge, of the sounds that you will need to find from stock libraries.

Virtual soundscape layers


Diegetic sounds are the sounds that come from within the scene that the story’s action is happening. The term is from the Greek word “diegesis” which means “narration” or “narrative”. Those sounds include also sources that don’t appear on screen, but the audience understands them as coming from the world of the story.


  • Character dialog.
  • Footsteps of the characters.
  • Sounds from the environment that our characters are presented in.

Aliases: Intra-diegetic.


Any sound that originates outside of the story’s scene. It’s easy to understand this category if you think that those are the sounds that the story’s character are unable to hear.


  • Narrator’s voice.
  • Film score.
  • God’s voice.

Aliases: non-literal, commentary.


This category describes any sound that moves in any direction between the diegetic and non-diegetic layers. Those sounds are of great assistance when we need a cool way to link some transition, like the transition between two different scenes.


  • A character whistles a tune (diegetic) and gradually that tune is played from an orchestra (non-diegetic) as part of the music score of the movie.
  • A narrator is heard reading a poem (non-diegetic) at the beginning of the movie, then the sound of the voice fades away and a song is heard with the same lyrics from a band that slowly fades into a bar that the band is playing (diegetic).


Acousmatic sounds are the sounds that are heard without any cause presented in the story’s world. Wikipedia also says that “the word acousmatic, from the French acousmatique, is derived from the Greek word akousmatikoi (ακουσματικοί), which referred to probationary pupils of the philosopher Pythagoras who were required to sit in absolute silence while they listened to him deliver his lecture from behind a veil or screen to make them better concentrate on his teachings”.

At SoundFellas we use this term as described by the French writer and composer Michel Chion in reference to the use of off-screen sound in film. Of course, we use the term to also describe the same set of sounds in games and other media. Primarily describing the sounds that are coming either from the speakers located next to the visual display(s), are not linked with any visible cause from the story’s world, and are not out of the story’s world.

Visual-to-acousmatic and acousmatic-to-visual

Those are two very useful definitions that we use at SoundFellas and come directly from Michel Chion, the terms are our own but you can read about the source of the theory in the article regarding the acousmatic sound at Wikipedia. According to Michel Chion: “the acousmatic situation can arise in two different ways: the source of a sound is seen first and is then “acousmatized”, or the sound is initially acousmatic with the source being revealed subsequently. The first scenario allows the association of a sound with a specific image from the outset, Chion calls this visualized sound (what Schaeffer referred to as direct sound). In this case, it becomes an “embodied” sound, “identified with an image, demythologized, classified”. In the second instance, the sound source remains veiled for some time, to heighten tension, and is only later revealed, a dramatic feature that is commonly used in mystery and suspense based cinema; this has the effect of “de-acousmatizing” the initially hidden source of the sound (Chion 1994, 72). Chion states that “the opposition between visualized and acousmatic provides a basis for the fundamental audiovisual notion of offscreen space” (Chion 1994, 73)”.


  • Acousmatic: You see a character’s face in front of a wall background and nothing else. You hear cars passing by and people’s locomotion and chatter around. You never see a busy town setting but you assume that the scene is located in one.
  • Visual-to-acousmatic: A car is shown starting its engine and moving away from the picture, the sound continues even after the car has left the screen, mixed more and more with the background and reverberation to denote that the car is moving away from the current location.
  • Acousmatic-to-visual: Our character is searching for a strange vocal sound heard in the dark night holding only a candle as a source of light, as the house lights don’t seem to turn on for some strange reason. The strange vocalizations seem to come from all over the place. Slowly the vocals are getting clearer and a word is starting to form. The candle suddenly burns bright and a little girl appears in front of our character’s eyes saying “help me” with the same voice we heard all around. The little girl then disappears as all the house lights burn bright and their bulbs break, the candle suddenly extinguishes in sync with the lights, dipping the scene to complete darkness. I’m good at this! 😀


This is actually a sub-categorization of the “diegetic” type.

We encounter this type very often in video games, but can also be used in films, especially in comedies that break the fourth wall. It describes the sounds that are part of the visual elements that are part of the game’s world but the game characters are not aware of them. All the sounds that come from the usual GUI elements of a game are considered extra-diegetic.


  • Score number counting and tier-passed achievement cues.
  • Health status bar filling or critical alerts.
  • Inventory list changes or alerts like weapons running out of bullets.


As we discussed above, the diegetic type can be categorized further into 2 different types. The first is the extra-diegetic and that leaves room for the second type, which is the intra-diegetic. This is actually what we simply call the diegetic in films, which usually don’t include extra visual interface elements that the characters of the story are not aware of, but are part of the action. That kind of meta-data presentation is not used in films. So if you are not making games or some kind of gamified film production, this term is probably not of much interest to you. If you are making games, you can just use the term interchangeably with “diegetic”.

A special mention must be done here, in GUI designs in games that are intra-diegetic. For example – as it is written in Wikipedia – in the video game series “Dead Space”, the player-character is equipped with an advanced survival suit that projects holographic images to the character within the game’s rendering engine, that also serves as the game’s user interface to the player to show weapon selection, inventory management, and special actions that can be taken. That creates a very nice link between the player’s world and the gamer’s world and can be also found in video games featuring vehicle simulations.


  • See the examples of the “diegetic” layer at the beginning of the article.

Aliases: Diegetic.


This is more of a philosophical term and it is used when a narrative happens inside another narrative. The term is not used often but can be useful in special cases.


  • A character from our movie is regularly watching a series on her television, the show she is watching features a very specific musical cue that happens outside of the series world but is a part of the series experience that our character is watching. This can be characterized as a “meta-diegetic – non-diegetic music cue”. Crazy right?


So, what are the advantages of using those layers to organize your project’s sound sources? Here are some good reasons and of course I expect that your creative nature will invent even more.

  • Grouping your sound sources according to their relationship with the narrative action is a great way to introduce new ways of telling your story through sound.
    Interesting transitions from one layer to another can be used together with the picture transitions to highlight key points in your story.
  • Grouping can also be made with the use of different reverberation and sound positioning in the soundscape, to create a clear distinction between the sound sources within the story’s world and those outside of it.
  • Figuring out the layering that will best serve your storytelling, early in your production, can make it easy to communicate with all your team and external stakeholders, vital information regarding the sound style of your project.
  • Organizing your sounds according to the soundscape layers and your narrative is an important part of creating your own style and giving your audiences the best sound experience possible.

So, what are you waiting for? From diegetic to acousmatic and all the layers in between, start designing the choreography and levels of your own project, and give your audiences the best sound experience possible.

At SoundFellas we can help you with that. Apart from hiring us to design and create the audio of your project, you will also find that our products are specialized for contemporary production, because:

  • Our Ambience Kits are broken down into loops, isolated sounds, and noiseprints and contain stereo, surround, and 3D binaural formats to match your preference.
  • Our Sound Effect libraries contain many variations for each type of sound and are organized in an intuitive way to help you incorporate them fat in your authoring environment.
  • Our Production Music libraries are already mastered to leave space for the rest of the sounds, like sound fx, dialog, and ambience, and are offered in stereo, surround, and 3D binaural so you can have truly immersive music in your projects.

By organizing your audio material using the taxonomy proposed in his article, you can keep value-adding consistency in your work, but also, very importantly, establish your own audiovisual storytelling style.

We Share Serious Knowledge And Critical News Only

* We don’t spam and you can unsubscribe at any time. By subscribing you agree to our terms of use and privacy policy.