The term cocktail party effect was coined by a British Cognitive scientist Colin Cherry, in the 1950s. He was interested in understanding how people listened, by conducting a few experiments. In his first experiment, he played two different overlapped messages recorded in the voice of the same person, through headphones. The participants were asked to listen carefully and try to write one of the messages on paper. If they put in enough concentration, the participants usually succeeded.
Now, if someone asks you to describe the cocktail party effect. The formal Cocktail Party effect definition is as follows:
Cocktail Party Effect Definition:
The cocktail party effect is the phenomenon of being able to focus one’s auditory attention on a particular stimulus while filtering out a range of other stimuli, much the same way that a partygoer can focus on a single conversation in a noisy room.
Cocktail Effect Psychology
Imagine yourself at a party with tens of people around trying to talk to each other. There are a number of overlapped voices talking, the music playing, drink glasses clinking and what not. Among that cacophony of sounds is a friend speaking in front of you, not much louder than the background noise itself. You can still make out what he is saying.
There’s something about the human speech, the auditory system and the high-level language processing system that enables you to conjure up a highly selective attention towards your friend, letting you to listen to him talk, as if muting everything in the background. It happens so naturally and in such a subtle manner that you might not even appreciate the presence of any out-of-the-world processing your brain is doing to make you understand your friend’s speech at such events.
This effect, known as the cocktail party effect has been known for long and the exact mechanics of how the human brain manages to deal with it has baffled scientists for several years. However years of contemplation and the rise in computing power has enabled some amazing breakthroughs in this area. Like say, take this experiment for example.
How does cocktail party effect work for computers?
Let’s say a cocktail party where you and another person are taking at the same time, has two microphones kept at a certain distance from each other. Both microphones will record both your voices. To just listen to one voice, at least to make a computer do that may sound like an extremely tough job to do. But here’s the thing. One microphone, which is closer to you records your voice slightly louder and slightly fainter in the other microphone. If both these recordings are made to go through a single very intelligent line of code, the code can almost very clearly output two files with your clean voice in one file and the other person’s in the second file. This single line of code is the Cocktail party algorithm, its generic name being – Independent Component Analysis (ICA). ICA is a special case of something called the Blind Source Separation (BSS) or Blind Signal Separation. It involves a high level of linear algebra and uses something called the Singular value decomposition.
Watch a demonstration playing the microphone inputs and the output from the code. A Cocktail party effect example below:
This demonstration was a part of the machine learning beginner’s course I was talking about yesterday.
Now this exact same thing can also be extended to more number of sources and all of them can be separated too. But from what I understand, it would require n number of microphones to separate n number of voices (Yes, that’s correct). Here is a demonstration with three microphones and three mixed voices.
You can try it out yourself an extended version of a simple ICA here. (Link)
Other versions of the Cocktail Party Effect
A similar approach can be used on pictures too. Like, if you have ever tried to take a beautiful sunset picture from the hotel window, or a picture of a nice dress inside the glass showcase of a shop, and have ended up with a picture containing annoying reflections, you must consider this. The paper from MIT which describes this method in detail is linked here. And a pictorial example taken from the paper is shown below.
See how the input is taken from the left image. Actually the input was two such images. One with increased reflection and one with decreased, just like the audio recordings. Accomplished in the case of image capture, by using polarized filter. That is, just the part where the painting is displayed. The output gives out two images, one of the reflection and of the underlying painting.
In a similar application, it can be used to remove noise from pictures, or from an audio recording. Or in a very different application, a similar approach can be used to detect hidden factors in financial data.
Featured image credit: Mark Probst, Flickr (Link)