June 1, 2007

Another Take on Blind Listening Tests

Unless you’ve spent the last 20 years listening to your headphones with the lights out, you’re aware that there’s a huge controversy over the use of blind listening by reviewers in determining whether one piece of audio gear sounds better than another.

In some areas of the audio kingdom, such as product development, blind testing rules. Two years ago, for example, Wilson Audio Specialties ran an advertisement in an audiophile magazine with the header "Blind Test." The picture showed one of their mechanical engineers setting up a blind listening test for "three seemingly identical WATTs" with cabinets constructed from different materials. All three speakers were painted the same to eliminate the listeners’ expectations of each material so they could make an unbiased, blinded assessment of the sound from each cabinet.

On the other hand, I don’t know of any audio reviewers, including myself, who use blind testing exclusively, or even frequently, in their listening sessions. Are we wrong? Ignorant? Lazy? Skeptics are right to be suspicious. Are we kowtowing to commercial interests and using review techniques that allow us to delude ourselves into hearing and reporting differences that don’t actually exist? Or is it just that, like any tool, blind listening is useful for some purposes and not for others?

Strangely enough, some reviewers are convinced that blind testing mysteriously makes actual sonic differences disappear. Forgive me for pulling rank on this one (I’m a psychologist by profession): It doesn’t. A host of psychological factors can do it, though. Among the suspects are stress, the expectation that the differences heard will disappear, performance anxiety or fear (imagine doing blind testing in a group of self-conscious audiophiles trying not to look na´ve in front of their peers), physical fatigue, lack of sleep, excessive repetitive listening, alcohol, distractions internal and external, lack of listening skill, lack of experience with the blind-testing process, and, last but not least, a "challenge" in which the blind tester is asked to put up or shut up. Except for an innate inability to hear differences at all, those factors are potentially controllable and reversible. So if you want to do blind testing of audio gear, relax, practice, and be persistent. The audible differences are still there; you just have to learn how to hear them.

That said, although blind listening tests work if you do them right, whether or not blind testing is a good idea depends on what you’re trying to find out. Again, a blind listening test is a tool. Like any tool, it’s more useful for some things than others. In my experience, given the fragile and brief nature of auditory memory, blind testing is best at eliminating bias in relatively brief A/B comparisons in which you’re listening for particular, well-defined characteristics of the sound. If you decide that a particular aspect or quality of the sound you hear is important to you (for me, the level of detail and the natural quality of acoustic timbres are at the top of the list), blinded, brief A/B comparisons with your reference equipment are a good way, maybe the best way, to determine whether that characteristic is present.

A good way, but not the only way. There are subjective, unblinded ways to reduce bias as well. For instance, developing broad-based experience by testing many products is invaluable. When you’re no longer blown away by price, a good story, or other product features that might tilt your judgment one way or another, you have a better chance of actually hearing what the product sounds like. Deliberately demythologizing the gear you’re listening to and cultivating a "call ’em as you see ’em" approach can also go a long way toward minimizing bias. This attitude is particularly useful when you use a variety of exhaustively familiar music selections in your brief unblinded comparisons that allow you to more easily recognize variations in what you hear. In addition, if you can practice imagining yourself taking the editorial positions that are most difficult for you (for instance, imagining that a prestigious company may have made a product that performs poorly), you’re more likely to be able to report what you actually hear.

These types of subjective strategies, I believe, are what most responsible reviewers perform in lieu of trying to deal with the procedural and personnel hassles of blind testing. By subjectively minimizing bias when making A/B comparisons, we have a better chance of understanding particular aspects of the sound we hear with a particular product, without the hassle. Would blind listening accomplish this? Most of the time, it probably would. But blind listening tests are hard to do well in real life -- much harder than you can imagine if you’ve never tried it, in part because you need to involve a knowledgeable and skilled person to help make the changes in your gear, as well as train yourself to deal with the psychological issues I mentioned earlier.

The other reason that audio reviewers aren’t totally enamored of blind listening tests is that brief A/B comparisons are only part of the process of evaluating audio gear. In doing our best to "get it right," most serious reviewers and audiophiles not only listen to particular details in the sound of gear that can be subjectively or objectively compared, but try to evaluate the subjective impact of the equipment over time. Is it easy to live with, day in and day out? Do I find myself avoiding certain types of music and seeking out others? Does music still excite me after I’ve been listening to the system for three months? Am I compelled to take out a third mortgage and buy the thing?

When the purpose of the evaluation is to understand the long-term experience of listening to a piece of audio gear, brief A/B comparisons of any sort are the wrong tool for the job. A reviewer listening to the sound of equipment over time is listening for something fundamentally different from the simple characteristics of the sound. In this case, it’s not about the presence or absence of bias, but about what is being communicated in a deep way.

Think about what a music teacher listens for. Teaching someone to play an instrument begins with a focus on the basic skills needed to make the instrument sound the way it’s supposed to sound. The teacher makes simple comparisons in his or her mind between what the student sounds like and what the student is supposed to sound like, and gives advice on technique (hold your elbow higher, tighten your embouchure) to move the student toward the technically desirable sound. This is analogous to making simple A/B comparisons: the teacher is attending to certain details that are essential to the quality of sound produced. But this is just the beginning.

The next step comes when the teacher sits back and says, "Let’s hear you play the music." What the teacher is listening for now is something far more meaningful than evidence of correct technique: an understanding of a musical phrase, a meaningful connection between phrases, a realization of the passion intended by the composer, a personal imprint that makes the music the musician’s own. It is a process of listening beyond the details for how the music touches us. What music communicates, how it impacts us, is not something that can be evaluated by the quality of a single note. Nor can it be tested by brief A/B comparisons, blinded or not.

Long-term listening is essential for coming to understand how a piece of equipment influences the music expressed through it. But (you may ask) isn’t music just music? How can audio equipment affect how music is expressed? Well, unless you’ve heard this for yourself, it’s hard to imagine how that might be -- but it is a key part of the experience of listening to different audio equipment. In a way analogous to how a musician might play with more or less emotion, dynamic emphasis, or clarity, audio equipment can flatten, make less clear, change the timbre, subtly alter the timing, and do any number of things to distort the musical expression that a musician might have worked a lifetime to be able to convey.

What’s at stake here is the whole impact of the sound on the listener. Will we be engaged, enthralled, exhilarated by what we hear -- or find that, after a few minutes of listening, we’re wandering off to do something more important, like clean out the garage? Even though a particular piece of audio equipment may make a strong first impression in the first few moments we hear it, it takes time to understand deeply how it influences music. It takes even more time to develop one’s own personal taste in these matters, and begin to distinguish the equipment that presents music the way we love it from the equipment that doesn’t.

Here’s how important personal taste is. A few months after their blind-listening advertisement, Wilson Audio ran another advertisement, headed "Of Tweeters and Truth," to discuss the process they went through in deciding on a tweeter for their WATT/Puppy 8 system. The question was whether or not they should switch, as some of their competitors have, to diamond or beryllium tweeters in their loudspeakers. They tested "a lot of tweeters" (as the ad put it), presumably using a combination of bench testing to generate objective data and blind listening to hear the tweeters with as little bias as possible. Their decision was that their own tweeter from the MAXX Series 2 delivered the least grain and distortion, and would thus be used in the WATT/Puppy 8.

Was Wilson’s own tweeter the most linear of those they tested? The ad implies that perhaps it was not. Was it the most hyperdetailed? Let me go out on a limb and say, given the measured performance of the best high-end ribbon tweeters, possibly not. Wilson avoided making a narrow decision based on a few particular sonic characteristics that some purchasers might notice in a brief A/B comparison. Instead, they focused on the qualities of the overall sound that they wanted to hear from their speaker -- in other words, on the sound they felt their customers would most enjoy, and most want to live with over time.

What Wilson Audio did was make a product-development decision based on personal taste -- a highly evolved sensibility based on exact knowledge of specific characteristics obtained through unbiased A/B comparisons and analyses, combined with the company’s thoughtful awareness of their own philosophy and the listening preferences of the audiophiles who buy their products.

That combination of unbiased information and judgment based on personal taste and broad experience is what I believe a good audio review should offer as well. Using blind listening tests and other strategies to minimize bias when comparing simple characteristics of the sound is only part of the process. When you evaluate how a particular audio component presents what you care most about in music, you need to listen not just for what you hear, but for what touches you in a deep and lasting way.

...Albert Bellg

