The Schism Around Voice: Multicasting vs. Broadcasting

Gwyn covers her mouthImagine that you would have an awesome technology that allowed you to create an universe you have just pictured in your mind, to the extent of detail you wish, and that you could get realistic characters walking around your universe, so perfect in its minutiae that their behaviour and looks would be completely impossible to distinguish from real human beings. Now imagine if that technology were available to everybody in the world and that anyone, anywhere could have access to it.

Actually, that technology does, indeed, exist. It’s even quite old, having first been developed over 6,000 years ago. It’s called a book.

Nevertheless, a “book” is not perfect — there is an author, and there is an audience. The author can do with their book whatever it wishes; but you, the reader, cannot. All you can do is read and imagine with the author, but not contribute to the book.

Enter the Internet, and its many social environments: from the old bulletin boards, through FidoNet, later the USENET, finally to IRC, and to webchats, we come to things like, well, Second Life. Here a new paradigm has emerged: the notion of a collaborative environment, where readers and authors alternate roles, and both contribute, at the same time, to a collective work. Early analysts of the “Internet revolution” have touted this as the primal change in the way we think about the ancient roles of author-editor-publisher-audience; the old “broadcasting” paradigm (one sender, many receivers) has been replaced, on the Web, by a new model: multicasting (all are senders and receivers) and the notion of collaborative environments, where all are readers and authors at the same time.

From the roots of that paradigm, buzzwords like “Web 2.0” have emerged: social, interactive environments, where people collaborate to provide content. So far, these were mostly assynchronous — in the sense that on a blog, a forum, or any of the social sites like MySpace or Flickr, the author(s) publish something (a text, a picture, a movie), and others add to it afterwards (with comments, or, in the case of the Wikipedia, with additional articles). It’s not done in real time, but there are delays; still, those delays are very tiny when compared to an author that writes a book and then receives fan/hate mail which might influence a future revision of their own work.

Real-time interactivity comes, however, with chatrooms — from IRC to webchat. Here the issue is slightly different: you don’t have persistence of context. An IRC session can take hours, with hundreds of participants, but after it finishes, the text “disappears” (obviously, you can grab a copy and read it later; but IRC, as a medium, is not persistent on its own). Thus, although it’s definitely real-time, it lacks the persistence effect of something like, say, Wikipedia — which grows and grows as more people interact with it.

There is, to a degree, a system that allows both — real-time interaction in a “multicasting” model, with many authors and readers who interchange roles — and persistence of content. The best known example are Richard Bartle’s MUDs and all its many successors. In that environment, people build rooms — snippets of text that, like a book, appeal to the users’ imagination to evoke images in their mind when they read what others have placed as a description of the room. The rooms are also interactive — they are not “merely” nice, textual descriptions, but they allow, through commands and other special options, agents in the environment to change the rooms. Whether it’s a “game” (in the sense of a role-playing game with missions, quests, and goals), or simply “interactive art” (a narrative that unfolds, depending on the users’ choice), the issue is the same: MUDs allow collaboration, interaction, multicasting, and they have persistence of content.

Second Life and the Collaborative Environment

Now enter Second Life. Bartle and others tend to scorn the “visual” aspect of Second Life, in the sense that it “distracts” the user — in the same sense that a movie based on a book will short-circuit the viewer’s imagination, by presenting a “ready-made” universe that is “consumed” without interaction. If I say that my hair is red, people will imagine different shades of red, and probably some will imagine my hair is curly, or perhaps spiky, or clipped short. In Second Life, however, they’ll get a 3D representation of my red hair — and there will be no room for “imagining” how it looks like. An image, being more worth than a thousand words, will “spare” us a thousand words of patient description of each flexiprim in my hair — but it will also constrain how people imagine that my hair looks like.

This has been an old argument about 3D virtual worlds, in the way they stifle the users’ imagination, by presenting a “canned” visual environment. The issue, of course, does not apply to Second Life, for a very good reason: everybody can change the enviornment. So, although one might agree that a “ready-made” 3D environment that does not allow much change (which is the case of the majority of virtual world these days — a company produces the highly detailed environment, and the users are simple consumers of that content), Second Life, however, is very different. When “red hair” is described, it can apply to any different conception of what red hair looks like; you can build it the way you wish; if someone disagrees and says “that’s not how red hair looks like”, that someone can create their own representation of red hair. Pure imagination is replaced by the absoluteness of redefining the whole environment: that is, with an adequate set of tools (the in-world building and scripting tools) anyone can precisely pin down their own representation of the environment, and share it with everybody else.

But that’s not all — in Second Life you can also do more: change (or, better, contribute) existing content. So, back to the red hair example, you aren’t limited to saying “I can do better red hair than you”; but you can, effectively, change someone else’s red hair — interactively, and in real time.

What has all the above to do with “voice”? 🙂

Multicasting vs. Broadcasting in a Collaborative Environment

Most people take technology and “the way things are” for granted. When you turn a TV on, you expect it to display an image after a second or less, and that it shows you a movie that “someone” has prepared (the broadcasting station and their producers), and that it will be shown in decent quality with 25 frames per second (or more). You don’t “expect” to be able to talk to someone in a studio while watching the news report. You don’t “expect” to interactively change the patterns of the clouds on the weather report. You don’t expect that suddenly someone else pops in and starts broadcasting their own popular show in the middle of an opera concert. We assume that TV works like it does, because we’re used to it: it’s a broadcasting medium, not a multicasting one, and it has worked well for over 70 years or so that way. Naturally, you might have post-modern art trying to break the rules (“conceptual television”) — or a degree of interactivity on some live events where people can call a phone number and talk to the hosts — but the nature of TV as a broadcasting medium hasn’t changed much. Its biggest change was perhaps the VCR, allowing people to watch things out of order, ie., still being consumers of video, but not limited to a predefined sequence of events as established by the broadcasting company. But that’s about it. If you want an interactive video medium, you’re far better off with something like YouTube (which is not a real-time interactive medium, but at least it allows some feedback and change, through comments, and, most important, video comments).

Second Life has also a few assumptions, which have worked well to give us a collective opinion of what it actually is. It’s a “medium” as well — no question about it — and it’s real-time. It’s definitely collaborative: Linden Lab has added little content, and practically everything (except the sky, the colour of the sea, the Sun and the Moon, and the Earthlike gravity) can be changed by the users. But it’s not an assynchronous medium like, say, YouTube, or a blog or a forum: very similar to IRC or, better, a MUD, it allows synchronous, real-time change to occur all over the world. At the same time, almost 40,000 people are changing their environment around themselves, all at the same time, and this change is collaborative and real-time. You can watch things being changed in front of your eyes; and, although people might be physically apart — they work on the same virtual environment. Better still: you can build things on a different, virtual location — like someone building a tower on their private island — and then join it with a common build elsewhere (placing the content at its final destination). Buying some outfit at, say, Pixel Dolls, but then going to a club and displaying your new dress is the simplest form of “location disruption” — the notion that content does not need to stay at a single location, but can move across the grid to different locations. Better still: a single piece of content can be used simultaneously at different locations at the same time (ie. that same dress can be in use by several different avatars on different clubs).

The best example of a similar concept on the 2D Web is having, say, someone copying the content of a blog article, and simultaneously placing it on several other sites. In a sense, “feed aggregators” and “blog portals”, or simply placing links from MySpace to YouTube, are similar concepts for the Web 2.0 that replicate (perhaps a bit artificially) what we expect to be “normal” in Second Life.

The Environment Shapes Communication and Behaviour

We take all the above for granted: the notion that content is not stuck to locations; content is created dynamically and in real-time; content is (or at least can be) created collaboratively, synchronously, and without “delays”. We’re so used to it that if by some reason this changed, we would be as surprised as having the news report on TV be suddenly “taken over” by a pirate broadcaster who pops in the middle of the report and starts showing pictures of surfers in Hawaii.

Thus, when using Second Life (in the sense of being an active participant in a multicasting, real-time, collaborative medium), we adopt — naturally, one would say — a model of social interaction that is indeed appropriate for that environment. Since content can be at different places, we use IMs to talk to people who we cannot “see”. Since there is not a one-to-one relationship between location and content, we tend to view our friends and contacts as being “detached” from the location (or even from their appearance). They are “somewhere on the grid” and probably “wearing some outfit or other” and “being surrounded by some kind of content”. We don’t care; we know that an IM will always reach them, no matter what their location or appearance is.

Since we’re used to collaborative content in Second Life (and also, because of older paradigms like MUDs or even IRC), we also tend to look upon social happenings in SL as being highly interactive and collaborative as well. A discussion in Second Life can have 25-50 participants — all talking (writing!) at the same time. You don’t really need “moderators” like in RL — the dynamics of IRC work well in SL, and fit like a glove to the collaborative and multicasting nature of it. In an environment with multiple agents that are producers and consumers at the same time, a social communication form (a discussion) will adopt exactly the same criterium: it’ll be multicasting as well, everybody participating, and everybody reading at the same time — in real time. Thus, the nature of the environment shapes the social interactions. And while this is not surprising (the same happens in RL, obviously — we don’t shout in a library or walk around naked in a church), it defines an unique quality of Second Life. While many might find the way social interactivity occurs in SL a bit strange at the beginning, they get used to it very quickly. No wonder; all the content in the world is generated the same way — why should social interactions be different?

This obviously isn’t restricted to discussion events — which are not very popular anyway. In fact, you can see the same behaviour on all social events in SL. When you go to a club, unlike in real life, the focus is mostly “talking to other people” — flirting, telling jokes, communicating — while, of course, listening to music (and commenting on it). RL clubs, however, obviously also offer an environment, but mostly a music, and a place for physical self-expression (dancing). What they don’t offer is a place for communication — the music is too loud for it. A jazz club might allow small groups of people at a table to whisper among themselves, but the communication will be mostly limited to single tables and small groups. There are no environments in RL where, say, a hundred people can be in the same place, listening to loud music, and talk among themselves at the same time. RL does not allow that; but SL does! More than that — besides a public chat, where everybody can contribute and read at the same time — you can also flirt with half a dozen people, in private, at the same time, while still listening to the music and participating in the public chat.

(The old Thinker’s group once did a philosophical discussion in one of the then-popular clubs, while dancing and listening to the music. It works! It’s possible in SL. Now try to do the same in RL…)

We don’t find this kind of behaviour “strange” for SL. In fact, it follows from the whole multicasting environment: you expect, when going to a club in SL, to be able to be an active participator of the social happening. And this is done simply by contributing to the ongoing, real-time chat. A good event hoster will not be “simply a DJ”; rather, they will tell jokes in public chat, accept orders for the next music, flirt with the new arrivals, encourage them to tip the dancers, and talk about trivial things — all at the same time! (“Why, Jeanne, you have the most lovely dress tonight — everybody, give it up for Jeanne!” [the audience cheers, types some witty comments, or howls]).

So many similar examples exist. On “Who Wants to Be a Lindennaire?”, a popular contest named after a similar TV contest, the participants can talk all the time among themselves. Trivia contests are not different. How would any of those events be possible in RL, with the contestants talking all the time? However, in SL, those events attract a social audience, who, besides winning some Linden dollars, expect to be able to chat.

Even AOL Pointe, AOL’s virtual presence, uses the social interaction in SL as a selling point: in SL, people can, like in RL, watch videos “together”. But the difference is that you can talk about it, while you’re watching the video together. That’s a kind of experience that you simply don’t have in SL; and it makes sense to explore SL as a new medium to provide that experience.

Again, most people who are familiar with all the above — “it’s just the way SL works, there is nothing special about it” — simply overlook the following issue: all is going to change in June. Dramatically. For ever. And there is no going back 🙂

The Old Arguments Against Voice in SL

The usual naysayers about the introduction of voice in SL have one of three arguments. The easiest, and most well understood, is the issue about anonymity. I will not consider this aspect, since it has been discussed ad nauseam for ever and ever, and at some point, sheer number pressure will always finish off that argument: as SL grows exponentially, if 1-2 million people will suddenly need to make the option of leaving SL because of their unwillingness to disclose RL data (and voice is a “signature” which you can’t escape from easily), they will not make a difference. Sure, if they left SL now, this would mean that SL would lose one third or their users. But in June it’ll be only one sixth. In December, less than one tenth. Next year, it won’t even register on the statistics. By 2010, it will be a side note on the SL History Wiki: “Before June 2007, some people did not like to reveal their real life identities, and they had sadly to leave when voice was introduced; however, they were a tiny minority that never really affected the growth of Second Life”. Exponential growth kills the voice of minorities (pun intended!) and is ruthless: a million today is nothing tomorrow.

So, overall, SL will not “disappear” and the sky is not going to fall down if a tiny minority of a couple million users leave SL overnight. A few, of course, will survive on their ghettos and still have some fun; they will be freaks, outcasts, but still enjoy SL, of course, in their tiny text-based communities. They will be ignored and disregarded, and nobody will have any interest left in them; in the mean time, the mainstream will go on, enjoying a renewed growth from all the users that never came to SL because it didn’t support in-world voice chat. Overall, SL will be a “bigger” place — and, since the new requirements for joining SL will be revealing your RL identity since day one, the hordes of dozens of millions coming in after June 2007 will never understand why “anonymity” was such an issue in the past. It will be something to talk about as part of a SL history trivia event, but nothing more. SL, after June 2007, will be the community of augmentists that have, once and for all, thrown out those pesky immersionists — and when the augmentists are the majority (if not in June, then certainly in December), the issue about anonymity will, once and for all, be buried in the past. Linden Lab can next try to introduce that cool feature of mapping your RL face onto your avatar, and bring less anonymity and more “real data” into SL. That will come some day.

Exponential growth is the mother of all arguments. “There will always be more happy users than unhappy ones”. For every unhappy one that leaves by June, three new happy ones will come. So, while the discussion around anonymity will continue to foster flame wars on blogs, forums, and even in-world events, it is a “lost cause”. The New Second Life has no more space for anonymity — like the current SL did not have space for gaming the ratings, or telehubs, or taxes on prims. The past is the past, and while one can always speculate how things could have been, we should aim for the future.

The second argument is a bit more complex, and it relies on published research which shows that co-habitation of voice-enabled and voice-impaired users of a virtual world platform is never easy. The results of all those published studies show that the end result is about the same: it will never work 🙂 During the first weeks, we’ll see people gravitating towards their own ghettos, the voice-enabled and the voice-impaired limiting themselves to their own friends with similar attitudes. You’ll lose all your friends that moved towards the “other field”, but at the beginning, this will get compensated by getting new friends with a similar alignment. But remember that exponential growth will make things change rather quickly; all new users will be voice-enabled. At some point in time, event hosters will simply give up doing voice-impaired events, since the majority will be voice-enabled anyway; shops will have shop attendants that will have no patience with the text-chatty customers; Mentors will never bother to answer questions in text; and so on. There are no examples where you can have a single world with voice-enabled or voice-impaired users at the same time and both enjoying the world in the same way. Under exponential growth, the world will align itself towards the majority, and after June, the majority will all be voice-enabled. There is no “room” left for the voice-impaired; they will either leave for other platforms, or content themselves living in their ghettos, uncared for.

The third argument is called “Bartle’s Argument“, based on an old essay he wrote in 2003, and which I basically have addressed last year. His argument is very simple: in a world where you can tweak everything, being unable to tweak your own voice in the same way, doesn’t make any sense — it’s a rupture of the “suspension of disbelief” which is at the core of all literary and movie production in the world. The argument is a bit stretched thinly, since it’s not clear that Second Life, as a social platform, attempts to convey any suspension of disbelief to anyone beyond the role-players. But the partial argument is still true: why can’t I have a set of sliders to change my voice, like I can do for changing my avatar? (or any other aspect of SL really)

In 2003, when Bartle wrote, that technology was not much advanced, so he recommended caution and patience. In 2007, however, this technology, even for amateurs, is becoming much mature. The very friendly Screaming Bee corporation, for instance, has a delightful voice morphing software called MorphVOX Pro, which I recommend (if you’re a Windows users; I’ve been patiently trying to encourage Screaming Bee to release a Mac version of it). But the best part of their technology is that they have a server-based solution of their software. Philip did even talk to them about this, with some gentle nudging by yours truly; however, at the end of the day, Linden Lab thought it would take too much time to do that integration. They would go with un-morphed voice first, and perhaps by the end of 2008, they’d use some sort of morphing technology which might be available.

18 months of waiting is too much. By the end of 2008, SL will have 35-50 million users. Among these, only about a million will have been patiently waiting for voice morphing and brandishing Bartle’s Argument. Will it really, really be worth it? I seriously suspect it won’t; people will be far better off buying their own voice morphing tool (assuming you have Windows, that is) and rely upon that, if that’s all you wish — eliminating your real life “signature” from your voice (and believe me, it works rather better than most people might imagine).

But all the above arguments are totally missing the point; the issue is not anonymity. It’s about disrupting multicasting.

The End of a Paradigm — Bye bye, Multicasting

We now come to the end of Second Life as we know it. The key to understanding the change is not to focus on the above three arguments against voice, which are the kind of arguments that always have two sides for it, and can be argued and argued while never convincing the other side. You can read pro and con sites and blogs dedicating huge amounts of argumentation for each viewpoint, and it’s quite clear that both factions are not talking to each other — they’re just praising their own faction and ignoring the other. Ultimately, as we have seen, the voice-impaired community will always lose, which will be the winning argument for the voice-enabled one. One can build the same kind of arguments for immersionism vs. augmentism; the augmentists, growing exponentially in number, will also always “win” every argument by sheer pressure of numbers. In any case, it’s not the quality of the reasoning that will win arguments — in these cases, like in so many others, the majority will always win with weak arguments, and it’s pointless to debate things like that. The battle is not fought on equal footing. At the very least, the old argument that “Second Life needs voice, because it’s the only virtual world that doesn’t support it natively” will win votes at Linden Lab: they know they have to keep abreast of their competition and can’t lag much behind (another intended pun 😉 ). So, at the end, even Linden Lab will need to reply to market pressures. Voice will be a reality in June; there is no turning back.

However, let’s see what we’re going to really lose in the process. Not users; the new users will come to a world that “always” had voice, and will never think twice about it. Not anonymity; the new users, again, will not care about that, they’ll have other uses for voice, and will rather insist on people not “hiding behind their avatars”. After all, the number of people posting “anonymous” pictures on MySpace are a tiny minority, aren’t they? They do exist, but they’re certainly not a significant percentage of the 150 million users of MySpace — instead, like in SL, they are mostly people very willing to give out as much RL data as possible.

The difference between the Old Second Life and the New Second Life is much more drastic. Again, this does not mean that SL will “disappear” or that people will “leave it in despair” or that “Linden Lab is doomed”. Rather the contrary — it will continue to grow in spite of everything, but it’ll grow into something different. Like the abolition of the telehubs and the introduction of point-to-point teleport drastically changed the landscape on the mainland — almost “overnight” – and SL did not “disappear”, but kept on growing, voice in SL will transform it so much as to make it a completely different product, but it won’t make it disappear. One might even argue that it will lead to a faster growth

There is a common fallacy brought by the voice-enabled group. They argue that communication is difficult when typing, because even a fast typer can only type 3-4 times slower than they can talk. The pro-voice group is notoriously lazy, and thinks that “talking” is a much more efficient method of conveying information and communicating.

Actually, this is a very old fallacy. It’s true that we type 3-4 times slower; however, we read ten times as faster as we listen. This means that during the same period of time that we’re focusing on a single conversation by listening to it, we could have ten simultaneous conversations in chat — obviously, writing less on each one, but receiving ten times as much information.

There is also another issue, which has simply to do with the way our brains are wired, and although some people can definitely train their abilities, not everyone can (or is willing to do it): listening is a “high-bandwidth” activity that requires a much higher processing power from our brains than reading. Writing/reading, however, is “low-bandwidth”. And, more than that, it is also time-independent. Why is this important? Well, you can easily pick up something you’re reading, and start from the point you were. We do that all the time when reading letters or email, getting interrupted by a phone call or a colleague at work (notice: voice communications!), and then getting back to the exact spot where we were.

Why is that so? Well, neurologists and cognitive scientists think that “reading” (unlike “hearing”) has evolved from the uncanny pattern-matching abilities that are built-in on our vision processing modules in our brians. They’re highly parallelised (our brain is very slow, but due to its huge interconnection of neurons, it can parallelise information processing at an astonishing rate), and even the optic nerve “helps” the brain with that by pre-processing information. So your eye is not really sending high-definition frames, 25 per second, for your brain for processing; rather, our eyes and optical nerve “filter” unnecessary data and just send relevant bits to the several areas of the brain that deal with the final processing. We are very good at doing this; that’s why, compared with other animals, we have exceptionally good vision, but very limited hearing (as well as smell and taste, of course).

Hearing, by contrast, is a complex process, which is not so evolved as vision, although it uses similar processes. However, it uses a lot of processing power from our brain. Most human beings cannot follow more than one audio conversation at the same time: in real life, the need for that has never evolved.

So think a bit how our “voice communications” work in real life. On business meetings, we politely wait for the speaker to stop, and then add our comments. Discussions are moderated; there is a panel of speakers, and the moderator will circle questions among them, and then with the audience: one at the time. We don’t do “perfect recollections” on audio; moderators, if they get two questions at the same time, will serialise them to the panel, sometimes asking the person to rephrase that second question (it’s easier to remember a face — visuals again! — than remembering what people said). But even ten years after having read The Lord of the Rings, almost everybody will remember the names of the major characters and the overall plot; quick question — what was the name of the little girl on Edward Scissorhands? (But you’ll definitely remember that Edward was played by Johnny Depp, obviously — due to his visual appeal on screen 😉 )

A study in the late 1980s or so also show that people listening to a 15-minute seminar or lesson will, at most, remember 5% of what has been said. This is the reason why teachers and speakers on conferences and seminars, these days, also use slideshows and hand out notes; or why teachers use blackboards and books to complement their lessons. Have you ever noticed that during a 15-minute podcast you could be reading ten transcripts of podcasts at the same time, while answering your email and replying to your friends on IM chat?

Why do all TV networks put a ticker with the headlines during the news? Well, in a minute, everyone will get the relevant headlines; if they have to listen to the whole news, they’ll need an hour to absorb all major news. Voice is slow, not fast, when you’re at the receiving end. The written text is way faster.

Nothing should surprise us when we see how our real world works. We have painfully adapted to our limitations of dealing with voice; it works great for one-to-one communications without interruptions. But try to speak on the phone with someone while doing a lecture on modern art at the same time; have you ever seen a speaker do that in public?

Of course not.

Our environment, again, shapes the way we communicate. In the real world, since we’re so limited in “multitasking” when using voice, we have evolved a means to deal with that limitation. We organise all our environment in a way that we only need to talk to one person at the time, to the exclusion of everything else. We pick up the phone and interrupt our meetings or leave that email unfinished (or “go afk” in Second Life while talking to your mom on the phone). When talking to an audience of dozens (conferences) or millions (TV and radio broadcasting), one person only is talking, while the others only listen. We don’t allow people to “interact back” on broadcasted events. People are not allowed to interrupt others in a library. And even when talking in a small group of friends, only one will be talking, and the others listening — we are educated by our parents that it is “unpolite” to interrupt others. Why do you think that these social norms have evolved? It’s not mere stubbornness or tradition; it’s simply the way we’re “wired” with the high-bandwidth demands of voice communication.

I’m pretty sure that someone will comment sooner or later that they are perfectly able to listen to music, talk to their friends, and be on the phone at the same time, and that my picture of the way the world deals with audio communication is very pessimistic. Well, there are always exceptions. I’m pretty sure that people who consider themselves to be “natural multicasters” think that others can do the same if they only try. The reality is not so simple. There is even a gender difference — women can usually multicast more, and can also change subjects much faster (and never miss a beat when having several conversations!) than men — although obviously there is a lot of overlap between genders. Also, musicians can train their hearing to listen to several instruments on an orchestra at the same time (and this really works to an extent, although one tends to focus on one instrument for short periods of time, and find it again when we focus on it once more). So, yes, exceptional individuals can “train” to listen to different conversations at the same time — but not the average person. In contrast, everybody can follow a 5-person-text-chat on separate IMs easily, and read a discussion with 20 people chatting at the same time. It only needs a slight adjustment, but we can handle that with little training.

Back to the New Second Life of late 2007. All the multicasting events will slowly fade out and die, as people will experiment with doing them using voice — since they’re used to do them using text-based chat — and will find out, quickly enough, that “it doesn’t work”. If people start yelling on a club, you’ll tell them to hush, because you want to dance and listen to the music; if they don’t behave, you’ll mute them. But today you’re expected to do text chat on clubs, that’s what makes them socially appealing! So why will people come to clubs after June 2007? Basically, to find a partner to do an “intimate” private voice chat 🙂 … and you’ll have to limit you flirting to one partner at a time. Remember that you won’t have many visual clues, only “disembodied voices”, and hopefully you won’t mix up the blonde in the husky voice with the redhead with the tinn, squeaking noise… or you’re up for trouble 🙂

This will mean a dramatic change overall. You’ll be hanging around with a couple of friends, talking about everything, and then a Voice IM pops up — you’ll excuse yourself because you have to handle that Voice IM, and mute everybody while you listen to it. When you “hang up” the call, you’ll apologise again, and couldn’t they repeat what they were just saying? Obviously, there won’t be any transcripts to scroll back…

You’ll also go and attend a conference from a major speaker; suddenly, your cat pukes on the carpet, and you need to clean up the mess — only to return to an empty conference room. Ah well. With luck, someone saved a podcast for you. Today, you’ll be able to quickly scroll back history, read about what you’ve missed, and still participate on the Q&A session.

There are going to be numerous examples of dramatic changes like that. In a way, we’ll lose a special quality of Second Life — de-localisation, unstuckness from the place you are, multicasting, synchronous communication with anyone, anywhere in SL, and the notion that you don’t need your full attention dedicated to SL, but can leave your computer and come back and scroll back on history. People will be angry if you don’t pick up your Voice IM; but you’ll only be able to take one at the time. After some months, people will be used to it — they’ll know they can’t expect everybody to answer all their Voice IMs simultaneously, and since they will require total concentration, if you log in for an hour, you might be able to talk to a dozen people, 5 minutes at the time. Right now, in an hour, you’re able to communicate with a hundred people. What a difference will that make! Naturally, we’ll adapt, and we have an advantage — this will be just like real life, where you can’t talk to hundred people in an hour anyway.

Second Life, just like real life, will lose the extraordinary breakthrough of creating one of the first true real time multicasting environments. At least, historically, we will be able to tell our children how fantastically amazing this once was, during a period of three years. All this will be lost, forever; and a New Second Life will emerge which will much more similar to real life in terms of social interaction.

The “Many Countries” Paradigm

Philip “Linden” Rosedale’s famous quote, which I still keep in my email signature — “I’m not building a game. I’m building a new country.” [interview to Wired, 2004-05-08] — received a serious blow when Linden Lab removed the telehubs. It was clear that what would follow next was a jump to private islands, where you can devise your own urban planning, and that there wouldn’t be “one country”, but “several communities” inside the same virtual environment.

Up to now, these communities have (mostly) had a major advantage: you would not really care how people would come together — they would share a common goal (or lifestyle!) and aggregate around that concept. Physical location would not be an issue, since most of those communities would have people writing in a common language (mostly English, but recently more and more on other national languages as well), more or the less well enough for everybody to understand. A non-native speaker would only take a bit longer to type, but they’ll generally manage reasonably well.

Contrast that with “voice”. In the “early days” of SL, you had mostly people with a certain degree of education and open-mindedness, mostly from travelling around the world, either for business or pleasure, and they would be rather tolerant of different accents, bad grammar, and a certain amount of “uhs” and “ahs” while people figure out how to say what they wish to say.

But SL is a mainstream application these days. People will be surprised to “hear” that their friends do no talk like TV or radio anchors, or movie actors. The “best” event hoster they knew might have a rasping voice, an unidentifiable accent, and such a strange grammar, that suddenly their events have no appeal whatsoever; you won’t be able to understand half (or all) of what they’re saying. People will naturally gravitate towards those that have clear voices, speak flawless English (or your own language), have some training as a speaker (ie. a journalist, DJ, or actor), and have good audio equipment delivering high quality sound in SL. This will naturally leave out many people.

Or will they? Perhaps not. They will have their own, small communities, where their own voice will be “understood” and accepted without problems. In a sense, voice communications in an international medium will push people to the communities that have the closest possible pattern of speech that you have; this is only natural, since, in real life, your circle of friends and acquaintances will come, very likely, from a geographically limited area (since geography plays a role in real life). In a sense, Second Life will, again, resemble more and more the “limitations” that we encounter in real life as well: to be understood, you’ll need to find and locate the type of people you can understand.

Perhaps this is a major difficulty because of the number of users in SL. Obviously enough, when you have a billion users, this doesn’t make a big difference — almost every tiny city will have a large part of the population in Second Life. But they will be the same people you see every day! With a hundred million users, you’ll still be able to find a reasonably large number of friends from your home town to communicate — but, again, it would seem pointless, since you see them every day as well. With just ten million users, it’s very likely that you’ll only find “strangers” — perhaps many from your own city, if it’s large enough.

So will Second Life be “fragmented” again, but this time, across physical boundaries? It’s the most likely outcome, since the same happens, in fact, with almost all “social” sites when they grow. There is obviously a difference in “education”, which is not apparent these days, but which will be very soon be visible: people with a different, open-minded mindset will move across all boundaries and fragmented communities, but they will be a tiny minority — just like in RL. Contrast that with what happens today — even someone that hardly speaks any English, and writes little more than “hi” and “my name is John” is perfectly able to drop in into an art lecture or a popular club somewhere and have some fun. They might even be bold enough to join the conversation; nobody knows if they’re using a dictionary or so to pick up some words. In fact, the best example are the many “automatic translators” used by Mentors to try to help non-native English speakers. While these translators are very, very weak, at least they enable rudimentary, basic communication — a word here, a word there, and at least something can be conveyed, at least enough to point people out to a place where they can find out more (in their own language).

But there are no real-time voice translators. If you don’t understand what people are talking about — because you simply aren’t fluent enough — there is no luck for you. Your only choice is to happen to stumble upon a community that understands you and can help you out; and remember that SL does, indeed, have a steep learning curve, compared to, say, YouTube or even MySpace.

Also, if you are able to multitask even with voice, be prepared — as soon as you meet someone with a bad audio setup, a very hard accent, or not enough fluency in the language, you’ll need full attention to be able to communicate the minimum — to the exclusion of everything else. Very few are that patient. In fact, the biggest argument for voice is that it doesn’t require you to painfully type everything, and that communication is thus much faster. That’s the case if all you want is to be able to talk to your best friend, lover, or latest sexual conquest (assuming they are very fluent as well); but you’re forfeiting the ability to communicate to, say, perhaps 80% of all users in Second Life, for the privilege of talking to a handful of people — which you could do anyway in Skype, if you really wanted.

Is it really worth it?

Conclusions

The voice-enabled SL will be a Babel Tower of people not understanding each other, and speaking to each of them will be a nightmare. Ultimately, we’ll forfeit the benefits of a multicasting environment and revert back to what Humankind has had for the past millenia: one-to-one conversation (one at a time) and broadcasting (one speaker, many listeners). Second Life, already fragmented once under the “many countries” paradigm, will fragment further — creating tiny communities of people that are able to understand each other, to the exclusion of everybody else.

There will be new opportunities, of course. Conferences, seminars, classes, even business meetings, will benefit a lot from voice chat, specially if you don’t require taking notes. And there will be a new job opportunity for actors or journalists that have clear, pleasing voices — like in RL, they will be in high demand for special events, and like in RL, there are not many people with those qualities, so they will be able to charge a lot for that.

The world is not “coming to an end”. Second Life, resembling more real life, might even be easier for the new users coming after June. In real life, most of us are not used to taking 12 phone calls at the same time and writing emails while talking to all these people. We’re used to one-to-one conversations with a restricted number of friends. Newcomers will never understand the appeal of a multicasting environment; they will expect, instead, good broadcasting events, like they have when they turn on the TV or the radio, or go out for a movie. New forms of social interaction will develop; people will excuse themselves to “pick up a voice IM” while on meetings or in a club. When talking to others, you’ll exclude everybody else in the world, and people will understand that (and hopefully respect it!).

One might also speculate what this means for businesses using Second Life. You can’t simply have an open-space office with everybody yelling to the screen; in fact, in the future, office environments using SL actively with voice, will resemble call centres. On the other hand, when at home, it’ll be hard to conciliate your family duties, your neighbours (who certainly don’t want to listen to you flirting with your latest friend at 2 AM), and a relatively noise-free environment. Voice has been popular on games for teenagers because they have less social pressure; you expect teenagers to be rowdy and loud while they’re “playing”. They are used to playing games with voice chat, as Hiro Pendragon reports in his blog. But all this requires a large degree of adaptation for the old users of Second Life.

The new ones will be the lucky ones; they will never have experienced a “different” Second Life.

About Gwyneth Llewelyn

I'm just a virtual girl in a virtual world...

5 Pingbacks/Trackbacks

  • Pingback: UgoTrade » Blog Archive » From Net 0 To A Virtual World: New Ways To Use The Net()

  • Extropia DaSilva

    ‘The author can do with their book whatever it wishes; but you, the reader, cannot. All you can do is read and imagine with the author, but not contribute to the book.’

    …’But when Gandalf offered Frodo the Ring, the Hobbit took a step backwards. ‘Bugger that for a game of soldiers’, he cried. ‘Sam still owes me a pint at our local, and it closes in half an hour!’ With that, Frodo stepped out of Bag End, leaving the poor wizard to ponder his next move…’

    With any writing medium, I could quite easily turn a book into a participatary interative medium. Ok, admittedly, probably not with the panache of the original author (Tolkein in this example) and would probably not circulate my own ‘Lord Of The Rings: Frodo Goes To The Pub’ quite as widely as ‘The Fellowship Of The Ring’…

    ‘with an adequate set of tools (the in-world building and scripting tools) anyone can precisely pin down their own representation of the environment, and share it with everybody else.’

    That is not really true, is it? You did, after all, leave out one very crucial component. Namely, skill. I mean, if you were to give me a big block of granite, and place a hammer and chisel in my hands, I would have everything I need to create a sculpture I might call ‘Extropia In The Style Of Rodan’s The Thinker’. But I could not actually sculpt it because I am no Rodan. I am no Aimee Weber for that matter, so the idea that anyone can build anything is entirely false.

    As for the rest of the article, I do find this concept of ‘immersionist’ versus ‘augmentist’ difficult to reconcile with my own existence within SL. To immerse YOURSELF in SL seems to imply a 1st-person perspective. To augment yourself implies enhancing your natural abilities. Well, my ‘primary’ (aka my RL self) is not ‘in’ SL, if she was I would not ‘exist’ and some stupid, ugly person would instead. Luckily for me, she has many ways of hiding the fact she is a dimwit, so people perceive me as being quite clever. Clearly, I am ‘augmented’ but at the same time, obviously my ‘primary’ is roleplaying a beautiful intelligent woman. So is she an immersionist or an augmentist?

    Anyway, the game will soon be up. I do not think she could roleplay me if voice becomes the only valid choice. She will probably kill me off. Am I sad? Partly, but then, it is quite fitting that the march of technology should have driven me to obsolence. I do not really see the point of making SL just like RL, seems to me the easiest way to experience a VR just like RL is to simply walk out of your RL front door and let your mind model a very convincing reality, based on signals it receives from ‘RL’. But that’s just me:)

    So, assuming Gwyn’s assesment is correct, by June it may well be Au revoir from me, Extropia DaSilva. BTW, should you subsequently encounter a person who strikes you as a bit of a moron, you might pause to consider that, in another life, she was actually quite the genius;)

  • There is indeed a big difference in stating “anyone can do anything” in Second Life and saying that “anyone has the chance (or opportunity) to do anything, if they have the required skills”. Very good point. Speaking for myself again, I’m also completely useless as a builder, and a very lousy (and lazy!) programmer anyway.

    For a discussion between augmentism vs. immersionism, see SL Creativity, which is, in my opinion, the best explanation so far of the two viewpoints on the issue. I’m just following the lead here 🙂 Lys Muse, the author of SL Creativity, has explored the issue much further.

    As for myself, Extropia, I’m going to be a bit more patient. I want to watch what is going to happen — you may call me a masochist perhaps, but I really do have a cat’s curiosity in seeing a new thing unfold and see what it leads to. Slowly, I’m picking up groups of people that will live on the voice-impaired ghetto, and we’ll see how many there are going to be, and how long we’ll survive in this new world. I might start wearing a yellow star on my chest and use as title “Second Class Resident” and post some provocative thoughts on my Profile. “Voice-impaired people also have rights — stop the discrimination NOW”, that sort of thing.

    It’ll be an interesting exercise to see how long that will last. I expect that at some point there will be strong peer pressure to let the matter drop, and well, after that, it might really be time to leave as well. But I’m obviously going to stay around as long as I’m allowed.

  • An interesting side-effect of the introduction of voice in SL is that it will be freely available on the mainland (I expect Linden Lab to upgrade their own servers!) but only available on private islands if people pay the extra cost of upgrading them to the latest hardware which can support voice. One can only wonder what that might mean for the overall land business: “Buy this voice-enabled plot!” will probably become a new marketing strategy…

  • Ashcroft Burnham

    This is an intriguing and thoughtful analysis of the potential impact of the coming of voice to SecondLife, on the assumption that voice gains overwhelming popularity.

    However, I think that there are some reasons to doubt that assumption that Gwyneth has perhaps not considered as much as she might. SecondLife is by no means the first internet-based interactive platform to acquire voice capabilities: some chatrooms have had voice options for years, and some 3d games (Battlefield 2/2142 being notable examples) have had voice built-in since their release.

    The experience of those platforms is that voice is rarely used. In Battlefield 2, for example, at first sight, this is somewhat puzzling: using voice communications within a fireteam gives one a distinct tactical advantage (i.e., one is more likely to win), and it makes the game more enjoyable (instead of running around shooting at anyone with a little red name above their heads, one can actually try to co-ordinate one’s actions). Although text chat is possible in Battlefield 2, it is generally impractical to stop running and shooting to type “look out!”, because one will be shot, bombed, knifed, shot again, and run over by several tanks and a jeep in the time that it takes to stop and type. If only one could say “look out!”, one could be running and ducking and shooting at the same time, and all would be well.

    But people don’t. What is popular is a system whereby people press certain shortcut keys (in Battlefield 2, holding SHIFT and selecting from a quick menu with the mouse) to get at certain pre-recorded voice commands, such as “Enemy vehicle spotted” or “follow me”. The ingenious thing is that users in different languages will hear the command in their own local language, no matter what language the person who sent the message sent it in: to everybody, it will appear as if the whole team speaks the same language. Of course, these pre-recorded samples are recorded on high quality audio equipment by professional actors, and all sound impeccable.

    So why don’t people use the real-time voice? There are, after all, only a limited range of pre-programmed commands: one can say “enemy tank spotted” or “fire in the hole!”, but one can’t say “all right team, let’s regroup near the side entrance; Johnny can distract them with a grenade while Jimmy and Mike go in with the anti-tank weapons, but mind the sniper on the roof”. When people play in serious tournaments, they use voice (although often Teamspeak, rather than the built-in voice capability) because of the tactical advantage, but, on general public servers, it’s very rare indeed.

    I don’t have detailed data for the reasons, so the answers may be somewhat speculative, but there are a number of theories. First of all, people are often shy about talking to strangers. It may be one thing for people to read what one writes: that is quite controllable, after all, but if strangers hear one’s voice they might make all kinds of judgments from one’s intonation or accent or annunciation. Secondly, talking is just plain hard work. People play games such as Battlefield 2 to relax, most of the time (serious tournaments aside). Holding down shift and moving the mouse around, or typing in text takes rather little concentration, and very little physical effort at all. To talk, one has to sit upright, get the timbre of one’s voice right, and actually take the effort to speak. Many people, who have spent the day talking to other people, want to come home and relax from having to do that. Thirdly, it’s a fiddle: getting a headset microphone sorted out can be quite cumbersome. Many people have hand-held microphones, but one can’t hold them and move around. Also, they are often poor quality, to the extent that others find it hard to hear what is being said. The poorer the quality of the microphone, the more effort that one must put into speaking to be understood, and the more effort that everyone else must put into listening to understand (that is why drivers talking on hands free mobile telephones in their cars are dangerously distracted in a way that drivers who are talking to passengers physically present in the car are usually not). Microphones need configuration, and need to be at just the right distance from the mouth, etc.. All of that is far harder to set up than a keyboard and a monitor.

    All of those factors apply far more to SecondLife than they do to Battlefield 2. Playing Battlefield 2, people are naturally concentrating and alert: they don’t have time to multicast to any significant extent, or even type (except when in a lonely spot or “dead”), and, since they have little social interest in their fellow players except as people to shoot or to help shoot the enemy, the risk of feeling embarrassed is somewhat abated. There is also a tactical advantage to be had by using voice chat, and most players are playing in the first place because they enjoy winning.

    In SecondLife, the pace is far slower: people use it more often, and for longer periods, than they would play a fast-paced shooting game, and have SecondLife open in the background often while doing other things, just as one has multiple tabs open on a browser (indeed, one day, might not somebody find a way of making tabbed virtual worlds, or a browser with web pages in some tabs and different virtual worlds in other tabs?). One builds up friends, often many of them, and there is a far greater potential to be worried about what those friends might think of one’s real voice, especially one’s voice after one has come back from a long day at work. Voice chat requires concentration to a far higher degree than text chat: for most light users (that is, most users), that is likely to be far too much effort to have to expend on what is for most a relaxing pastime, a secondary social world that takes far less effort to engage in than the primary one.

    We cannot be sure just how popular that voice will be when it is introduced in June (the beta grid is not much to go by, since nearly everybody there is there specifically to test the voice system – I might have a go myself at some point), but, if the indications from multiplayer games and chatrooms are anything to go by, its use will be the exception, rather than the rule. (Incidentally, the lack of use of voice in Battlefield 2 was somewhat unexpected to the publishers: it was one of the first games to have built-in voice capabilities, and it was a feature heavily advertised).

    There may, of course, just like the serious tournaments in Battlefield 2, or the occasional use on the public servers, circumstances in which voice in SecondLife is popular, such as the more serious uses to which it is increasingly put: business meetings, distance learning, and the like, and no doubt it would give a whole new reality to virtual concerts. However, don’t expect voice to take off and change the face of the grid come June unless people react to voice in SecondLife in a wholly unprecedented way.

  • Ashcroft Burnham

    One other thought: Gwyneth predicts that, in a world of text, voice will immediately take over and displace text when the technology exists to displace it because people will largely perceive voice as superior.

    Of course, there has been an arena, far more popular and important than SecondLife, in which very nearly the reverse has happened: mobile telephony. In the 1980s and early/mid 1990s, mobile telephones, like landline telephones, were voice only devices. One could talk to one person at a time, and that was it. Then, in the late 1990s, digital mobile telephones became popular, and with them a new and initially somewhat obscure technology called the “short messaging system”. By the beginning of this century, the text message, as it had become known, vastly outnumbered the voice call over mobile telephone networks, evidence that, when faced with a choice of text or voice, people strongly tend to prefer text.

    It is true that cost is a partial driving factor in the preference for text messages on mobile telephones, or at least was initially, but cost is at least partly relevant in SecondLife, too, since there is a suggestion that private island owners not paying the latest greatly increased fees will have to start paying them in order to be able to use voice.

    More than cost, however, people text because it is possible to text in many sets of circumstances where voice conversations would be impossible, undesirable or inconvenient. One can text at work or in the library or when talking (in real life) with friends; one can be texting multiple people at once; people can receive text messages, read them, and respond at their leisure (meaning that text messages are often more polite and less intrusive ways of social organisation amongst busy people).

    Bear in mind also that, with mobile telephony, unlike SecondLife, there are more incentives and fewer disincentives (apart, perhaps, from cost, which is of little relevance to those with healthy incomes, who text almost as much as students and those with low incomes) than in SecondLife: people generally know personally the people whom they talk to on mobile telephones, and have met them in real life, so there is no question of being embarrassed at the sound of one’s voice. Telephony has conventionally been considered a voice communication medium, so people’s assumptions are that telephony will involve voice communication; the converse is true of the internet and virtual worlds.

    Although one can never really be sure what the takeup of voice in SecondLife will be in the end, contrary to Gwyn’s confident predictions, Linden Lab might well have spent months designing something that, like the voice system in Battlefield 2, is only used sporadically or for specialist purposes.

  • Gwyn: I, too, am a bit confused as to why, given your lengthy and comprehensive examination of the advantages of text over voice in data transfer rates, persistence, optional asynchronicity, language and so on, you begin by saying that text chat will basically be dead in a few months and we will all be steamrollered by a flood of people using voice.

    After all, these are not points of which people are not aware at some level. Before all of these new users come on, they will have already been exposed to the option of voice vs that of text, in the form of telephone and Skype vs IMs and email and IRC (not many people will be using a computer for the first time to enter SL) and will be aware that there is a time and a place for different media. As Ashcroft says, even outside of the internet people have made these decisions out of practicality and convenience; they text rather than phone, they IM rather than call.

    In fact, I remember the miserable days before email and IM were common in the workplace and the telephone was the standard medium (since it is so much faster than a letter), having little contact with anyone except workmates socially (unless there was nobody else in the office to hear personal calls) and having to take huge reams of notes on paper simply in order to remember what had been said. I would not like to go back to the days of scrabbling for a biro whenever a call came through, or taking messages for other people, and I don’t know many who would either; it has gotten to the stage where I am actually suspicious of people in RL who insist on making telephone calls rather than emailing (or at least emailing _as well_). I suspect that anyone who has _grown up_ using email would find such a situation absurd.

    I am simply not sure who all of these people are who will join SL and chat away happily for long enough and in such great numbers, insisting on only communicating via voice, that text becomes obsolete.

  • Let me try to summarise both Ashcroft’s and Ordinal’s points (with apologies to both of you for being so brief 🙂 ):

    Ashcroft: “SMS is more important for mobile phone operators than voice” (fact, not wishful thinking)

    Ordinal: “There are so many good reasons for text over voice, why will voice wipe out text?”

    Ashcroft’s fact is unarguable; SMS messages are, these days, the core stream of revenue of mobile phone operators, a fact I’ve once found out when my boss pointed out to me that all phone calls internal to our business subscription plan were basically “for free”, but SMS wasn’t, and I had a huge bill on my account because that simply didn’t make any sense to me. SMS is one side-technology that was never meant to be used in ‘mainstream’ mobile phones, and its impact was totally underestimated by the designers of mobile communications.

    It’s true that SMS is indeed the core revenue of mobile operators. It is also used for asynchronous communications, like Ashcroft mentioned — calling someone up while they’re on a meeting is rude, but sending an SMS is not (just like email vs. a regular phone call). So it’s used by high school students during the classroom as well as for business managers for their work. Also, no sane mobile operator will ever think of “disabling” text on their phones — rather the contrary, they keep using it for more and more revenue. Germans pay their supermarket bills by using SMS. Others download ringtones and subscribe to information on sports events, weather, or stock exchange markets. SMS is here to stay.

    However, typing on a phone keypad is cumbersome — way worse than on a keyboard. So, while SMS will definitely continue to be used (and more and more), as a means of “pleasant communication”, it has only marginal success. Sure, young lovers will send romantic messages several times during the day; but are they engaging in cyber-romance, like they do in SL? Not really. There are no equivalents of “SMS Chat Rooms”, although several have (even successfully) played with that concept — using, say, SMS to send messages to a videotext-enabled TV channel, for example. But the scope is limited as a means for “almost-real-time” communication.

    If we compare SL with ‘games’, well, the focus is very different. There is no question that people socialise on WoW or on any other MMORPG; in fact, since the overall number of users of all MMORPGs outpace SL’s own userbase, it’s fair to say that more people socialise on MMORPGs than on SL — and many are obviously using voice. However, “games” have a limited advantage when using voice (although I assume that coordinating teams on a real-time battle are far easier using voice; my younger brother, a die-hard fan of America Online, does that all the time using voice. Strangely enough, I spend more time with him on MSN chat that on the phone 🙂 ). For most non-socialising uses of MMORPGs, voice is not a huge advantage, and it might even become distracting when fighting that Ogre and having your sweetheart wishing to talk with you; it simply breaks your concentration too much.

    However, when the focus is on socialising events, the reverse is true. A 15-minute event using voice becomes an hour if you just use text. Obviously, during that hour, the whole audience is doing 4 things at the same time (ie. chatting with friends — on text), so, in effect, you’re not “wasting” time, but effectively multitasking. For the “speaker”, though, the time required is much higher! They’ll take four times as longer to write the whole thing. Nevermind that at the end everybody will have a text transcript — which might take you an hour to write and summarise afterwards. The “immediacy” of voice is unquestionably easier to use — specially when we look at the Y Generation, which has uncannily short attention spans, and they have no patience to type and type.

    Similarly, although you can, in an hour, spend your time flirting with 4 different people in private IMs, you can only talk to one person in the same period — but what is the more fulfilling experience? Nevermind the difficulty of having “wrong” body language when talking to an avatar — we’re used to flirt (or have phone sex) over the phone, and can cope very well.

    I’ve been a few times over to the Voice Beta Grid, trying to see the changes in interaction that are already occuring. The Voice Beta Grid is, of course, an extreme example: people only go there to try out voice chat, so that’s all they do. So, they stand around in a circle, often not even looking at each other, and they’re just yelling “can you hear me?” or “hi, who are you?” and “that’s me, you moron” 🙂 There is no “avatar body language” at all.

    Now contrast that to a regular chat in SL using text. People have emotes (“gestures”) and use them all the time. You can look at someone and you know they’re not paying attention — either because they’re IMing others, or you can watch their head moving around, and know that they’re activating menus on their screens. Picking up “avatar body language” is an art that all of us learn after a few months. People invite others to sit down, facing each other, because you can then look at what they’re typing (and when they’re typing). Eye-to-eye avatar contact is important — as a study has shown — which sounds quite weird when you think about it. But — we’re humans, and we need to pick up clues. A smiling avatar shows that you’re not angry when typing something blunt.

    Bartle, however, is very blunt when he says that all this is irrelevant for newbies. They couldn’t care less about that. With voice, you don’t need body language. All you really need is a headset and talk. I agree that it’s harder to setup for someone who never played with one, and is just used to phone calls with “perfect” sound and no problems (we’ve evolved phone technology for over 130 years 🙂 — VoIP has little more than a decade!). But — harder compared to what? The SL interface is so clumsy, that if you have mastered it, figuring out how to use a headset is childish compared to SL 🙂

    I still think that the issue here is to understand that things will not happen “overnight”. I can imagine that good old friends who have shared something in SL for a year or three will stil use text chat for most of their interaction. And one day one of them, in the middle of an interesting discussion, will just say: “oh, let’s forget about it, this is a fantastic discussion we’re having, I’m tired of typing, let’s use voice instead” and the other one will answer “sorry, I don’t do voice”. Why not? There might be a billion reasons, but at that moment a huge barrier has suddenly separated those two friends forever — just because for one, using voice is as natural as doing a phone call, but the other one is not willing to do the same. This will break their relationship — or, at least, change it. “If you’re in SL, why don’t you use voice?” The doubt will arise — are they hiding anything? Do they lisp? Do they have an ugly voice? Are they of the opposite gender or a different age? These questions will remain unanswered — but there will always remain that doubt: “this person doesn’t really want to give me their RL identity — what are they hiding from?”

    Voice breaks anonymity to a degree that it would be considered a ToS violation – or at least a violation of the community standards — but it’ll become “normal” to invade other people’s privacy and “forget” about the silly “anonymity” in SL. And this will happen over time. Not by June 2nd. But over time, gradually, and at an accelerated pace.

    Think about the impact that Sony Home will have, when everybody who bought it (which, of course, is voice-enabled) will leave it, bored that they can’t create anything, and try SL next. The “Sony Home Generation” will naturally use voice in SL. And they will come mentally prepared to do all their communications using voice; text chat will simply be an alien environment for them. The more successful Sony Home (and others…) become, the more easily people will use voice-enabled SL and forget about text chat — except, of course, when using it like they use SMS: “Hi, are you busy, or can we talk over voice?”

    Ordinal, you and I are deviant aberrations of corporate culture 🙂 I fought for about 2-3 years on a brand-new Internet company that I had founded to erradicate completely “voice” communications in the business. Voice is the worst nightmare in a small company — it only has disadvantages for business, and no advantages at all. When a customer calls you up, you need to give answers immediately — which you might not be prepared to give. You can just call one person at the time; I tended to tell the phone operator to send me a message via ICQ (it’s all that existed at that time) to ask me first if I wasn’t on the mobile phone before she transferred the call. Worse than that, phone communications are expensive — think about call centres, and what it costs to staff them. After these 2-3 years of “customer education” they’ve started to learn a few things: with a phone call, you don’t get a transcript, so you never know if someone got your instructions right — so it was better to send an email. An email got an answer at odd hours during the night; a phone call would normally be useless (if you didn’t have the answer ready — “I’ll send you an email with the answers”) during office hours, and unthinkable outside them. After a period of time, it was corporate policy to answer all emails (or ICQ IMs, or forum posts…) for free, but charge for phone support — because one operator could easily handle 10 or 12 users on ICQ, while just one on the phone, and — since you got a transcript — you only needed to explain things once. Most clients would forget what we’d told them, and they would call back later, again and again, to go through the same routine.

    I have a huge amount of anedoctes on the use of the “voiceless” office. In 1994, I hired system analysts, programmers, technicians, and even secretaries using IRC — and a few emails. If they didn’t use IRC or email, they would be worthless anyway — but at least they got some basic training on the “text-based” tools. I still hire some programmers that way; we have very close relationships on complex projects, and I have no idea on how they sound on the phone or how they look like, or if their picture on MSN is their RL picture. Who cares! I never did… I just needed them to work, not to smell their bodily odours 🙂

    So, customers started to use ICQ because they would be “always in touch” with the company’s representatives — while if they used phone calls, they had to fight at the incoming calls queue, and very likely never get through. At meetings, people were forbidden to pick up phone calls (that’s both rude for the attendants, who will lose their time while they patiently waited for their colleague to hang up the phone). They could, however, bring their notebooks or laptops and answer emails and IMs on meetings. After a while it was pretty worthless to do the meetings anyway; a chatroom was enough; and to socialise, you would simply take a break, have lunch together, or something much more pleasant than “staying in a meeting”.

    There are many more silly cases like that. I’ve oriented a trainee for 6 months and we never met physically; all work was done on MSN and email. Prospective customers who insisted on physical meetings or phone calls would very likely never complete a purchase; they were far more likely to have the meeting as an excuse to leave their offices for a couple of hours 🙂 Even with my soulmate, I often sent IMs to tell that I would soon be back home — each one of us would be at their respective offices, and if we couldn’t get an IM through, we’d send SMS instead. I still keep in touch with former colleagues and friends — and even family! — after years and years, and chat with them every day, if they’re on an IM system. For me, it’s like they have never left my sphere of acquaintances — even if we haven’t physically seen each other for a while. As said, I talk to my brother and cousin more over MSN or Gtalk than on the phone or physically; I think I haven’t even seen my cousin in two or three years. Or was it four? But we talk every day!

    So, every time I start a new company, the first thing to do is get rid of the nasty “voice” habit, and start putting productivity back on the priorities. Customers, for instance, are forbidden to phone me. When they complain, we tell them how much they have to pay for the “privilege”. Instead, they have my email, MSN, Gtalk, Yahoo, whatever they prefer — as well as web-based forms to contact sales and obviously support. They grumble and complain a lot, but they quickly understand how so much more efficient the system is — instead of trying to call over hours and hours, or even days, and finally go through, they just send an informal SMS or email: “help, we need the proposal for tomorrow!!”. And they’ll have it in a few hours.

    Sadly, this is anedoctal evidence. The vast majority of all companies in the world get “scared” if they only have a “virtual” contact with someone, specially if it’s not even voice, but just text. It’s a question of mentality. You “trust” people more if they wear nice, expensive dresses with quality fabrics and a known brand and have a formal hairdo (or a suit and a tie for the males, and having shaved in the morning). Old habits die hard, and we were educated — by tradition — that a “sloppy” person is “untrustful”. In this age of nerds and geeks, however, the reverse is true — the more sloppy your geek, the more likely he’s a genius that doesn’t care about personal appearance. So, for people like me, all that is really moot — I don’t need to know how people look like to trust them, but… how many of the 6 billion people in the planet think the same way?

    Just a handful.

    Of course, naturally enough, when I started to use SL, all this started to make a lot of sense to me. So people need to trust a nice visual image? Well, you just need a well-groomed avatar 🙂 It’s far cheaper! No need to use eye concealers or face scrubbers to look “fresh and clean” every day, and outfits in SL never get dirty or wrinkled, your office always has the latest style in furniture, and the view is always nice. Just like the first businesses on the Web tried to get the best-looking site to impress customers (if you care about your site, you care about your customers!), in SL you can do exactly the same. It’s the same thing!

    So, voice breaks all that. For friends, this means that I’ll be yawning late at night (or too early in the morning) and I have an ugly voice, so they’ll have a hard time to figure out what I’m saying; for business it’s even worse, although when “in my best” I can be pretty convincing in my argumentation, I can’t do that when I’m sleepy, groggy, distracted, or worried about something else. But in SL I’m “always in my best” — as on emails, IMs, SMS, or any other text-based medium. I can appear “untiring and nice” all the time — while definitely my tired or bored voice will immediately betray me 🙂 In RL, at least I can “return a call” when I need to prepare myself — get some documents ready, finish a proposal, or do a short slideshow presentation. But it won’t be the same in SL — this being part of the fast-paced Internet, people will demand the same in-world as well. I can say “sorry, I only saw your IM now” but I can’t say “sorry, I can’t talk right now, I’m busy”. Why? People pick up the phone at all moments, don’t they? That’s why they have mobile phones , right?

    As you see, this can go on and on, and worth a whole book on business practices. “How Phone Calls Destroy The Business Environment”. It’ll be a nice book to write on the subject!

    Will anyone read it? Naaw. Nobody believes in the voiceless office.

    But people would probably download the podcast…

  • Ashcroft Burnham

    Gwyn, there are no doubt some advantages to using voice in SecondLife: certain sorts of conversations might be more fulfilling, or take a shorter time for the speaker. But isn’t it no more than guesswork that these advantages will be considered by nearly everyone (even in a year’s time) to outweigh all the disadvantages for every situation? For example, even assuming that a speaker at an event finds it easier to use voice (which, for reasons that I have already given, I doubt: a speaker may not want to focus all of her/his energy on just the event, for example, or may not feel like doing more talking after having been talking at work all day, or may not feel like sitting up straight and using an intelligible voice for half an hour), those who might attend such an event might find it so cumbersome to have to listen (and not drop concentration even for a moment if the whole thing is to be heard) that they just don’t bother. If people can attend an event and IM with their friends and surf the web and write e-mails at the same time, then many people might be more likely to attend. If a speaker wants a popular event, he or she might have to use text.

    There may be some events that people host in voice, either because they are the kind of events that inherently benefit from voice (a concert, something theatrical, etc.), or for the sake of novelty, but that doesn’t mean that all speakers and all attendees will always prefer voice, even after voice has had time to bed down.

    As to personal relationships, flirting was mentioned. The usual course of things in online dating sites, for example, is for people to start out writing each other short e-mails, progress to slightly longer e-mails, then progress to calling each other on the telephone (perhaps having exchanged telephone text messages first), and then meet in person. In romantic relationships, voice is likely to be seen as an escalation of the relationship. Some people might not really want their “virtual” romances escalated: after all, people often choose to have a virtual, rather than a real, romance for a reason. People tend to like the idea of being able to create an avatar that appears very different from how that person appears in the first life. Will people be suspicious of those who do not use voice? Not if they are disinclined to use voice for the same reasons themselves.

    The old friends who decide to use voice in the middle of an interesting text chat will probably go back to using text the next time that they talk (realising that voice may not be convenient for the other at that particular time), and only resume voice if they are having another interesting conversation that would take far longer to type. Some people may reply “Sorry, it’s not convenient at the moment” when a friend suggests taking a conversation into voice, and any genuine friend ought respect that, rather than becoming suspicious about the person’s motives.

    The experience of technology so far is that, when people can choose between text and voice, they tend to choose text most of the time and voice some of the time. The advantages that Gwyn has listed for voice in SecondLife generally apply to most other comparable media (and, certain semi-comparable media, such as computer games, some of the stated advantages do not apply, but other powerful ones, such as tactical advantage and having a more immersive game, do). It seems rather difficult to say in the circumstances that, in SecondLife, unlike any other mixed voice/text medium that has ever been invented, voice will eventually almost entirely displace text for all or almost all functions.

  • A more customer-oriented company than Linden Lab would probably have done a survey if residents would use voice in Second Life or not.

    I’ve added a poll on my blog… let’s see what the results would look like 🙂

  • Ashcroft Burnham

    I shall be very interested to see the outcome of the survey…

  • There are not enough readers here, Ashcroft 🙂 Linden Lab would need to email all users and send them a link to SurveyMonkey or so…

  • Gwyn: good to see that you clearly understand what I’m talking about when it comes to practical use of voice 🙂 I’m just not sure that you and I are such aberrations.

    The sort of people in my experience who insist on using voice, and only voice, in a business context are:

    – Human Resources, who want to be able to judge your mood, and also don’t want to make any recordable promises (sorry, I have a bad opinion of HR based on past experience);
    – Sales people, who want to be able to bully you verbally into an immediate decision rather than you being able to review their offers at your leisure;
    – Incompetent middle-managers, who don’t actually have an opinion or anything to say and can’t understand what it is that other people are trying to tell them, so need to have lengthy voice conversations to (a) “prove” that they’re involved and (b) have people pander to their stupidity and explain, at length, over and over again, what is going on.

    I’ve talked about this outside of any sort of internet context with a lot of people and I find that those who don’t fall into those categories are quite aware of the issues of voice (and face-to-face meetings) across all demographics. The admin staff say “don’t waste my time in meetings, send me an email with what you want done, I’m busy”. The members of the board say “don’t waste my time with phone calls, summarise the issue into proper proposals and reports, I’m busy”. In my earlier days I was explicitly advised by the good managers that I had to always request a follow-up email after every conversation and, more importantly, not do anything until this was received. It’s a common perception. People realise when voice is appropriate over text, for discussion and “brainstorming” usually, but they’re not wedded to it at all, and for some things they prefer to use text.

    In fact, even when voice would be more efficient as a mechanism, a lot of people would prefer just to receive emails, because voice chat dominates your entire sensorium as you say, and you can’t be talking to more than one person at a time. Getting a phone call while talking to someone else is the obvious example, but it’s also difficult to reply to one’s emails while on the phone or speaking face-to-face – I simply can’t do it to my usual standard, and other people agree.

    The problem is that so many companies have their internal process plan dominated by the third type of person that I mention that it’s very hard to make progress. But my point is that for everyone else, as far as I can see, it’s obvious that sometimes voice is convenient, and sometimes it isn’t. You say that “companies get scared” – it’s the people who are wedded to voice who get scared, who unfortunately are often the decision-makers when it comes to protocol, the majority of employees aren’t bothered.

  • Just a short comment to second all you’ve said, Ordinal 🙂 Pity, however, that it’s the “wrong” people who impose corporate policy in the business environment — but this is very unlikely to change soon!

    Ashcroft, as I hope you understand, I’m on “your side” in the sense that your arguments are the ones that are rational, logical, and correct. Nevertheless, the European Union, Japan or Korea are plagued with more mobile phones than inhabitants — and I’m pretty sure that while the mobile operators’ profits come from SMS, the use given to all those phones is still for voice calls. This is a mentality very hard to change — after all, after 130 years using telephones, we were all “brainwashed” that the best way to communicate is to do it face-to-face, and the second best one is by using a voice call. It’ll be hard to prove, through rational arguments, that this is, in fact, simply not true. The ones making the decisions about voice will not listen (pun intended) by rational and logical argumentation. You don’t find people in decision areas that will take those arguments and go “aaah, yes, you’re so right, let’s get rid of phones”. Instead, the first thing a company does is to get a phone number; only afterwards they might get an email address, just because it is fashionable to do so.

    At my home, I have a telephone just because recently I got an ADSL line (I used a cable connection before). Nobody knows the phone number except my parents; I have no use for the phone, and can’t even remember the phone number. But then again, I’m an odd one, and don’t register on the statistics 🙂 Imagine my difficulty of opening a bank account a few years ago without giving a phone number. Managers were outraged and shocked; I sometimes gave my parent’s phone number instead just to be able to open the account…

    No, the notion that “voice is supreme” is too ingrained into our society, and “mass voice communications” is still in the minds of people as being an achievement of the 20th century. People would be surprised to find out that “voice traffic” over the world-wide networks these days pales in comparison to data communications in the 21st century. Of course, much of that traffic is already VoIP…

  • The thing is, though, that these “wrong” people are not going to be any influence on actual SL users, who are not in general part of a company. They will get home and think “right, finally I have some time of my own, I think I’ll go onto SL, do I want people bothering me with phone calls like they have been all day? Hell no! I run my own time now! And also I have to deal with the cat jumping on me and kid X asking me to tell kid Y to stop doing Z, which wouldn’t fit in with my av’s role as an ice-cool dungeon mistress.” I don’t think this is an unusual situation, and I don’t think that the majority of people will think that “voice is supreme” under all circumstances.

  • Extropia DaSilva

    If I go down to the sandbox areas and ask someone who is working on something to explain what they are doing, that person has to stop working in order to answer my question because you can either type a reply or work the ‘build’ tools. You cannot do both at once.

    But if they can TELL me what they are doing, their hands are free to continue working and I am not interrupting them with my curiosity. Furthermore, won’t the more effective combination of using the build tools and explaining their functionality while doing so ( as opposed to doing, stopping to explain, doing…) help create better ‘universities’ than currently exist?

  • I am not really concerned about voice on SL from a sociological point of view. And I doubt the drastic changes you forsee will happen at all actually.
    I chatted on Yahoo! Messenger for about 6 years. Started long before they made voice an option on there. And where as it was a fun tool to begin with, and people started using it for playing music in the chatrooms and talking with eachother in rooms as well, it slowly kind of faded away as a commonly used feature.
    The reason for this, of course, is exactly what you mentioned in your blog here. Voice is limited as a social interaction. Only one person can speak at a time. One speaks, the rest listens. It is like trying to have a conversation using walkie talkies. You cannot have the same type of group “banter” where everyone can voice an oppinion and answer eachother criss crossing the chat. The people I know who tried voice in the beta client, says it turns into chaos as soon as more people speaks. It works fine as a conference tool, but as group communication it’s just noise.
    This will become very evident when it becomes a feature on SL and people will gradually stop using it much.
    I worry more about it as an annoyance. You can easily disregard chatters, who type stupid things, by just overlooking what they type. You filter it out when you read the chat.
    It is more difficult to diregard the noise of many people speaking in a club where you want to chat and listen to the music. Or some guy trying to chat you up in his native tongue etc. I remember on Yahoo when arab chatters would come in and try voicing with you in their own language or in incomprehensable english. Voice turned into a nuisance and something you muted more often than not.

    I think the average SL’er will find that the voice feature will confuse things more than it will benefit them. And they will return to text chat quite fast, as chatters did on Yahoo.

    If LL are in any way similar to Yahoo in the way they “upgrade” the interaction features, the next thing will be webcams on Second Life. I see it coming and trust me, that will change the face of everything. Because by then you will be pestered by other SL users over and over again with “Can I see your cam?”. This fundementally undermined the quality of chat on yahoo and will do the same for SL.

    The horror.. the horror…

  • Pingback: Voice in SL: Going Nowhere Fast « Notes from the Beach (2.0)()

  • This is an extremely detailed article. I greatly enjoyed reading it. I own a conferencing and collaboration company and am always interested in reading articles like this. I think that LL will have difficulty incorporating it, but you never know.

    Reno Provine
    http://www.ganconference.com

  • Pingback: Networked_Performance — The Schism Around Voice:()

  • Pingback: Are Virtual World Conferences a Cost-Effective Eco-Alternative? « Earth2Tech()

  • Pingback: Why I don’t use voice in SL. « Dale Innis’s Weblog()