On Google Wave, and how we made Skype Chat

November 27, 2009

Google Wave looked interesting when it was first presented. I was looking forward to getting on to it. And I was intrigued by the actual interface when I got the invite a few months ago.

I was intrigued because it was a déjà vu from five years ago to me. It was the time when I was challenged with leading the effort to design Skype chat to support multiple people. I now thought a five year anniversary is fitting to document this part of Internet history. And furthermore, enough time has gone by to make sure that this doesn’t contain any proprietary info or doesn’t hurt anybody. A lot has changed with Skype chat and conversations, and I mostly wasn’t a part of the later changes. I can’t really explain the more recent work. I can only discuss how it came to be.

So, it was somewhere in 2004 and Skype realized they need to revamp their text chat/conversation system. Skype actually had one from the very beginning, but it was one-to-one and not very nice. It did support the basic concept of queueing messages that is extensively used today as well; that is, you could write a message to someone regardless of whether they were online or not, and it would be delivered the next time you would both be online. True to Skype’s serverless architecture, the message would simply be queued in your computer.

That was nice, but not enough. We needed to support multiple people, as was evidenced by our own itch of using Jabber for internal work conversations until chat came to be. It did an OK, but not great, job. It set a bar for us to cross.

I was tasked with leading the project, and I take credit for the most important interaction design decisions, right or wrong. I actually consider Skype Chat my most important interaction design work to date of everything that I have done, though I didn’t know what I was doing was called interaction design. We called it just “project management.” I had great help, though, from Kristjan, Priidu, Indrek and other Skype staff, who we spent a lot of time with at the whiteboard, arguing through things.

Here are some of the most important questions and decisions we went through. This is reconstructed from memory, and is all years old, so forgive me for possible occasional misrepresentation, and fix it in comments.

What is the goal?

It was somewhat similar to Wave. We set out to “fix the email problem”, and provide a convenient group collaboration tool that would first and foremost support our own organization of many adhoc groups coming and going, but being fairly limited in size (the first chat size limit was 50 people I think, later boosted to 150). We looked at SMS, IRC, email, Jabber groupchat, Internet chatrooms, and other modes of such communication available at the time.

We were more limited in scope that Wave, though, since we didn’t try to solve the communication and document problems together. What Wave is trying to do is to say that the same “room” can serve as both discussion and document. The jury is still out there whether this works. At Skype, we limited our approach to only consider communication, and not really look at the document/wiki aspect. We figured we’d solve the document aspect by providing a convenient file transfer capability integrated with the chat.

What’s the content model vis-a-vis people coming and going? What happens when you close the window?

This was the most important question to me. Is it synchronous or asynchronous? Is it like email, where you get everything delivered at the time of your choosing? Or is it like IRC, where you only get the stuff that happened while you were connected with your client, and miss out on the rest? (Nevermind that there are now IRC log-bots that capture and share everything, as that is a different experience from actually participating.)

I chose to make chats persistent because that is how teams work in an organization. A team tends to be consistent over time, and it is in the interest of the team that everybody is constantly filled in and knows the same information. So, closing the chat window in the GUI should not disconnect you from the chat. Neither should turning off your computer and later turning it on again; you should still get all the messages. More on this below in “multiple devices.”

If closing a window keeps you in a chat, what does this mean for alerts if you get a new message? How about if you minimized a window? What notifications will you get? I don’t remember all the details, but we spent some time to generate quite an elaborate alert policy. Today, if you look at the alerts/notifications in your Skype preferences, you’ll find all the settings there.

But if closing the window does not take you away from the chat, how do you actually exit one? Or are you stuck in all your chats forever? No, you are not. We just made “leave chat” an explicit user action. There’s a button for it.

Are chats public or invite-only? How do you get in a chat and what do you see then?

I spoke about leaving a chat, but how about getting to a chat in the first place? We made chats private and invite-only. The only way for you to get in a chat was if someone added you to it. That could happen in two ways: either you started a 1:1 chat and expanded it to multiple people later, or you explicitly started a chat with multiple people. I don’t have data, but I’m quite sure that “1:1 first, expand later” is 99.9% of the use.

If you are added to a chat, do you see the whole history or only from that point onwards? I decided that each member sees only what’s posted during the time that they are a member of a chat. Again, this was to support our own use, but it also just feels more natural to eliminate historic cruft. You may be added to a team chat that may be two years old. What they spoke a while back may be irrelevant to you, and they may actively not want you to see it. If it’s important, someone will copypaste you the relevant parts of earlier discussion. I made it this way because it resembles how human groups naturally work.

Note that this is a major difference from Wave. If you are added to a Wave, you see everything from the beginning of times, and can play it back bit for bit, seeing all the edits. I’m not saying it’s wrong, and it definitely has its advantages in their wiki/document model. Time will tell how it works out.

Aside: later, there were public chats in Skype. And even in private chats, there’s a permission scheme so some people can kick others out under some conditions. But I won’t write about it because I’m not sure what’s the status or future of public chats, and the permissions are complex. We borrowed some functions from IRC: type “/help” to see the commands, including permission-related, that are available to you in any Skype chat.

How will multiple devices/computers work?

This was a very interesting question, and we diverged from what most of the popular IM systems did at the time. In client-server-based systems like MSN, AIM, Yahoo Messenger, ICQ etc, only one client per username could be connected at the time. If you connected with AIM from computer A, and then from computer B, computer A got disconnected at the moment you connected from B. I don’t know why that was the common practice in the industry. Maybe it was about message routing complexity, or security, or who knows what.

Skype’s architecture, though, was more complex than simple client-server. Everybody was a client and a server at the same time. What would “one client at a time” even mean? It would have created more problems than it would have solved. So, we realized that what were dealing with was not a messaging problem, but more of a distributed database problem. And through some brilliant engineering by Indrek, we created a system based on “synchronizing”. That is, each device signed in with a particular username has a local database of all chats, and when (re)connected to the Skype network, synchronizes the chats with the “Skype cloud”, posting messages created locally, and retrieves new ones from the cloud.

It sounds complex, and it is. Initially it was quite buggy, but over time, I think it has worked remarkably well. I myself use Skype across multiple devices and overall it works really great. I absolutely think it was the right decision as I could foresee Skype being used across multiple computers and other devices. Extrapolate this approach to the iPhone version and you see what I mean.

I have not looked at Wave protocol guts, but it feels that our design and implementation was/is fairly similar to the gist of Wave federation, except that Wave is more complex, spanning multiple servers, networks and who knows what else. But all in all, you have a collection of waves, and in those, you post messages locally and retrieve remote ones, and it works quite nicely already today, e.g across multiple browsers.

One important thing that Skype and Wave do right, but that’s completely broken in Twitter, is the distribution of metadata, and in particular read/unread states. In Skype, we synchronize not only message content, but also the read/unread state, and Wave does the same. But in Twitter, it highly annoys me that there is no facility to synchronize read/unread state, and I have to read the same tweets ten times across different clients and devices.

Interestingly, email can work using both models. The “old MSN/AIM etc” is POP3 where content is downloaded to client and deleted from cloud, and the new Skype/Wave model is IMAP. (Though these days, probably very few people use POP3 or realize the difference.)

So there; that’s some of the thinking that went into Skype Chat that continues to serve the global Skype community. I’m extremely proud of it and as I said, it’s one of my most important works to date. I don’t know what’s next for it; I do know that Skype’s “conversations” pitch that they practiced around Windows version 4 has much of the same goals that Wave tries to achieve, with perhaps voice and video other added goodies. We’ll see what comes next.