Skype Chat Stats

September 29, 2008

I made a little Skype Chat Stats script. You can run it on a Skype chat (most useful in group chats, although it works in 1:1 too) and it gives you this result. (Period is customizable.)

Chat statistics for 1.1.2007-31.12.2008

By message count:
  8235   Al Bino
  4925   Al Fresco
  4111   Amanda Lynn
  2947   Barb Dwyer
  2904   Barry Cade
  2881   Bea Minor
  2852   Bill Board

By text length:
404819   Bill Loney
328649   Billy Rubin
216267   Bud Light
175678   Dan D. Lyons
165080   Dick Bush
155606   Dick Tator
154309   Dilbert Pickles
134121   Don Key

By posted links:
  1063   Doug Graves
  1038   Dr. Butcher
   525   Dr. Kauff
   403   Earl E. Bird
   385   Fanny O'Rear
   365   Gene Poole
   341   Helen Back
   316   Herb Rice

Total traffic: 210 KB

Get it here. Following is a little discussion of bits and pieces of its architecture and composition.

Generally I am pretty happy with this script, as I took a much more organized approach than I often do at other times. Instead of just randomly throwing things together and making sure it sort of works and leaving it at that, I tried to optimize and scrutinize each line until I couldn’t take away things any more.

Why do this in the first place? First I wanted to learn about Skype4Py, and secondly, I find it’s a good idea to “stay sharp” to throw a script like this together once in a while. But the specific reason was simply that someone said “this would be a good idea” in a group chat I’m in :)

The ‘add new dictionary element’ pattern

I’ve found this is a pattern in Python that I use a lot recently:

        try:
            topChatterIndexes[m.FromHandle]
        except KeyError:
            # ... initialize the new chatter record

I don’t know if this is the most correct way to add a new key->value pair to a dictionary, but it works great for me.

Template power

One might wonder what’s this whole Skype Name index has table business. Why is there a dictionary of name->index mappings, and then there is a list of dictionaries, where Skype Name is a seemingly redundant attribute? Couldn’t there be simply a dictionary of Skype Names as keys, and other attributes as values as a nested dictionary?

I actually had it originally this way, but then I found that doing it like I do now lets me more easily re-sort things on the fly in the Django template that I use for output. This also answers why I use Django templating: it gives me the most power here. I simply pass the template one set of values, and the template system can re-sort it on the fly, without me needing to change my code.

I am using Django templating because it’s both elegant and powerful. Consider, for example, this code:

    {% for chatter in chatters|dictsortreversed:"charCount" %}
    {{ chatter.charCount|rjust:7 }}   {{ chatter.fromDisplayName }}
    {% endfor %}

In English, this means:

take the “chatters” list of dictionaries
sort based on the “charCount” attribute in each dictionary, in reversed order
for each member of the sorted dictionary, output these two attributes, right-justifying the numeric column to the given with

Sure, I could do this with my homebrew code, but templating is in my view the correct place for this, and if all these filters have been already implemented for me in Django, I just use them.

Feedback to user

If a script runs beyond a critical time, it is necessary to give feedback to the user, to let them know that the system hasn’t frozen or died. This is why there is a basic progress meter in this script: a number is output every 100 messages processed. (I found that the initial enumeration of chats didn’t take long enough on my machine to warrant similar progress reporting.) I initially tried printing something for each message, but this took too much time for just printing, decreasing the actual processing performance. Once per 100 messages seems to be good enough.

This is an example of a feedback mechanism where there isn’t exact science to determine what’s the right number, you just have to do what feels right and is easy to implement.

Shortcomings

There are some things that are inherent to Skype API and/or Skype4Py. As far as I can tell, there is no operation like “return me a chat with this ID” or “return me messages from this chat which fall in this date range”. This is why I currently enumerate all chats, and all messages in the target chat. This isn’t too big of a concern for this application, as I imagine this script gets run infrequently and in a non-time-sensitive manner.

Known bug

There is a known bug, apparently due to Skype4Py, where the non-ASCII display names are printed with characters chopped off, and some counts may be a few characters off. See this forum thread for more info.