Skype Chat Stats
September 29, 2008
I made a little Skype Chat Stats script. You can run it on a Skype chat (most useful in group chats, although it works in 1:1 too) and it gives you this result. (Period is customizable.)
Chat statistics for 1.1.2007-31.12.2008
By message count:
8235 Al Bino
4925 Al Fresco
4111 Amanda Lynn
2947 Barb Dwyer
2904 Barry Cade
2881 Bea Minor
2852 Bill Board
By text length:
404819 Bill Loney
328649 Billy Rubin
216267 Bud Light
175678 Dan D. Lyons
165080 Dick Bush
155606 Dick Tator
154309 Dilbert Pickles
134121 Don Key
By posted links:
1063 Doug Graves
1038 Dr. Butcher
525 Dr. Kauff
403 Earl E. Bird
385 Fanny O'Rear
365 Gene Poole
341 Helen Back
316 Herb Rice
Total traffic: 210 KB
Get it here. Following is a little discussion of bits and pieces of its architecture and composition.
Generally I am pretty happy with this script, as I took a much more organized approach than I often do at other times. Instead of just randomly throwing things together and making sure it sort of works and leaving it at that, I tried to optimize and scrutinize each line until I couldn’t take away things any more.
Why do this in the first place? First I wanted to learn about Skype4Py, and secondly, I find it’s a good idea to “stay sharp” to throw a script like this together once in a while. But the specific reason was simply that someone said “this would be a good idea” in a group chat I’m in :)
The ‘add new dictionary element’ pattern
I’ve found this is a pattern in Python that I use a lot recently:
try:
topChatterIndexes[m.FromHandle]
except KeyError:
# ... initialize the new chatter record
I don’t know if this is the most correct way to add a new key->value pair to a dictionary, but it works great for me.
Template power
One might wonder what’s this whole Skype Name index has table business. Why is there a dictionary of name->index mappings, and then there is a list of dictionaries, where Skype Name is a seemingly redundant attribute? Couldn’t there be simply a dictionary of Skype Names as keys, and other attributes as values as a nested dictionary?
I actually had it originally this way, but then I found that doing it like I do now lets me more easily re-sort things on the fly in the Django template that I use for output. This also answers why I use Django templating: it gives me the most power here. I simply pass the template one set of values, and the template system can re-sort it on the fly, without me needing to change my code.
I am using Django templating because it’s both elegant and powerful. Consider, for example, this code:
{% for chatter in chatters|dictsortreversed:"charCount" %}
{{ chatter.charCount|rjust:7 }} {{ chatter.fromDisplayName }}
{% endfor %}
In English, this means:
- take the “chatters” list of dictionaries
- sort based on the “charCount” attribute in each dictionary, in reversed order
- for each member of the sorted dictionary, output these two attributes, right-justifying the numeric column to the given with
Sure, I could do this with my homebrew code, but templating is in my view the correct place for this, and if all these filters have been already implemented for me in Django, I just use them.
Feedback to user
If a script runs beyond a critical time, it is necessary to give feedback to the user, to let them know that the system hasn’t frozen or died. This is why there is a basic progress meter in this script: a number is output every 100 messages processed. (I found that the initial enumeration of chats didn’t take long enough on my machine to warrant similar progress reporting.) I initially tried printing something for each message, but this took too much time for just printing, decreasing the actual processing performance. Once per 100 messages seems to be good enough.
This is an example of a feedback mechanism where there isn’t exact science to determine what’s the right number, you just have to do what feels right and is easy to implement.
Shortcomings
There are some things that are inherent to Skype API and/or Skype4Py. As far as I can tell, there is no operation like “return me a chat with this ID” or “return me messages from this chat which fall in this date range”. This is why I currently enumerate all chats, and all messages in the target chat. This isn’t too big of a concern for this application, as I imagine this script gets run infrequently and in a non-time-sensitive manner.
Known bug
There is a known bug, apparently due to Skype4Py, where the non-ASCII display names are printed with characters chopped off, and some counts may be a few characters off. See this forum thread for more info.