Skype Chat Stats
September 29, 2008
I made a little Skype Chat Stats script. You can run it on a Skype chat (most useful in group chats, although it works in 1:1 too) and it gives you this result. (Period is customizable.)
Chat statistics for 1.1.2007-31.12.2008 By message count: 8235 Al Bino 4925 Al Fresco 4111 Amanda Lynn 2947 Barb Dwyer 2904 Barry Cade 2881 Bea Minor 2852 Bill Board By text length: 404819 Bill Loney 328649 Billy Rubin 216267 Bud Light 175678 Dan D. Lyons 165080 Dick Bush 155606 Dick Tator 154309 Dilbert Pickles 134121 Don Key By posted links: 1063 Doug Graves 1038 Dr. Butcher 525 Dr. Kauff 403 Earl E. Bird 385 Fanny O'Rear 365 Gene Poole 341 Helen Back 316 Herb Rice Total traffic: 210 KB
Get it here. Following is a little discussion of bits and pieces of its architecture and composition.
Generally I am pretty happy with this script, as I took a much more organized approach than I often do at other times. Instead of just randomly throwing things together and making sure it sort of works and leaving it at that, I tried to optimize and scrutinize each line until I couldn’t take away things any more.
Why do this in the first place? First I wanted to learn about Skype4Py, and secondly, I find it’s a good idea to “stay sharp” to throw a script like this together once in a while. But the specific reason was simply that someone said “this would be a good idea” in a group chat I’m in :)
The ‘add new dictionary element’ pattern
I’ve found this is a pattern in Python that I use a lot recently:
I don’t know if this is the most correct way to add a new key->value pair to a dictionary, but it works great for me.
One might wonder what’s this whole Skype Name index has table business. Why is there a dictionary of name->index mappings, and then there is a list of dictionaries, where Skype Name is a seemingly redundant attribute? Couldn’t there be simply a dictionary of Skype Names as keys, and other attributes as values as a nested dictionary?
I actually had it originally this way, but then I found that doing it like I do now lets me more easily re-sort things on the fly in the Django template that I use for output. This also answers why I use Django templating: it gives me the most power here. I simply pass the template one set of values, and the template system can re-sort it on the fly, without me needing to change my code.
I am using Django templating because it’s both elegant and powerful. Consider, for example, this code:
In English, this means:
- take the “chatters” list of dictionaries
- sort based on the “charCount” attribute in each dictionary, in reversed order
- for each member of the sorted dictionary, output these two attributes, right-justifying the numeric column to the given with
Sure, I could do this with my homebrew code, but templating is in my view the correct place for this, and if all these filters have been already implemented for me in Django, I just use them.
Feedback to user
If a script runs beyond a critical time, it is necessary to give feedback to the user, to let them know that the system hasn’t frozen or died. This is why there is a basic progress meter in this script: a number is output every 100 messages processed. (I found that the initial enumeration of chats didn’t take long enough on my machine to warrant similar progress reporting.) I initially tried printing something for each message, but this took too much time for just printing, decreasing the actual processing performance. Once per 100 messages seems to be good enough.
This is an example of a feedback mechanism where there isn’t exact science to determine what’s the right number, you just have to do what feels right and is easy to implement.
There are some things that are inherent to Skype API and/or Skype4Py. As far as I can tell, there is no operation like “return me a chat with this ID” or “return me messages from this chat which fall in this date range”. This is why I currently enumerate all chats, and all messages in the target chat. This isn’t too big of a concern for this application, as I imagine this script gets run infrequently and in a non-time-sensitive manner.
There is a known bug, apparently due to Skype4Py, where the non-ASCII display names are printed with characters chopped off, and some counts may be a few characters off. See this forum thread for more info.