Documentation

Running the script will give you an instance of the GroupChat class. Its attributes are

Name Description
master List of all messages in the thread, each one stored as a dict with three keys; ‘sndr’ (string: message sender), ‘text’ (string: message body), ‘time’ (datetime object: message send time).
size Int: number of total messages
users A list of the names of all unique users of the chat, as strings
users_initials The initials of the above, useful for more crowded plots
max_len The max length in chars of any of the participants names + 1
times A np.datetime64 array of all message times, much faster than using lists of datetimes
sorted_master The master list sorted by time
totals The total number of messages for each participant
convos List of conversations within the thread, each entry being another list of message dicts belonging to that conversation

Its methods are

Calculating/Analysing

GroupChat.user_sort()

Divides the master list up into lists for each individual user.

returns A list of dicts, one for each user

GroupChat.cluster_find(threshold=30.0)

Clusters messages into conversations based on gaps. Cluster boundaries are placed where the differenence in time between two sequential messages is larger than the chosed threshold. 30 minutes seems to work pretty well for me but it will vary a lot across different group chats.

returns List of clusters, each cluster being a list of message dicts belonging to that cluster

GroupChat.conversation_matrix()

Calculates the conversation matrix for the whole group chat as follows: When a user takes parts in a conversation with another user, the matrix entry between those users receives a point. When all conversations have been added, each row is divided by that user’s total messages and normalised by the sum of all that user’s points.

returns N by N np.array of values, N is number of participants

GroupChat.time_bins(times_list, bin_size=1)

Finds the message per bin_size (in days) for a list of message times in times_list. returns np.array of datetime coordinates for the bins, number of messages in the bin

GroupChat.word_find(words)

Finds all of the occurences of at lease one of the strings listed in words returns np.array of datetimes where the word(s) occurred

GroupChat.all()

Performs all of the built in analysis in one go:

Printing

GroupChat.random(n=5)

Prints n random messages from the chat

GroupChat.message_string(msg)

Prints the msg dict in a readable way.

GroupChat.message_rank()

Prints a table of participants ordered by their total number of messages.

Plotting

Running any of these methods will save the resulting plot in plots/

GroupChat.time_plot(bin_size=1, window=30)

Plots the whole group’s usage over time, with bin_size in days and a moving average of size window.

GroupChat.time_plot_user(names, bin_size=1, window_30)

Same as above but plots the activity of individual users whose names are given in the list list.

GroupChat.matrix_plot()

Plots the conversation matrix

GroupChat.word_plot(words, bin_size=30)

Plots the usage of individual words by the whole group over time. words should be a list of lists, each sublist representing one word or phrase. For example, to see the usage of things to do with football compared to things to do with nights out you might do:

>>> words = [['football', 'footy'], ['pub', 'bar', 'drinks']]
>>> groupchat.word_plot(words)

GroupChat.daily_plot(names=None)

Plots the activity of the group over the day. Passing names as None will plot the whole group. Passing names as a list of names will plot the activity of those users only.

GroupChat.weekly_plot(window=60)

Plots a sort of heatmap for the group’s activity over a whole week. Smoothed with a moving average of size window (in minutes).

GroupChat.message_length_plot()

Simply plots the distribution of message lengths over the whole chat

Groupchat.word_length_plot()

Plots the distribution (just mean and std. dev.) of word length for each user.