With minimal use of the command line you can use all of the features of this code. This page will briefly explain how to get the data from Facebook, parse it, analyse it, and produce some of the example plots on the previous page.
For more information on any part, please check out the documentation.
1. Getting Your Data from Facebook
On Facebook, go to your settings and you should see an option to “Download a copy of your Facebook data”. Follow this and click “Download Archive”.
Facebook will put together a .zip archive of all your data and email it to you, it can take a little while so check back after 20-30 mins.
In the meantime go back to the Facebook front page and click Messenger on the left hand side.
Click on the group chat you’re interested in and copy the number at the end of the URL.
You’ll need this later to find the group chat within your Facebook data download.
2. Getting the Code
If you have Git installed you can run
$ git clone https://github.com/conor-or/fb-analysis
or if you don’t have Git you can click this link to download all the stuff you need.
3. Parsing the HTML
The first time you run the code you need to pass it the HTML file to parse. The file you want will be in the /messages
folder within your Facebook archive. It’s named xxxxxxxxxxxx.html
with the ID number you found earlier. From within the fb-analysis directory run
$ python main.py /path/to/html/file.html
Once this is done a group_chat.pkl
file is saved in the input/
directory. This contains only the thread you’re interested in and is formatted as a list of dicts, each dict representing a single message. For example it might look like this:
[
{
'text': u'What about Saturday night?',
'sndr': u'Linwood Zook',
'time': datetime.datetime(2013, 12, 29, 21, 48)
},
{
'text': u"Can't make it lad",
'sndr': u'Harold Culligan',
'time': datetime.datetime(2013, 12, 29, 21, 50)
}
]
5. Running the Analysis
After this, it’s really up to you how you want to analyse your group chat. Once the .pkl
file is created the program will always read from this by default so to interact with the group chat you can run:
$ python -i main.py
To return the list of dicts of all messages:
>>> groupchat.master
To print $n$ random messages from the chat:
>>> groupchat.random(n)
12/06/16 15:49 Genaro Eggleston What a shambles
09/11/16 04:15 Diedra Devane Needs to chill.
26/08/15 11:54 Melissia Dubiel Is it Euston?
11/12/16 00:39 Mitchell Ashworth Thought it would be better
04/04/16 21:10 Genaro Eggleston Yeah in a suit
To print the ranking of all participants by message count:
>>> groupchat.message_rank()
Linwood Zook 21610 26.00%
Mitchell Ashworth 17929 21.57%
Diedra Devane 13538 16.29%
Genaro Eggleston 10650 12.81%
Erna Claypool 6162 7.41%
Harold Culligan 3914 4.71%
Lester Sneed 3739 4.50%
Melissia Dubiel 2478 2.98%
Wilburn Malbon 1213 1.46%
Maryann Peguero 942 1.13%
Ashanti Sankey 486 0.58%
Evelin Boden 420 0.51%
Hai Cruzan 24 0.03%
Irwin Alm 8 0.01%
To run all available plotting features simply do
>>> groupchat.all()
or run the script initially with
$ python -i main.py -all
All outputs will be saved in /plots
.
If you’re more a fan of doing things through Jupyter/iPython you can run the included notebook to access everything interactively.
$ jupyter notebook notebook.ipynb
For the full list of all features please check out the documentation. Enjoy!