Recently I noticed that my website is beginning to attract spam, some of it quite funny. I started to wonder – what do spam bots think will work on me? What topics do they think trick me into giving them a chunk of my (dwindling) savings?
Content analysis is a great way to analyse qualitative data. But first, what’s qualitative data?
Qualitative data is non-numerical information, such as transcripts from interviewing users, or the results of a usability review. Or spam messages.
Generally speaking, people know roughly what to do with numerical data. Count it, compare it, look at means, variance, etc. But what do you with the responses to an open-ended question? Well, content analysis is one answer.
So how does it work?
First, researchers chop the data into small ‘units of meaning’, which could be as large as an entire comment or even a single word. Next, those fragments are labelled with different descriptions of their content.
For example, let’s say you interviewed 100 people about their experience of eating at McDonalds. You would then look at the data, and create ‘codes’ (such as ‘liked the fries’ or ‘speed of service’), and assign those codes to each response. At the end, you have an overview of how frequently certain topics come up .
Content analysis is glorified counting, but still a great technique! For more information, see the following paper: Cairns et al., 2008.
Analyzing the data
I started out by creating a new set of codes, which you can find below as I discuss each topic. I then created drop-downs in Google Sheets and coded the message body of each piece of spam.
Here’s what coding the data looks like…
Type of spam
Sex and Dating
Unsurprisingly, the #1 thing the internet wants to sell me is…Sex, and dating sites. Well, at least the promise of sex and dating websites. Sometimes I wonder if any of these are legitimate services or just straight up scams. Either way, that’ll be a hard pass from me.
If there’s one thing spammers are good at, it’s spamming people. Well at least I hope they’re good at it, because they’re liberally offering their services to people like me.
Not a bad marketing strategy if I say so myself. How can I know they can spam people for me? Well, they managed to get their message into my inbox, so there you go…
My favourite message is one that informed me: “Did you know that it is possible to send letter completely lawfully?”.
What a revelation.
The next most popular topic to spam me about is financial services, like crazy investment opportunities. I guess I really am an idiot, because what fool would turn down returns of “uр to 8335%”.
Sigh, better just wait for Bitcoin 2.0 or whatever the next great opportunity is.
Essay writing services
This one is quite interesting. I guess it must be a pretty lucrative business, pumping out essays for students.
It’s pretty well targeted that they managed to send this to me while I’m a student. Maybe they just blanket spam websites with this, but I don’t remember ever getting this type of spam in the last couple of years while I was working.
Maybe the spamming services are smart enough to have a bot reading my website for mentions of university? If so, those ‘marketing offers’ above are seeming better and better…
Well this one is pretty boring. Give us $20 and we’ll give you a pair of totally authentic Ray-Bans. Taobao exists, why would I answer this?
Technically it’s my interpretation that one of the message was religious in nature, but it seemed to fit the bit. The entire message was just:
“These are indeed end times, but most are in the falling away”.
I really have no idea what’s going on here. It’s a bit too gloomy to get me in a spending kind of mood. Maybe the scam is that if you reply they start offering you guaranteed passages to heaven? Indulgences via Gmail?
Anyways, I hope it’s not end times because that means writing this post will have been a waste of time.
Some thoughts about content analysis
The sample size is so small!
Yes, my entire sample was just 17 emails, which is a pretty limited sample size.
But content analysis is more about familiarizing yourself with the data, rather than drawing statistically significant conclusions from it. Being able to work with smaller sample sizes is one of the benefits when using content analysis to investigate a topic.
You came up with the topics, isn’t that subjective?
Yes, it is. Content analysis will always have a certain amount of subjectivity in it. There are ways of trying to mitigate this such as inter-coder reliability, where 2 people code the same data independently, and then you compare their results. Or using agreed-upon codes for common topics.
Philosophically, I’m not sure if these really solve the problem. I think it’s best to not think of it as a problem at all, but rather part of the analysis.
Coding robotically, with zero subjectivity, would probably lead to worse results than with a certain margin for making reasonable decisions. A bit of flexibility in a bridge keeps it from shattering.
Couldn’t a program do this more efficiently than a human?
Depends on the context! For some data, I imagine it would be much more efficient to have a program chew through huge amounts of data.
But you have to consider the meaning of a sentence is not always immediately clear from the vocabulary within it. Sarcasm, meaning ‘between-the-lines’, and references are incredibly common in participant responeses.
I’m sure you could write programs that understand all of those, but would it be worth the effort? Again, it depends. I suspect the majority of the time it would be an automation trap.
I think coding analyses will be more and more automated. But there’s no better way to get to know the data, and understand what’s going, then doing them by hand. So for now, it’s drop-downs in a Google Sheet for me.