I was recently invited with Thought-Wired to give a talk at Hobsonville Point Secondary School, an institution leading the way in modern approaches to learning and education in Auckland. The facilities and staff are so mind-bogglingly awesome, I always spend most of my time there lamenting my own educational experience. Case in point, I was there to talk about machine learning, and its relevance to artificial intelligence. This was a subject that I did not learn about until the final year of my Bachelor’s and these students were about to learn about it before having sat NCEA Level 1 [so so jealous].
For my talk, I ran through a real life example and demo of how machine learning works. This post is aimed to supplement that by going through the example in a bit more depth and cover some points I missed in my talk. Hopefully, it should be understandable to high school students, but if there are sections that aren’t clear, or if I’ve fallen into the abyss of academic vernacular, leave a comment so I can fix it up.
I initially had problems trying to come up with an example of how to explain exactly what machine learning is, and furthermore how to demonstrate it engagingly. I found a post that explained it quite well using shopping for mangoes as an example, but didn’t feel this was relevant or interesting to a high school student (check it out though, it does provide a pretty good explanation). Somehow, I arrived at the conclusion that a live demo to show an actual machine being trained using real data was the best way. In hindsight, this could have been a terrible idea, but I believed even if the results didn’t turn out the way I expected, something could still be learned from them.
The use of social media is pretty much ubiquitous in our lives, whether you’re an adult or a high school student. However, are we using the same apps? How easy is it to differentiate a student from an adult based on the social apps they use, and if it is difficult: could a machine be taught to tell whether a person is a high school student or adult based on the social apps they use? This machine could be able to identify underlying patterns in the data that might not be observable to a human.
As I mentioned earlier, I foolishly wanted to use real data, so to create my data set for adults I quickly whipped up a Google form and sent it out through the social apps I use asking for responses. Specifically, I wanted people aged over 15 (criteria for an “adult” in this case), and a list of what social apps they used. I also provided a text field for participants to proffer other apps that I might have missed.
With limited time, I managed to collect 40 responses with an average age of 30.1 (s.d. 8.6) years. The eventual list of apps I arrived at was as follows:
While I was setting up for my talk, the students were sent a link to a new form to fill it out. This form consisted of the same list of apps, and no field to suggest new apps. The responses to this form became the student data set, and luckily, there were exactly 40 responses with an average of 14 (s.d. 0.8) years.
The following bar chart shows a summary of everyone’s responses. Note that to us, as humans, we can see that TapTalk is used by neither student nor adult and is therefore useless as a feature. LinkedIn, Periscope, Ello and Telegram are only used by adults (but not all adults) and would be a good indicators of adults. Similarly, Skype seems to be much more popular among the students and could also be a good indicator, but for students.
The type of machine I used was a binomial logistic regression classifier. This means there are two kinds of output: student or adult (binomial), it uses curve-fitting techniques to learn (logistic regression), and is able to sort samples of data into categories or classes (classifier). The machine will learn from data fed to it by adjusting certain coefficients or properties that affect how much influence particular features have.
A feature is a single property that can be measured, e.g. does a person use Facebook: Yes or No. In this case, the features are the usages of each of the social apps. Combining all the features of a person creates a feature set and the corresponding type of person (adult or student) is the class. Every feature set will have a corresponding class. An example of a feature set would be a person who uses Facebook, Instagram, Snapchat and Skype, and since this person is a student, their class is student.
I used the feature sets of 25 adults and 25 students to train the machine, and the remaining feature sets were used to test the machine’s ability to predict the type of person based on the apps they used. The purpose of testing is to see how well the machine has learned by presenting it with feature sets it has not seen before and seeing how well its predictions or outputs compare to the actual classes.
Training the machine took less than a second and a total of 5 iterations. This meant that it took 5 adjustments of the machines coefficients to get its outputs to match actual classes with acceptable accuracy. In this case, the machine was able to learn to the point where its output was 96% accurate, i.e. from the training data, the machine only got 2 people’s classes wrong.
The real test comes from the new data – the 30 feature sets that the machine has never seen before. For this, the machine was 87% accurate in predicting a student or adult. A confusion matrix can be used to help visualise the results and identify where the machine lost some of its accuracy. The confusion matrix for the data is shown below:
This is a very useful tool for comparing and analysing machine learning ability. Output Class is what the machine predicted, Target Class is the actual class, and 0 and 1 represents adults and students respectively. The grey squares represent linear totals (e.g. machine outputs of adult were correct 73% of the time, while student target classes were identified 79% of the time) and the blue square is the total accuracy. If we look at the top left green square, here the machine predicted adult and the actual class was adult (the two 0’s). This happened 11 times.
In the red square to the right of it, the target class was 1 or student, but the output class remains as adult. This means the machine got it wrong, and this happened 4 times and is the main contribution to the loss in accuracy. In terms of predicting students (the bottom right green square), the machine did this correctly for all of them, resulting in 100% accuracy as shown in the grey square.
It may take a while to get your head around the matrix, but it’s a very useful took for quickly visualising results.
The other component of the machine we can look at is the actual values of the coefficients that were adjusted within it. How these are applied will be difficult to explain (I barely understand it!) but put simply, the higher their magnitude the more important the corresponding feature is to the machine in determining its output.
In this example, all coefficients were 0 (meaning not used at all), except for four features. These features and their coefficients were:
What these values mean, is that the machine relied heavily on the LinkedIn and Skype features to tell whether a person was a student or an adult. It also used Whatsapp to a lesser extent and Facebook a little bit. These pretty much support what we thought at the beginning by visually inspecting the data, meaning that in this case a machine is not that much advantageous over a human. These features make the identification quite easy because there is such a big difference in usage when it comes to these apps.
So what happens if we get rid of these features and train the machine without them? As a human, can you tell which features will be important now, and how much?
Taking out LinkedIn, Whatsapp, Skype and TapTalk (since no one uses it anyway) resulted in 66% training accuracy and 60% test accuracy. Woah, that’s a pretty big drop! Here’s the confusion matrix:
And coefficient values:
It’s immediately evident from the confusion matrix where the machine is going wrong. It’s getting students wrong more often than right (only right 40% of the time), however it’s still doing pretty well at identifying adults. From the coefficients, we see that there’s no real strong feature any more. Instead, the machine has to look at patterns mostly within Twitter, Pinterest, Instagram and Vine usage to figure it out. Did you see these coming? You might have been able to name some apps, but figuring out their relative importance might have been trickier. In this case, the machine might have the upper hand over a human.
An overall accuracy of 60% is still not too bad because it’s greater than chance. The machine is still performing better than if we were to try figure out the type of person based on a coin toss, which at least demonstrates it has learned something. There are also ways we can look at improving the results of the machine:
- Probably the easiest is to add the highly differentiating features like Whatsapp, LinkedIn and Skype back into the mix
- We could also collect more data from people to increase the size of the training data set
- Expand the number of features available by including more in the survey, such as gaming or fitness apps (however, sometimes too many features can be a problem and the machine can be overwhelmed)
- We could improve the learning algorithm
- We could use a different learning algorithm
This was a very simple example of how machines can learn from real data, and I was very happy the live demo worked well enough to show some actual learning. I used social apps to try and be relevant but consider this: if we change that around, and look instead at certain websites people visit, or the types of videos they watch, couldn’t we also train machines to deliver specific content or suggest similar websites or videos to check out?
This is the essence of targeted advertising and content. Have you ever gone to a website and found it conveniently showing an ad for something you actually do need? It’s not luck. Something, somewhere, has tracked what you’ve been doing and based on what it knows about similar behaviours has classified you as a likely target for this particular ad.
Scary? Maybe. However, the potential convenience it offers could be worth it. After all, similar ideas are used for auto-complete, predictive text, suggestions on YouTube and similar short-cut behaviours. Machine learning is everywhere, whether it’s obvious to you or not. Have a think about where you might have missed it lurking, and whether something like Judgement Day is a possible threat from machine learning, or is there something else we need to be worried about? 😉