Saturday 3 February 2018

Book Review: Machine Learning and Security

Quick caveat for this review. I read this on Safari and the book hasn’t been published in paper yet (due late Feb 2018), so the content may be subject to change.

What differentiates this book from others that I’ve read around applying machine learning to security is that it dives a lot deeper than others on the why as opposed to how of machine learning. Whereas other books may outline some of the high level principles of classification and show some basic examples with approaches likes naive bases and k means clustering, this book goes beyond that and shows other approaches, along with explaining why different approaches would be more or less effective in each scenario.

Throughout the book the authors use the pretty standard Python scikit-learn package, which will be familiar to anyone who’s played around with machine learning to any extent.

The authors take the first two chapters to introduce the concepts around classification, clustering and anomaly detection and give various sample worked examples. While I often quickly skip these kinds of intro chapters in an area I’m familiar with, I’d recommend spending some more time here as these are the first areas where the depth of explanation of the areas is more in-depth than most and really sets the tone for the approach for the book, where the authors try and build an understanding of why different approaches should be used as opposed to just how to use them.

The authors then move on from the concepts to applied examples with code in Python for how to use this in analysing malware, network traffic analysis and fraud detection. The only criticism I’d have on these chapters is that the authors spend a bit too much time on explaining malware, network attacks and fraud, which is good of course for someone new to the topic, but for me someone coming to this book is probably more someone from a security space looking to understand machine learning, rather than the reverse.

The next chapter is one another that differentiates itself. It covers the challenges of taking machine learning from a fun and interesting exercise to understanding the challenges and potential solutions to dealing with production systems and real data. It covers issues like data and model quality, performance and maintainability, monitoring and privacy.

The book finishes off with a really interesting intro to how attackers can target machine learning and manipulate it, which is an area that I hadn’t spent much time considering previously. It covers obvious examples like manipulating the baseline of normal over time to avoid detection, but also some interesting ideas around manipulating how the model performs it’s analysis to bypass detection by adding features to your attack that would result in the model reducing the rating it applies to a potential attack. This chapter alone and the ideas in it should really be part of all pen testers toolkit and should be a really fun area to dig on for any system using machine learning.

Overall, I’d thoroughly recommend this book for anyone interested in applying machine learning to info sec data. Especially if you’ve already read some basic introductions to machine learning or come across more basic worked security examples.

Links:
Amazon: https://www.amazon.co.uk/Machine-Learning-Security-Clarence-Chio/dp/1491979909/
Safari: https://www.safaribooksonline.com/library/view/machine-learning-and/9781491979891/