Monday, 17 March 2014

Book Review: Data-Driven Security - Analysis, Visualization and Dashboards

Ever since I read The New School of Information Security back in 2008, I've been interested to looking at how data can drive better decision making in information security.

At first this started with looking at security metrics and then progressed into looking at economics of information security and quantitative risk assessment, touching on statistics and machine learning along the way and leading towards the whole concept of data science.

During these years, I've come across both the authors of this book (Jay Jacobs and Bob Rudis) through forums such as SIRA and Security Metrics so when I heard back in January that they were releasing this book I knew it was going to be worth a read!

By pure chance I ended up attending RSA this year and so had the pleasure of meeting both Jay and Bob where they were signing copies of Data Driven Security so I am the very happy owner of a signed copy.. thanks guys!

The book covers the concepts, tools and techniques that can be used to analyze different types of information security data sets and explains many of the common pitfalls in both approach and interpretation of the results of this analysis. It's effectively a perfect introduction to data science/analysis for information security!

The book starts off by introducing the reader to what data analysis is, covering historical concepts and how to create a good question to answer with analysis, rather than simply analysing data for the sake of it.

It then moves on to provide an introduction to the R programming language, a free statistical programming language, and also how they us Python in conjunction with R to analyze data.

The book is very practically oriented, encouraging the reader to start playing around with both Python and R by providing full coded examples of all the analysis performed in each chapter. To make life easier, all the code examples can be downloaded from the books website and any data sets used for analysis are either publicly available already or can be downloaded with the source code.

Once you get your head around the basics of using the tools for analysis, the book then walks through examples of the different types of analysis that information security data sets may require, covering things like exploring data sets of malware infections, performing regression analysis on malware data and applying machine learning to breach data. Throughout the examples, the book puts a strong emphasis on visualization of data including both the common mistakes in presenting data analysis and also looks both at static and interactive visualization.

The book also briefly touches on NoSQL databases but this is very much just to show that they exist and where they may be used. I'd highly recommend Seven Databases in Seven Weeks if you're looking for a bit more info on this side of things.

The book finishes off with a look at what a data driven approach means for information security, what core skillsets are needed and how a team can be built. It ends on a very interesting example of how Bob's team started off focusing on just one single question to answer "Have we seen this IP before in our external perimeter logs", which is a perfect illustration of finding a single framed question to answer through analysis, rather than trying to boil the ocean on your first attempt at analysis.

One other excellent aspect of the book is that at the end of each chapter, a number of other books are highlighted as further reading, but with brief summary of why each book is interesting in relation to the chapter. What was very interesting for myself was that I'd actually read many of the books referenced, but hadn't put it all together in the context of information security.

Overall I thoroughly enjoyed reading this book and while I haven't had the time to start looking at applying the ideas in the book to my own data sets, it's opened up a whole world of analysis tools and techniques and has effectively shortcutted my learning in the area dramatically.

The biggest benefit I see from this book is the highly practical oriented approach, which allows anyone with an interest in information security data analysis to quickly get up to speed in the basics, allowing for them to at least have the tools and knowledge to start trying to ask interesting questions and get results, without having to re-invent the wheel.

If you've ever been sitting in front of a huge set of firewall or webserver logs during an incident trying to figure things out by greping, cuting and counting results you're going to get a lot from this book!


No comments:

Post a Comment