Máirtín O Sullivan

Wednesday, 10 February 2016

Book Review: Hadoop Security

As a follow up to my basic into to Hadoop with Hadoop 2 Quick Start Guide, I wanted to get more detail on the security features available in the Hadoop ecosystem and this sounded like it fitted the bill and was recently published (June 2015) so figured it would be pretty up to date.

One thing that I immediately liked about the book is that apart from a very brief few pages of an intro to security concepts, it get straight into things, which for me is always a good indication taht there won't be much padding in the book.

The book first starts off with a section on security architecture starting with a basic look threat modelling for distributed systems, which is a nice touch as really threat modelling should be part of any security architecture discussion and even touching on this at a high level is great, as is puts the whole book in context.

The next chapter moves onto general security architectures in a Hadoop environment, covering network level segregation, OS level security and an overview of the different types of Hadoop node roles. This was a great start to the book as immediately it starts working through the different nodes, what user roles need access to them, what nodes can be segregated from direct access and how at a high level they interact for data loads and job submission.

The final chapter of the architecture section finishes up with an overview of Kerberos, which while initially seemed a bit strange, it becomes obvious why later on as Kerberos plays such a key role in Hadoop security. If you need to get up to speed quickly on Kerberos, I’d highly recommend Kerberos: A Network Authentication System… it’s a quick and easy read that I read over ten years ago and it’s still as good now as it was then.

The next section deep dives more into authentication and at this point the book gets straight into the hands on configuration guide, covering detailed configuration steps required to map Kerberos principles into the Hadoop world, how to map to local users, how user groups work in Hadoop and mapping to LDAP groups. The chapter then moves on to cover the various authentication protocols in use across the Hadoop ecosystem, before explaining the differences between simple and Kerberos authn and then a nice dive into token auth, including the flows of how delegation tokens are created to allow various systems to impersonate users. The chapter finishes off with a fully worked Kerberos authn configuration guide, which to be fair I skimmed over as I don’t need that level of detail at the moment.

The next chapter moves onto authz covering HDFS ACLs and extended ACLs and various service level authorisations before moving on to MapReduce (1 and 2) and YARN, and Zookeeper ACLs, HBase, and Oozie. There’s a few nice worked examples here of the effects of authz restrictions and what errors users will see when their access is restricted.

The book then moves on to cover Sentry, which is Apache’s attempt to centralise authz within the Hadoop ecosystem, which after reading through the previous few chapters it’s obvious it’s needed! The basic architecture on which Sentry works is covered and how it integrates with the various applications and then walks though how to configure each application to use Sentry. Again a very practical oriented approach is taken here with a lot of detail on the configuration steps.

The last chapter in this section covers the logging available by default in each of the various applications and their basic config. This is a quick chapter and really just goes to show the configuration aspect, rather than any analysis approaches to the logs.

The third section of the book moves onto data security, specifically to cover encryption of data in transit and at-rest, starting with great coverage of how HDFS file encryption works. What was particularly good in this chapter was the strong emphasis it places on the key management and also making the reader conscious of potential lack of encryption on temporary data such as logs. The second half of the chapter covers encryption of data in transit, mainly focusing on the configuration of SSL/TLS in the various applications in the ecosystem.

The next chapter is a short one and looks at security of data as it is loaded into the Hadoop ecosystem, covering both the confidentiality and integrity of the data, but mainly focusing on confidentiality/encryption. The following chapter then covers how client access of data in the Hadoop environment can be performed securely, focusing of course on the edge nodes and how users interact with them, through command line RPC or APIs. From an architecture perspective, I found this chapter particularly helpful as it does a good job of describing the trust boundary that will exist in most deployments and how this should be architected securely.

The last chapter in this section covers Cloudera Hue and to be honest I just skimmed this one as it wasn’t relevant to me.

The final section of the book covers some use cases nicely, outlining scenarios with business and security requirements, before walking through how to architect and configure the right mix of controls to meet the requirements. For me I would have loved more examples here as this is more at the level I’m working at, rather than the technical configuration. But still, great to see it presented in this way.

Overall, this was a great book that to be fair goes into a lot more depth in terms of technical configuration settings than I needed. This can make it a tough read if you’re just looking for the high level, however, if you’re setting up a Hadoop cluster then this should be your go-to book.

However, it also works great at the level I was looking for as it's got a strong focus on architecture considerations and puts the security functionality into context rather than just explaining the feature sets available. You just may need to skim some of the more detailed sections like I did!

Links:
Amazon: http://www.amazon.com/Hadoop-Security-Protecting-Your-Platform/dp/1491900989/
Safari: https://www.safaribooksonline.com/library/view/hadoop-security/9781491900970/

Monday, 25 January 2016

AWS Certified Solution Architect Associate Exam

Been an age since I've done any exams so decided before Christmas that I may as well try and use some of the AWS exams to formalise my knowledge a bit.

First one I picked was the AWS Solution Architect Associate and passed it this morning so figured I'd give some feedback on how I found the exam.

In terms of material I used, apart from obviously just using AWS practically and reading quite a few books over the last year on AWS, the main material I used for exam prep was the ACloudGuru training courses.

I went through all the ACloudGuru AWS associate level training material (Architect, SysOps, Developer) and the AWS Solution Architect Professional material before doing the Architect Associate exam. That's probably not necessary but I listen to a lot of material in the car so managed to get through all of them over the last two months. At the price point the ACloudGuru exams are at (very inexpensive), I would recommend that you buy all of them. The quality of the material is very high, they're interesting to listen to, they highlight a lot of exam specific questions and also are updated very frequently, which due to the pace as which new products and services are released on AWS, is essential for these exams.

I also used the official practice exam that costs $20, and this was actually worthwhile as it was reasonably close to the real exam.

So, onto the exam itself..

If you're not familiar with it, it's a 60 question, 80 minute proctored computer based exam. Similar to all the ones you've probably done before in IT in that sense.

Coverage of topics was pretty much in line with what the ACloudGuru courses stated, and with a strong emphasis on EC2, S3, the security responsibilities of users vs AWS and a lot on which combinations of AWS services to use in different scenarios. A few question on RDS, DynamoDB, Route 53 and some billing/management questions.

About 80% were scenario questions, with around three or four lines to read in each scenario. Nearly all were choose two/three of the following four answers, rather than choose one. These are more like the scenario questions you get in the practice exams, but more awkward and complex.

The remaining 20% were straight forward questions like you’d see in the ACloudGuru examples.

Most questions were effectively based on application of the knowledge you get in the Udemy/ACloudGuru course. e.g. If you know private subnets don’t include routes to the Internet, you can rule out certain answer options that include systems on the Internet accessing instances on these subnets.

The wording I thought was very poor and appears designed to catch you out, rather than test your knowledge, which always annoys me. You spend more time reading the scenarios to try and figure out what they’re actually asking, instead of clearly understanding what the requirements are and answering the question based on this. Of course, it could be said that this is probably close to having to actually interpret people’s attempts at giving you requirements in real life… :-)

Also, I noticed the fonts change in the exam questions (even when it isn’t designed to indicate a different context), which indicates pretty poor quality control on the user experience and creation of the questions (guessing copy/paste between materials).

There was at least one question where all the answers were incorrect, but two of the four answers were so completely wrong that it could only have been the other two (incorrect) answers.

In terms of timing, I got through all 60 questions in 55 minutes but had marked around 17 questions for review. I took another 10 mins for reviewing those and finished up with around 15 minutes to spare. Normally I get through these types of exams in around 60% of the time allocated so this was about right based on previous exams.

I would say though that marking questions for review is very strange for me and marking almost 30% is really high as I normally don’t ever bother reviewing any. However, some of them were so strange that I needed to re-read them at least three/four times and even then it still wasn’t entirely clear what they were asking in a few of them.

In terms of confidence in my answers.. I really couldn’t have said at the end if I passed or not, or even if I got 40% or 90%.. In the end I got 81%, which wasn’t too bad as I realized after that I’d gotten two obvious questions wrong.

To be fair, as an Architecture related exam, it’s actually does focus on application of knowledge of AWS, rather than just regurgitating lists of answers learned off. So in that sense I think it's actually a really good approach to take for this kind of exam.

However, I think it could definitely do with tightening up on the quality of the question wording as that’s key if you want the exam to be focused on application of knowledge, rather than just pick the right list of answers.

Overall, I'd definitely recommend doing this exam, especially as it gets you to learn bits of AWS that you may not use on a day to day basis yourself, and with how quickly new services pop up, this can't be a bad thing..

I'll definitely drive on with these and just need to figure out if I want to do all of the associate level exams now or just go straight to the Solution Architect Professional exam...

Wednesday, 30 December 2015

Book Review: Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem

Having never installed or played around with a Hadoop environment myself, I was on the look out for an intro style book that would give me the basics and enough info to start me off.

When browsing this one caught my eye as I didn’t even realise there was a Hadoop 2 and the title was pretty much spot on for what I was looking for so decided to give it a shot.

Overall, I enjoyed the book and it was spot on for what I was looking for. It’s a traditional tutorial/walk through type of book on how to get a Hadoop cluster up and running and how to admin/interact with it, but it also covers enough theory that you don’t need to have any prior experience with Hadoop to follow along.

However, I would say that I think it’s overpriced in the paper edition and retail price ebook so if you’re interested in this book, try and read it on Safari or get a Kindle edition to make it affordable. Other than that definitely recommended.

The book starts off with a really good overview of what Hadoop is, the MapReduce pattern and the changes in Hadoop 2. Good intro material.

The next chapter is a more traditional walk through on how to install Hadoop uses both the Hortonworks distribution and the Apache sources. It also covers use of Ambari for a simple web based admin console for your cluster. Nothing too detailed is explained here as it’s covered off later, but it’s a straight forward walk through so is spot on for that.

The third chapter gives a really good intro to how HDFS works, covering the nodes involved, their roles and the approach taken to replication and then some basic file system commands. I particularly enjoyed this chapter as I hadn’t used HDFS before and so some of the concepts around the different nodes, compute following data, append only files and block sizes were spot on for what I needed to understand.

The forth chapter covers running jobs and monitoring them in the web gui, along with some examples for base lining the performance of the cluster.

The fifth and sixth chapters walks through the MapReduce approach to data analysis, using word counting in text files as the main example and then moves on to the basics of writing code to create MapReduce jobs, covering the basics in Java and Python. Simple and straightforward, but again spot on in term of depth.

The seventh chapter runs through some of the other Apache tools within the Hadoop ecosystem, covering Pig, Hive, Sqoop, Flume, Oozie and HBase. These are just quick overviews but interesting as I wasn’t aware of some of these.

The eight chapter is really nice in that it focuses exclusively on YARN (Yet Another Resource Negotiator), which is new to Hadoop 2 and is one of the big differences in the new version. It walks through how to use YARN for things other than the traditional MapReduce pattern, using the YARN distributed shell as an example, before touching briefly on how some of the other Apache tools can be used with YARN.

The last two chapters focus on admining Hadoop through the commands required and the Ambari interface. I skimmed these as I’m only doing a very basic setup to get my head around Hadoop but would look back to these as needed.

In summary, the author notes initially that this book is written to a "hello world" level in terms of depth and that’s spot on across the book. It gives you enough info to get you to a working example, and then it’s up to you. I really liked this analogy and it’s exactly the level I was looking for. I also liked the author’s style of writing so will also be going looking for more of his book to find some more advanced material on Hadoop.

If you looking for an intro to Hadoop that’s a nice combination of both theory and high level tech implementation, then this is definitely worth a read.

One thing I would say is that I got through the book very quickly (3 hours roughly), and was surprised to see when I checked Amazon that the paper version is just over 300 pages as it really didn’t feel like that. It reads more like a book of around 150 pages, which in my head makes sense for quick start book.

Why I highlight this is that while I really enjoyed the book, as I mention earlier, I don’t think it’s worth the price of $27 that the paper version is currently retailing for. For me it’s more in the $15 - $18 bracket and so if you’re going to read this then definitely try and go for the Kindle edition which is worth it at $17.

Links:
Amazon: http://www.amazon.com/Hadoop-Quick-Start-Guide-Essentials-Addison-Wesley/dp/0134049942
Safari: https://www.safaribooksonline.com/library/view/hadoop-2-quick-start/9780134050119/

Monday, 28 December 2015

Book Review: Creating A Data-Driven Organization

I stumbled across this book while browsing and it’s title obviously jumped out to me as I'm always interested in anything to help quantify analysis or build data driven approaches to what I do.

I wasn't entirely sure what to expect but in summary, it's a really enjoyable, easy read on how to build data-driven teams and the culture to support them in an organisation.

The book starts off by establishing what the author really means by data-driven, touching on some of the fundamentals of data quality, collection and analysis.

After these initial chapters the book really got interesting for me as it starts to look at the organisational and cultural consideration of building a data-driven program.

The author first outlines the different skillets required for a rounded data-driven analysis team, covering skillets like business skills, programming, devops, stats, visualisation, machine learning and big data analysis. I really liked how the author shows these as complementary skills across the team, but highlights that your team don't need to be experts at all.

One really nice aspect is that the need for strong visualisation is hishlighted immediately, specifically in relation to it’s role in not just performing data analysis, but selling it the rest of the organisation. This is further later on in the book through a whole chapter on visualation, including how it can/should be used effectively, covering a lot of the ideas from Tufte, etc in a really nicely summarised form.

The author then moves on to describe the different types of data-analysis, how they are used and then works through some discussion around metrics and A/B testing as core examples of how data analysis can be applied to business contexts.

The next three chapters cover what I think to be the most important aspect of the whole book; the approach of decision making and it’s effect on data-driven approaches, the key comments of a data-driven culture within an organisation and the role of the C-suite in establishing this culture. These chapters outline many of the key cultural challenges to moving towards a more data-driven approach and are great reads for anyone who may be pushing for more data analysis within their organisation, but it struggling to get traction.

The book finishes out with a chapter on privacy, ethics and risk, which obviously as a security guy I love to see. I particularly like the “ick” factor approach that the author outlines to dealing with data analysis and privacy.

Overall I think this book is a great introduction to a lot of topics relating to data analysis and data driven decision making, and incorporates some really good lessons on organisational structure, culture, skillets and challenges with adopting data-driven approaches within organisations.

The author highlights thoughout that this book doesn’t touch on the tools or technology used for data analysis, or details on data analysis approaches, as these are covered in many other books, which are referenced at needed. So if you're looking for this type of material, definitely go elsewhere.

However, if you’re new to applying data-driven approaches to your field (IT, business or otherwise) or if you’re a manager or leader looking to understand how you can affect change within your organisation towards a data driven approach, I'd highly recommend this.

Links:
Amazon: http://www.amazon.com/Creating-Data-Driven-Organization-Carl-Anderson/dp/1491916915
Safari: https://www.safaribooksonline.com/library/view/creating-a-data-driven/9781491916902/

Monday, 23 November 2015

Book Review: Building Microservices

Continuing my up-skilling on cloud security, I wanted to get a better handle on application architectures that map into cloud computing patterns and while micro services aren’t a cloud specific architecture, the key goals of loose coupling, high scalability, etc align well to a cloud environment so I figured this would be a good book to have a a read through.

The first two chapters are very easy reads, covering an introduction to micro services and their benefits, mapping strategic goals to principles and practices.

The next chapter introduces the fictional MusicCorp organisation and application that is used throughout the remainder of the book, demonstrating the concept of bounded contexts and how to apply it to a monolithic application. At this point the book really gets into more detailed discussions on the topics, with each of the further few chapters being pretty meaty in comparison to the earlier chapters. The rest of the chapter covers some of the key technologies that can be used to facilitate micro services (RPC, SOAP, REST, XML, JSON, message queues, etc) touching on both the positives and negatives of each and also covers area like versioning, choreography/orchestration and integration with COTS.

The author then expands on the MusicCorp example and uses it to demonstrate how to split out the application into multiple micro services, before moving on to CI topics like deployment and testing and a further short chapter on monitoring. For me the chapter on breaking a monolithic application into micro services wasn’t as relevant for what I was looking for, but it some of the high level approaches were interesting to understand how it may be tackled.

Security is touched on next but as the author mentions early on in the book, he’s not a security specialist so this is a fairly light chapter covering authn/authz and SSO, touching on OpenID Connect and SAML One nice thing to see in this chapter was a call to be frugal with data storage in light of potential data loss events, particularly where personally identifiable information may be in play. Nice touch!

The last main two chapters cover system/organisational design and micro services at scale, both of which I thought were great introduction on the topics. Too many organisations think concepts such as devops or micro services can simply be tacked on to their exist structure, but this chapter does a nice job of dispelling this myth. Chirstian Posta wrote a really good blog post on this specifically related to micro services and I'd also recommend Mike Cohn's chapter on team structure from his Succeeding with Agile book.

One aspect of the book that I really liked was the liberal use of links to other material and books when further more detailed explanations are merited. This avoids the author going off on tangents, which I often find many authors doing (sometimes as a necessity to explain a concept… and sometimes just to pad the book).

I’m not a developer or application architect so at times the book goes slightly into too much detail for what I needed, but to be fair this only rarely happens and so doesn’t detract from the overall flow. Of course that probably means for someone who is a developer or application architect that it won’t go into enough detail, which sometime that is held out by other reviewers.

From a security perspective, I’d highly recommend this book as a great way to get up to speed on how applications should/will be deployed in the cloud and microservices in general. Additionally, if you're still working in a very specialised/siloed organisation, this should be up there to read to understand how things may change. Ultimately, if your organisation isn’t doing something in this space now, then they will soon and you may as well be up to speed!

Links:
Amazon: http://www.amazon.com/Building-Microservices-Sam-Newman/dp/1491950358
Safari: https://www.safaribooksonline.com/library/view/building-microservices/9781491950340/

Wednesday, 4 November 2015

Book Review: Python And AWS Cookbook

I’ve been playing around with AWS properly for the last couple of months and had mainly been getting myself up to speed with the key security considerations, the console, the various services AWS provides, while playing around with setting up an ELK stack in AWS.

I’ve also been interested in playing around more with Python so figured this would be a good opportunity to combine both and get a better understanding of AWS and Python at the same time.

This book introduces you to the Boto Python interface to AWS and walks you through a series of very simple examples of how to use it.

The book is primarily split into two section; one covering EC2 and the other S3. Both sections cover all the basics that you could look for, including how to enumerate the EC2 instances/S3 buckets in your account, how to loop through regions (if needed), how to create new instances or buckets and how to edit tag, metadata and such. The book also covers some basics of ELBs, security groups and S3 permissions so basically, most of what you’d need to do some basic scripting of EC2 and S3.

The book briefly touches on CloudWatch and SNS but nothing too in-depth, nor does it cover any of the other AWS services Boto current supports (See here for the current list).

A lot of people have criticized this book as only touching the surface of AWS, and that's definitely true.. you can know nothing about Boto at the start and get through this book in around eight hours (probably much less if you know Python beforehand.. which I didn’t). However, at the end you’ll know enough to get and running and can then loop back to either the Boto or AWS documentation to fill in any more gaps.

If you’ve just used the AWS console and haven’t tried you hand at the API, then this is a perfect intro to the nuances that exist with the APIs and ultimately you’ll learn way more about AWS because you’ll start seeing options or constraints in the API calls that you may not even realize exist (or at least I know I did!).

While there’s nothing much here that you can’t get directly from the Boto documentation, I always like following a book along as opposed to jumping around read-me docs so if you’re similar, and looking for a book to kickstart your understanding of Boto and to help you put together some basic scripts for AWS, I really recommend this.

I'd love to see an updated edition of this book, as it was released in 2011 and things around moving so quickly on AWS, it would benefit from a refresh and also some more examples added.

Links:
Amazon http://www.amazon.com/Python-AWS-Cookbook-Mitch-Garnaat/dp/144930544X/
Safari https://www.safaribooksonline.com/library/view/python-and-aws/9781449308100/

Saturday, 21 March 2015

edX Economics of Cybersecurity Course Review

I've been keeping an eye on the area of economics and information security for around eight years so when I saw this course pop up back in November I signed up immediately for January, despite not really knowing what to expect.

If you work in information security and are a fan of the Freakonomics series of books/podcasts, then the ideas used to analyze info sec in this course will be right up your street so just go and sign up now for the next session!

If you're not sure what on earth economics of information/cyber security is, then have a quick read of this paper and it'll give you a much better intro than I could ever give.

The course itself is the usual style of MOOC with recorded video sessions along with discussion forums, live webinars and some multiple choice questions at the end of each section.

The course material is split over six sections covering the following topics, with each section around an hour in length and with an accompanying webinar of a further hour:

An introduction to economics in the context of security;
Measurement of security;
Security investment and management:
Market failures and
Human factors in security.

In terms of content, I thought that the material was a fantastic introduction to a wide range of aspects of economics of security and pretty much spot on for the level of detail I was expecting. I would have loved to see more detail but have to appreciate that it's an introductory course!

I found that based on my existing reading in the area, I was very familiar with the majority of the content in the course, in particularly the areas around the fundamentals of applying economics in security, measuring security, investment and risk management and behavioral heuristics/biases. However, the section on policy interventions and privacy definitely gave me some new insights.

In terms of pre-requisitions for the course, I feel that if you had never done micro economics or had any exposure to the area of economics before, then it'll probably be a bit of a shock to the system on the first week as they very much dive straight in! Because I'd read a lot on the topic, albeit in a completely unstructed way, I was pretty familiar with almost all of the topics covered and with some basic background in economics I was able to keep pace no problems.

I also felt that some of the sections could have had more context set initially to lead people from a traditional, technical information security background in. For example, the human factor section jumps straight into explaining the reasons behind poor decision making by individuals, but doesn't really explain where in information security you'd normally see these kinds of poor decisions being made. For more experienced info sec professionals, they'll immediately understand the context in relation to either risk management decisions or end user opinions, however for more junior people, outlining the examples up front in simple terms would greatly benefit the course.

I was a bit disappointed with the multiple choice questions at the end because when you got answers wrong, there was no way to get prompts as to what the right answers were and you only get the description of what the right answer was when you get the answer right... So in the end I found myself attempting to brute force the answers for a number of questions, just to understand why I go the question wrong!

I really enjoyed it personally as a refresher in the area and also learned some new aspects that I hadn't come across in the areas of market failures and policy intervention and privacy. Also, I'm always a big fan of inter-disciplinary approaches to information security as I find if you stick with just learning from people who come from the same educational/professional background as yourself, it's very easy to become siloed in the way you look at a problem.

Overall, I think that this entire course should be considered mandatory content for any security management type certifications (CISM, etc) as it provides a fantastically unique view on security that if you're working in info sec management, you really need to understand.

I'd love to see a follow on, more in-depth course form the same lecturers to go into more detail on the topics covered in this corse, look at some practical examples of analysis and review and compare/contrast the different research that has been published in the area of economics of info sec over the past few years. Hopefully that won't be long coming!

edX: https://www.edx.org/course/economics-cybersecurity-delftx-econsec101x