• Goa'18
  • Training
  • Machine Learning for Pen-Testers and Security Researchers

Machine Learning for Pen-Testers and Security Researchers

Anto Joseph & Clarence Chio

Anto Joseph Clarence Chio

Trainer Name:Anto Joseph & Clarence Chio
Title: Machine Learning for Pen-Testers and Security Researchers
Duration: 2 Days
Dates: 28th Feb - 1st March 2018


Making & Breaking Machine Learning Systems is a fast-paced session on machine learning from the InfoSec professional’s point of view. In this training, students will not only get hands-on experience with developing intelligent, learning security applications, but also learn the techniques for training, tuning, and evaluating such systems. The course is positioned for security professionals who are interested in machine learning but may not have any practical experience with it. Machine learning is becoming increasingly ubiquitous in a variety of fields, and security professionals educated in this subject matter are better positioned to assess the (often lacking) security postures of machine learning algorithms and systems. This class does not promise that students immediately become machine learning experts, but does ensure that all applications and techniques learned can be directly and immediately applicable to the work done by security engineers, penetration testers, application developers, and InfoSec enthusiasts alike.


Making & Breaking Machine Learning Systems is a fast-paced session on machine learning from the InfoSec professional's point of view. The class is designed with the goal of providing students with a hands-on introduction to machine learning concepts and systems, as well as making and breaking security applications powered by machine learning.

The lab session is designed with security use-cases in mind, since using machine learning in security is very different from using it in other situations. Students will get first-hand experience at cleaning data, implementing machine learning security programs, and performing penetration tests of these systems.

Each attendee will be provided with a comprehensive virtual machine programming environment that is preconfigured for the tasks in the class, as well as any future machine learning experimentation and development that they will do. This environment consists of all of the essential machine learning libraries and programming environments friendly to even novices at machine learning.

At the end of the class, students will be put through a CTF challenge that will test the machine learning development and exploitation skills that they have learned over the course in a realistic environment.

Course Outline Day-wise Day 1

  • Introduction to machine learning
    • Hands-on guided exploration of Python machine learning libraries:
      • Data-wrangling using Numpy and Pandas
      • Scikit-learn’s functions and capabilities
      • Data visualization using Matplotlib / Seaborn
  • Walkthrough of the most commonly used machine learning algorithms (with quick hands-on examples / visualizations for select algorithms)
    • Supervised learning algorithms
      • Linear / logistic regression
      • Support Vector Machines
      • Decision trees / Random forests
    • Unsupervised learning algorithms
      • Clustering
    • Semi-supervised learning
  • 2-hour example: Building (and bypassing) an email spam filter with scikit-learn
    • Loading data efficiently
    • Using a labeled email / spam corpus training and test set, extract salient features to build a word model of spam
    • Model tuning, cross-validation, and evaluation process
    • With complete knowledge of the system, manually craft a piece of spam to bypass the filter
  • Lecture on application of machine learning in the security / abuse spac
    • Spam, fraud, malware, phishing, and intrusion detection short examples
    • Principles behind selecting the best machine learning models for different use-cases
    • Considerations when using machine learning in an adversarial / malicious environment
  • Solving practical problems in real-world machine learning deployments
    • How to explain the predictions made by your model (using LIME)
  • Day 2

    • How to approach the problem of class imbalance (using imbalanced-learn)
    • How to approach model / result evaluation in an unbiased way
    • How to efficiently approach model hyperparameter tuning (grid search etc.)
  • Deep learning for security
    • Using Keras / TensorFlow for anomaly detection with convolutional neural networks
    • Choosing the appropriate model for implementing different types of problems - efficacy comparison of different machine learning techniques for solving the anomaly detection problem, and what other considerations to have
  • 2-hour example: Building a simple network intrusion detection system with 2 different machine learning models
    • Importance of understanding the data and the threat model before designing a solution for the problem
    • Model tuning, cross-validation, and evaluation process
    • Guided comparisons of the performance characteristics for each implementation
    • Visualizing and presenting the data for ease of analysis by security operation professionals.
  • Streaming pipelines for machine learning using Apache Spark MLlib (PySpark)
    • Overview of Apache Spark
      • General architecture
      • Distributed, scalable machine learning deployments with Spark
    • Guided example of a streaming architecture for network anomaly detection using reinforcement learning on Spark
  • Evaluating the security of machine learning systems
    • Techniques and guided example of fuzzing a classifier and regressor to find blind spots in the model
    • Evaluation of intelligent learning system architecture that is resilient to model poisoning by an adversary
  • Machine Learning CTF challenge - "next-gen" WAF bypass challenge

What to bring?

  • Latest version of VirtualBox Installed
  • Administrative access on your laptop with external USB allowed
  • At least 20 GB free hard disk space
  • At least 8 GB RAM (the more the better)


  • Basic familiarity with Linux
  • Python scripting knowledge is a plus, but not essential

Who Should Attend?

  • Security Professionals
  • Web Application Pentesters
  • Software / application developers
  • People interested to start using machine learning for security

What to expect?

  • Familiarizing yourself with popular machine learning algorithms and how to adapt these to different problems
  • How to clean and sanitize data using powerful data processing libraries in Python
  • How to build a spam classifier and online anomaly detection system in Python
  • How to do performance evaluations of machine learning classifiers
  • Examples of using machine learning in intrusion detection, botnet detection, phishing detection, web vulnerability analysis, malware classification, and behavioral analysis
  • Perform tuning of machine learning systems to improve classification / detection results
  • Perform security evaluations and penetration tests on machine learning systems
    • Fuzzing machine learning classifiers
  • How to avoid vulnerabilities in machine learning system and algorithm design
  • How to use Apache Spark to design scalable and distributed real-time machine learning systems
  • Write your own machine learning malware classifier

What not to expect?

To be a machine learning expert in just two days. This training will impart you all the necessary skills to start building security software using machine learning and teach the lesser known ways of exploiting such systems. Students need to put in further work and use the skills learned in the class to continue their explorations in machine learning and keep up with the latest developments in this fast-evolving field.

About the trainers

Clarence Chio (@cchio)

Clarence Chio is an engineer and entrepreneur who has given talks, workshops, and trainings on machine learning and security at DEF CON, BLACK HAT, and other security / software engineering conferences / meetups across more than a dozen countries. He is a co-author of the new O'Reilly Book "Machine Learning & Security: Protecting Systems with Data and Algorithms". He was previously a member of the security research team at Shape Security, a community speaker with Intel, and a security consultant for Oracle. Clarence advises a handful of startups on security data science and is the founder and organizer of the "Data Mining for Cyber Security" meetup group, the largest gathering of security data scientists in the San Francisco Bay Area. He holds a B.S. and M.S. in Computer Science from Stanford University, specializing in data mining and artificial intelligence.

Anto Joseph (@antojosep007) Anto Joseph

Anto Joseph is a Security Engineer for Intel. He was involved in developing and advocating security in Machine Learning and Systems / Mobile and Web Security Research. He is very passionate about exploring new ideas in these areas and has been a presenter and trainer at various security conferences including BH USA, Defcon, BruCon, HackInParis, HITB Amsterdam, HackLu, Hacktivity, PHdays, X33fCon, NullCon, c0c0n and more. He is an active contributor to many open-source projects and some of his work is available at


Copyright © 2019-20 | Nullcon India | International Security Conference | All Rights Reserved