Malicious traffic detection using traffic fingerprints and machine learning

Over the past year, we’ve worked on a machine learning project at Ben Gurion University of the Negev.
The project attempted to find out if we can identify malicious underlying traffic (viruses, botnets, command and control channels) hiding interspersed in ‘normal’ network traffic, without using advanced heuristics or deep packet inspection – but by using the statistical breakdown of the packets and supervised machine learning algorithms, as well as clustering.

By using inter-arrival and departure times of the packets seen on a network connection in conjunction with the Lempel Ziv 78 (LZ78) compression algorithm to assign probabilities, we arrived at some interesting results.

This means that even malware which transports data through TLS encrypted flow can be identified, without decrypting the data first.

The article was originally to be published in November 2014, but we missed several deadlines. Instead of having it be buried in my files, I’ve attached the article for any and all interested.

The research paper

Malicious traffic detection using traffic fingerprint (PDF, 6MB)

Our Python3 source code

https://github.com/arnons1/trafficfingerprint

Comments

2 responses to “Malicious traffic detection using traffic fingerprints and machine learning”

  1. sam ed Avatar
    sam ed

    Really interesting. thank you. what traffic fingerprinting application did you use ?

    1. arnon.shimoni@gmail.com Avatar
      arnon.shimoni@gmail.com

      We did our own fingerprinting, with Wireshark pcap files.
      The algorithm is explained in the document.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.