Supporting repository for the paper: "ECH-Resilient Malware Detection via Flow-Level Statistical Features" By Márton Pál Lipcsey-Magyar, Attila Ármin Madarász, and Adrian Pekar
The deployment of Encrypted Client Hello (ECH) challenges TLS fingerprinting, the dominant approach for encrypted malware detection. This paper presents a comprehensive evaluation of flow-based statistical features as an ECH-resilient alternative. Through rigorous validation against the official JA4+ implementation, we demonstrate that only 64.9% of malware families possess unique signatures, fundamentally limiting fingerprinting recall.
Our results show that Random Forest classifiers using combined flow statistics achieve 98.11% F1-score for binary malware detection with 97.22% recall—substantially exceeding fingerprinting's theoretical maximum of 64.9%. These findings establish flow-based classification as a practical approach for maintaining network security visibility as encryption technologies advance.
├── reproduce-research/ # Validation pipelines
│ ├── paper-pipeline/ # Reproduce using original author's data
│ ├── nfstream-pipeline/ # Reproduce using NFStream extraction
│ └── verify-ja4-calculation/ # JA4+ conformance validation
│
└── paper-code/ # Main classification system (Python)
See paper-code/README.md for detailed usage instructions.
| Model | Feature Set | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|---|
| Random Forest | Combined | 97.07% | 99.02% | 97.22% | 98.11% |
| Random Forest | Core | 96.55% | 98.66% | 96.91% | 97.78% |
| Random Forest | SPLT | 96.65% | 98.74% | 96.95% | 97.84% |
| Neural Network | Combined | 90.03% | 94.39% | 92.78% | 93.58% |
| Model | Feature Set | Accuracy | Macro F1 |
|---|---|---|---|
| Random Forest | Combined | 61.62% | 54.81% |
| Random Forest | Core | 59.66% | 52.39% |
| FAISS k-NN | Combined | 43.97% | 34.30% |
| Metric | TLS Fingerprinting (JA4+JA4S+SNI) | Flow-Based ML (RF+Combined) |
|---|---|---|
| Recall | ≤64.9% (theoretical max) | 97.22% |
| F1-Score | ≤78.6% | 98.11% |
| ECH-Resilient | No | Yes |
| Malware Coverage | 64.9% | 100% |
Open paper-code/notebooks/malware_classification_experiments.ipynb for an interactive notebook with all experiments, visualizations, and analysis.
The experiments use the malware traffic dataset from:
Matoušek, P., Přívora, J., & Ryšavý, O. (2024). "TLS Traffic Analysis: Malware Classification with JA4+ Fingerprints"
Dataset characteristics:
- 16,542 flows across 101 families (59 malware, 42 benign)
- Sources: Desktop malware, mobile malware, desktop apps, mobile apps
- Authenticated and labeled network traces
Note: The full dataset is not included in this repository. Please refer to the original paper for access.
- Volumetric: Packet counts, byte volumes (bidirectional, src→dst, dst→src)
- Temporal: Flow duration per direction
- Statistical: Packet size distributions (min, mean, stddev, max)
- Timing: Packet inter-arrival times (PIAT) distributions
- First 25 packet sizes in arrival order
- Captures protocol-specific patterns
- Early detection capability
- Synergy between macro-level (flow stats) and micro-level (SPL) patterns
- Best performance across all tasks
- 3 Classification Tasks: Binary, Full Multiclass (101 classes), Malware-only (59 classes)
- 3 Feature Sets: Core (33), SPLT (25), Combined (58)
- 3 ML Models: Neural Network, Random Forest, FAISS k-NN
- Total: 27 experimental configurations
- Reproducibility: Fixed random seeds (42), stratified 80/20 splits
- Márton Pál Lipcsey-Magyar - Budapest University of Technology and Economics
- Attila Ármin Madarász - Budapest University of Technology and Economics
- Adrian Pekar - Budapest University of Technology and Economics & CUJO LLC
For questions about the paper or code:
- Adrian Pekar: apekar@hit.bme.hu
Supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences and Celtic-Next project RAI-6Green: Robust and AI Native 6G for Green Networks (C2023/1-9, funded by 2024-1.2.6-EUREKA-2024-00009).
Note: This repository contains the complete implementation and validation pipelines supporting the paper. All experimental results are reproducible using the provided code and methodology.