Web Application Attacks Detection Using Deep Learning

1 of

Published on Nov 03, 2022

Page 1 (0s)

Picture 7. Picture 4. Web Application Attacks Detection Using Deep Learning.

Page 2 (11s)

Content Placeholder 4. Nicolas Montes. Speaker’s BIO Data Scientist with a bachelor degree in Statistics. Currently finishing a MSc in Machine Learning and Data Science from the Engineering School of Universidad de la Republica, UDELAR..

Page 3 (25s)

Content Placeholder 6. AUTHORS & AFFILIATIONS. Nicolas Montes Gustavo Betarte (1) Rodrigo Martínez (1) Alvaro Pardo (2) (1) Instituto de Computación, Facultad de IngenieríaUniversidad de la República, Uruguay (2) Departamento de Ingeniería, Facultad de Ingeniería y TecnologíasUniversidad Católica del Uruguay, Uruguay.

Page 4 (39s)

Outline. Problem definition: attacks on web applications web application firewalls (WAF) Application of machine learning to improve WAF Advanced NLP approaches to text encoding and a review of the RoBERTa architecture Proposed solution based on two step learning: use of RoBERTa language model as a feature extractor One-class classification using OCSVM Results and conclusions.

Page 5 (1m 13s)

Web Application attacks. Content Placeholder 4. Picture 4.

Page 6 (2m 49s)

ML pipeline. One of the most challenging problems when applying machine learning based anomaly detection for web application security is how to extract features from the raw network data We treat a HTTP request as raw text and investigate advanced approaches in NLP for text encoding Transforming the HTTP into a vector of numbers (feature extraction). Then we use it as input for a one-class classifier.

Page 7 (3m 39s)

Classic text encoding. Words are represented as atomic units (one-hot).

Page 8 (4m 34s)

Text encoding with embeddings. Content Placeholder 4.

Page 9 (6m 20s)

RoBERTa high level architecture. Self-supervised Masked Language Model Stack of L identical encoder layers Each encoder layer contains two sub-layers. Multi-head self attention and feedforward network. Input representation.

Page 10 (7m 16s)

Our proposed t wo-step learning framework. HTTP request are treated as raw text 1st step: training a RoBERTa language model 2nd step: extract features from RoBERTa and perform one-class SVM classification.

Page 11 (7m 58s)

Design decisions. 1) Convert each request into a numeric feature vector with the RoBERTa model. Using the centroid of the token representations.

Page 12 (9m 1s)

Results. Content Placeholder 4. Picture 4. CurvasROC5 (1).pdf.

Page 13 (11m 44s)

Conclusion. The experimental results show that the proposed approach outperforms the ones of the classic rule-based MODSECURITY configured with a vanilla CRS without requiring the participation of a security expert to define the features We used a performance metric proposed by Wee Sun Lee and collaborators to automatically obtain the operational point of the OCSVM As further work we intend to re-train the pre-trained Language Model with more HTTP request for an extended DRUPAL dataset. We also plan to use a set of attacks and explore the fine-tuning approach for RoBERTa.

Page 14 (12m 36s)

Content Placeholder 4. Thank you!. Website: www.ciarp25.org Email: info@codan-consulting.com Twitter: @CiarpCongress Facebook: @CIARPcongress.