Privacy and security
on the TOR network

Balancing Privacy and Security on the TOR network: Cicada Learning’s Insight

How can we use AI to address the problem of balancing the need for privacy and the need for security on the TOR network?

Understanding the TOR Network

The TOR (The Onion Router) network is a crucial tool for maintaining online anonymity. It operates by routing internet traffic through multiple layers of encryption, ensuring that the original source of the data remains concealed. Here’s a breakdown of how it works:

Layered Encryption

The client generates a path through three nodes (entry, middle, and exit) and collects their public keys.

Asymmetric Key Encryption

A temporary asymmetric key is encrypted with the nodes’ public keys in reverse order (exit, middle, entry).

Progressive Decryption

The encrypted data is progressively decrypted as it travels through the network, maintaining the user’s anonymity.

Detecting and Identifying TOR Activity

Detecting TOR activity is relatively straightforward due to the public nature of exit nodes. Websites, ISPs, and governments can block these nodes’ IPs. However, identifying different types of TOR activity requires more sophisticated techniques. We can use Deep Neural Network (DNN) models, as proposed by Sarkar et al. (2020), based on captured traffic datasets to distinguish various TOR traffic types.

Types of TOR Traffic

TOR traffic can be categorized based on its application, each having distinct characteristics:

Email: Delivered through SMTPS and received using POP3S and IMAPS.
Chat: Generated using applications like Facebook, Hangouts, Skype, ICQ, and AIM.
Streaming: Continuous streams of data from platforms like YouTube and Vimeo.
File Transfer: Using SFTP, FTPS, and Skype.
VoIP: Voice traffic from Facebook, Skype, and Hangouts.
PSP: Torrent traffic from BitTorrent and μTorrent.
Web Browsing: HTTPS and HTTP traffic.

Network Flow Attributes

To effectively analyze TOR traffic, certain network flow attributes are crucial:

Source and Destination Ports: Ports used by the source and destination.
Flow Duration: Length of the connection in seconds.
Flow Bytes and Packets: Number of bytes and packets sent.
Flow IAT (Inter Arrival Time): Packets flow inter arrival time (max, min, mean, std).
Fwd IAT and Bwd IAT: Forward and backward inter arrival time (max, min, mean, std).
Active and Idle Time: Seconds the flow has been active or idle (max, min, mean, std).

Fine-Tuning for Malicious Traffic

Malicious TOR traffic can include activities such as:

Communication between malware and command-and-control centers
Denial of Service attacks
Spam traffic
Connections to servers hosting illegal content
Policy violation P2P alerts
Trojan activity
Suspicious port scanning
NOOP string traffic

Framework for Malicious Activity Detection and Blocking

So, how do we detect and prevent malicious activity? We can start by implementing the following strategies:

Middle and Exit Node Analysis: Focus on these nodes while maintaining privacy at the entry node level.
AI and GAN Integration: Utilize artificial production of incriminating data to train a deep learning model and integrate Generative Adversarial Networks (GAN) to expand the dataset.

Metadata and Traffic Patterns: Prioritize metadata and traffic patterns over content analysis.

Intrusion Detection Systems: Deploy these systems on both middle and exit nodes for enhanced security.

AI-enhanced security measures can help TOR maintain anonymity while improving security by reducing human intervention. However, developing a sufficiently large and comprehensive dataset for training these models requires substantial human-conducted malware and vulnerability analysis, which may lag behind evolving threats. Additionally, this method does not apply to all TOR-based criminal activities, particularly those without discernible traffic patterns, such as the sale of illegal goods and services.