From Behavior to Natural Language: Generative Approach for UAV Intent Recognition
UAV Behavior Intent Recognition Based on Generative Models: A Cross-Modal Study From Behavior to Natural Language
Background and Research Objectives
In recent years, Unmanned Aerial Vehicle (UAV) technology has advanced rapidly and has found widespread applications in civilian and military domains, including search and rescue, precision agriculture, and communication relays. However, as UAV swarm scales expand and intelligence levels increase, the field of aerial command and control faces a growing demand for higher intelligence. In complex confrontation environments, improving UAV “situation awareness” has emerged as a critical issue, particularly focusing on effective recognition of UAV operational intent. This identification process helps reveal the relationship between the opponent’s intent and tactical deception, optimizes information flow in the command hierarchy, and provides guidance for decision-making.
Traditional classification-based intent recognition methods are limited by issues like database distribution imbalances and poor robustness, which make achieving classification accuracy in real-world complex scenarios challenging. This study proposes a generative model for UAV behavior intent recognition, mapping long-term UAV behavior sequences to natural language for intent detection. This generative model successfully addresses database distribution imbalances using behavior sequence compression, standard Transformer architecture, and hybrid pretraining strategies.
Paper Source and Publication Details
This study was jointly conducted by researchers from the College of Air Traffic Control and Navigation at the Air Force Engineering University (Leyan Li, Rennong Yang, Maolong Lv, Ao Wu) and the TUM School of Social Sciences and Technology (Zilong Zhao). It was published in the IEEE Transactions on Artificial Intelligence, December 2024 issue, under the title “From Behavior to Natural Language: Generative Approach for Unmanned Aerial Vehicle Intent Recognition” (DOI: 10.1109/TAI.2024.3376510).
Research Process and Core Methods
Overall Framework
This research adopts a generative, cross-modal approach to transform UAV long-term behavior data into natural language tags for intent recognition. The method includes the following core modules:
- Behavior Data Compression Module: Reduces input sequence length of time-series data to significantly lower the computational complexity of the Transformer model.
- Standard Transformer Architecture: Leverages the Transformer encoder to extract features from specialized UAV behavior sequences from compressed data.
- Generative Decoding: Uses a word-by-word generative model to map behavioral features into natural language label space, with the intent recognized based on similarity with entries in a tag library.
Data Compression Techniques
Since the computational complexity of the Transformer model grows quadratically with sequence length (O(n²)), this paper proposes two types of time-series compression methods to handle UAV long-term behavior data. The first is block-based statistical and neural network compression (e.g., equal interval sampling, convolution layers, LSTM layers). The second utilizes positional encoding to reduce sequence length and maintain information fidelity while ensuring computational efficiency.
Hybrid Pretraining Tasks
To improve network initialization and training efficiency, three pretraining tasks were designed for behavior sequence data:
- Time-Series Smoothing Task: Randomly masks segments of time-series data and trains the model to reconstruct them, enhancing resilience to missing data.
- Contrastive Learning Classification Task: Uses triplet loss to improve feature discrimination between similar and different behavior categories.
- Cross-Modal Matching Task: Aligns behavior features with the natural language space by computing similarity matrices between generated tags and label embeddings.
Experiments and Results
Dataset Characteristics and Processing
The study constructed a UAV behavior dataset generated via a wargame simulation platform, employing predefined tactical intent labels. The dataset consists of 7 attributes (e.g., longitude, speed, altitude) and 10 tactical intent types (e.g., air combat patrol [ACP], suppression of enemy air defense [SEAD]). However, the dataset exhibits significant class imbalance, with certain intents heavily dominating. This imbalance negatively impacts the training and performance of conventional classification models.
Comparison Between Generative and Classification Models
Experiments reveal that the generative model significantly outperforms traditional classification models in intent recognition accuracy. The generative model achieves an accuracy of 78.2%, surpassing classification models like PCLSTM (62.1%) and GRU-FCN (65%). Importantly, using similarity between embeddings to match tags proved more effective than using BLEU score matching.
Traditional classification models struggle with dataset imbalance, leading to misclassifications predominantly in the most overrepresented class (e.g., ACP). In contrast, the generative model mitigates such skewness by leveraging natural language mapping, demonstrating better robustness.
Robustness and Temporal Sensitivity
Robustness to Missing Data
The generative model retains 74.9% recognition accuracy despite 50% data loss. This highlights the model’s robustness and resilience in disrupted environments or when UAV sensor data is missing.Real-Time Prediction Capability
The generative model achieves 73.1% accuracy using only 1 minute of UAV flight data, showcasing its ability to provide timely tactical guidance. Longer sequences further improve accuracy.
Effects of Pretraining Tasks
Integrating hybrid pretraining tasks accelerates convergence by 22.2% and improves final accuracy by 2.8%. Pretraining tasks help networks bypass local optima and extract features efficiently.
Research Contributions
This study’s generative UAV intent recognition model demonstrates multiple advantages over traditional methods:
- Addressing Dataset Imbalance: By mapping behavior features to natural language, the approach significantly reduces the impact of data distribution imbalance.
- Modular Design: The method does not require modifying Transformer structures, enabling direct compatibility with various long-sequence data types for multimodal tasks.
- Robustness and Adaptability: The model performs well under adverse conditions like data loss or noise and is temporally sensitive.
- Reduced Retraining Costs: Fine-tuning allows seamless tag library updates without needing to retrain the model from scratch, significantly lowering long-term costs.
This innovative approach not only enhances UAV situational awareness but also lays a strong theoretical and practical foundation for intelligent UAV command and confrontation technologies.