Pattern-Oriented Data Mining and Constraint-Based Reasoning
Pattern-Oriented Data Mining and Constraint-Based Reasoning
Fichiers
Date
2025-11-06
Auteurs
Douad Mohamed El Amine
Nom de la revue
ISSN de la revue
Titre du volume
Éditeur
Université Oran1
Résumé
With the explosion of digital data, pattern mining has become a fundamental pillar of knowledge discovery. This thesis contributes to the body of work aimed at optimizing the discovery of signicant structures in large databases, addressing three major challenges: the exponential combinatorics of candidate patterns, informational redundancy, and the limited expressiveness of traditional specialized approaches. Our work proposes an innovative framework based on constraint programming (CP), reconciling declarative expressiveness with algorithmic effeciency. The first major contribution introduces a new global constraint based on a formally supported relaxation of the Overlap measure for extracting k-sets of diverse closed patterns. Unlike previous approaches using the Jaccard measure, our ClosedOverlap model exploits the (anti-)monotonicity properties of the relaxed Overlap measure, offering superior discriminating power. To solve the critical "first pattern" problem that affects diversication approaches, we integrate entropy optimization that minimizes false positives induced by relaxation. This modular architecture, implemented in the Choco solver in both sequential and parallel versions, effectively separates the specific sub-problems of pattern mining, allowing increased exibility while maintaining competitive computational performance.
Our second contribution extends this declarative approach to user preference learning through a multi-criteria ensemble learning framework. This model elegantly combines the Analytic Hierarchy Process (AHP) with linear optimization to dynamically calibrate the weighting of heterogeneous classifiers (SVM, RF, KNN). This approach has been validated on complex multispectral satellite image classification problems, demonstrating its robustness against noise and data heterogeneity. Experimental validations conducted on reference datasets, both real and synthetic, demonstrate not only the theoretical superiority of our approaches but also their practical applicability to high-dimensional problems. Our Overlap-based approach signicantly outperforms Jaccard-based methods in terms of quality and diversity of extracted patterns, while our multi-criteria aggregation model substantially improves the classification accuracy of multispectral images. This thesis thus establishes a fruitful bridge between theoretical advances in constraint programming and concrete applications in data mining and remote sensing. It opens promising perspectives for the dynamic customization of mining constraints and extension to other types of structured data, contributing to the emergence of more interactive knowledge extraction systems adapted to user needs.
Description
Mots-clés
Pattern Mining; Constraint Programming; Optimization; Relaxation; Entropy; Pattern Diversity; Linear programming; Ensemble Learning; Remote Sensing.