Customer Segmentation
Predictive Analytics for Arvato Financial Services
A data-driven customer acquisition strategy combining unsupervised segmentation with supervised learning to identify high-value prospects from 891K+ demographic records.

The Challenge
Arvato Financial Services needed a smarter approach to customer acquisition for a German mail-order company. Traditional broad marketing was inefficient and costly. The project provided four datasets: a general population of 891,211 individuals with 366 features, 191,652 existing customers, and two campaign datasets for training and evaluation.
Data Preprocessing & Feature Engineering
Processing involved handling missing values, encoding categorical variables, and transforming mixed-type columns. Principal Component Analysis (PCA) retained 200 components preserving 95% of the total variance, reducing dimensionality while maintaining signal integrity across all 366 original features.
Unsupervised Learning — Segmentation
KMeans clustering segmented both populations into six optimal clusters. A critical finding emerged: 99.9% of existing customers concentrated in a single cluster (Cluster 2), which represented only 31% of the general population. This revealed a clear, actionable profile of the ideal customer and highlighted untapped prospect segments.
Supervised Learning & Results
Multiple algorithms were evaluated including Logistic Regression, Random Forest, AdaBoost, and Gradient Boosting. Gradient Boosting achieved the best performance with a training accuracy of 0.935 and a final test ROC-AUC of 0.79. The result is a complete, end-to-end customer acquisition pipeline enabling precision targeting to reduce acquisition costs while improving conversion rates.