Econometrics / Machine Learning Case Study

Credit Card Clients in Taiwan
Risk & Behavioral Analysis

A consulting-style redesign of the original notebook and report, translating exploratory analysis, supervised learning, and clustering results into an executive narrative focused on credit risk, customer behavior, and decision relevance for financial institutions.
Credit risk
Default prediction
Feature engineering
Behavioral segmentation
Financial analytics
Headline finding
Payment behavior
is the strongest signal of default risk
The project shows that recent repayment status and related behavioral features consistently outperform isolated demographic or spending variables when identifying customers likely to default.
0.81
Best AUC
0.72
Accuracy
0.96
Best regression R²
3
K-Means clusters

Executive Summary

The decision-maker view

Business question

Which customer behaviors best explain credit default risk, how predictive is the dataset across different ML tasks, and can the same data also support meaningful customer segmentation for risk management?

Main conclusion

The strongest practical value of the dataset lies in classification and behavioral segmentation. Regression is useful for understanding nonlinear dynamics, but default prediction and customer clustering produce the clearest operational insights.

Strategic takeaway

Banks should prioritize repayment behavior and utilization stress indicators in early-warning systems. These signals are more actionable than static demographic variables and better aligned with operational credit-risk decisions.

Project Context

Dataset scope and analytical ambition

Dataset overview

The project uses the UCI “Default of Credit Card Clients” dataset, covering 30,000 customers in Taiwan and combining customer characteristics, credit line allocation, monthly payment status, billing amounts, payment amounts, and next-month default labels.

Analytical objective

The analysis intentionally approaches the data from three angles: regression for financial behavior understanding, supervised classification for default prediction, and unsupervised clustering for customer segmentation and anomaly-oriented diagnostics.

Core Behavioral Drivers

What the data says matters most

1. Payment history

Repayment status variables emerge as the clearest risk signal. Even small delays materially increase default probability, and later-stage delinquency sharply shifts customers into the high-risk segment.

2. Utilization as stress signal

Credit utilization functions as a strong secondary indicator of financial pressure. High utilization does not act alone, but it becomes highly informative when combined with repayment patterns.

3. Spending is informative, not decisive

Average billing and payment amounts add value, but on their own they do not separate risk classes sharply. Their strength comes from interaction effects, not isolated linear relationships.

Regression Findings

Useful for behavior understanding, not the strongest final use case

Why regression was tested

The project explored whether behavioral features such as average bill amount, average payment amount, payment power, and utilization could predict credit limit allocation (LIMIT_BAL). This was meant to test how far behavior-only variables can explain financial capacity.

Main lesson

Linear regression delivered moderate performance around R² ≈ 0.59–0.61, confirming that the problem is only partially linear. Nonlinear ensemble methods captured the structure much better, demonstrating that customer financial behavior follows more complex dynamics.

Model Observed performance Interpretation Assessment
Linear Regression R² ≈ 0.59–0.61 Provides interpretable but limited explanatory power. Generalizes reasonably, yet misses nonlinear dynamics and leaves high error in monetary terms. Baseline only
Extra Trees Regressor R² ≈ 0.96 Strongest regression performance in the report comparison. Captures nonlinear interactions effectively and aligns closely with actual credit limit values. Best regression model
Strategic implication Behavior-only prediction remains incomplete Even with strong nonlinear fit, real-world credit limit decisions still require external variables such as income, employment, debt-to-income, and broader financial history. Needs richer data

Classification Findings

The strongest operational value of the project

Best model

Gradient Boosting Classifier emerged as the best-performing classifier in the comparison workflow, balancing discrimination ability with manageable false positives.

What drove performance

The model benefited from combining recent payment status variables with engineered financial behavior features such as payment power and utilization rate.

Why it matters

This part of the project demonstrates real potential for credit-risk scoring, preemptive intervention, and threshold-based decision strategies in banking.

Metric Value What it means
AUC ~0.81 Good ability to distinguish defaulters from non-defaulters.
Accuracy ~0.72 Moderate overall correctness across both classes.
Precision ~0.79 Positive flags are fairly reliable, which limits unnecessary interventions.
Recall ~0.61 The model captures roughly 60% of actual defaulters, leaving meaningful false-negative risk.

Confusion matrix message

  • Most non-defaulters are correctly identified.
  • A substantial number of actual defaulters are also captured.
  • The most costly error remains the false negative: a customer who will default but is not flagged.
  • False positives are present but materially lower than missed-risk cases.

Executive interpretation

This is a usable risk model, not a perfect one. It is well suited for probability-based screening and threshold tuning, especially in settings where the business wants to trade off missed defaulters against customer friction.

Segmentation Findings

Where clustering adds business value

K-Means outcome

The elbow analysis and PCA projection support a 3-cluster structure. These clusters map well to interpretable behavior-based profiles: low-risk, medium-risk, and higher-volatility customer groups.

  • Cluster 0: low utilization, low payment stress, stable borrowers
  • Cluster 1: moderate utilization and mixed risk patterns
  • Cluster 2: high capacity, high activity, higher payment stress

DBSCAN diagnostic

DBSCAN showed that the dataset does not contain sharply separated density-based clusters. Instead, customer behavior follows broad, continuous gradients with overlapping regions and a meaningful amount of noise.

  • Two dominant DBSCAN clusters cover most observations
  • ~17% of observations behave like noise or irregular transition cases
  • Small micro-clusters reflect niche financial behaviors rather than stable market segments
Consulting interpretation: K-Means is the more useful segmentation tool for this dataset because the customer base varies continuously rather than forming compact density islands. For business use, this makes K-Means the better foundation for credit strategy and monitoring.

Business Implications

Where these results could be applied

Risk scoring

Recent payment behavior and utilization can strengthen early-warning models for delinquency and credit-risk monitoring.

Credit line strategy

Behavioral clusters can help distinguish clients who may qualify for limit increases from those who require tighter risk controls.

Targeted action

Segmentation supports differentiated communication, loyalty incentives, refinancing offers, and intervention strategies by customer profile.

Limitations & Next Steps

What would make the system stronger

Current limitations

  • The dataset lacks external variables such as income, employment, and bureau-based credit history.
  • Behavior-only models can be operationally useful, but not fully sufficient for production-grade lending decisions.
  • Classification still leaves a meaningful false-negative cost that would matter in real financial settings.

Next analytical steps

  • Add richer socioeconomic and credit-history features.
  • Tune decision thresholds according to business risk tolerance.
  • Extend explainability for executive use and governance.
  • Test deployment use cases such as early-warning dashboards or risk tiering workflows.