← Back to home Sample Projects

Procurement data science & AI, in practice.

A look under the hood at the kinds of analytics and AI solutions my team and I build — the problems, the models, and the tools that turn procurement data into savings, scale, and intelligence.

Representative projects illustrating methodology and technical approach. Details are generalized to protect proprietary data and results.

Project 01 · Machine Learning

Spend Classification Engine

Supervised classification · Random Forest
Challenge

Millions of transactions arrive each year with inconsistent, free-text descriptions and missing category codes. Manual tagging is slow, subjective, and breaks down at scale — leaving the spend taxonomy too noisy for reliable sourcing analytics.

Approach

Trained a Random Forest classifier on labeled historical spend, combining TF-IDF features from supplier names and line-item text with categorical signals (GL account, cost center, supplier). Class imbalance handled with balanced class weights; performance validated with stratified k-fold cross-validation and a held-out test set.

Outcome

High-confidence predictions auto-classify the majority of spend straight to the taxonomy, with low-confidence cases routed for human review — turning weeks of manual tagging into a near-real-time, analytics-ready feed.

Tech stack

Pythonscikit-learnRandom ForestTF-IDFpandasDatabricks
# Spend → category classifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import TfidfVectorizer

X = tfidf.fit_transform(df["line_text"])
clf = RandomForestClassifier(
  n_estimators=400, class_weight="balanced")
clf.fit(X_train, y_train)
# route low-confidence rows to review
conf = clf.predict_proba(X).max(axis=1)

Consistent, scalable spend taxonomy powering sourcing & savings analytics.

Project 02 · Time-Series Forecasting

Category Spend Forecasting

Forecasting · Meta Prophet
Challenge

Finance and category teams need a reliable forward view of spend for budgeting and savings planning — but category spend carries trend, seasonality, and fiscal-calendar effects that simple run-rate extrapolation misses.

Approach

Built forecasts with Meta's Prophet, decomposing each category into trend, yearly and quarterly seasonality, and fiscal-period effects. Backtested against naïve and moving-average baselines (and benchmarked vs. SARIMA), with uncertainty intervals to support best/expected/worst-case scenario planning.

Outcome

Sharper budget forecasts and earlier visibility into spend inflection points — giving sourcing teams a head start on savings actions and Finance more credible numbers to plan against.

Tech stack

PythonProphetpandasstatsmodelsPower BI
# Category spend forecast
from prophet import Prophet

m = Prophet(
  yearly_seasonality=True,
  seasonality_mode="multiplicative")
m.add_seasonality(name="quarter",
  period=91.25, fourier_order=5)
m.fit(spend_df)
future = m.make_future_dataframe(periods=12,
  freq="M")
fcst = m.predict(future)

Forward-looking spend forecasts for budgeting and savings capture.

Project 03 · Unsupervised ML

Spend Anomaly & Leakage Detection

Outlier detection · Isolation Forest
Challenge

Maverick spend, off-contract purchases, duplicate invoices, and price outliers hide inside millions of transactions. Rules-based checks catch the obvious cases and miss the rest — letting value leak quietly out of the P&L.

Approach

Applied an Isolation Forest to engineered transaction features — amount, purchase frequency, supplier concentration, and unit-price variance vs. category norms — to score every record for anomaly likelihood. The highest-scoring transactions feed a prioritized review queue for sourcing and compliance.

Outcome

Surfaced compliance gaps and savings leakage that rules-based monitoring missed, focusing scarce analyst and audit time on the transactions most likely to matter.

Tech stack

Pythonscikit-learnIsolation ForestSQLpandas
# Flag anomalous transactions
from sklearn.ensemble import IsolationForest

iso = IsolationForest(
  n_estimators=300,
  contamination=0.02, random_state=42)
df["score"] = iso.fit_predict(features)
# -1 = anomaly → review queue
review = df[df.score == -1]

Prioritized review queue that targets real leakage and compliance risk.

Project 04 · Generative AI

Procurement Intelligence Assistant

LLM · Retrieval-augmented generation · AI-assisted build
Challenge

Stakeholders across Finance, Procurement, and the business need answers from spend data fast — but most can't write SQL and shouldn't have to queue behind the analytics team for every question.

Approach

Built a conversational assistant powered by Claude that turns natural-language questions into governed insight over the spend cube using retrieval-augmented generation (RAG). The application itself — data connectors, query layer, and web UI — was developed rapidly with AI-assisted coding (Claude Code), with answers grounded in trusted data and validated before release.

Outcome

Self-serve procurement intelligence: faster decisions for stakeholders and freed analyst capacity for higher-value work — a concrete step from a reporting function toward an intelligence and automation office.

Tech stack

ClaudeClaude CodeRAGPythonVector searchWeb app
# NL question → grounded answer
ctx = retrieve(question, spend_index)
answer = claude.messages.create(
  model="claude-opus-4-8",
  system="Answer only from context.",
  messages=[{"role":"user",
    "content": prompt(question, ctx)}])
# validate before surfacing to users

Conversational, governed access to spend insight — no SQL required.

Let's talk shop.

Want to compare notes on procurement analytics, AI strategy, or any of these approaches? I'd love to connect.