Skip to content

Tomislav Suhina

Projects

A selection of what I've built and delivered, described at a high level. The specifics of client engagements are kept in confidence, so these focus on the problem, the approach and the outcome rather than naming names.

Featured

Bayesian Media Mix Modeling

Technical lead of a 15+ person interdisciplinary team (data scientists, data and MLOps engineers, front end) delivering a Bayesian marketing-mix model for budget allocation and ROI optimisation. It captured value at tens-of-millions scale in its first year and is still used for strategic marketing decisions today.

Stack: PyMC, Bayesian modelling, causal inference, marketing analytics

Agentic Management-Reporting System

An autonomous agent that lets managers interrogate live project status: surfacing blockers and at-risk workstreams, and flagging team members who are over- or under-utilised. The hard part was trust. I instrumented it with MLflow execution tracing to see exactly how it retrieves context and reasons, and an LLM-as-a-judge evaluation harness to test its answers against baselines before relying on them. Designed around small, granular, verifiable tasks so its behaviour stays reliable in production.

Stack: LangGraph / PydanticAI, RAG, MLflow Tracing, LLM-as-a-judge

Data-Science Excellence Standard & Maturity Assessment

Authored the methodology an organisation uses to score how mature a data product is and to benchmark projects against each other on a common scale. A structured, unambiguous assessment spanning problem scoping (is the goal clear, what KPIs are we moving, what change counts as success), engineering maturity (version control, CI/CD, dev/test/acceptance/production hygiene, monitoring, drift detection, data-quality reporting, MLOps automation), modelling rigour (baselines, reconstruction of the KPIs a project claims to move) and stakeholder management. It turns "how good is this, really?" into something measurable and repeatable, and lets very different projects be compared fairly.

Focus: standards, benchmarking, data-science governance

GenAI & Agents

Production RAG for Decision Support

A retrieval system that gives front-line decision-makers fast, current answers drawn from a large and complex body of internal policy and external regulatory guidance, built to speed up a high-volume approval workflow. The hard parts were document parsing and the retrieval quality that depends on it. Working with the business, I assembled a validated "golden set" of question-answer pairs and ran it through an LLM-as-a-judge harness, which let the team keep improving the system against a fixed, trusted benchmark. I delivered the MVP that became the basis for the full product launch after my assignment ended.

Stack: LangChain, ChromaDB / FAISS, document parsing, LLM-as-a-judge, custom evals

Predictive & Causal ML

Causal & Classification for Industrial Equipment Monitoring

Two models working together on a production line. A classifier automatically assigns each production-loss event to a standard taxonomy (breakdown, minor stop, slow-down, changeover and so on), removing manual tagging by operators and standardising how losses are reported across sites that previously labelled the same events differently, which had undermined the credibility of reporting. A causal model then identifies which machine is the true root cause of a given loss, for example tracing a detected minor stop back to the upstream machine actually responsible for it. Together they make production-loss reporting both automatic and trustworthy. Deployed to production with drift detection.

Stack: Scikit-learn, XGBoost, causal inference, Evidently, MLflow

Customer Retention Modelling with Experimentation

Designed the model and the A/B testing methodology end to end, to flag customers at risk of leaving or non-payment so the business could intervene proactively. It showed a statistically significant uplift versus control; I estimated the monetary value of that uplift and set the whole thing up as a reproducible pipeline that ran live throughout the pilot, rather than as a one-off analysis.

Stack: XGBoost, A/B testing, Databricks, MLflow

Financial KPI Forecasting & Time-Series Standards

Built end to end with full ownership, and established the time-series practices that became the organisation's standard approach. The system lets a central team deep-dive into the forecasts of individual subsidiaries, compare them against the model's, and use the gap to drive a better planning dialogue across the group. Still in use today for global financial planning.

Stack: Prophet, PyMC, time-series modelling, statistical forecasting

Optimisation & Decision Science

Spare-Parts Allocation & Master-Data Quality

An optimisation system that helps sites share spare parts and position them at strategic locations for rapid transport in an emergency, cutting the amount of expensive spare stock that has to be held globally. I later returned to extend it with entity deduplication and master-data quality work: cleaning and matching records so that even more parts could be reliably shared and pooled, pushing global stock down further.

Stack: optimisation, RecordLinkage / Dedupe, PySpark, master data management

Transportation Optimiser (Advisory)

Brought in to shape a transportation-optimiser initiative. I defined and protected the scope, produced documentation that all stakeholders signed off on, and explained the limitations, possibilities and trade-offs the business had to weigh to deliver within its budget and resources. The value here was as much about discipline as algorithms: without a clear, agreed scope the project risked drifting in endless modification and dying when the funding ran out.

Focus: problem framing, scoping, stakeholder alignment, delivery strategy

Exploration

Alongside the work above I've explored a wide range of applied ML: real-time anomaly detection for manufacturing equipment, automated data-quality validation, a product-design recommender, warehouse bin allocation, delivery-route optimisation and churn prediction. Some became tools that are still in use; all of it sharpened a sense of what actually survives contact with production.