Hâtvalues

Not All Skew Is Suspicious - How to Avoid Mistaking Signal for Outliers

May 02, 2025

Not All Skew Is Suspicious # Introduction # You’re exploring your data. You see a long tail or a cluster of extreme values. Your instinct? Flag them as outliers. Trim the noise. Clean the dataset. But here’s the thing: not all skewed data is dirty. Skew can carry meaningful structure about the real-world process you’re modeling.

Optimize Like a Pro With LSD

Apr 10, 2025

Introduction # Start-ups often need to move faster than traditional A/B testing best practices allow. Typically, A/B tests need a couple of weeks to gather enough data, sometimes more. When multiple improvements are ready to ship, waiting to test them one at a time can mean lost momentum or missed opportunities. Enter the Latin Square Design (LSD), a brilliant example of working smarter instead of harder. As a result of using LSD, your estimate of the treatment effect has significant sources of noise removed, which means:

To A/B or not to A/B

Feb 16, 2025

Introduction # This is the story of a project that began as a straightforward A/B test but quickly revealed more than expected—offering fresh insights and expanding the scope of analysis. It’s been a while since I worked as an independent data and analytics consultant. I went freelance after many years in data systems, BI, and MIS at a large multinational education company. During that time, I led projects using applied statistics, data mining, and algorithmic forecasting—and discovered a real passion for data science. But the chance to deepen those skills long-term wasn’t there, so I made the leap into freelancing, motivated by clarity about my goals and a desire for more hands-on, impactful work.

Tallinn Ride Hailing App

Jan 27, 2025

Introduction # I acquired a fun little data set for a well-known ride-hailing app recently. I performed a pretty detailed analysis at the request of my source, including some clustering of the ride start locations. The idea was to help drivers plan ahead to get into position before times of peak demand. There’s no NDA and this data is no longer very fresh, so I thought it would be nice to show the results in an interactive Tableau viz.

Analysing SaaS Trial to Subscriber Conversions - Part 3 - Time Dependent Variables

Nov 13, 2023

Introduction # You can read the Series Introduction here In the previous post, we saw how survival curves can be created for different strata, or factors/categories of the independent variables, giving us a way to determine whether there are significant differences in the median survival time (or other quantile) between groups. The groups are fixed throughout the trial period. In a clinical or randomized control trial, these would be set as part of the experimental protocol. In this observational setting, these are often customer segments, such as industry vertical and other things that may not be under our control.