Data & Analysis built-by-bobby.com

Essay · Data & Analysis

Five Years Under the Bar

Every set since 2021 - two countries, two apps, pounds and kilos, a lot of messy spreadsheets - cleaned up and plotted.

I've logged nearly every session since July 2021 - first a Google Form in Mexico, then a dedicated logging app in Europe. That's 4,751 working sets across 397 training days, in a file that was frankly a mess. So this page is two things at once: how my training actually went, and a case study in cleaning real-world data.

The data was a disaster (that's the interesting part)

Hand-typed logs produce wonderful chaos. One lift appeared under 212 different names: Bench, Flat Barbell Bench Press and Paused Bench are all the same movement. Weights ran pounds in Mexico, kilos in Europe, plus free-text noise: level 7, bodyweight, 7,5, 35/17.5, even a stray 8:51 where a rest timer landed in the weight column.

The fix: a hand-built, auditable mapping collapsed those 212 spellings into 87 real movements (0 unmatched), every weight normalised to kilograms, and the estimated-1RM formula capped at 12 reps - otherwise a mistyped 90kg × 60 set claimed a fictional 270 kg squat.

Consistency: every day I trained

One filled square per training day. The empty stretches are the story: moving countries, changing gyms, the spells where life got in the way.

Getting stronger: estimated 1RM over time

Estimated one-rep max (Epley) for the main barbell lifts, taking each month's best set so the trend reads cleanly through session-to-session noise. Lifts move at very different scales, so toggle to % change from each lift's start to compare rates fairly.

Where the work went: volume by muscle group

Monthly tonnage (weight × reps, summed). Pick the muscle groups you want to compare.

Bodyweight

Personal records

Two numbers, kept separate on purpose. Tested 1RM is a weight I actually lifted for a single. Est. 1RM is a model's guess from a higher-rep set - useful, but not a real lift (which model to trust is just below).

Some basic modelling

The charts so far just plot the data. These four ask small quantitative questions of it: rate of progress, which 1RM formula to trust, dumbbells vs barbell, and whether my lifts are balanced.

1 · How fast did I get stronger?

A least-squares line through the best estimated 1RM each month: the slope is my rate of progress in kg/month, R² how linear it was. Bench was the steadiest climber; squat and overhead press are flatter and noisier (varied rep ranges, long plateaus).

2 · Which 1RM formula should I trust?

I have real tested singles, so I can back-test the estimators instead of trusting one blindly: predict each known 1RM from a heavy multi-rep set near the same date, then measure the error. Across my lifts, .

3 · Dumbbells vs barbell

Barbell bench against the combined load of both dumbbells (2× the per-hand weight), as estimated 1RMs month by month. The barbell wins by a fairly steady margin: the stability cost of pressing two independent weights.

4 · Is my strength balanced?

Each main lift as a ratio to my bench press, against a typical "balanced" target. Bars short of their marker are lagging lifts.

5 · Relative strength

Strength only means so much without bodyweight. Each lift is a multiple of bodyweight (using the scale reading nearest the lift), placed on the usual Beginner→Elite scale. Bodyweight history is patchy, so this covers only lifts with a weigh-in nearby: mostly the recent, heaviest ones.

Consistency - did showing up pay off?

The economics of my training

Borrowing from economics: a Lorenz curve and Gini coefficient of how my working sets spread across muscle groups. In income terms, Gini 0 = perfect equality, 1 = one earner takes everything. For training it measures imbalance - though here, unlike income, perfectly equal isn't the goal (calves don't need a chest's volume). A lens on how lopsided my training economy is.

What I train - and what I neglect

Each muscle group's working sets by month: when I started, stopped, and how steadily I trained it. The heatmap counts each set once, under its primary mover. Borrowing from text analysis, I also measured the average gap between sessions for each group (its "inter-arrival time"): low means clockwork, high means long droughts. For that gap I credit a muscle whenever a lift recruits it, primary or secondary, so squats and RDLs count toward glutes and hamstrings, not just the occasional hip thrust. Tag glutes by primary mover only and the data claims I hadn't trained them in over a year, which is a tagging artefact, not the truth.

Full numbers

Where this sits

I deliberately didn't benchmark these against competition databases like OpenPowerlifting. Comparing a casual lifter to people who compete is the Strava-vs-Olympian trap: it tells you nothing useful and flatters no one. The honest peer group is other people who quantify their own training:

What this one adds is messy-data honesty: five years, two countries, two apps, pounds and kilos, 200-plus exercise spellings, lost bodyweight months, cleaned up without pretending the gaps aren't there.

A caveat: cable machines lie

I changed gyms often around Europe, and cable/machine loads depend entirely on the hardware: pulley ratios and stack weights vary wildly. So a sudden jump on a cable exercise usually means a new gym, not new strength. Those movements are tagged venue_sensitive in the pipeline and kept out of the strength and PR charts above.

How this works. I log my sessions, export a CSV, and upload it from my phone. A Polars pipeline reconciles it against the full history, and this page reads the result, so it updates itself as I train. My own data, end to end.

I've since built my own logger to capture all this: Workout Logger, a local-first, installable app that's fast to log into and exports the same CSV this page reads. The front end of the same story.