About tBallpark

Better Baseball Decisions Without Replacing Baseball Experience

Data over impressions

Player value is measured by contribution to winning

Decisions rely less on eye-catching individual moments and more on what makes runs and wins more likely.

Avoid outs

Because players who get on base create more chances to score

Baseball's limited offensive resource is used more effectively. Fewer wasted outs mean more scoring opportunities.

Use existing strengths better

Do not look for stars; make the current team more effective

The same players can produce more sporting success when roles and lineup decisions are based on actual performance contribution.

CONTEXT DECIDES

Slot 1 or Slot 8 – same player, completely different value

Batting first means 20% more plate appearances than batting last. Ballpark shows you which player creates real runs at which position – and why the wrong lineup costs you points before you ever notice.

NUMBERS DON'T LIE, AVERAGES DO

Two players, same batting average – completely different problems

Miller hits .292 but loses 24% of his offensive value in context. Carter has zero extra-base hits – no matter where you put him, he won't move runners. Ballpark shows you not just who is better, but what each player actually needs to become more valuable.

SKILL OR LUCK

3 for 5 – does that already mean anything?

A player who goes 3 for 5 looks good. But five at-bats against the same pitcher in the same game situation tell you almost nothing. Ballpark's reliability score measures not just how many at-bats – but how varied they were. Because a small sample against different pitchers can tell you more than a large one that all looks the same.

Why Statistics Help Coaches

Coaches and managers understand the game, the players, timing, pressure and game situations better than any number. Statistics do not replace this judgment. They add a second layer.

The goal is not to tell coaches how to coach. The goal is to make visible what is difficult to separate by observation alone: which play really improved the situation, which player created value, and how reliable the observed performance actually is.

Why the Same Play Does Not Always Have the Same Value

In baseball, a play does not always have the same value. A walk, a bunt, an out or a stolen-base attempt can have a very different effect depending on the runners on base, the number of outs and the current game situation.

What matters is not only what happened, but when it happened. This is exactly where Run Values help.

Run Values show how much an event changed the expected run outcome in a specific base-out situation. Did the play increase the expected runs, or did it reduce them?

Run Values show, similar to professional baseball analysis, what a play was worth in its specific game situation. A walk, a bunt, an out or a stolen-base attempt does not always carry the same value. The decisive question is whether the expected runs increased or decreased because of that event.

With around 250 plays per game, it is difficult to separate this cleanly by impression alone. Who really contributed? Who improved the situation? Who did not? Without measurable context, these answers often remain imprecise.

Baseball knowledge stays with the coach. The statistic provides the measurable context. Together, they make the evaluation of plays, players and decisions more precise.

From Run Values to wOBA

wOBA is a weighted offensive performance measure per plate appearance. Weighted means that not every event counts the same.

A home run has more value than a single, a single has more value than a walk, and an out affects the game differently than a successful hit.

wOBA evaluates how valuable the individual offensive events of a player were – such as walks, singles, doubles, triples or home runs. It describes league-based, team-isolated offensive player performance.

wOBA is built from Run Values.

Where These Metrics Come From

wOBA, in the form used today, goes back to Tom Tango. FanGraphs states that Weighted On-Base Average was created by Tom Tango and used in The Book.

Run Values come from the Linear Weights tradition. Early foundations were developed by F. C. Lane and George Lindsey. Pete Palmer later developed this into a systematic sabermetric evaluation system. According to SABR, his Linear Weights were a core part of the analysis in The Hidden Game of Baseball, which he published with John Thorn in 1984.

wRC+ puts a player's league-based, team-isolated offensive performance in relation to league average. Runs Created originally goes back to Bill James. wRC and wRC+ are modern, wOBA-based developments that became especially known through FanGraphs. FanGraphs describes wRC+ as a wOBA-based version of OPS+, adjusted for league and park context.

What a Statistic Can and Cannot Tell You

It must remain clear what a statistic can actually say. First, it only describes what happened.

A player may have many hits in a tournament. That is the observed result, but it is not yet proof of his true ability.

The next step is estimation: How reliable is this number? A small number of plate appearances is not automatically worthless. What matters is how these plate appearances came about.

A result based on fewer but different game situations can be more reliable than many repetitions against the same pitcher, the same defense and the same game context.

Only after that comes prediction: How likely is it that the player will repeat this performance? The raw observed value is not enough for that. You need to know how stable the underlying data is.

Context is essential. Was the player truly superior? Or did he benefit from favorable matchups, weaker pitchers, a specific role or only a few pressure situations?

Without this context, a number can quickly become a wrong explanation.

Measuring Data Reliability

This is where the Data Reliability Score QDC comes in. QDC is based on Quantity, Diversity and Consistency.

It does not evaluate the performance itself. It evaluates how reliable the data basis behind that performance is.

QDC checks three points: Is there a sufficient number of plate appearances? Did the performance occur against different pitchers, in different games and on different days? Does the player value remain stable across repeated samples?

This makes visible whether a result is already reliable or whether it still has to be treated carefully.

A larger number of plate appearances helps, but it is not enough by itself. If a player appears 100 times against the same pitcher and the same defense, the quantity is high, but the context is narrow. QDC makes exactly this difference visible.

Better Evaluation, Not Automatic Answers

The most common mistake is treating observed performance immediately as true ability or as a safe prediction. This is how overrating, underrating and wrong conclusions happen.

Statistics do not replace baseball judgment. They help separate observation, estimation, prediction and context. QDC additionally shows how reliable the basis of that evaluation is.

Better Evaluation, Not Automatic Answers

The most common mistake is treating observed performance immediately as true ability or as a safe prediction. This is how overrating, underrating and wrong conclusions happen.

A batting order is a continuous loop in which the earlier spots carry more weight – weak hitters cannot be hidden, only moved further back. The best position therefore depends not only on the frequency of plate appearances, but on which player types fit which situations and where their strengths produce the highest return. The traditional spots create typical situations: some spots come to bat more often with bases empty, others more often with runners on base. Spots with bases empty favor OBP-strong hitters, because a walk and a single are equal in value there. Spots with more frequent runners-on-base situations favor power hitters, because their home run produces the most runs there. A lineup is therefore an interconnected system. Every batter influences the situations and opportunities of the batters that follow: a walk changes the state for the next batter, and this new state changes which events are valuable. The best hitter therefore does not fit everywhere – but the right type fits the right spot. The optimal batting order results from the interplay of sequence, game situation and downstream effect – not from tradition or a simple ranking.

What the AI Output Looks Like

PlayerNo Joe Doe — offensive evaluation (all metrics in context)

PlayerNo Joe Doe’s profile is built on hit-driven production with real swing-and-miss cost. The AVG (.292) is slightly higher than the league AVG (.287), and the wOBA (.385) is also higher than league wOBA (.376), which indicates that when contact happens it has produced reasonable overall batting value rather than empty singles-only output.

However, that contact value is counterweighted by two strong suppressors of run creation:

Strikeouts are a defining feature: K% (.333) is far higher than the league K% (.182). That level of swing-and-miss reduces the number of balls put in play and limits the offense’s ability to string events together, which helps explain why a decent AVG/wOBA pairing doesn’t translate into strong overall scoring impact.
On-base conversion beyond hits is limited: despite the solid AVG, OBP (.333) is well below league OBP (.398). The small separation between OBP and AVG here means his on-base game is not being meaningfully supplemented by additional non-hit reach, so the batting line relies heavily on hits to generate baserunners—and the high K rate makes those hit opportunities less frequent.

Those constraints show up in the outcome-level production: wRC+ (31) reflects very low run creation relative to what the contact-value metrics might suggest, and wOBA+ (90.9) positions the overall batting value as below typical league context even with a slightly above-league raw wOBA—an internal tension that aligns with a hitter whose events are not arriving in a volume/shape that converts efficiently into runs (high Ks + depressed OBP).

Reliability Score: 0.4288 — the directional story (good hit value, heavy K drag, low OBP limiting conversion) is moderately supported, but not at the “structurally reliable” threshold (≥ 0.5).