His thirst for battle, and desire to pulverize his opponents, fuels so much of what this team does. There is this "How dare you enter the field of battle with us. Who do you think you are?" Attitude. Remember the fit the media had when those Marines whizzed on dead Taliban? That is what Lynch's crotch grab was in football terms. The NFL fines him, and the football paparazzi allow them to collect by asking him questions he already made clear he won't answer. They aren't worthy of it, should respectfully step by to go talk to his teammates. But they don't, and so, when he does speak, it's to fellow warriors turned journalists who he respects and knows understand. When I watch this guy NOT take the easy step out of bounds in favor of staying in and hitting a few more football Orcs...well, let's just say I watched the Hobbit this morning and thought often of Marshawn and the men he fights with and for. May he always remain a Seahawk. We cannot spare this man. He fights.

Since the 1960's, the idea of the "Elo Rating System" has dominated tournament chess. It has percolated from there into soccer, into college football, into video games, and so on.

It's a rating system that is both (1) mathematically as strong as a python, and (2) intuitively simple. The average player has a rating around 1400 "class C", within a range that is about 800-2400. If he sits down on Friday night across from a player with a rating of 1601 ("class B"), he's got about a 25% chance, 3 to 1.

That is also how many "rating points" the players have at stake. If the class C player wins, his rating will go up +12, to 1412; if the class B player wins, his rating will go up +4, to 1604. Thusly:

Each "class" interval is about 200 points and *as it turns out, corresponds neatly with the idea of a "standard deviation."* And, interestingly, by the time you get to 2 SD's -- 400 rating points' difference -- victory for the lower player is virtually impossible.

(Christopher Langan, with an IQ between 195 and 210, once said that if you get to 2 SD's difference in IQ, the people begin to have trouble exchanging information. Although Langan himself enjoys being hard to understand...

Now imagine the communication between two beings with (say) 10 SDs' difference. Or imagine chessgames between computers with 3300 ratings, vs. human chessmasters with ratings of 2300. We literally do not understand what computers are doing when winning certain endgames, like Rook and Bishop against Rook. *Nobody* understands the computers' play, not with a year's study. Which is a scary thought if you've ever seen the Matrix.)

This system was so simple and powerful that it exceeded Arpad Elo's wildest dreams. Elo had said, "rating systems will be accepted according to the players' confidence in their intutitve accuracy." How right he was. When a 1500 player now sits down across from a 1700, he has a near-perfect idea of where that 1700 player is ... in tactical combinations, in positional play, in Rook endings, etc.

"1700" describes a chessplayer better than his name does.

**Seattle Seahawks**

Nate Silver has an article out, musing about how sky-high the Seahawks' Elo rating is. Exec Sum:

- The Seahawks' 35-6 crushing of Arizona put them #1 again ...
- (... similar to how the Spurs' crushing of the Heat would make them "scariest" team)
- The Seahawks current 1755 rating -- different scale than chess -- has them #21 all time ...
- Adjacent to the Montana 49'ers, Aikman Cowboys, and Brady Patriots
- If the Seahawks win the Super Bowl, they'll be #3 in history, behind two Brady teams
- The Seahawks' record against high-rated teams is unbelievable
- Russell Wilson has never, in 52 games, been "out of a game" late in the 4th quarter

There y'go Rick. That's the HBDI way of capturing the Seahawks' roll. ;- )

**The World's #1 Starting Pitcher**

Bill James has his own "Elo system" for rating Opening Day aces. It's actually an improvement in one way: it subtly penalizes inactivity. (In 1977, Fischer hadn't played in five years, but his rating was mathematically the same as in 1972.)

At the end of the 2014 season, here was the list of top Aces:

**World's #1 Starting Pitcher
September 28, 2014**

- Clayton Kershaw - 626.01
- Felix Hernandez - 578.42
- Max Scherzer - 564.43
- Chris Sale - 562.54
- David Price - 560.85
- Johnny Cueto - 550.86
- Cole Hamels - 550.77
- Jon Lester - 550.58
- Adam Wainwright - 546.99
- Madison Bumgarner - 545.31

**Hisashi Iwakuma**or a

**Corey Kluber**. It doesn't matter that Kluber had 7.2 WAR to Felix' 6.1 last year. Kluber does not go in to 2015 with the same standing or "fear factor." Hisashi Iwakuma is beautiful to watch, a sight to behold, when he's healthy. It doesn't

*exactly*make him Cole Hamels.

**incentive**to pitch well going forward; he got it as a

**natural consequence**of what he did.

*There is a life lesson here.*No Central Agency of Fairness exists in the real world; somebody notify the NEA of this, please. Mariners blogs aren't bigger or smaller based on what the NEA thinks will build our self-esteem.

**Cole Hamels**or

**David Price**, I'm in. I do not know if the radar gun agrees, and don't much care.

**Felix Hernandez**is a "supergrandmaster," a guy who (in these terms) a full standard deviation, maybe two SD's, ahead of some other Opening Day starters. Here's SABRMatt's idea of "supermarginal" players. Chessplayers understand the concept of a "supergrandmaster." One of them is worth three lesser stars. One Lionel Messi is worth any number of Danny Welbecks.

**Kyle Seager's**rating is Master Class and, though most people don't realize it, his rating is still rising nicely.

**Justin Ruggiano's**"rating" would be bottom of the barrel, whatever his platoon splits.

**Logan Morrison's**would be rather low, but climbing in an eyebrow-raising way, like that of a talented 13-year-old chessplayer.

**Brad Miller's**would be similar.

## Comments

...I have long considered developing an elo rating system that measures player performance in baseball by match up to match up results. Would there be interest here in seeing that? Abs following it?

but I suspect it would be quite the undertaking. I remember seeing some strength-of-schedule stuff for hitters once, a few years ago, but it was pretty simplistic.

Basically, I would compute a seasonal elo and a career elo...the seasonal, everyone starts at 1000 and goes up or down as the plate appearances take place...the career elo is calibrated with very old play by play days from say...1946...and then calculated going forward for the rest of the play by play era.

But the math wouldn't be so hard.

If I read you right, you are talking about a "matchup adjusted" rating that, e.g. for hitters factors in the quality of pitchers faced (and vice versa). I have been wanting to see something like that for a while now. Quantify who is getting it done vs the top opponents (ie, playoff conditions), vs feasting on scrubs. I think it would fill in some blind spots from the one size fits all WAR type stats.

So...yes please!

For me that would be one of the Prime Attractions of a SABR Shtick subdomain. Those would be the first two things I thought of: Inside Scoop and SABR Elo ratings.

Not sure how you'd apply them to things other than Starting Pitchers and Teams, though.

Hey Benihana. What's my next step on the premium space station? :- )

Rather than by doing SP game results, I'd be doing it by PA by PA results. Take the average run value of an event, find the run value of each PA outcome. Calculate a single event W% for each outcome and call that the win score part of the elo.

Would this be factoring in the quality of the opponent? I was envisioning something that basically replicates a team "strength of schedule" type formula but for each player at the PA/individual opponent level (aggregating PA by PA results). So a HR vs Felix is worth way more to a hitter's rating than a HR vs whoever. Not sure if that is what you were describing.

What elo ratings do is figure out how good each player is by the sequence of matches such that if two players are separated by a lot on their current elo rating when they face each other, the player who is much worse is rewarded a lot more for a "win" against the player that is much better than the much better player would be if he wins against the worse player. Let's say, for example that a 1200 elo batter is facing Felix, whose elo is, say, 2000. If the average run value of a PA in the AL is .35 and the batter homers (run value of 1.4), meaning the single event W% is around .900, the worse player is given .9 of a win in that matchup..and because Felix is s&p much better, making this result an upset win, the number of points that the batter takes from Felix is larger than if he had been having a pitcher with a similar elo to himself.

When last I considered live-tracking strength of schedule, I ran into the problem of needing daily play by play data as a season unfolded. I can, fairly readily, calculate elo ratings for players in all past completed seasons, but I have never been able to figure out how to get play by play data in a data frame for each day's games during a current season. So to track elo ratings live, I will need to find some way to do that

If a player is a standard deviation worse, he benefits from 3:1 odds, both in reality and in the Elo points they exchange. Don't doubt that the math would be fairly cumbersome as it applied to individual pitcher-batter matchups, but now that you put it in these terms, I see what you're getting at with Elo ratings for batters and pitchers. Good stuff.

It never occurred to me to use the Elo system in "handicap" contests. By definition, in chess that 3:1 odds situation applies to a simple, fair, 50-50 game situation. Now you introduce the idea of a 70% game in favor of the pitcher -- similar to Pawn odds in chess -- I have no idea how you would adapt the system.

I'm curious. Where did you become familiar with the Elo system?

I did have an unofficial elo rating when I was in club chess in high school (was around 1500 oe 1600).

The wash to Fox the problem in baseball of the matchup favoring the pitcher is to convert event run probabilities into winning percentage. The average at bat carries a standard run value...you're familiar with the concept of offensive winning percentage (James invented that), I assume. I pripose to do that to the outcome of each PA. Once you're in W% scale, the matchup is fair again.

The one problem with my idea is that a negative event carries a negative run value. So the run values of all events will have to be shifted s&p that the worst events for the batter carry a zero W% So rather than using a basic RC estimate for each event, I'd be using a marginal RC over an out. Not a big deal to calculate that though.