Normalizing Ratings

29 Symmetry 26 5/2/2025, 12:39:45 AM hopefullyintersting.blogspot.com ↗

Comments (26)

homeonthemtn · 9m ago
I'd rather we just did an increment of 3 rating. 1. Bad 2. Fine 3. Great

2 and 4 are irrelevant and/or a wild guess or user defined/specific.

Most of the time our rating systems devolve into roughly this state anyways.

E.g.

5 is excellent 4.x is fine <4 is problematic

And then there's a sub domain of the area between 4 and 5 where a 4.1 is questionable, 4.5 is fine and 4.7+ is excellent

In the end, it's just 3 parts nested within 3 parts nested within 3 parts nested within....

Let's just do 3 stars (no decimal) and call it a day

nlh · 1h ago
Similarly - one of my biggest complaints about almost every rating system in production is how just absolutely lazy they are. And by that, I mean everyone seems to think "the object's collective rating is an average of all the individual ratings" is good enough. It's not.

Take any given Yelp / Google / Amazon page and you'll see some distribution like this:

User 1: "5 stars. Everything was great!"

User 2: "5 stars. I'd go here again!"

User 3: "1 star. The food was delicious but the waiter was so rude!!!one11!! They forgot it was my cousin's sister's mother's birthday and they didn't kiss my hand when I sat down!! I love the food here but they need to fire that one waiter!!"

Yelp: 3.6 stars average rating.

One thing I always liked about FourSquare was that they did NOT use this lazy method. Their score was actually intelligent - it checked things like how often someone would return, how much time they spent there, etc. and weighted a review accordingly.

Hizonner · 39m ago
I buy a lot of "technical things", and you constantly see one or two star ratings from people who either don't know what the thing is actually supposed to do, or don't know how to use it.

My favorites: A power supply got one star for not simultaneously delivering the selected limit voltage and the selected limit current into the person's random load. In other words, literally for not violating the laws of physics. An eccentric-cone flare tool got one star for the cone being off center. "Eccentric" is in the name, chum....

esperent · 16m ago
> for not violating the laws of physics.

I would personally frame that as a review for poor documentation. A device shouldn't expect users to know laws of physics to understand it's limitations.

Hizonner · 56s ago
If you don't know that particular law of physics, you have no business messing with electricity. You'll very likely damage something, and quite possibly damage someone.

We're talking about a general-purpose device meant to drive a circuit you create yourself. I'm not sure what a good analogy would be. Expecting the documentation for a saw to tell you you have to cut all four table legs the same length?

stevage · 23m ago
Or worse, a 1 star rating for a product they loved but there was a problem with delivery.
derefr · 10m ago
I take this not as people being dumb, but as a clear conflict of interest: people want to be able to rate the logistics provider separately from the product, but marketplaces don't want to give people the option to do that — as that would reveal that the marketplace will sometimes decide to use "the known-shitty provider" for some orders. (And make no mistake, the marketplace knows that that provider is awful!)
theendisney · 1h ago
With averages: to have 5 stars you need a hudred 5 star ratings for each one star rating.

If one would normalize the ratings they could change without doing anything. A former customer may start giving good ratings elsewhere making yours worse or give poor ones inproving yours.

Maybe the relevance of old ratings should decline.

ajmurmann · 31m ago
Is that actually bad? What happened is that we learned more about the customer's rating system. I might never have had Cuban food and love it the first time I try it on Miami but then keep eating it and it turns out the first restaurant was actually not as good as I thought, I just really like Cuban food.

This actually somewhat goes into another pet peeve of mine with rating systems. I'd like to see ratings for how much I will like it. An extreme but simple example might be that the ratings of a vegan customer of a steak house might be very relevant to other vegans but very irrelevant to non-vegans. More subtle versions are simply about shared preferences. I'd love to see ratings normalized and correlated to other users to create a personalized rating. I think Netflix used to do stuff like this back in the day and you could request your personal predicted score via API but now that's all hidden and I'm instead shown different covers off the same shows over and over

kayson · 1h ago
The normalization doesn't have to be "live". You could apply the factor at time of rating and then not change it.
stevage · 25m ago
I like rating systems from -2 to +2 for this reason.

The big rating problem I have is with sites like boardgamegeek where ratings are treated by different people as either an objective rating of how good the game is within its category, or subjectively how much they like (or approve of) the game. They're two very different things and it makes the ratings much less useful than they could be.

They also suffer a similar problem in that most games score 7 out of 10. 8 is exceptional, 6 is bad, and 5 is disastrous.

tibbar · 1h ago
One of my favorite algorithms for this is Expectation Maximization [0].

You would start by estimating each driver's rating as the average of their ratings - and then estimate the bias of each rider by comparing the average rating they give to the estimated score of their drivers. Then you repeat the process iteratively until you see both scores (driver rating, and user bias) converge.)

[0] https://en.wikipedia.org/wiki/Expectation%E2%80%93maximizati...

mzmzmzm · 22m ago
A problem with accounting for "above average" service is sometimes I don't want it. If a driver goes above and beyond, offering a water bottle or something else exceptional, occasionally I would rather be left alone during a quiet, impersonal ride.
rossdavidh · 39m ago
I have often had the same thought, and I have to believe the reason is that the companies' bottom line is not impacted the tiniest bit by their ratings' systems. It wouldn't be that hard to do better, but anything that takes a non-zero amount of attention and effort to improve, has to compete with all of those other priorities. As far as I can tell, they just don't care at all about how useful their rating system is.

Alternatively, there might be some hidden reason why a broken rating system is better than a good one, but if so I don't know it.

parrit · 31m ago
For uber you don't need a rating at all. The tracking system knows if they were late, if they took a good route and if they dropped you off at the wrong location.

Anything really bad can be dealt with via a complaint system.

Anything exceptional could be asked by a free text field when giving a tip.

Who is going to read all those text fields and classify them? AI!

healsdata · 20m ago
Counterpoint -- Lyft attempted to charge me a late fee when a driver went to the wrong spot in a parking by garage.
Retr0id · 1h ago
> I'm genuinely mystified why its not applied anywhere I can see.

I wonder if companies are afraid of being accused of "cooking the books", especially in contexts where the individual ratings are visible.

If I saw a product with 3x 5-star reviews and 1x 3-star review, I'd be suspicious if the overall rating was still a perfect 5 stars.

parrit · 37m ago
pbronez · 15m ago
One formal measure of this is Inter-Rater Reliability

https://en.wikipedia.org/wiki/Inter-rater_reliability

JSR_FDED · 27m ago
A++++ article!
stevage · 20m ago
Wow you remind me of eBay.
xnx · 1h ago
I don't understand why letter grades aren't more popular for rating things in the US.

"A+" "B" "C-" "F", etc. feel a lot more intuitive than how stars are used.

technetist · 58m ago
I think that ultimately you run into the same issue.

In US education you are taught that you need to get an A. Anything below a C, gets you on the equivalent of a “Performance Improvement Plan” in corporate world. And B is… well… B.

So with that rating engrained, people would probably feel bad about rating their ride-share driver a C when they did what was expected. And it wouldn’t stop companies from pushing for A ratings.

Even elsewhere like the food industry where they do have letter ratings, A is the norm with anything lower being an outlier.

Perhaps for this to work, it would need a complete systemic shift where C truly is the average and A and F are the outliers. In school C would need to be “did the student do the assignment.” And A would need to be “the student did the assignment, and then some.”

jsnell · 40m ago
There's nothing intrinsically intuitive about letter grades. It's just that you've been taught those specific arbitrary mappings.

Consider for example the "S" as a better grade than "A", originating from Japan but widely applied in gaming.

xboxnolifes · 37m ago
Even worse, is S actually good, or does the scale go SSS+, SSS, SS, S, A, B?
NegativeK · 1h ago
We'd still get the same pressure to give an A+ to every interaction unless things were fucked.

I used to rate three stars for what "performs as expected" until I realized that it's punishing good products. Switch to A-F would result in the same behavior, except it'd be Uber drivers trying to make a living instead of noxious parents declaring that their kid deserves an A.