September 25, 2025

Debunking Objectivity in Security Scoring Models

Jeremiah Grossman

Blog Details Image

Cybersecurity has a scoring addiction. Risk scores, risk ratings, prioritization systems. If it can be scored, ranked, shaded red, or slapped with a letter grade, it has been done. Multiple times. Usually in slightly different fonts.

Some models size up the security posture of an entire company. Others zoom in on a single alert, asset, or vulnerability. Public examples like CVSS, FAIR, and EPSS are familiar. Most others are proprietary black boxes cooked up by vendors, analysts, or that one developer who insists their machine learning model is, “basically sentient.”

The promise is simple: turn messy technical details into something clean, digestible, and actionable. In theory, this makes decisions easier, communication smoother, and response faster. And because the word “algorithm” is involved, people assume the model must be intelligent, impartial, and scientific. More accurate than asking a room full of experienced security professionals for their opinions.

Here is the uncomfortable truth: Most scoring models are distilled expert judgment dressed up in equations. This is especially true for the proprietary models offered by vendors. The math, if you’re even allowed to see the algorithm, makes the conclusion look inevitable, but in reality, the model is just automating the intuition of whoever built it.

The Subjectivity Problem

Security scoring models may look objective, but they are usually built on subjective ideas of what “right” should be and often without defining what “right” means in terms that matter. For example, whether the model actually reduces breaches or prevents financial loss. Having built and analyzed security scoring models for years, I always come back to the same question: how do we know a model is “right”? And if we tweak it to be “more right,” how do we know it truly improved rather than worsened?

Anyone who has tuned a scoring model knows the drill. Run it. Look at the output. Adjust the weights. Rerun it. Repeat until the results line up with what you expected all along. If they do not, it is not called a discovery. It is called a bug.

Let’s apply this to any vulnerability scanner ranking model. If the model ranks Vulnerability A above Vulnerability B and the builder thinks the result is “wrong”, the weights get changed. Maybe a new variable is added. Maybe a dataset is swapped. Eventually, the model agrees with the builder. At that point, the math is not discovering truth; it is mirroring opinion.

The real point of a vulnerability scanner ranking model should be to surface to the top vulnerabilities that actually lead to breaches and result in financial loss. Fix those, and the likelihood of breach measurably goes down. Yet, most models are not validated against those types of real-world outcomes. Instead, they are judged on how well they match what their builder already believed.

Consensus is not accuracy. Math stemming from made-up variable weights is not proof. And here is the kicker. Half the time, no one even knows whose expert opinion the model represents. Who are we supposed to trust? Do we know their names? Do they still work at VendorX? Can we ask them why they thought their algorithm was right? They may be long gone, and no one really knows how the model works. 

Expert Models Aren’t Useless

Let’s also be fair. This does not mean expert-driven models are bad. Experts can be right, and encoding their instincts into repeatable systems can be useful at scale. But let’s be honest about what these models really are: formalized versions of someone’s judgment. They are not immutable laws of physics.

The Real Test

The best scoring models are not the ones that align with intuition or win arguments on Reddit. The endgame should not be a perfect bell curve that looks great on a powerpoint slide. An accurate security scoring model reflects real-world risk as it plays out. Whether or not an expert agrees is beside the point. In vulnerability management, that means prioritizing the vulnerabilities most likely to be exploited in the wild and cost your organization real money. That is when a model stops being a guess and starts being useful.

After all, no color scheme in existence can make a bad model good.

Stay Tuned For More

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.