I own a supplement store. I also built a scoring system that rates every product in that store on a 0–100 scale. You should probably be skeptical of that. I would be. This article explains exactly how the scoring works, why I built it instead of just picking my favorites, and — most importantly — what it can’t tell you that you still need to know.
By Trenton Garza · April 2026 · See also: Methodology summary →
The Short Version
- The SuppVault Score is a 0–100 formulation quality rating based on six dimensions.
- It’s computed algorithmically from label data — same label in, same score out.
- Prop blends are NOT penalized. They score lower on transparency naturally, but there’s no extra deduction.
- Scores are category-relative: a pre-workout is compared to other pre-workouts.
- We sell products scoring from 35 to 97. The range is the proof the score is honest.
- It does not measure taste, individual response, or whether you’ll actually enjoy taking it.
Why We Built a Scoring System
The supplement industry has a transparency problem. Here’s what a typical buyer walks into:
- Proprietary blends that hide individual doses. A “Performance Matrix 5,000mg” might contain 4,900mg of cheap filler ingredients and 100mg of the expensive ones.
- “Clinically studied ingredients” in marketing copy that does not mean “clinically studied doses.” A product can contain 50mg of an ingredient that requires 3,000mg to be effective and legally advertise it as clinically studied.
- Influencer endorsements that tell you nothing about the formula. Marketing budget has zero correlation to formulation quality.
- Review sites that use subjective criteria ("gave me great energy") or hide their affiliate relationships.
I spent 14 years buying supplements before I owned a store. I watched the category evolve from the prop-blend DMAA era to today’s fully disclosed labels, and I noticed the same thing every year: a buyer had no way to quickly tell whether a product was actually dosed right or just marketed well. I built the SuppVault Score to answer that question with data.
The Six Dimensions of the Score
Every product goes through the same six-dimension evaluation. Each dimension scores 0–10. The overall 0–100 score is computed from these dimensions with category-appropriate weighting.
1. Clinical Dosing
For every key active ingredient on the label, we compare the dose per serving against the clinical standard from our ingredient knowledge base — 2,492+ entries with documented effective dose ranges sourced from peer-reviewed research and Examine.com grades.
Example: L-Citrulline has a clinical standard of 6,000–8,000mg for blood flow and performance, backed by over 46 clinical trials and 5 meta-analyses (Examine.com Grade A for plasma arginine elevation). A pre-workout with 6,000mg gets near-full marks. A pre-workout with 3,000mg gets partial credit (above the minimum effective threshold). A pre-workout with 500mg gets minimal credit — that’s label decoration, not dosing.
The system is graduated, not binary. Below the minimum effective dose scores near 1/10. At the clinical low end scores 8-9/10. Above the clinical midpoint scores 10/10. This matters because rigid thresholds create weird results — a product with 5,900mg citrulline shouldn’t score catastrophically worse than 6,000mg.
2. Label Transparency
Products with fully disclosed labels — every ingredient and exact dose listed individually — score highest. Products with proprietary blends score lower because we can’t verify individual doses.
But — and this is important — prop blends are NOT penalized beyond the natural lower transparency score. We don’t subtract extra points for the format. Some brands use prop blends to protect novel formulations. Some use them to hide underdosing. The score doesn’t assume bad intent; it just scores what’s observable.
This is a genuine design choice and I want to be clear about it: a good prop blend product can score in the 70s or 80s. It can’t reach 90+ without full transparency, because transparency is itself a quality signal. But we’re not treating proprietary blends as a defect.
3. Manufacturing
Here’s where most scoring systems go wrong: they give points for GMP compliance. But GMP is the legal minimum for every US supplement — it’s required by FDA 21 CFR Part 111. If a brand is selling legally, they’re GMP-compliant. Giving points for it would inflate every score equally and tell you nothing.
What actually earns points in this dimension: brand-owned manufacturing facilities (full process control instead of contract manufacturing), additional voluntary certifications beyond baseline GMP, and documented quality control systems. Hi-Tech Pharmaceuticals, for example, owns their entire manufacturing plant — that’s a meaningful signal beyond legal compliance.
4. Third-Party Testing
Independent verification by NSF Certified for Sport, Informed Sport, BSCG Certified Drug Free, USP Verified, ConsumerLab, or Informed Choice earns points here. The different programs test for different things — see the certifications guide for the full breakdown.
Important caveat: most products carry no third-party testing at all. This isn’t unique to “bad” brands — certification programs are expensive and favor established brands with institutional customers. Many excellent formulas from smaller brands score lower in this dimension simply because the brand can’t justify the per-batch cost of certification. That’s a real cost to consider, but it’s not a quality judgment on the formula itself.
5. Inactive Ingredients
This dimension rewards formulas that don’t dilute the active ingredients with unnecessary fillers. Heavy use of FD&C artificial dyes (Red 40, Yellow 5, Blue 1), titanium dioxide, excessive maltodextrin, or long lists of sweeteners scores lower.
This isn’t about being a purist — a few common excipients are fine. It’s about rewarding the products that put the budget into the actives rather than filling the tub with cheap carriers.
6. Value
Price per serving compared to the category median. A product costing significantly less than category average while hitting clinical doses scores well. A product costing significantly more than the median needs to justify it with premium ingredients, certifications, or other differentiators.
Value is the lowest-weighted dimension because price alone doesn’t determine quality — expensive can be worth it and cheap can be excellent. It earns a slice of the score, not the majority of it.
What the Score Distribution Looks Like
Across the ~1,675 products we’ve scored, the distribution is not evenly spread. Most products cluster in the 70–85 range because that’s where competent-but-unoptimized formulas sit. The tails are meaningful:
| Range | Rating | What It Takes to Score Here |
|---|---|---|
| 90+ | Exceptional | Clinical doses on every key active, fully disclosed label, premium ingredient forms, clean inactive list. Only the top ~5–10% of products reach here. |
| 80–89 | Strong | Most actives hit clinical doses. Transparent label. Minor gaps (one ingredient slightly underdosed, no third-party testing, average value). |
| 70–79 | Average | Some actives at clinical dose, some underdosed. Functional product but not optimized. Where most of the catalog sits. |
| 60–69 | Below Average | Meaningful underdosing or transparency issues. The label promises more than the formula delivers. |
| <60 | Poor | Heavy proprietary blending that prevents dose verification, pervasive underdosing, or missing key ingredients entirely. I still carry these products because transparent scoring means showing the full picture. |
Where the Data Actually Comes From
The score is only as good as its inputs. Here’s where every data point comes from:
Label data is extracted directly from manufacturer Supplement Facts panels using an AI vision pipeline (three different model types cross-check each other to catch extraction errors). Ingredient clinical doses come from our ingredient knowledge base, which synthesizes peer-reviewed research summaries, Examine.com’s evidence grading system, and published meta-analyses.
Certification claims are cross-referenced against the public databases maintained by NSF, Informed Sport, BSCG, USP, and ConsumerLab. If a product claims a certification but doesn’t appear in the certifying body’s database, the testing dimension reflects that.
The Conflict of Interest — And Where the Algorithm Gets It Wrong
I own the store. I score the products in the store. That’s a conflict of interest. And here’s the part most review sites won’t tell you: even when the scoring is honest, the algorithm has real blind spots. Both things are true, and I’d rather tell you about both than pretend either one isn’t there.
How I handle the conflict
- The score is algorithmic. No human editor decides what a product scores. Label data goes in, dimensional scores come out. If I wanted to boost a product, I’d have to change the algorithm itself — which would affect every product in the category, visible to anyone who compares.
- I sell products scoring below 50. Many of them. If I only sold high-scoring products, the score would be meaningless — it would just be a marker for “products we chose to carry.” Browse any category and you’ll find products in the 40s next to products in the 90s. The distribution is the proof.
- No paid placements. No featured tier. No sponsored position on “best of” rankings. No brand can pay to boost their score or jump to the top of a collection sort.
- The full algorithm will be published. Not a marketing summary. The actual code, the dose thresholds, the weighting logic, the category router — open for anyone to audit. I’m working on a dedicated page that publishes the full implementation so a competitor, a researcher, or a skeptical customer can check my work and tell me where I got it wrong.
Where the algorithm is genuinely flawed
This is the part most review sites won’t admit. A strict algorithm can’t capture everything that makes a product good, and I didn’t override the scores to “fix” the cases where it falls short. I kept them as the algorithm computed them. But you should know where the blind spots are, because some of my favorite products score lower than they deserve for reasons that have nothing to do with the formula being bad.
Blind spot #1: Proprietary blends done right
I said earlier that prop blends aren’t penalized — that’s true. But the algorithm also can’t give a great prop blend product the full credit it deserves. Some of the best formulas I carry have prop blends because the brand is genuinely protecting a unique formulation, and the dose distribution inside that blend is actually dialed in. The algorithm can’t see that. It scores the visible doses and gives a neutral mark to the hidden ones. That means a prop blend with excellent dose distribution gets essentially the same score as a prop blend hiding 90% pixie dust — because from the outside, the algorithm can’t tell them apart.
Practical translation: If you see a prop blend product scoring in the mid-70s from a brand you trust, don’t write it off as average. It might be one of the best products in the category that just can’t prove it in a scoring system.
Blind spot #2: Category edge cases
The algorithm is category-specific, which is usually the right call — you want a pre-workout compared to other pre-workouts, not to protein powder. But some products don’t fit cleanly into one category. A hybrid stack that’s part pre-workout, part fat burner, part pump formula — which rubric applies? The algorithm picks one, which means the product gets scored against things it’s not really comparable to. Those scores are less reliable. When in doubt, read the ingredient list and the dose breakdown directly instead of leaning on the score.
Blind spot #3: Certifications favor established brands
A $40 pre-workout from a small transparent brand with clinical doses but no NSF/Informed Sport seal scores lower on the testing dimension than a $50 pre-workout from a mainstream brand with the cert but weaker doses. That’s accurate to what you can verify about each product, but it sometimes produces results where the objectively better formula scores lower overall.
Practical translation: Tested athletes need the cert regardless — a positive drug test ends careers. Everyone else should weight clinical dosing more heavily and treat certification as a tiebreaker, not a requirement.
Blind spot #4: Novel ingredients with thin research
A few genuinely effective compounds don’t have enough published research yet for the knowledge base to assign clinical doses. The algorithm gives them a neutral score contribution — neither credit nor penalty — which means a product using cutting-edge ingredients can look average on paper while being genuinely innovative. The score catches up as research accumulates, but it’s always behind the frontier. If you’re chasing something new, the score isn’t the right tool.
What I do about all this
I don’t override scores. I don’t adjust them to match my preferences. I publish them as the algorithm computes them, flaws included. What I do is write articles like this one to explain where the algorithm is and isn’t reliable, so you can use the score as one input into a decision — not the only input.
And I’m publishing the full algorithm. Not a marketing summary of it. The actual implementation — the code, the dose thresholds, the weighting logic, the category router. If you find a flaw I didn’t list here, tell me and I’ll add it. If you can build a better version, I want to see it. Transparency about the method is the only way I can ask you to trust scores that come from inside a store.
That’s the best I can do from this position. The conflict of interest is named. The algorithm’s blind spots are named. The code is going to be public. If something still doesn’t feel right, that’s a signal worth trusting — and I’d rather you find the flaw than pretend it isn’t there.
What the Score Cannot Tell You
Being honest about limitations is part of the score’s credibility. Here’s what it does not measure:
- Taste and mixability. A product scoring 92 can taste terrible. The score evaluates the formula, not the drinking experience. If you already know you hate a flavor, no amount of clinical dosing will make you enjoy taking it.
- Individual response. Caffeine tolerance, genetics, training experience, diet, sleep quality, and stress levels all affect how you respond. The score can’t predict whether a specific product will work for you specifically.
- Actual powder-in-the-tub verification. We score what the label declares. Third-party certifications (NSF, Informed Sport, BSCG) verify that the powder matches the label. That’s why certified products earn credit in the testing dimension.
- Safety for your specific situation. Drug interactions, medical conditions, pregnancy, age, and medication schedules aren’t factored in. Talk to a doctor before starting a new supplement, especially if you have health conditions or take prescription medications.
- Long-term effects. The score reflects current evidence. Research evolves. A highly scored ingredient today might be reclassified as emerging evidence matures.
- Cross-category comparison. A pre-workout scoring 89 and a protein powder scoring 89 went through the same dimensional evaluation but with category-specific weighting. You can compare scores within a category cleanly; across categories it’s more approximate.
Frequently Asked Questions
How is this different from other supplement rating sites?
Most rating sites use editorial judgment ("our team picked their favorites"), undisclosed affiliate relationships, or consumer satisfaction reviews. The SuppVault Score is a data pipeline: label extraction → ingredient knowledge base lookup → dimensional scoring → output. Two analysts running the same product through the same system get the same score. That’s not better in every way — editorial rating has its place — but it’s a different thing.
A product I use has a low score. Should I stop using it?
Not necessarily. A low score means the formula doesn’t align well with clinical dose recommendations or has transparency issues — it doesn’t mean the product can’t work for you. Many popular products score below average because their marketing budgets exceed their ingredient budgets. If you feel a product is working and you’re getting results you care about, that’s valid data your body is telling you. The score is additional information, not a directive. Use it to make more informed future choices, not to second-guess results you’re already getting.
Can brands pay for a higher score?
No. The scoring algorithm runs automatically from label data. There’s no human editor who approves scores, no way to “featured” a product, and no paid placement program. A brand that wants a higher score has to reformulate the product. If that changes in the future, I’ll disclose it directly on this page — but as of April 2026, no brand has ever paid anything toward their score.
How often do scores change?
Scores change in three cases: (1) the manufacturer reformulates the product and we re-extract the new Supplement Facts panel, (2) our ingredient knowledge base is updated with newer research that shifts dose recommendations, or (3) we improve the scoring algorithm itself (rare, but happens when we find gaps or weighting issues). Most scores are stable for months at a time.
Should I only buy 85+ products?
No, and I’d argue against treating the score that way. A 72-scoring product that fits your budget, tastes great, and hits the ingredient you actually care about is a better choice for you than a 91-scoring product that costs twice as much and you won’t enjoy taking. Use the score to compare within a category (“which pre-workout should I pick?”) rather than as an absolute threshold (“only buy 85+”). The comparative use is where the score is most reliable.
How does the score handle novel or cutting-edge ingredients?
Ingredients without established clinical dose ranges (e.g., very new compounds with limited human trial data) get a neutral score contribution — they don’t boost or penalize the product. As research accumulates and dose ranges are established, those ingredients get added to the knowledge base and products containing them are re-scored.
What happens when a product’s label is unclear or unreadable?
The three-model OCR pipeline (Gemini 3.1 Pro + Claude + GPT) cross-checks label extractions. When models disagree on a dose, the product is flagged for manual review. If the label genuinely can’t be read (damaged photo, non-English only, etc.), the product doesn’t get scored until we can get a clean label source. You’ll occasionally see products on the site without a SuppVault Score — that’s why.
See the score in action: Browse the pre-workout collection sorted by SuppVault Score, read the Best Pre-Workout 2026 guide, or check the methodology summary for the short version.
Related Guides
I built SuppVault because I spent 14 years reading supplement labels and seeing the same thing over and over: marketing claims that don’t match what’s actually in the tub. Every score on this site comes from data, not sponsorships. If a product from a brand I like scores low, it scores low.