The SuppVault Scoring Algorithm (Full Implementation)

Last updated: April 2026 · By Trenton Garza, Founder

Why This Page Exists

Most supplement review sites won’t tell you how their rankings are made. They’ll give you a paragraph about “expert analysis” and leave it there. I think that’s the wrong way to ask for trust.

This page publishes the actual scoring logic used by SuppVault. Not a marketing summary. The real functions, the real thresholds, the real dimensional weights. A competitor could read this page, rebuild the entire system, and point out every flaw. I’d rather that happen than hide behind vague claims of objectivity.

What the Score Is

The SuppVault Score is a 0–100 integer assigned to every scorable product in the catalog. It’s computed from six sub-dimensions, each scored 0–10. Here are the dimensions and what they measure:

Dimension	What It Measures	Range
clinical_dosing	% of key active ingredients with disclosed doses at or above clinical standard	0–10
label_transparency	Ratio of ingredients with visible individual doses vs hidden in prop blends	0–10
manufacturing	GMP status + third-party testing mentions beyond baseline regulatory requirements	0–10
testing	Third-party certification level (NSF, Informed Sport, BSCG, etc.)	0–10
inactive_ingredients	Absence of artificial colors, sweeteners, and unnecessary fillers	0–10
value	Cost per serving vs the category median (pre-workout vs pre-workout, protein vs protein, etc.)	0–10

The final 0–100 SuppVault Score is a category-weighted combination of these six. The weighting varies slightly by category because a pre-workout needs to be evaluated differently than a multivitamin — dose adequacy matters more for a pre-workout, while inactive ingredient quality matters more for a daily multi. Category-specific weighting is documented in each category’s rubric.

The Actual Scoring Functions

Here are the actual Python functions that compute each dimension. This is the code that runs on every product. You can verify it against any product’s displayed score.

1. Clinical Dosing

Counts how many key active ingredients have disclosed doses. Each key active with a disclosed dose counts toward the ratio. The score is that ratio multiplied by 10.

def score_clinical_dosing(brain_b: dict) -> tuple:
    """Score based on % of key ingredients at or above clinical dose."""
    ingredients = brain_b.get("normalized_ingredients", [])
    key_actives = [i for i in ingredients if i.get("is_key_active")]

    if not key_actives:
        return 5, "No key actives identified"

    # Count ingredients with doses (not in prop blends with NP dose)
    dosed = [i for i in key_actives 
             if i.get("dose_standardized_mg") 
             and i.get("dose_standardized_mg") > 0]
    total = len(key_actives)

    # Score based on ratio of dosed key actives
    ratio = len(dosed) / total
    score = min(10, max(1, round(ratio * 10)))

    return score, f"{len(dosed)}/{total} key ingredients with disclosed doses"

Translation: A product with all 10 key active ingredients disclosed at measurable doses gets 10/10. A product with 5 of 10 gets 5/10. A product with none (everything in a prop blend) gets 1/10.

2. Label Transparency

Distinguishes between "grouped under a header with doses still shown" and "actually hidden inside a prop blend." The key variable is hidden_ratio — how many ingredients have no visible dose vs the total.

def score_label_transparency(brain_a: dict, brain_b: dict) -> tuple:
    """Full = 10, partial = 5-7, full prop blend = 3.
    Key distinction: in_blend just means grouped under a header. If 
    individual doses are still shown, it's still transparent."""
    ingredients = brain_b.get("normalized_ingredients", [])
    if not ingredients:
        return 5, "No ingredient data"

    total = len(ingredients)

    # Count TRULY hidden doses: in a blend AND no individual dose shown
    hidden = [i for i in ingredients 
              if i.get("in_blend")
              and not (i.get("dose_standardized_mg") 
                       and i["dose_standardized_mg"] > 0)]
    disclosed = total - len(hidden)

    if len(hidden) == 0:
        return 10, "Full label disclosure"

    hidden_ratio = len(hidden) / total
    if hidden_ratio < 0.2:
        return 8, f"Mostly disclosed ({disclosed}/{total})"
    elif hidden_ratio < 0.5:
        return 6, f"Partial disclosure ({disclosed}/{total})"
    else:
        return 3, f"Proprietary blend ({len(hidden)}/{total} hidden)"

Translation: Full disclosure = 10. Less than 20% hidden = 8. Less than 50% hidden = 6. More than 50% hidden = 3. Note the "blend header" exception: a product listing ingredients under a named sub-blend (like "Pump Matrix") with the individual doses still shown gets full credit, because the transparency is still there.

3. Manufacturing

Evaluates manufacturing quality signals. GMP compliance is the legal minimum for any US supplement, so it earns partial credit; additional third-party testing mentions earn more.

def score_manufacturing(handle: str, cert_data: dict) -> tuple:
    """Score based on GMP and manufacturing quality signals."""
    certs = cert_data.get(handle, {})
    mentions_testing = certs.get("mentions_third_party_testing", False)
    gmp = certs.get("gmp_certified", False)

    if gmp and mentions_testing:
        return 10, "GMP certified + third-party tested"
    elif gmp:
        return 7, "GMP certified"
    elif mentions_testing:
        return 6, "Third-party testing mentioned"
    else:
        return 4, "No manufacturing certifications found"

Translation: Base score is 4 (the neutral default). GMP + testing = 10. GMP only = 7. Testing claim only = 6.

4. Third-Party Testing

Ranks certification level. Informed Sport (per-batch testing) scores highest, then NSF and BSCG (equivalent), then Informed Choice and Informed Protein (sample-based), then COAs and miscellaneous signals.

def score_testing(handle: str, cert_data: dict) -> tuple:
    """Score based on third-party certification level."""
    certs = cert_data.get(handle, {})

    if certs.get("informed_sport"):
        return 10, "Informed Sport certified"
    elif certs.get("nsf_certified"):
        return 9, "NSF Certified for Sport"
    elif certs.get("informed_choice"):
        return 8, "Informed Choice certified"
    elif certs.get("informed_protein"):
        return 8, "Informed Protein certified"
    elif certs.get("bscg_certified"):
        return 9, "BSCG certified"
    elif certs.get("coa_url"):
        return 6, "COA available"
    elif certs.get("mentions_third_party_testing"):
        return 5, "Third-party testing mentioned (no cert)"
    else:
        return 3, "No third-party certification found"

Translation: No certification at all = 3 (base). A COA available = 6. Informed Choice or Informed Protein = 8. NSF Certified for Sport or BSCG = 9. Informed Sport (every-batch testing) = 10. Note: NSF and BSCG are equivalent in this scoring; they test for different things but both represent rigorous third-party verification.

5. Inactive Ingredients

Looks for artificial colors, sweeteners, and unnecessary fillers. A minimal or natural ingredient profile scores highest; artificial colors and sweeteners score lower.

def score_inactive_ingredients(brain_b: dict) -> tuple:
    """Score based on other/inactive ingredient quality."""
    ingredients = brain_b.get("normalized_ingredients", [])
    non_key = [i for i in ingredients if not i.get("is_key_active")]

    if not non_key:
        return 9, "Minimal inactive ingredients"

    artificial_markers = ["artificial", "fd&c", "red 40", "blue 1", 
                          "yellow 5", "yellow 6", "titanium dioxide", 
                          "carrageenan"]
    sweetener_markers = ["sucralose", "acesulfame", "ace-k", "aspartame"]
    natural_sweeteners = ["stevia", "monk fruit", "erythritol", "thaumatin"]

    text_blob = " ".join((i.get("raw_name") or "").lower() for i in non_key)

    has_artificial = any(m in text_blob for m in artificial_markers)
    has_artificial_sweetener = any(m in text_blob for m in sweetener_markers)
    has_natural_sweetener = any(m in text_blob for m in natural_sweeteners)

    if not has_artificial and not has_artificial_sweetener:
        return 10, "No artificial colors or sweeteners"
    elif not has_artificial and has_natural_sweetener:
        return 9, "Natural sweeteners, no artificial colors"
    elif has_artificial_sweetener and not has_artificial:
        return 7, "Artificial sweeteners, no artificial colors"
    elif has_artificial and has_artificial_sweetener:
        return 5, "Contains artificial colors and sweeteners"
    else:
        return 6, "Some artificial ingredients"

Translation: Minimal inactives = 9. No artificial anything = 10. Natural sweeteners (stevia, monk fruit, erythritol) = 9. Artificial sweeteners only (sucralose, Ace-K) = 7. Artificial colors + sweeteners = 5. Note: this dimension is particularly subjective — some people don’t care about sucralose at all, and many excellent formulas use it because it’s cheap and effective. Treat this dimension as a preference signal, not a quality defect.

6. Value

Compares the product’s cost per serving against the category median. Well below median = high score; well above = lower score.

def score_value(handle: str, servings_data: dict, 
                category_medians: dict) -> tuple:
    """Score based on cost per serving vs category median."""
    product = servings_data.get(handle, {})
    cps = product.get("cost_per_serving")
    category = product.get("category", "unknown")
    cat_median = category_medians.get(category)

    if cps is None or cat_median is None:
        return 5, "Insufficient price/serving data"

    ratio = cps / cat_median if cat_median > 0 else 1.0

    if ratio <= 0.6:
        return 10, f"${cps:.2f}/srv — well below ${cat_median:.2f} median"
    elif ratio <= 0.8:
        return 8, f"${cps:.2f}/srv — below ${cat_median:.2f} median"
    elif ratio <= 1.0:
        return 7, f"${cps:.2f}/srv — near ${cat_median:.2f} median"
    elif ratio <= 1.3:
        return 5, f"${cps:.2f}/srv — above ${cat_median:.2f} median"
    else:
        return 3, f"${cps:.2f}/srv — premium vs ${cat_median:.2f} median"

Translation: 60% or less of category median = 10. Near the median = 7. 30%+ above median = 3 (premium). Value is computed category-relative: a pre-workout is compared against other pre-workouts, not against protein powder.

Clinical Dose Thresholds (Sample)

The clinical dosing dimension depends on a knowledge base of 2,492+ ingredient entries, each with documented dose ranges sourced from peer-reviewed research and Examine.com evidence grading. Here’s a sample of the key pre-workout ingredients and their thresholds:

Ingredient	Minimum Effective	Clinical Standard	Upper Range	Evidence
L-Citrulline	3,000mg	6,000–8,000mg	10,000mg	Strong (46+ trials)
Beta-Alanine	3,200mg	3,200–6,400mg	6,400mg	Strong
Caffeine Anhydrous	100mg	200–400mg	400mg	Strong (47+ trials)
Betaine Anhydrous	1.5g	2.5g	20g	Strong
Creatine Monohydrate	2.5g	3–5g	25g (loading)	Strong (170+ trials)
L-Theanine	50mg	100–400mg	800mg	Strong (26+ trials)
L-Tyrosine	500mg	100–150mg/kg	13,500mg	Moderate
Nitrosigine®	750mg	1,500mg	2,000mg	Strong
Alpha-GPC	300mg	300–600mg	1,200mg	Moderate (26+ trials)
Taurine	1,000mg	1,000–3,000mg	6,000mg	Strong
Agmatine Sulfate	250mg	1,000mg	2,670mg	Moderate

This is a sample of 11 ingredients from 2,492+ in the knowledge base. Each entry also includes mechanism descriptions, synergy information, what-you-feel guidance, known label tricks, and Examine.com Grade ratings where available. The full database will be published in a future version of this page.

The Data Pipeline

The scoring functions above need accurate input data to work. That comes from a multi-stage pipeline:

Brain A (Vision OCR): Three AI vision models (Gemini 3.1 Pro, Claude, GPT) independently extract the Supplement Facts panel from each product’s label images. Disagreements are flagged for manual review.
Brain B (Normalization): Raw ingredient names are canonicalized against the ingredient knowledge base. “Citrulline Malate 2:1” and “L-Citrulline Malate (2:1 ratio)” both map to the same canonical entry with automatic yield conversion (1g L-Citrulline = 1.76g 2:1 Citrulline Malate).
Brain C (Research): Multi-tier web research verifies each product against manufacturer sources, PricePlow, Examine.com, and published research.
Brain D (Cross-check): Multi-AI formula verification catches disagreements between sources.
Brain E (Compliance): DSHEA/FTC compliance screening plus banned substance flagging for athlete compliance.
Score computation: The six dimension functions above run on the verified product data.
Category weighting: Sub-scores are combined using category-specific weights appropriate to the product type.

Each stage writes JSON files that can be audited independently. A product’s final score can be traced back through every stage to the original label image.

Known Limitations

Publishing the algorithm means publishing its flaws. Here are the ones I know about:

Prop blends done right are undervalued. The algorithm can’t distinguish a well-dosed prop blend from one hiding pixie dust. Both get penalized equally on transparency. Great products with real-world prop blends may score lower than they deserve. See the blind spots section of the methodology article for the full breakdown.
Hybrid products fit poorly. A product that’s part pre-workout, part fat burner, part pump formula gets routed to one category template and scored against products it’s not really comparable to.
The inactive ingredients dimension is subjective. Many excellent formulas use sucralose. The algorithm treats it as mildly negative. That’s a preference, not a universal fact.
Novel ingredients get neutral scores until research catches up. If a compound doesn’t have enough published trials yet, it’s scored as neutral rather than penalized or rewarded. This lags the frontier of the supplement industry by 1–3 years for cutting-edge ingredients.
Label OCR is not perfect. The three-model cross-check catches most errors, but some label extractions are wrong. If you spot a score that doesn’t match what’s actually on the label, report it and I’ll re-run the extraction.

How to Audit a Specific Score

Want to check whether a product’s score is accurate? Here’s how I would do it:

Pull up the Supplement Facts panel. Either on the product page or directly from the manufacturer.
Identify the key active ingredients. These are the ingredients with dose targets in the clinical literature — the ones the algorithm is grading.
Check each dose against the thresholds above. Is L-Citrulline at 6g or 3g? Is caffeine at 200mg or 50mg?
Run the functions mentally. If 9 out of 10 key ingredients are disclosed with doses, clinical_dosing should score around 9/10. If 3 out of 10 are disclosed, around 3/10.
Compare to the displayed score. If it doesn’t match what you calculated, either the algorithm has a bug, the label OCR extracted something wrong, or you’re missing category weighting context. Any of those are reportable.

Frequently Asked Questions

How is the SuppVault Score calculated?

The SuppVault Score is a 0–100 composite of 6 dimensions: clinical dosing, label transparency, manufacturing, testing, inactive ingredients, and value. Each dimension is scored from raw product data extracted by our 7-stage Brain pipeline. Per-dimension code is published above.

Can I see the actual code behind the score?

Yes. This page publishes all six Python scoring functions verbatim from compute_score_dimensions.py. The algorithm is fully open — no black box. Every threshold, weight, and conditional is documented and auditable.

Are proprietary blends penalized?

No. Proprietary blends are not penalized as a category — they naturally score lower on the label transparency dimension because per-ingredient doses cannot be verified. A well-formulated prop blend with a transparent total dose, clean inactive ingredient list, and third-party testing can still score in the 70s.

What is the difference between an Excellent score and a Good score?

Excellent (90–100) means every clinical-dose threshold is met, the label is fully transparent, the product is third-party tested, and the brand carries NSF or Informed Sport-tier certifications. Good (75–89) means most thresholds are met, the label is transparent, and manufacturing is GMP-compliant. Below 75 indicates at least one dimension has structural issues.

How often are SuppVault Scores updated?

Scores recompute when product label data, formulation, or dosing changes. We re-pull and re-score the full catalog quarterly, and recompute individual products immediately when reformulations are detected via the Brain D verification pass.

Can a brand pay to improve its SuppVault Score?

No. SuppVault takes zero sponsorship dollars, no affiliate revenue from scored brands, and no paid placements. The algorithm runs identically across every product whether the brand has a relationship with the store or not. The full code is published on this page; tampering would be auditable.

What does the SuppVault Score not measure?

The score does not measure subjective experience (taste, mixability), personal tolerance to stimulants, or whether the product is right for your specific goal. It measures whether the product is well-formulated and honestly labeled — not whether it is right for you specifically.

Found a Bug? Built a Better Version?

The fastest way to reach me is Instagram DMs at @trentongarza. I read everything. If you’ve found a product whose score is genuinely wrong, or a scoring edge case I haven’t thought about, send it over.

The goal is to keep making this more accurate over time. The fastest way to do that is to get better criticism.

Trenton Garza

Founder, SuppVault · 14 Years in Supplements

I built SuppVault because I spent 14 years reading supplement labels and seeing the same thing over and over: marketing claims that don’t match what’s actually in the tub. Every score on this site comes from data, not sponsorships.

Full bio → · @trentongarza

Frequently asked questions

How does the SuppVault Score work?

The SuppVault Score is a 0–100 rating built from six independent dimensions: clinical dosing, label transparency, manufacturing quality, third-party testing, inactive ingredients, and value. Each dimension is scored separately, weighted by category (a pre-workout is weighted differently than a multivitamin), and combined into the final score.

How is the SuppVault Score calculated?

We use a Python pipeline that parses every supplement label, normalizes ingredient names against a 2,492-entry knowledge base, looks up clinical-dose thresholds for each active ingredient, and runs six dimension scorers. The full source code for the dimension scorers is published on this page so anyone can audit the math.

Why don't proprietary blends get a lower SuppVault Score?

Proprietary blends are not penalized by the algorithm. They naturally score lower on the label-transparency dimension because the dose of each ingredient is hidden, but a well-formulated prop blend at clinical doses can still earn a high overall score. We don't punish prop blends — we reward transparency.

What does a SuppVault Score of 90+ mean?

A score of 90 or higher means the product hits clinical doses on every key active ingredient, discloses every dose on the label, is manufactured in a third-party-certified facility, uses clean inactive ingredients, and is priced reasonably for what it delivers. Fewer than 5% of supplements in our database score this high.

Can I see the source code for the SuppVault Score?

Yes. The full Python source for all six dimension scorers — clinical_dosing, label_transparency, manufacturing, testing, inactive_ingredients, and value — is published on this page with the actual code blocks, threshold tables, and weighting logic. We believe scoring transparency is the only honest way to rate supplements.

Are SuppVault Scores audited by humans?

Yes. Every score goes through a Brain D cross-check stage that compares the algorithm's verdict against the original label OCR, a Brain E compliance review for athlete-safety claims, and a final spot-check by Trenton Garza for any product that scores above 85 or below 30. Algorithms are fast; humans catch edge cases.

What are the limitations of the SuppVault Score?

The score has four known blind spots: it can underrate well-formulated proprietary blends, it can't distinguish between hybrid products that span two categories, certifications favor larger brands with bigger compliance budgets, and novel ingredients without published clinical trials default to the lowest evidence tier until research catches up. We publish these limitations openly.

Does the SuppVault Score replace medical advice?

No. The SuppVault Score rates how well a supplement is formulated and labelled — it does not tell you whether you personally should take it. For dosing decisions, drug interactions, or any health condition, ask a registered dietitian or physician. We are not a substitute for personalized medical advice.