Skip to main content

Interest — Reference

The Interest model identifies the most relevant interests for each customer by analysing labelled events in your data warehouse. It is a descriptive model and supports three interest types:

  • Product Interest — the interest label is already present in the event row (e.g. a product category). No classification step needed.
  • IAB Interest — requires a Topic Classification step that scrapes and classifies page content into standard IAB categories.
  • Custom Interest — also uses Topic Classification, but against a custom taxonomy defined for your business.

Input data: see AI Model Data Requirements → Interest for the event-table fields each interest type needs.


How it works

  1. Input data — reads from an event table. Each row must include at least:

    • a user identifier (user_id_column, typically bpp_user_id),
    • an event timestamp (date_column),
    • the interest field (interest_column) — either the interest label (Product Interest) or the page URL (IAB / Custom Interest).
  2. Interest column

    • Product Interest — the interest is read directly from interest_column, e.g. "Shoes||Sportswear||Running" (hierarchical labels joined with ||).
    • IAB / Custom Interestinterest_column holds the page URL. The Topic Classification pipeline scrapes and classifies the page, writing the label before analysis.
  3. Aggregation — interests are grouped and counted per user over a configurable lookback window (lookback_days).

  4. Statistical scoring — for each interest, the population mean and standard deviation of its frequency are computed. Each user–interest pair gets a z-score:

    z = (count - mean) / std
  5. Threshold filtering — only interests with a z-score above threshold are kept. A higher threshold keeps only interests that are significantly more frequent than average for that user.

  6. Top-N selection — the top interests per user are retained.

  7. Output — results are written to a dedicated BigQuery table, one row per user with their selected interests as a JSON list.


JSON configuration reference

FieldTypeRequiredDefaultDescription
data_src.regionSTRINGCloud region of the dataset (auto-populated by the platform)
data_src.project_idSTRINGGCP project ID (auto-populated)
data_src.dataset_idSTRINGBigQuery dataset (auto-populated)
data_src.source_table_idSTRINGInput event table ID (a reconciled *_bpp table)
data_src.date_columnSTRING"event_date"Column with the event timestamp
data_src.user_id_columnSTRINGUser identifier column (typically bpp_user_id)
interest_columnSTRINGField containing the interest label (Product) or page URL (IAB/Custom)
lookback_daysINT7Number of days of event history considered
thresholdFLOAT0.7z-score threshold for selecting interests

Example configuration

{
"data_src": {
"region": "europe-west8",
"dataset_id": "bpp_tables",
"project_id": "example-project",
"date_column": "event_date",
"user_id_column": "bpp_user_id",
"source_table_id": "event_pageview_bpp"
},
"threshold": -0.5,
"lookback_days": 60,
"interest_column": "page_title"
}

Output structure

FieldTypeDescription
bpp_user_idINT64Unified Bytek ID of the user
taxonomy_nameSTRINGInterest classification type: product, iab, or custom
valueJSONList of the user's selected interests
created_atDATETIMEProcessing timestamp

These fields are available in the audience builder once the model completes — see Predictions → Interest.


Use cases

  • Segmentation — build audiences of users with strong affinity for certain products, categories, or topics.
  • Recommendations — personalise product or content recommendations from dominant interests.
  • Enrichment — feed interest profiles into your CRM, CDP, or ad platforms for better targeting.

Best practices

  • Product Interest — ensure the event table has a clean, hierarchical interest_column (use || for multi-level categories).
  • IAB / Custom Interest — confirm Topic Classification has run before Interest analysis, and that interest_column holds absolute, crawlable URLs (including https://).
  • Threshold — lower values are more inclusive (broader interest sets); higher values are stricter (only the strongest interests).
  • Lookback — use longer windows (60–90 days) for long consideration cycles, shorter (7–14 days) for fast-moving products.