Skip to main content

Identifiers & Hashing

Identifiers are the foundation of BPP. They let the platform resolve user identities across multiple data sources and unify events and attributes into a single customer profile — the Bytek ID.

This page is the practical reference for which identifiers to provide, how BPP normalizes and hashes PII, and how identifiers are mapped. For the conceptual walkthrough of how matching and stitching work, see User Reconciliation.


How identifiers map to the Bytek ID

When you map your dataset in the Data Source Manager, each column that represents an identifier is declared with:

  • Identifier type — a canonical name (e.g. email, cookie_id, crm_contact_id).
  • PII flag — whether the column holds raw personal data.
  • Requires hashing — whether BPP should hash it on ingestion.

BPP matches identifiers by their type and value, not by column name. If the same identifier type appears in multiple tables, all of them are linked to the same Bytek ID — even when the underlying columns are named differently.

Example: a column named hashed_email in users_main and a column named hem in events_web can both be declared as identifier type email. BPP treats them as equivalent and joins users across the two tables.

table_namefield_nameidentifier_typeis_pii
users_mainhashed_emailemailno
events_webhememailno
events_webfp_cookie_idcookie_idno

If a single event row contains both hem and fp_cookie_id, both are linked to the same Bytek ID.

:::warning User-level identifiers only Map only user-level identifiers (email, phone, cookie, CRM contact ID). Entity-level IDssubscription_id, account_id, order_id, crm_account_id — must not be declared as user identifiers. :::


Automatic PII normalization and hashing

For identifiers flagged as PII that requires hashing, BPP normalizes and hashes the value during ingestion, so you can provide raw PII safely. BPP never stores plain-text PII.

If a value arrives already hashed at source, BPP detects this and skips re-hashing.

General process

  1. Trim leading and trailing whitespace.
  2. Lowercase all text.
  3. Normalize provider-specific quirks (e.g. Gmail rules).
  4. Hash with SHA-256 (hex, lowercase).

Per-type rules

Email

  • Trim, lowercase.
  • For @gmail.com / @googlemail.com: remove dots (.) from the local part and drop +tag suffixes (e.g. John.Doe+promo@gmail.comjohndoe@gmail.com).
  • SHA-256 after normalization.
  • You may provide pre-hashed emails (SHA-256) or raw emails — BPP normalizes and hashes automatically.

Phone number

  • Remove spaces, parentheses, and dashes.
  • Convert to E.164 format (e.g. +14155552671).
  • SHA-256 after normalization.

Name + surname

  • Lowercase, trim, remove diacritics.
  • Concatenate fields (e.g. john + doe) and apply SHA-256.
  • Typically used only for offline match or identity enrichment.

Postal address

  • Lowercase, remove punctuation, standardize abbreviations (StStreet).
  • Concatenate into a single string and apply SHA-256.

Device identifiers (non-PII) — GA Client ID, GA4 user pseudo-ID, first-party cookie, mobile advertising ID (IDFA, GAID)

  • Already anonymized; no hashing required. Stored as-is for behavioural analysis and cross-device linking.

Common identifier types

Identifier typeExample field namesDescriptionPIIHashed
hashed_emailhem, hashed_email, email_hashSHA-256 hashed user emailNoYes
emailemail, user_emailRaw user email (BPP will hash)YesNo
hashed_phonehphone, phone_hashSHA-256 hashed phone numberNoYes
phonephone, mobile_numberRaw phone number (BPP will hash)YesNo
cookie_idfp_cookie_id, ga_client_idFirst-party cookie / GA identifierNoNo
device_ididfa, gaid, device_idMobile device / app identifierNoNo
crm_contact_idcrm_contact_id, hubspot_vidCRM contact-level identifierNoNo
domain_iddomain_idDomain-level ID for web identityNoNo

The Bytek ID

The Bytek ID is the system-generated, anonymized user key created during identity resolution.

  • Each unique person receives one Bytek ID.
  • All their sub-identifiers (email, phone, cookie, CRM contact ID, …) link to it.
  • When new identifiers appear, the identity graph is updated and merged on the next daily run.
  • BPP writes the resolved key back to your warehouse as a bpp_user_id column on the enriched copy of each table.

This unified key enables consistent joins, aggregations, and model training across time, channels, and systems. See User Reconciliation for merge rules and coverage metrics.


Best practices

  • Include at least one stable user identifier in every user and event table.
  • Use consistent identifier types across datasets — the same concept must share the same identifier type everywhere.
  • Ensure user identifiers form the primary key of the user table: at least one always populated, no duplicates.
  • Provide multiple identifiers where possible to maximize match rates.
  • Never tag entity-level IDs (subscription_id, order_id, account_id) as user identifiers.
  • Hash PII consistently (correct casing, whitespace, Gmail normalization, E.164 phones) so hashes match Google and Meta — or let BPP hash it by flagging the column as PII.

Summary

  • BPP performs identity resolution to unify users across sources.
  • Each user receives a unique Bytek ID, surfaced as a bpp_user_id column.
  • Normalization and hashing keep matching privacy-safe and deterministic.
  • Identifiers are mapped in the Data Source Manager UI — by type, PII flag, and hashing — not by column name.