Identifiers & Hashing

Identifiers are the foundation of BPP. They let the platform resolve user identities across multiple data sources and unify events and attributes into a single customer profile — the Bytek ID.

This page is the practical reference for which identifiers to provide, how BPP normalizes and hashes PII, and how identifiers are mapped. For the conceptual walkthrough of how matching and stitching work, see User Reconciliation.

How identifiers map to the Bytek ID

When you map your dataset in the Data Source Manager, each column that represents an identifier is declared with:

Identifier type — a canonical name (e.g. email, cookie_id, crm_contact_id).
PII flag — whether the column holds raw personal data.
Requires hashing — whether BPP should hash it on ingestion.

BPP matches identifiers by their type and value, not by column name. If the same identifier type appears in multiple tables, all of them are linked to the same Bytek ID — even when the underlying columns are named differently.

Example: a column named hashed_email in users_main and a column named hem in events_web can both be declared as identifier type email. BPP treats them as equivalent and joins users across the two tables.

table_name	field_name	identifier_type	is_pii
users_main	`hashed_email`	`email`	no
events_web	`hem`	`email`	no
events_web	`fp_cookie_id`	`cookie_id`	no

If a single event row contains both hem and fp_cookie_id, both are linked to the same Bytek ID.

:::warning User-level identifiers only Map only user-level identifiers (email, phone, cookie, CRM contact ID). Entity-level IDs — subscription_id, account_id, order_id, crm_account_id — must not be declared as user identifiers. :::

Automatic PII normalization and hashing

For identifiers flagged as PII that requires hashing, BPP normalizes and hashes the value during ingestion, so you can provide raw PII safely. BPP never stores plain-text PII.

If a value arrives already hashed at source, BPP detects this and skips re-hashing.

General process

Trim leading and trailing whitespace.
Lowercase all text.
Normalize provider-specific quirks (e.g. Gmail rules).
Hash with SHA-256 (hex, lowercase).

Per-type rules

Email

Trim, lowercase.
For @gmail.com / @googlemail.com: remove dots (.) from the local part and drop +tag suffixes (e.g. John.Doe+promo@gmail.com → johndoe@gmail.com).
SHA-256 after normalization.
You may provide pre-hashed emails (SHA-256) or raw emails — BPP normalizes and hashes automatically.

Phone number

Remove spaces, parentheses, and dashes.
Convert to E.164 format (e.g. +14155552671).
SHA-256 after normalization.

Name + surname

Lowercase, trim, remove diacritics.
Concatenate fields (e.g. john + doe) and apply SHA-256.
Typically used only for offline match or identity enrichment.

Postal address

Lowercase, remove punctuation, standardize abbreviations (St → Street).
Concatenate into a single string and apply SHA-256.

Device identifiers (non-PII) — GA Client ID, GA4 user pseudo-ID, first-party cookie, mobile advertising ID (IDFA, GAID)

Already anonymized; no hashing required. Stored as-is for behavioural analysis and cross-device linking.

Common identifier types

Identifier type	Example field names	Description	PII	Hashed
`hashed_email`	`hem`, `hashed_email`, `email_hash`	SHA-256 hashed user email	No	Yes
`email`	`email`, `user_email`	Raw user email (BPP will hash)	Yes	No
`hashed_phone`	`hphone`, `phone_hash`	SHA-256 hashed phone number	No	Yes
`phone`	`phone`, `mobile_number`	Raw phone number (BPP will hash)	Yes	No
`cookie_id`	`fp_cookie_id`, `ga_client_id`	First-party cookie / GA identifier	No	No
`device_id`	`idfa`, `gaid`, `device_id`	Mobile device / app identifier	No	No
`crm_contact_id`	`crm_contact_id`, `hubspot_vid`	CRM contact-level identifier	No	No
`domain_id`	`domain_id`	Domain-level ID for web identity	No	No

The Bytek ID

The Bytek ID is the system-generated, anonymized user key created during identity resolution.

Each unique person receives one Bytek ID.
All their sub-identifiers (email, phone, cookie, CRM contact ID, …) link to it.
When new identifiers appear, the identity graph is updated and merged on the next daily run.
BPP writes the resolved key back to your warehouse as a bpp_user_id column on the enriched copy of each table.

This unified key enables consistent joins, aggregations, and model training across time, channels, and systems. See User Reconciliation for merge rules and coverage metrics.

Best practices

Include at least one stable user identifier in every user and event table.
Use consistent identifier types across datasets — the same concept must share the same identifier type everywhere.
Ensure user identifiers form the primary key of the user table: at least one always populated, no duplicates.
Provide multiple identifiers where possible to maximize match rates.
Never tag entity-level IDs (subscription_id, order_id, account_id) as user identifiers.
Hash PII consistently (correct casing, whitespace, Gmail normalization, E.164 phones) so hashes match Google and Meta — or let BPP hash it by flagging the column as PII.

Summary

BPP performs identity resolution to unify users across sources.
Each user receives a unique Bytek ID, surfaced as a bpp_user_id column.
Normalization and hashing keep matching privacy-safe and deterministic.
Identifiers are mapped in the Data Source Manager UI — by type, PII flag, and hashing — not by column name.

How identifiers map to the Bytek ID​

Automatic PII normalization and hashing​

General process​

Per-type rules​

Common identifier types​

The Bytek ID​

Best practices​

Summary​