Sabina’s Stats Corner: Improving Statistical Inference of a Self-Selected Sample

Special Notes:

Listen to our L&L lectures online: WHRI Lunch & Learn Series – Women’s Health Research Institute

Visit our Stats corner in the e-blast for previously published tips on data management and analysis: E-Blast Archive – Women’s Health Research Institute (


When a sample is not selected based on probability and individuals select themselves into a group, self-selection bias arises. This is a known limitation of self-selection sampling, which can be defined as xxxxx. The main issues with non-probability sampling are:

  1. The samples do not represent the target population.
  2. There is an element of unknown sample inclusion or participation mechanisms.

Adjustment Approaches

A variety of adjustment approaches can correct for self-selection bias in nonrandom samples:

  • Inverse Propensity-Score Weighting (IPSW): Utilizes a probability sample as a reference to estimate pseudo-weights for the non-probability sample.
  • Propensity-Score Adjustment by Sub-Classification (PSAS): Similar to IPSW, using sub-classification for adjustment.

Newer machine learning methods, such as kernel weighting, have been proposed recently to improve IPSW and PSAS.

Application in HER-BC Study

In a real-world setting for the HER-BC study, we plan to evaluate the abovementioned methods as sensitivity analyses and perform appropriate re-weighting of the self-selected sample to improve inference.

Data Sources and Re-Weighting

We propose using the following data sources for re-weighting:

  • Statistics Canada Census 2021: For re-weighting selected demographic characteristics.
  • Canadian Labor Force Survey: For re-weighting labor-related data/characteristics.


If issues arise, the minimum available GIS units will be used for matching as individual data linkage is not possible. Additionally, the British Columbia Index of Multiple Deprivation (BCIMD) will be used for model adjustment. BCIMD provides area-based measures of multiple deprivation to examine social inequities across 218 Community Health Service Areas (CHSAs) in British Columbia and comprises four dimensions:

  1. Residential instability
  2. Economic dependency
  3. Ethno-cultural composition
  4. Situational vulnerability

Good luck with your Statistics adventure!

Contact Sabina for statistics help or questions here: