- Career Center Home
- Search Jobs
- Part-Time Research Data Scientist
Description
We’re hiring a part-time Research Data Scientist to lead end-to-end preparation of complex, large-scale health datasets for peer-reviewed publication. This role centers on cleaning, harmonizing, and structuring messy, multi-source datasets, followed by advanced statistical analysis and machine learning to generate publishable insights.
You’ll work with survey, observational, and real-world health data, building reproducible analytical workflows that meet academic research standards. This role is best suited for a PhD-trained data scientist or quantitative researcher with deep experience in machine learning, advanced statistics, and real-world data analysis.
Key Responsibilities
Data Cleaning & Harmonization
Clean, normalize, and integrate messy datasets from multiple sources (e.g., survey data from longitudinal studies)
Resolve inconsistencies and schema mismatches across datasets
Design scalable approaches to dataset harmonization for cross-study comparability
Data Pipeline Development
Build and maintain reproducible data processing workflows for large-scale datasets
Structure datasets for downstream statistical modeling and publication-ready outputs
Implement version-controlled workflows for data processing and analysis
Statistical Analysis & Machine Learning
Apply advanced statistical methods (e.g., mixed-effects models, causal inference, longitudinal modeling)
Develop, validate, and interpret machine learning models for large-scale observational data as needed
Ensure methodological rigor aligned with peer-reviewed research standards
Research Collaboration
Partner with researchers to refine hypotheses, define analytic strategies, and interpret findings
Translate complex analyses into clear, defensible results for academic publication
Reproducibility & Publication Support
Develop reproducible codebases and documentation (e.g., notebooks, pipelines)
Prepare datasets, figures, and statistical outputs for manuscripts, abstracts, and reports
Contribute to methodological transparency and auditability of analyses
Technical publication-ready writing ability required (e.g., writing up Results and Methods sections for publication)
Requirements
Qualifications
PhD (preferred) in Data Science, Statistics, Biostatistics, Epidemiology, Computer Science, Experimental Psychology or a related quantitative field
3–5+ years experience working with large, complex datasets in research, healthcare, or applied data science
Strong expertise in data cleaning, preprocessing, and dataset harmonization at scale
Advanced proficiency in Python or R (e.g., pandas, tidyverse, scikit-learn, statsmodels) or related software/programming experience
Deep experience with machine learning and advanced statistical methods
Strong foundation in reproducible research practices
Ability to communicate technical findings clearly to interdisciplinary teams and collaborate with team members to produce high quality publications
Preferred
Prior experience preparing analyses for peer-reviewed publication
Familiarity with survey data (Qualtrics, REDCap) and/or healthcare data standards (FHIR)
Background in public health, epidemiology, or biostatistics
Experience with causal inference, longitudinal analysis, or real-world evidence studies
Experience working with messy, real-world observational datasets across multiple sources
Familiarity with cloud or distributed data tools (AWS, GCP, or Spark)
Background or familiarity in cannabinoid research