Social Scientist
Knowli Data Science, Tallahassee, FL
Florida State University, Tallahassee, FL
Code and Script Samples
Below are scripts I've developed that include data cleaning procedures and parsimonious methodologies that different types of research questions.

Quasi-Experimental Analyses of the Florida Temporary Housing Pilot
The script linked below uses coarsened exact matching (CEM) to conduct quasi-experimental analyses on the effects of a housing pilot on Medicaid recipients experiencing housing insecurity.
​
Using administrative claims data and death records, the R script identifies potential confounders, measures their association with selection into the housing pilot, and determines ideal matches between the treatment and reference population. Most selection variables are matched exactly, but continuous variables are binned to identify similar levels of utilization across populations.
​
A match rate of 77% leads to analyses that assess treatment effects on outcomes such as ED visits, hospitalizations, and mortality.


Measuring Age-Adjusted Death and Causes of Death among Florida's Medicaid and Statewide Populations
Left-hand images are illustrative, do not reflect real data estimates
The scripts below leverage Medicaid eligibility data, death records, and state/county population estimates to compare causes of death and age-adjusted death rates between the general population and Florida Medicaid recipients, from 2012-2023.
​
These analyses requires several steps. Script 1 (T-SQL) identifies deceased Florida residents who had 3+ months of Medicaid Eligibility within 12 months of their deaths, using social security numbers. These death records are also joined with cause of death category codes to examine causes of death for the general population and Medicaid recipients, by county, year, age, sex, race/ethnicity, and educational attainment. Script 2 (T-SQL) develops annual Medicaid population estimates at the state and county level.
Script 3 (R) uses death records, results from Script 2, and population estimates to calculate age-adjusted death rates for Medicaid recipients and the general population, respectively. The general population is measured using American community survey estimates (5-year) and age-adjustments use 2000 Census population estimates.
Wrangling Qualitative Data to Assess Key Barriers to the Florida Temporary Housing Pilot
This script locates, cleans, and combines case management files to prepare them for content analysis. The script first installs and loads required packages. Next, it locates and stores folder and file names in order to iteratively download and include each case management update file to one data frame. After renaming columns, the script modifies date fields and then aggregates data of interest. This final step consolidates repetitive comments to facilitate a quicker analysis of qualitative data. Distinct comments and their counts are exported to an Excel file.


Wrangling Florida State and County Population Estimates
This script leverages Tidycensus to pull state and county population estimates from the Census Bureau. The linked script below selects American Community Survey estimates from Florida and specific counties within. In it, I show how to identify variables of interest, consolidate desired variables into vectors that can be fed into tidycensus snippets, and wrangle results into accessible dataframes for applied analyses.