1. Health
Download icon

SNFsim: A Discrete Event Simulator for Decision Support in Skilled Nursing Facilities

  1. Caroline Strickland  Is a corresponding author
  2. Brittin Wagner
  3. Stanley Wang
  4. Daniel J Lizotte
  1. Department of Computer Science, Canada
  2. PointClickCare, Canada
  3. Department of Epidemiology and Biostatistics, Canada
Research article
Cite this article as: C. Strickland, B. Wagner, S. Wang, D. J Lizotte; 2026; SNFsim: A Discrete Event Simulator for Decision Support in Skilled Nursing Facilities; International Journal of Microsimulation; 19(1); 79-112. doi: 10.34196/ijm.00337

Abstract

We introduce SNFsim, an open-source discrete-event simulator for developing and evaluating reinforcement learning (RL) methods for multi-dimensional sequential decision support in Skilled Nursing Facilities (SNFs). SNFs play a vital role in the United States healthcare system, delivering specialized care to individuals with ongoing medical needs. Decision-making within SNFs is often complex due to their fast-paced and stochastic nature. SNFsim provides a modular and extendable simulation of major decision-making processes within SNFs, capturing many of the complexities and uncertainties existing in healthcare environments while still being flexible enough to allow for easy customization. Its potential uses are two-fold: First, as a test bed for the development and comparison of RL algorithms, and second, as the basis of a decision-support system that can be tailored to individual SNFs.

1 Introduction

Reinforcement learning (RL) methods have been central to the development of data-driven decision-making in domains ranging from autonomous vehicle navigation (Stafylopatis and Blekas, 1998) to finance (Hambly et al., 2023) to video games (Mnih et al., 2013). In many domains, simulation environments that replicate real-world dynamics have been indispensable in allowing agents to learn policies without the risks often associated with direct real-world interaction. Widely used examples range from simpler systems like the cart-pole environment (Nagendra et al., 2017) to more complex platforms such as chess and shogi (Silver et al., 2017). However, there remains a notable gap in simulators designed specifically to support the exploration and development of RL methods tailored for multi-dimensional and multi-objective decision support, especially within the healthcare domain.

To help bridge this gap, we introduce the Skilled Nursing Facility Simulator (SNFsim), an agent-based discrete-event simulator that combines real-world health data with subject-area expertise to simulate decision-making dynamics in post-acute care facilities. Like microsimulation models, which are used to inform policy by predicting population-level outcomes under different scenarios (Rutter et al., 2019; Spielauer, 2011), SNFsim models individual patients with heterogeneous characteristics to predict facility-level outcomes under different operational policies. However, it incorporates elements more commonly found in discrete-event and agent-based models (Karnon et al., 2012; Marshall et al., 2015) such as resource constraints and feedback loops where facility-level decisions directly influence individual patient outcomes. This approach captures the dynamics and reward structures necessary to develop and test RL methodology capable of navigating multi-dimensional and multi-objective decision-making in resource-constrained healthcare settings, thus expanding the applicability of RL to real-world health services.

1. 1 Research Objectives and Contributions

Our primary goal is to provide a simulator and associated environment designed to support the development of RL methods for creating multi-objective decision support tools within SNFs. This environment, which is effectively an interface layer over the underlying simulator, should contain a sufficiently detailed state and action space, and capture multiple real-world outcomes of decision-making within SNFs.

Our main objectives for this work are to:

  1. Provide a scalable open-source simulator calibrated using real-world datasets that is both modular and easily modified to fit user needs.

  2. Develop a Gymnasium-compatible RL environment that exposes the simulators underlying dynamics through a structured interface.

  3. Demonstrate the use of RL agents in learning policies for decision support within SNFs using the aforementioned environment.

By sharing our experiences in tackling the challenges of mathematizing decision-making within a system as intricate as SNFs, we aim to provide valuable insights for future efforts in creating simulation-based environments, particularly those designed for complex, multi-objective, and multi-dimensional decision-making processes in healthcare. Through the quantification of SNF operations, our work displays the complexities of healthcare management from a computational perspective.

1. 2 Related Work

Healthcare simulators have long played a crucial role in various aspects of healthcare delivery, training, and decision support. In the late 1960s and early 1970s, computational simulation modeling had emerged as a widely used technique for addressing hospital scheduling issues in the United States (Robinson et al., 1968; Kolesar, 1970; Rising et al., 1973). Following refinement of modeling techniques in the 1990s, the number of papers published describing simulation modeling in healthcare has grown substantially, extending to a wider range of settings and problems (Günal and Pidd, 2010). In using simulators, we reduce the need for direct interaction with the corresponding real-world systems. This allows for faster, safer, more cost-effective solutions to complex problems. In a 1978 survey of computer simulation in healthcare by England and Roberts (England and Roberts, 1978), the authors examine reports of 92 simulation models, offering insight into the extensive history of simulation within the healthcare domain.

Developing a simulator that models every aspect of a healthcare system is often impractical and infeasible (Pidd, 1997). Thus, many researchers assume a certain scope and level of abstraction when modeling these complex systems. Many simulators focus on a patient-level system existing within healthcare settings, for example, through Ahangers work in the computation, modeling, and simulation of the HIV-AIDS epidemic (Ahangar, 2022). In this work, a general mathematical model for the HIV-AIDS epidemics is introduced, specifically integrating vaccination strategies to understand the viruss transmission dynamics in the population. Similarly, the work of Man et al. introduces a Type 1 diabetes simulator based on insights into insulin and glucose kinetics during hypoglycemia, accounting for nonlinear insulin-dependent utilization. This simulator provides a reliable framework for Insilico trials, testing glucose sensors, and validating closed-loop control systems (Man et al., 2014).

Simulators have also been used to develop and evaluate RL algorithms for health by providing a controlled environment to train, test, and refine decision-making strategies. The work of Jalalimanesh et al. introduces an agent-based simulator of vascular tumor growth based on collected biological data (Jalalimanesh et al., 2017). This research demonstrates the power of a simulation-based approach combined with RL for simulating and optimizing radiotherapy. Petersen et al. use biological simulation and deep RL to develop and evaluate personalized multi-cytokine therapies for the treatment of sepsis (Petersen et al., 2018).

Simulators that only consider facility-level decisions have also been developed. Sun et al. introduce a simulator that considers length-of-stay and discharge outcomes among nursing home residents with the goal of identifying promising staffing strategies (Sun et al., 2023). Butler et al. use discrete-event simulation for healthcare asset allocation through patient misplacement, where patients are scheduled and assigned to the best alternative unit within a hospital due to a shortage of beds in the preferred area (Butler et al., 1992).

To the best of our knowledge, there has been no identified or documented simulator that models the consequences of accepting or denying referrals within SNFs while considering their relationship to facility-level decisions and their impact on multiple interrelated outcomes. Moreover, we are not aware of any simulators in the post-acute care domain that explore the interactions between multiple concurrently existing subprocesses and their reciprocal effects. SNFsim not only concentrates on the repercussions of referral decisions on various outcomes, but also examines the interdependence between referral decisions and staffing decisions, shedding light on their mutual influence and supporting the development of RL algorithms with multiple objectives in multi-dimensional action spaces.

2 Decision-Making in Skilled Nursing Facilities

In the following section, we provide an overview of decisions, states, and outcomes relevant to decision-making in SNFs, and we identify the components of these that are modeled by SNFsim.

2. 1 Decisions

Decision-making in SNFs includes both patient-level decisions as well as facility-level decisions. Within these categories, SNFsim models patient referrals and staffing decisions, respectively. Figure 1 presents an overview of the referral intake flow and staffing decisions process within SNFs.

Skilled nursing intake flow and staffing process.

The red circled numbers denote the temporal ordering of the referral process, starting at the hospital and ending with a patient accepting an offer from a nursing facility. The green circled numbers denote the temporal ordering of a scheduling coordinator making staffing decisions. It is important to note that the intake flow and staffing process occur at different times.

2. 2 Patient-Level Decisions

The day-to-day activities in SNFs include many patient-level decisions, including the formulation of individualized treatment and nutrition plans based on patient status, discharge decisions, and advanced care planning. Patient-level decisions require a nuanced understanding of the patients health status and goals to provide comprehensive care.

SNFsim models the daily processing of individual patient referrals within SNFs. The referral intake process most often begins at the hospital via a pre-discharge referral, wherein the patients medical and financial details are communicated to a discharge planner within the hospital, either electronically or through traditional methods like phone or fax. This information is forwarded by the discharge planner to admissions coordinators at receiving SNFs, who are responsible for evaluating whether or not they can accommodate the care needs of the patient. Should the admissions coordinator determine that the facility is unable to offer appropriate care, the referral is declined and no offer is made. Otherwise, if the facility believes that it is able to meet the patients care requirements, the admissions coordinator extends an offer to the patient. Given that hospitals often dispatch multiple referrals for a single patient to various SNFs either concurrently or sequentially, the efficiency and promptness of the admissions coordinators response with an offer are crucial.

2. 3 Facility-Level Decisions

Facility-level decisions within SNFs often include things like financial decisions and policy implementation decisions. Facility-level decisions are responsible for shaping the strategic direction and overall quality of care in SNFs. These decisions require a more broad perspective that balances immediate needs with long-term objectives, ensuring that the facility remains a sustainable environment for long-term patient care.

SNFsim models staffing decisions within SNFs, which involves a careful evaluation of the needs of the current case-mix, budget constraints, and available nursing resources to ensure the delivery of high-quality care.1 This process begins with assessing the acuity levels and specific care requirements of the resident case-mix, which dictate the necessary staff mix of nurses, certified nursing assistants (CNAs), and therapists. Staff scheduling is then optimized by managers to cover all shifts, taking into account fluctuations in the case-mix.

2. 4 State

We conceptualize the state of a SNF as all of the information required to make patient- and facility-level decisions described above. This includes information about the current case-mix, like demographics and care needs, as well as staff resources, financial metrics, and regulatory standing. Although each aspect of state holds its own significance, they are interconnected and together significantly influence the facilitys overall effectiveness at making optimal patient- and facility-level decisions.

SNFsim maintains state information about the current case-mix (including detailed health information about each resident), facility-level financial information, and about current staffing levels. It also models transitions (moving from one state to another) in response to referral and staffing decisions.

2. 5 Outcomes

In the context of SNFs, an outcome is a measure of success or failure that reflects the effectiveness of the decisions being made within a facility. Outcomes in SNFs are important for understanding the overall performance of a facility, including clinical outcomes, financial outcomes, and operational outcomes. SNFsim supports the evaluation of four outcomes corresponding to each of these domains: rehospitalization, occupancy, patient reimbursement, and nursing costs.2

2. 5. 1 Rehospitalization

Rehospitalization is the process wherein patients, initially discharged from a hospital to a care facility for recovery, must return to the hospital due to complications or insufficient recovery progress. Rehospitalization events in a SNF can disrupt the stability of patients lives and elevate the risk of medical errors associated with care coordination (Mor et al., 2010). Treatment in SNFs with historically low rehospitalization rates were found to causally reduce a [future] patients likelihood of rehospitalization (Rahman et al., 2016); thus, by minimizing rehospitalization, SNFs can improve patient outcomes and provide patient-centered care. Patient rehospitalization also represents a significant cost burden for SNFs, as it not only entails direct medical expenses but also impacts the performance metrics of the SNF. Thus, minimizing patient rehospitalizations ultimately leads to better patient care and improved operational functioning.

2. 5. 2 Occupancy

The occupancy of a SNF refers to the percentage of available beds that are currently occupied by patients, indicating the facilitys usage rate and capacity utilization. In recent years, occupancy rates within SNFs in the United States have reached record lows (Laes-Kushner, 2018). Reduced occupancy introduces a variety of issues for SNFs, one of the most notable being that families seeking care for their loved ones often consider the occupancy rates of a facility as an indicator of its desirability and quality of care; low occupancy rates can raise concerns about the viability of a facility. Conversely, SNFs that are at or near maximum capacity face repercussions such as strained resources and potential impact on quality of care. As a general guideline, SNFs should strive to maintain an occupancy rate above 95.8% if possible, as facilities who do so are 2.09 times as likely to have greater efficiency scores (Ozcan et al., 1998). Higher (but not maximal) occupancies generally allow for the ability to accommodate slight fluctuations in demand, while still ensuring that the facility remains financially viable and able to deliver consistent high-quality care to its patients.

2. 5. 3 Patient Reimbursement

Each day, there exists some number of inpatients within any given SNF. These patients, typically through insurance, are responsible for reimbursing the facility for their care. The amount per day that a patient pays to the facility is directly linked to the complexity of their care needs, meaning that higher acuity implies a higher daily reimbursement amount from patient to facility.

In the context of SNFs, net revenue encompasses the total income earned from services provided to patients, including accommodation, medical care, and rehabilitation services. A portion of this net revenue is allocated towards operational expenses, including salaries for employees who deliver care, ensuring the facility can maintain a high standard of service. As the healthcare landscape continues to evolve, adequate net revenue is crucial for the long-term viability of SNFs. Simply. the more patients with high acuity (and therefore higher reimbursement rates) that a SNF accepts, the higher their net revenue becomes.

2. 5. 4 Nursing Costs

As mentioned previously, a portion of the net revenue of a SNF is allocated towards operational expenses. A major component of these expenses is compensating nursing staff, who are a central to delivering around-the-clock patient care. The less nurses that a SNF employs, the less they are required to pay out, thus increasing their net revenue. It is also true, however, that this would likely decrease the quality of care to patients within the facility, potentially increasing adverse outcomes like patient rehospitalizations.

3 The SNFsim Simulator

Developing a simulation model requires both formalizing what the model must encapsulate, as well as what can reasonably be excluded (Tracy et al., 2018). In the following, we describe how SNFsim is designed to capture the essential dynamics of patient flow, staffing patterns, and the effects of varying levels of patient care within a SNF setting. We begin by describing the data resources used to calibrate SNFsim and a simplified description of its transition dynamics, followed by a detailed description of how referrals are created and how the SNFs case-mix, staffing, and net revenue are updated as simulation progresses.

3. 1 Data Sources

We used two distinct primary datasets for calibrating SNFsim, each serving a unique purpose. The first dataset, which we will refer to as the referral dataset, is used to allow SNFsim to generate a realistic distribution of referrals. The second dataset, referred to as the classifier dataset, is used to construct classifiers to decide whether or not a patient should be readmitted to the hospital on any given day during their stay.

Our referral dataset contains 643,770 referral records from 10,856 unique SNFs within 41 states in the United States between January 2020 and November 2022. Each data point captures detailed referral information including Length of Stay (LoS), primary diagnosis information, patient care needs, and insurance information.

Our classifier dataset contains 7948 observations from 1225 unique facilities between January 2020 and November 2022. Each data point captures information regarding the details of a patients stay within a SNF. This includes LoS, mean number of certified nursing aide (CNA) hours per day during stay, patient care needs, primary diagnosis information, rehospitalization status, gender, and age.

3. 2 Design Philosophy

To accommodate a wide range of use cases, both customizability and simplicity were prioritized to ensure that the open-source codebase is easy to understand for as many users as possible. To achieve this, the Python programming language was chosen, and the RL environment API from OpenAI Gymnasium was used to implement the wrapper, reflecting Gymnasiums popularity within RL communities. Figure 2 provides a simple example of how to use the Gymnasium environment.

Code snippet demonstrating an example setup for training an RL agentusing SNFsim.

3. 3 Overview of SNFsim Functionality

The transitions of SNFsim, broadly outlined in Figure 3, capture core operational dynamics and decision-making processes intrinsic to managing a SNF. This figure illustrates how various elements of action and state interact to produce the simulated dynamics of a SNF.

Simplified SNFsim simulation step.

We assume preset configuration values. Algorithm 1 presents an equivalent but more granular presentation of a simulation step.

In the remainder of this section, we provide a closer look at the individual components, including the models used for predicting patient rehospitalizations, the generation of referrals at each timestep, and the formulation of outcomes. By providing these details, we aim to provide RL researchers with a comprehensive understanding of the intricacies involved in simulating the operations of a SNF and the potential of RL approaches to enhance decision-making within these environments. At the same time, we aim to provide health services researchers and providers with a clear understanding of what data and expert knowledge were used to specify the simulator.

3. 4 Referral Representation and Sampling

When generating a new referral, we are interested in sampling relevant patient features available at the point of referral. Typically, admissions coordinators receive the referral details outlined in Figure 4 when a referral is being presented to a SNF. When generating referrals within SNFsim, we generate a subset of these referral features that actively contribute to the decision-making process for determining the acceptance or denial of a referral. Namely, we focus on a collection of clinical information and functional status, which serve as strong indicators of both the current health status and the financial implications associated with a referral.

Patient information typically available to the SNF at the point of referral.

We developed a synthetic dataset to be used within SNFsim. This approach allows us to generate realistic referral profiles by analyzing the statistical relationships within real-world data while ensuring complete patient privacy. By preserving the statistical structure of the original conditional distributions between features, we were able to generate representative synthetic patients that mirror the complexity and diversity of actual SNF populations without exposing personal health information.

To generate the synthetic dataset, we made use of the Synthetic Data Value (SDV) framework (Patki et al., 2016). Our original dataset, the referral dataset discussed in Section 3.1, was comprised of two datasets without patient-level identifiers. The first captured admission records containing ICD-10-CM codes, LoS, and insurance type. The second captured patient demographics containing age, gender, ICD-10-CM codes, and Payment Driven Patient Model (PDPM) classification codes.

To construct a unified dataset while preserving conditional distributions, we merged these datasets through diagnosis-based conditional sampling. For each admission record with a given IDC-10-CM code, we randomly sampled a patient from the demographics dataset with the same diagnosis. This approach preserves the empirical conditional distribution P(demographics|diagnosis) observed in the original data, ensuring that patient characteristics associated with specific diagnoses remain realistic.

We then applied a Gaussian Copula Synthesizer to the merged dataset. This method separates marginal distributions from dependence structure (Sklar, 1959), allowing for the independent modeling of each variables distribution while preserving multivariate relationships. The copula first transforms each variable to its empirical cumulative distribution function, applies inverse normal transformations, estimates the correlation matrix in the resulting multivariate normal space, and finally generates synthetic data by sampling from this multivariate normal distribution and applying inverse transformations. We configured the synthesizer with gamma distributions for LoS to capture right-skewed patterns, and beta distributions for age to respect bounded ranges from 18-120 years. Figure 5 shows age distribution by gender in the original and synthetic dataset (a) and LoS distribution by age group in the original and synthetic dataset (b).

Bivariate relationships between a subset of columns in the generated dataset versus the original dataset.

(a) Age distribution by gender. (b) LoS distribution by age group.

We also implemented post-generation constraints. Since PDPM codes are diagnosis-dependent, we verified that each synthetic ICD-10-CM and PDPM code pairing had been observed in the original dataset. Invalid pairs (those never observed together) were corrected by resampling PDPM codes from the empirically valid set for that diagnosis.

Individual variable distributions were well-preserved, with an SDV column shapes score of 87.7%, indicating strong marginal distribution fidelity (detailed metrics in Table 1). Additionally, bivariate relationships across all 15 variable pairs showed good preservation with an SDV column pair trends score of 73.6%. Empirical validation of the trivariate distribution [age, LoS, gender] confirmed good multivariate fidelity with a variation distance of 0.173. However, complex non-linear interactions and categorical-specific patterns beyond pairwise correlations may not be fully captured. Further, clinical validity constraints were perfectly satisfied, with all of the unique 82,413 unique ICD-10-CM and PDPM code combinations in the synthetic dataset verified as observed in the original dataset, ensuring that all generated patient profiles represent clinically plausible (diagonsis, PDPM) pairings.

Table 1
Marginal similarity between original and synthetic data including the metrics mean diff (percentage difference in means), KDE Sim (kernel density overlap), TV dist (total variation distance), JS sim (Jensen-Shannon similarity), and cov (category coverage).
VariableMean Diff.KDE Sim.TV Dist.JS Sim.Cov.
Continuous:
Age0.02%88.1%
Length of Stay13.8%92.4%
Categorical:
ICD-10-CM0.19479.1%78.7%
Insurance Type0.15083.6%100%
PDPM Code0.13883.4%98.2%
Gender0.08087.2%100%

The synthetic dataset retained 92% coverage of original (diagonsis, PDPM) combinations while preserving 79% of diagnostic diversity (4,504 of 5,726 unique ICD-10-CM codes). The 1,222 excluded codes were overwhelmingly rare diagnoses: 99.4% (1,215 codes) occurred fewer than 10 times in the original data, collectively representing only 1.8% of total admissions. Notably, all top 500 most common diagnoses were fully retained, and these codes account for 85.7% of all admissions. This improves privacy protection by further preventing any reidentification through rare diagnosis combinations while still maintaining good coverage of clinically important scenarios.

3. 4. 1 Patient Driven Payment Model

A core component in generating referral information lies in PDPM codes, which serve as a strong indicator of patient care complexity within SNFsim. We use PDPM codes in our synthetic data generation because they effectively encapsulate multiple dimensions of patient care needs and directly impact facility reimbursement, making them essential for realistic decision-making simulation. This choice is grounded in the PDPMs design to reflect the acuity and resource needs of the patient more accurately than previous models like the Resource Utilization Group (RUG-IV), which was therapy-driven. The PDPM adjusts payments based on the patients condition and care needs, rather than the volume of services provided (Centers for Medicare and Medicaid Services), making it a more patient-centered approach to reimbursement. A Health Insurance Prospective Payment System (HIPPS) PDPM code is comprised of five primary components: physical and occupational therapy (PT/OT), speech-language pathology (SLP), nursing, non-therapy ancillary (NTA) services, and an assessment type. Each of these five components has a corresponding Case-Mix Group (CMG) required for deciphering the corresponding character in the final code, as well as a Case-Mix Index (CMI) which is responsible for determining the daily reimbursement amount. Figure A1 demonstrates the process of constructing a HIPPS PDPM code based on calculated patient care needs from each of the aforementioned components. The CMG of each of the five primary components is converted to a code value using Tables A1 and A3. The CMI and CMG can be calculated from Table A4 (PT/OT component), Table A5 (SLP comopnent), Table A6 (nursing component), Table A11 (NTA component), and Table A3 (5-day PPS MDS component). Once the CMG and CMI for each component is calculated, the 5-digit HIPPS PDPM code can be constructed. The reimbursement amount can then be calculated using Equation 1, which follows the official PDPM payment structure published by the Centers for Medicare and Medicaid Services (CMS), as defined in the PDPM Technical Report (Acumen, 2018).

(1) Daily Payment=((PT Base Rate×PT/OT CMI)+(SLP Base Rate×SLP CMI)+(Nursing Base Rate×Nursing CMI)+(NTA Base Rate×NTA CMI)+Non-Case Mix Component)×Adjustment Factor

It is important to note that the adjustment factor is calculated using Table A10, and dictates that the longer a patient is an inpatient within the facility, the less reimbursement the facility receives. Additionally, the base rate is calculated for each of PT/OT, SLP, nursing, and NTA using either Table A8 or Table A9. Which table is used depends on whether the facility is located in a rural or urban area geographically. This is also where the non-case mix component is collected.

We developed an open-source module, PyPDPM, for calculating the precise reimbursement value based on a patients PDPM code and the number of days that patient has been a resident. For each timestep that a patient resides in the SNF, their calculated reimbursement amount is added to the facilitys overall net revenue. While real-world payment generally occurs monthly through a consolidated billing process (Williamson, 2024), SNF administrators know the daily per-diem rates and incorporate this information into admission and discharge decisions. Our per-timestep reward structure therefore captures the information available to decision-makers while also providing the immediate feedback necessary for effective RL training, avoiding the sparse rewards and difficult credit assignment that would result from monthly billing cycles.

3. 5 Facility Representation and Dynamics

SNFsim has a designated bed capacity, a limit on the number of staff (for each CNA option) available to attend to inpatients, and a list of n inpatients together with their relevant information at any given timestep.

The referral intake flow and staffing decision process within SNFsim are shown in Figure 6. It maintains the same sequential layout as that presented in Figure 1, but has abstracted away some of the real-world complexities and detailed procedural steps. In particular, by: assuming all generated referrals have valid insurance eligibility and authorization; assuming that if a referral is accepted then the patient accepts the offer; and removing the need for a referral portal by not keeping a log of unique patient IDs and their intake history.

SNF intake flow and joint staffing decision process.

The red circled numbers denote the temporal ordering of a referral, beginning at the hospital and concluding with a patient accepting an offer from a nursing facility. This is abstracted, and is assumed rather than implemented within SNFsim. The blue circled numbers denote the temporal ordering of a referral within SNFsim, beginning with an empirically sampled referral and concluding with the SNF accepting the referral. The green circled numbers represent the temporal ordering of a simplified staffing decision process within SNFsim, beginning with assessment of the care needs of current inpatients and concluding with increasing, decreasing, or not altering current staffing hours.

The steps presented in Algorithm 1 comprise a single timestep of SNF sim at a high level, which we call a day.

Algorithm 1 Temporal Day in SNFsim (broad visual representation in Figure 3).
1: Input: State St={Pt,Ot,Ht,Ct,Rt}, Action at={At,Δt}
2: Output: Next state St+1={Pt+1,Ot+1,Ht+1,Ct+1,Rt+1}
3: where:Pt: patients, Ot: occupancy, Ht: nursing hours, Ct: finances, Rt: referrals
4: Initialize daily metrics: Revenuet0, Rehospitalizationst0, Costt0
5: // Update staffing levels based on action
6: for each staff type s{FullTime,PRN,Agency} do
7:    Stafftsmin(Stafft1s+Δts,%MaxStaffs) {Enforce staffing constraints}
8:    CosttCostt+i=13Staffts[i]HourssRates
9: end for
10: // Process referrals
11: Pnew Accept referrals based on At (0 or 1 for each referral in Rt) and bed availability Bmax
12: // Calculate nursing hours per patient
13: ht Nursing hours per patient per shift
14: αt Staffing modifier based on ratio of actual to expected care
15: // Update patients and compute rehospitalizations
16: Premove
17: for each patient pPt do
18:    Update patient day count, accumulate revenue, and record nursing hours received
19:    Calculate shift-specific understaffing penalties ϵ
20:    p.r Rehospitalization risk based on patient data, staffing, and care quality
21:    if p.r>threshold then
22:      Mark patient for rehospitalization and update count
23:    end if
24: end for
25: Update patient population: Pt+1(PtPremove)Pnew
26: Update occupancy:
27:    Ot+1|Pt+1|Bmax {Occupancy as proportion of available beds}
28: Update finances:
29:    Ct+1Ct+RevenuetCostt
30: Set next-day nursing hours:
31:    Ht+1Ht {Staffing decision at time t determines hours for day t+1}
32: Generate new referrals for next day: Rt+1
33: ReturnSt+1={Pt+1,Ot+1,Ht+1,Ct+1,Rt+1}

3. 5. 1 Rehospitalization model

SNFsim is designed in part to reflect the impact of multiple different factors on patient rehospitalization outcomes. To ensure optimal patient care, SNFs must consider both staffing levels and patient characteristics that influence rehospitalization risk. Based on consultation with healthcare experts and discoveries in our previous work (Strickland et al., 2023), we identified several key variables that impact patient rehospitalization.

To quantify how individual factors influence rehospitalization risk, we conducted multivariate logistic regression analyses using patient data from 7,948 observations (588 hospitalizations, 7.4% of the dataset). Table 2 presents four model specifications. Model A isolates patient demographic factors (age and reimbursement) to demonstrate reimbursements expected positive association with readmission (as one would assume that cases with more complex care needs would have a higher likelihood of rehospitalization). As assumed, higher case complexity increases rehospitalization risk (coefficient = 0.188, p<0.001). Model B includes LoS and key therapy indicators with minimal multicollinearity with a Variance Inflation Factor (VIF) < 3. Model C incorporates eight core clinical features, although multicollinearity is present (VIF > 10 for several features). Notably, in this model, the coefficient sign for patient reimbursement (a proxy for care needs) switches from positive (like in model A) to negative. Model D extends to 13 features total by adding categorical operational factors (diagnosis ICD-10-CM chapter, facility, payer type) as well as other relevant variables. These categorical features contribute to prediction but are excluded from the table due to space constraints. Notably, adding these operational features to logistic regression slightly reduces performance (AUC falls from 0.873 in model C to 0.867 in model D), showing that linear models do not benefit from these added complex interactions.

Table 2
Multivariate Logistic Regression Results
Model AModel BModel CModel D
FeatureCoef.PCoef.PCoef.PCoef.P
LoS-3.173***-3.180***-3.193***
meanHours-0.216***-0.215***
reimbursement+0.188***-0.344*-0.329*
age-0.028-0.005-0.005
NPG-0.230***-0.274***-0.293***
SLP+0.077+0.100+0.083
PT_OT+0.033+0.035
NTA_0.448**_0.428**
days_since_start+0.052
gender+0.017
Additional categorical features in Model D (not included):
ICD-10-CM Chapter (***), facility, insurance type
Number of features23813
AUC-ROC0.5500.8670.8730.867
Max VIF14.72.133.336.6
  1. Model D contains full feature set used in Random Forest (RF AUC=0.921 vs. Logistic Regression AUC=0.867). ***p < 0.001, **p < 0.01, *p < 0.05. N=7,948; 588 (7.4%) readmissions.

For optional use in SNFsim, we implemented a Random Forest classifier using all 13 features from Model D. Random Forest is well-suited for this prediction task as it naturally captures non-linear interactions between facility and patient characteristics. Ultimately, while 13-feature logistic regression achieves an AUC of 0.867, Random Forest achieves an AUC-ROC of 0.921 and an AUC-PRC of 0.592 (Figure 7). The higher AUC-ROC in this case indicates better overall ranking of patients by risk, while the AUC-PRC (though modest) represents an 8.0-fold improvement over the baseline rate of 0.074, demonstrating strong predictive performance given the substantial class imbalance typical in rehospitalization datasets and comparable in performance with similar classification models (Pauly et al., 2019; Lou et al., 2025; Chandra et al., 2019).

Random Forest model performance for patient rehospitalization prediction.

ROC curve (AUC = 0.921) shows excellent discrimination between rehospitalized and non-rehospitalized patients. Precision-Recall curve (average precision = 0.592) demonstrates strong performance relative to baseline prevalence of 7.4%. The model substantially outperforms logistic regression (AUC = 0.867) using the same feature set.

Note that in SNFsim, whether or not a patient is rehospitalized in a given timestep is not necessarily deterministic based on model output. The models predicted probability is adjusted by staffing-based multipliers (configurable) and noise and compared to a threshold (readmission_threshold in Table 3). If this threshold is exceeded, the patient is rehospitalized. This adds additional stochasticity and incorporates incentive for balanced nursing hours across shifts.

Table 3
Required SNF Configuration Parameters
ParameterDescriptionDefault Value
total_bedsNumber of beds in the facility100
occupancy_boundsTarget min/max occupancy range75%/90%
nursing_hours_targetTarget min/max nursing hours per patient2.5/3.5
full_time_cnaTotal available full-time nursing assistants20
prn_cnaTotal available pro re nata (as-needed) staff10
agency_cnaTotal available agency-contracted staff15
referral_rateMean number of daily referrals5
min_cna_hoursMandated nursing hours per patient day2.8
fac_stateState where the SNF is locatedNew York
readmission_thresholdThreshold for readmission prediction0.7

3. 5. 2 Tailoring SNFsim

Prior to simulation, users must configure key parameters of the SNF to match set up their preferred environment. Table 3 presents the required configuration parameters along with their default values. These parameters include facility characteristics such as the total number of beds and optimal occupancy targets, staffing resources including the maximum available numbers of full-time, PRN (pro re nata or as needed), and agency CNAs, as well as operational factors like daily referral rates and minimum required nursing hours per patient day.

The selection of the facilitys state location is important for accurately simulating financial aspects of operations, as each state within the United States has different hourly wage rates for nurses (of each subcategory). This selection impacts staffing costs and, consequently, affects the overall budget management and profitability of the SNF. The current state-specific wage data used in the simulation can be found in Table A12.

3. 6 OpenAI Gymnasium Environment

In this section, we document a custom OpenAI Gymnasium environment, snf_v0, which serves as a practical example of how users can create custom-tailored simulation environments based on SNFsim that can be easily used as testbeds for RL method development and application.

The environment described in this section is a direct operationalization of the facility representation introduced in Section 3.5. Specifically, the patient population, staffing configuration, occupancy levels, financial metrics, and referral generation defined in Section 3.5 make up the elements of the state space in snf_v0, while the referral acceptance and staffing adjustment steps correspond to the environments action space. Thus, this section translates the conceptual facility dynamics into a formal multi-objective Markov Decision Process (MO-MDP) compatible with RL.

3. 6. 1 State Space

The underlying simulator maintains a full Markovian state representation, St={Pt,Ot,Ht,Ct,Rt}, which includes the complete set of inpatients and their associated attributes, facility occupancy, available nursing hours, cumulative financial metrics, and the full set of generated referrals. This state corresponds directly to the facility representation described in detail in Section 3.5 and is used internally by SNFsim to compute transitions and patient outcomes.

Because many patient-level features are high-dimensional, the environment does not expose the entire state to the agent. Instead, the agent receives a condensed observation, described in Section 3.6.2, that solely captures the operational information required for making decisions. Simply, snf_v0 is a partially observable MOMDP in which O=f(St). The observation space provides a structured, lower-dimensional subset of the full simulator state, St.

3. 6. 2 Observation Space

While the full internal state, St, contains all patient-level and facility-level information, the observation space provides a structured subset of this internal information that is relevant for decision-making and avoids incorporating high-dimensional variables that may not be necessary for effective decision-making. Thus, the observation space is not identical to the state space, and the environment is partially observable. Our composite observation space includes three components; a condensed facility state vector, a staffing vector, and a fixed-size matrix of referral features. Let O=(ft,st,rt), where ft summarizes the facilitys operational status, and is formalized as:

(2) ft=[Ot,Revt,Costt,r¯,Nt],

with Ot denoting occupancy rate, Revt the total daily reimbursement, Costt the daily staffing costs, r¯ the mean rehospitalization risk across all current inpatients, and Nt the number of residents currently admitted.

The staffing vector st, the second element of O, encodes the total CNA hours scheduled across three shifts including day, evening, and night. This can be written as:

(3) st=[Htday,Htevening,Htnight].

Finally, the referral matrix rt provides a compact description of n candidate referrals at time t, with each row containing the expected daily reimbursement, expected LoS, and age of a prospective admission.

3. 6. 3 Action Space

In the snf_v0 environment, the range of actions an agent can perform at some timestep t is defined by its action space. For this environment, we use a multi-component action space to accommodate the detailed decision-making required. This action space consists of both referral acceptance decisions and staffing adjustments. For referral decisions, we implement a MultiBinary space of size referral_rate, allowing the agent to make binary accept/reject decisions for up to referral_rate potential admissions simultaneously. Second, for staffing management, we employ nested Box spaces that reflect different operational realities for each staff type. Full-time staffing adjustments range from [-2, 2] CNAs per shift, representing incremental changes to a persistent staff. In contrast, PRN and agency staffing use absolute values from [0, max] for each shift, where max is the total available staff of that type. This staffing model mirrors real-world operations where PRN and agency staff are scheduled daily as needed, with staffing levels resetting to zero at the beginning of each day, while full-time staff schedules persist with incremental adjustments between days.

3. 6. 4 Reward Signal

The reward signal defines the objective measure to be optimized by the learning process. We construct a vector-based reward function that captures key operational objectives.

Occupancy Reward

The occupancy reward, Rocc, evaluates how well the facility maintains an optimal census level, normalized to a range of [-1, 1]:

(4) Rocc(Ot,Tl,Tu)={12OtTu1Tu,if Ot>Tu1+2OtTl,if Ot<Tl1,if TlOtTu,

where Ot represents the occupancy rate at time t, and [Tl,Tu] defines the target range (the default setting is [0.75, 0.9]). This formulation provides maximum reward when occupancy falls within the optimal range, while creating a linear penalty gradient for both underutilization of space and overcrowding. When occupancy is zero, the reward equals -1, scaling linearly towards 1 as the occupancy approaches the lower threshold. Similarly, exceeding the upper threshold results in a linear decline from 1 toward -1 as occupancy approaches maximum capacity.

Reimbursement Reward

The reimbursement reward, Rreimb, introduces a distribution-aware approach by evaluating the facilitys total daily revenue against statistical benchmarks derived from the referral dataset. This method captures both patient selection quality and volume:

(5) Rreimb(Rtotal)={RtotalE[R]1,if RtotalE[R]RtotalE[R]T[R]E[R],if Rtotal>E[R],

where Rtotal represents the facilitys total daily reimbursement revenue, E[R]=Cmaxαμr denotes the expected revenue benchmark (α occupancy at mean reimbursement rate μr), and T[R]=CmaxβQ90(r) represents the target revenue benchmark (β occupancy at the 90th percentile reimbursement rate). Here, Cmax is the maximum bed capacity, μr is the mean reimbursement rate from the referral distribution, and Q90(r) is the 90th percentile of the reimbursement rate distribution. In our implementation, we set α=0.5 and β=Tl to represent realistic industry benchmarks. This formulation creates a continuous reward signal scaled to [-1, 1] that accounts for the statistical properties of the referral pool, incentivizing optimal selection while maintaining appropriate census levels.

Nursing Cost Reward

The nursing cost reward, Rnurse, normalizes staffing costs relative to the theoretical maximum possible cost, with an additional bonus for maintaining optimal care hours per patient.

(6) Rnurse(Ct,Cmax,pt)=clip(2CtCmax+1+B(pt),1,1)

where Ct is the current staffing cost, Cmax is the maximum possible cost when all staff are scheduled, and B(pt) is an efficiency bonus that is defined as:

(7) B(pt)={0.2(1|ptp¯|Δp)if HlptHu0otherwise

where pt represents the mean nursing hours per patient at time t. The target nursing hours range [Hl,Hu] defines the acceptable care-intensity range, with p¯=Hl+Hu2 and Δp=HuHl2.

The reward combines two components. The base term 2CtCmax+1 penalizes staffing expenditure linearly, ranging from 1 (at zero cost) to -1 (at maximum cost). The efficiency bonus B(pt) adds up to 0.2 when hours-per-patient fall within [Hl,Hu]. These terms encourage both cost efficiency and appropriate care intensity. The final value is clipped to [-1, 1], and defaults to 0 when Cmax=0. It should be noted that B(pt) can be removed from the calculation of Rnurse(Ct,Cmax,pt) with no other adjustments if the user does not wish to consider efficiency.

Rehospitalization Reward

The rehospitalization reward, Rrehosp, is designed to penalize patient rehospitalizations while accounting for facility occupancy:

(8) Rrehosp(h,n,t)={1if h=012(hn)2if n>0 and h>01min(t30,1)if n=0

where h is the number of rehospitalizations, n is the current number of patients, and t is the current time step. The reward function maximizes positive reinforcement when no patients are rehospitalized, creating a strong incentive to provide quality care and effectively manage the case mix. As the rate of rehospitalizations increases, the reward decreases quadratically, with the potential to reach a minimum value of -1 for extremely high rehospitalization rates. For facilities with zero patients, the penalty gradually increases over time, reaching a maximum negative reward of -1 after some number of time steps (defaulted to 30). This is put in place to allow an agent to amass patients at the beginning of new episodes. This approach balances the need to minimize rehospitalizations with maintaining an active patient population.

4 Applications and Baseline Results

In Section 4.1, we illustrate how SNFsim may be used to evaluate and compare hand-constructed policies in terms of their relative performance on different reward components. In Section 4.3, we evaluate and compare standard implementations of Proximal Policy Optimization (PPO) in SNFsim. We use a simplified rehospitalization classifier in this section, which is a lightweight alternative to the random forest model that preserves the directional relationships of the original model while guaranteeing monotonicity. While the random forest achieves high predictive fidelity (AUC of 0.921), its non-monotonic decision surface can introduce unpredictable risk signals during training, potentially impacting the agents ability to learn stable associations between actions and outcomes. The abstraction maintains the essential risk patterns and allows for controlled policy learning experiments. The simplified model considers LoS, mean CNA hours, reimbursement rate, and age, each contributing to the rehospitalization risk at varying degrees of influence consistent with the relationships identified in Table 2. Based on domain knowledge and the empirical findings of the regression analysis, we posit that longer stays and higher CNA hours per patient are protective against rehospitalization, while higher reimbursement rates (a proxy for patient acuity) and advanced age are associated with elevated risk. The relative influence of each feature on rehospitalization risk is fixed in our simplified model but is fully configurable. Note that researchers employing SNFsim still have the option to use the original rehospitalization model discussed previously, or their own model, rather than the simplified version.

4. 1 Policy Analysis

Rolling out and comparing manually specified policies can provide insights into how different strategies balance or prioritize each of the competing objectives. This can help clarify the relationship among environment dynamics and primary objectives.

We ran five episodes for each policy with an episode length of 365 timesteps (equivalent to one year in the environment). Each facility had at their disposal 35 full-time CNAs, 10 PRN CNAs, and 15 agency CNAs. Each facility also had an optimal occupancy between 75% and 90%, was located in New York, received 10 referrals per day, and had 100 available beds.

We evaluate two sets of simple policies P1 and P2 and P3 and P4. Policy P1 focuses on optimizing occupancy and minimizing rehospitalizations by accepting every referral each day and keeping nursing hours per patient at as close to six hours as possible. Policy p2 is more conservative, accepting just one referral per day (the one with the highest reimbursement rate), and hiring just one PRN and agency CNA per each daily shift. Figure 8 displays the results from five rollouts of policies P1 and P2 within the snf_v0 environment.

Performance comparison of five rollouts between P1 and P2 in the snf_v0 environment.

Policy P3 emphasizes cost reduction (but not necessarily in an efficient manner) by accepting five random referrals each day and keeping nursing hours per patient at as close to a singular nursing hour per patient hours as possible. Policy P4 prioritizes high-quality patient care by accepting the five highest-reimbursement (i.e., most intense) referrals each day. Additionally, P4 maximizes CNA availability by increasing facility staffing and fully utilizing PRN and agency CNAs. Figure 9 displays the results from five episodes of rolling out policies P3 and P4 within the snf_v0 environment.

Performance comparison of five rollouts between P1 and P2 in the snf_v0 environment.

It is clear that by accepting every referral while maintaining a high level of nursing hours per patient, P1 consistently achieves high occupancy as patients are admitted frequently and very few patients are rehospitalized due to the excellent levels of care provided. However, since the facility has 100 available beds and only ten referrals per day (all of which are accepted), there is no selectivity in acceptance. As a result, the reimbursement reward is lower than it could be if the facility received a larger number of referrals and selectively accepted those with higher reimbursement rates. Additionally, because P1 ensures a high level of care by scheduling the CNAs required for high levels of patient care, the nursing cost management reward is also lower than its potential. Conversely, P2 struggles to retain patients due to its low level of referral acceptance (just one per day), resulting in low occupancy and reimbursement rewards. Regardless, it manages to keep a low level of rehospitalizations despite spending very little on CNAs as there are fewer patients in the facility each day to care for.

Policy P3 accepts five random referrals per day and allocates only a single nursing hour per patient, which is significantly below the level required for effective inpatient care. As a consequence, P3 suffers from a high rate of rehospitalizations, low reimbursement (due to the frequent rehospitalizations), and consistently low occupancy. In contrast, policy P4 selects the five highest-paying patients per day and maximizes CNA staffing. This approach results in substantial rewards in terms of reimbursement and rehospitalization prevention, however it is at the cost of a very low nursing cost management reward. Additionally, because the facility operates near full capacity due to the continual acceptance of referrals combined with high levels of care, the occupancy reward remains low due to the optimal occupancy being set to a value between 75% and 90%, as demonstrated in Equation 8.

4. 2 Model Validation and Real-World Engagement

To build confidence in SNFsim, we conducted validation activities at multiple levels during its development. For patient-level and facility-level validation, notable measures included cross-validation of the rehospitalization risk model, calibration of staffing costs to Bureau of Labor statistics wage data, and validation that simulated patient demographics closely match empirical distributions from CMS claims data. System-level validation ensured that capacity constraints are enforced and boundary conditions are handled properly. As seen in Section 4.3, policies learned in a standard environment exhibit consistent behavior across multiple evaluation episodes, demonstrating stable facility dynamics.

To ensure the practical relevance of our framework, we have consulted with healthcare experts and frontline nursing staff through our industry partnership with PointClickCare, the leading provider of healthcare technology solutions for post-acute care. Our goal was to obtain an understanding of their decision-making priorities. These discussions revealed that facility operators generally face three primary challenges: 1) balancing quality metrics mandated by CMS with financial sustainability, 2) making rapid staffing decisions despite labor shortages, and 3) handling the tradeoff of the acceptance of high-acuity patients and maintaining low rehospitalization rates to ensure a continued positive reputation and quality care for patients.

Both the simulator itself and the weight configurations examined in Section 4.3 were designed to reflect these real-world priorities. However, it should be noted that in its current state, this work represents a computational framework rather than a deployment-ready tool.

Moving from this testbed to practical adoption will require addressing several unique implementation challenges. First, building trust in recommendations driven by machine learning requires transparent explainability tools that show why specific referrals are recommended for acceptance or rejection or why certain staffing decisions are suggested. Next, validation through pilot deployments (in which recommendations are made alongside human decision-makers without influencing operations) will be non-negotiable in ensuring user confidence. Finally, to maximize usefulness, individual facilities should calibrate the simulator to best reflect their respective transition probabilities. As SNFsim is highly modular, this would require large-scale historical data and increased operational granularity built into simulation.

4. 3 RL Methods

In this section, we train multiple RL agents within SNFsim and examine how the resulting policies differ in terms of decision-making and objective balancing.

We evaluate the effectiveness of PPO within SNFsim due to its stability in training under noisy reward signals and effectiveness in high-dimensional state spaces like ours. PPO aims to maximize a cumulative reward signal by learning a policy function while ensuring stable learning through constrained policy updates (Schulman et al., 2017). The algorithm maximizes the clipped surrogate objective:

(9) LCLIP(θ)=E^t[min(rt(θ)A^t,clip(rt(θ),1ϵ,1+ϵ)A^t)],

where rt(θ)=πθ(at|st)πθold(at|st) is the probability ratio between the new and old policies (πθ and πθold respectively), A^t is the estimated advantage function, and ϵ is a hyperparameter that constrains policy updates.

For simplicity within our multi-objective problem, we employ a fixed-weight scalarization approach that converts the vector of rewards into a single scalar value through weighted summation. With m objectives (where m=4 in our case), the scalar reward at each timestep is calculated as:

(10) R(s,a)=i=1mwiRi(s,a)

where wi represents the weight assigned to the ith objective and Ri(s,a)[1,1] is the normalized reward for that objective. This approach allows us to prioritize different objectives by adjusting their respective weights.

4. 3. 1 Method Performance

We evaluated three PPO agents (hyperparameters in Table A14) with distinct reward weight configurations to explore different policy priorities within SNFsim. Table 4 summarizes the key performance metrics and Figure 10 shows training performance under varied weight configurations.

PPO agent training performance under varied weight configurations across 200,000 timesteps.

Objective weights follow the order [reimbursement, nursing costs, rehospitalization, occupancy]. (a) Balanced approach with uniform weights [1.0, 1.0, 1.0, 1.0]. (b) Care optimization with weights [0.4, 0.4, 1.6, 1.6], emphasizing rehospitalization minimization and occupancy management while de-emphasizing financial metrics. (c) Financial optimization with weights [1.6, 1.6, 0.4, 0.4], prioritizing reimbursement maximization and nursing cost minimization over patient care and facility occupancy. The theoretical summed reward range is [-4, 4].

Table 4
Summary of training performance across different PPO weight configurations.
MetricBalancedFinancial FocusCare Focus
Weights[1.0, 1.0, 1.0, 1.0][1.6, 1.6, 0.4, 0.4][0.4, 0.4, 1.6, 1.6]
Average Reward1.410.402.54
First Quarter Avg1.070.131.88
Last Quarter Avg1.530.532.94
Improvement (%)42.74%312.67%56.00%
Trend Slope0.59820.53071.3929

The care-focused agent achieved the highest absolute performance, with final-quarter rewards approaching 75% of the theoretical maximum (2.94 out of 4.0). This agent also exhibited the steepest learning curve and attained the highest average reward (2.54).

In contrast, the financially-focused agent (Figure 10c) delivered lower absolute rewards but demonstrated the most dramatic relative improvement (312.67%). This indicates that financial optimization requires more sophisticated strategies. Despite this improvement, the agent reached only approximately 13% of the theoretical maximum reward of 4, which hints at difficulty in simultaneously optimizing financial components without considering the importance of patient wellness and facility occupancy.

The balanced agent (Figure 10a) achieved moderate performance with a 42.74% improvement rate and a trend slope of approximately 0.60, reflecting the trade-offs between competing objectives. These results suggest that policy optimization in multi-dimensional SNF management should consider the difficulty of improving different objectives and prioritize accordingly based on clear individual goals.

Table 5 shows clear tradeoffs between policies trained with different objective weightings. The care-focused policy achieved superior performance across both quality and overall financial metrics: 13.6% rehospitalization rate (approximately half the real-world SNF average of 23.5% (Minges et al., 2019), 73.0% occupancy (the highest), and $28,534 daily profit (16% higher than Balanced, 13% higher than Financial Focus). This policy invested in staffing ($9,933 daily cost, 47% above other policies) and selectively accepted referrals (60.1% acceptance rate), allowing it to maintain higher occupancy with patients that better fit the care potential of the SNF.

Table 5
Outcomes by policy configuration over 10 different 365-day episodes. Weight vectors indicate a priori selection of importance for [reimbursement, nursing cost, rehospitalization, occupancy] objectives.
MetricBalancedFinancial FocusCare Focus
Weights[1.0, 1.0, 1.0, 1.0][1.6, 1.6, 0.4, 0.4][0.4, 0.4, 1.6, 1.6]
Occupancy Rate55.62% ± 0.24%53.89% ± 0.20%72.97% ± 0.39%
Rehospitalization Rate37.3% ± 4.1%42.8% ± 3.6%13.6% ± 3.4%
Referral Acceptance Rate80.0%100.0%60.1%
Daily Revenue$31, 618 ± $131$32, 199 ± $13638, 467 ± $141
Daily Cost$6, 938 ± $1.81$6, 922 ± $1.75$9, 933 ± $1.44
Daily Profit$24, 680$25,277$28,534

Conversely, the financial-focused policy learned to accept all referrals (100% acceptance rate) and minimized staffing costs ($6,922), but suffered from the highest rehospitalization rate (42.8%) and, as a result, the lowest occupancy (53.9%). The high patient turnover from inadequate care prevented sustained growth, ultimately resulting in lower profit despite lower costs.

The Balanced policy achieved intermediate outcomes (37.3% rehospitalization; approximately 1.6 times the national average, 80% acceptance rate, $24,680 profit), but was dominated by the care-focused approach on both quality and financial dimensions.

These results suggest that prioritizing quality through adequate staffing and selective admissions generates superior financial outcomes compared to cost minimization, as preventing rehospitalizations results in more overall value than reducing care expenses.

5 Discussion

SNFsim provides a flexible open-source platform calibrated using real-world datasets for the development and testing of complex RL algorithms and policies that tackle long-horizon, multi-dimensional, and multi-objective optimization challenges. By simplifying many of the low-level details, SNFsim focuses instead on the high-level interactions occurring within SNFs, allowing researchers and healthcare professionals to explore the potential outcomes of various policy implementations and management strategies in a controlled, risk-free environment. By simulating real-world conditions and constraints, the platform provides a sandbox for identifying and addressing the limitations of current algorithms, encouraging the advancement of RL strategies that are applicable in real-world SNF settings.

Amidst the complexities of complex stochastic real-world environments in which algorithms face the risk of non-convergence or settling to local optima, SNFsim provides a valuable framework for constructing custom environments rooted in real-world healthcare data. We demonstrate an example of this process in Section 3.6, in which we build a custom Gymnasium environment, snf_v0, on top of SNFsim and later train RL agents within that custom environment.

It is important to note that although SNFsim incorporates predictive components (such as the rehospitalization risk model), it is not intended to function as a validated forecasting tool, nor do we claim complete empirical predictive accuracy in this work. Rather, the simulator provides a controlled environment in which the consequences of different staffing and referral policies can be explored through forward simulation. As such, SNFsim enables users to explore how alternative decisions may influence revenue, occupancy, staffing costs, and rehospitalizations under consistent assumptions. Suggestions on how SNFsim or a similar system might be implemented to aid in real-world decision-making is discussed in Section 4.2.

5. 1 Relevance to RL Research

RL applications are commonly developed within single-objective environments, focusing on optimizing a scalar reward signal. However, healthcare inherently involves balancing multiple objectives, which can be a significant challenge. Discrete-event simulators have long been a tool for multi-objective optimization in healthcare (Wang et al., 2015; Al-Hawari et al., 2022), providing a framework for decision-making within complex systems with competing priorities. By taking into account the interrelated outcomes of occupancy, net revenue, and rehospitalization, and modeling their nonlinear relationships, SNFsim allows for the development of more robust and adaptable multi-objective RL solutions.

Additionally, the infinite-horizon of decision-making in SNFs distinguishes it from many conventional RL applications, which are often based on finite horizons with clearly defined start and endpoints based on pre-defined goals (e.g., reaching the end of a maze or landing a helicopter directly in the center of a helipad). In SNFs, there is no fixed endpoint; the facility must continuously operate and evolve, making it imperative for RL policies to focus on long-term sustainability and adaptability. Using simulators such as SNFsim for infinite-horizon problems offers significant benefits. Simulators provide an interactive environment where algorithms can be continuously trained, tested, and refined under an assortment of conditions that may not be fully represented or available in offline datasets. This is particularly beneficial for infinite-horizon problems, where the decision-making process extends indefinitely into the future, and the system must adapt to evolving states and objectives over time. SNFsim allows for the exploration of long-term strategies and the evaluation of their impacts in a controlled setting, enabling the development of more robust and effective RL solutions that are better equipped to handle the complexities and uncertainties of real-world applications.

Finally, the interdisciplinary nature of RL in SNFs, involving collaboration across healthcare, operations management, and AI, not only enhances the applicability of RL solutions, but fosters a broader understanding of real-world complexities and allows for the exploration of more pragmatic RL applications.

The exploration of RL within the context of SNFs is not just a step towards improving operational efficiency and patient care in SNFs; it is a significant contribution to the field of RL as a whole. It challenges existing ideas, introduces new problems, and could help bridge the gap between methodological research and practical applications.

5. 2 Relevance to healthcare Decision-making

The use of SNFsim in conjunction with RL has the potential to enhance operational efficiency and resource management. RL models, trained and tested within a simulated SNF environment that can be tailored to an individual SNF, can analyze complex patterns and predict future trends. This predictive capability could support more informed decision-making regarding critical aspects like staffing levels, resource allocation, and patient admissions, and thereby help to achieve operational efficiency and high-quality patient care.

Our simulator provides a safe and controlled environment for testing and refining RL-based strategies. In the sensitive realm of healthcare, direct experimentation can be risky and ethically problematic. A simulated environment using de-identified healthcare data allows for extensive testing of different policies and decision-making scenarios without risk to patients or facilities. This feature is invaluable for validating RL models and ensuring their reliability and safety before real-world implementation.

Finally, the predictive analytics capability of a discrete-event simulator like SNFsim is particularly relevant for proactive healthcare management. By identifying potential risks and anticipating future scenarios, RL models can advise on preventive measures and strategic adjustments. For instance, predicting high-risk patients for rehospitalization and planning appropriate interventions can significantly improve patient outcomes and reduce the financial and reputational risk associated with hospital rehospitalizations.

6 Future Work

The goal of SNFsim and the custom snf_v0 environment is to provide a testbed for the future development of RL methodology and decision support tools, which we discuss below; however, there are avenues for future research and enhancements of the simulator and environment themselves.

In the future, we aim to expand upon the multi-objective nature of snf_v0 by incorporating additional objectives that reflect the competing goals inherent in healthcare facility management. For instance, we plan to incorporate outcomes such as patient satisfaction and staff well-being, providing a more comprehensive approach to decision-making. Additionally, adapting the environment to dynamically adjust these objectives based on changing circumstances or policies could offer a more realistic simulation of healthcare operations.

Future iterations could also explore more nuanced representations of the state and action spaces, capturing additional complexities of real-world SNFs. This could include more detailed patient profiles and a broader range of operational decisions. Simulating a multi-agent ecosystem, representing a group of facilities such as those in a county, or state, could also prove advantageous. In this setup, each agent would manage the decision-making for one facility. These facilities might each have distinct priorities and should learn to make strategic choices based on their understanding of the likely actions of other facilities within their ecosystem.

In the current implementation, SNFsim models referral arrivals as a Poisson process with constant rate parameter, which is effective for facilities operating in stable markets with established hospital referral relationships, though referral patterns in the real-world often vary over time based on a variety of facility-level features as well as hospital partnerships (McHugh et al., 2021; Kim et al., 2019). As such, future work should consider the task of capturing shifting referral demographics and numbers over time, ensuring to capture the relationship between geographical location, facility-level features, and hospital relationship on sampled referrals.

While SNFsim uses real-world data collected from SNFs, access to additional anonymized data could improve the realism and applicability of the simulator by enabling more accurate modeling and policy development. For instance, gaining access to staff data could enable us to integrate employee experience into the simulator, thereby facilitating more refined and informed staffing decisions. Similarly, acquiring comprehensive facility surveys filled out by patients or their relatives would facilitate the integration of a scoring system within the simulator. This addition could allow actions to impact a perceived facility score, potentially influencing both the quality and volume of generated referrals.

Ultimately, SNFsim is a valuable resource in healthcare decision-making, providing a modular platform calibrated using real-world medical data and offering researchers and practitioners an opportunity to explore and develop techniques for effectively balancing multiple objectives in SNFs.

Footnotes

1.

We use the term case-mix to describe a “mix of cases (patients)” receiving car.

2.

This is often referred to as ‘readmission’; in this work, we use the term ‘rehospitalization’ to make it more obvious that we are referring to the event where a patient leaves the SNF.

Appendices

Table A1
PDPM payment groups to code value.
PT/OTSLPNURSNPGCode Value
TASAES3NAA
TBSBES2NBB
TCSCES1NCC
TDSDHDE2NDD
TESEHDE1NEE
TFSFHBC2NFF
TGSGCBC2G
THSHCA2H
TISICBC1I
TJSJCA1J
TKSKBAB2K
TLSLBAB1L
TMHBC1M
TNLDE2N
TOLDE1O
TPLBC2P
LBC1Q
CDE2R
CDE1S
PDE2T
PDE1U
PBC2V
PA2W
PBC1X
PA1Y
Table A2
PT and OT case mix groups and PT and OT CMIs based on clinical category and PT and OT function score.
Clinical CategoryPT & OT Function ScorePT & OT Case Mix GroupPT CMIOT CMI
Major Joint Replacement or Spinal Surgery0-5TA1.531.49
Major Joint Replacement or Spinal Surgery6-9TB1.691.63
Major Joint Replacement or Spinal Surgery10-23TC1.881.68
Major Joint Replacement or Spinal Surgery24TD1.921.53
Other Orthopedic0-5TE1.421.41
Other Orthopedic6-9TF1.611.59
Other Orthopedic10-23TG1.671.64
Other Orthopedic24TH1.161.15
Medical Management0-5TI1.131.17
Medical Management6-9TJ1.421.44
Medical Management10-23TK1.521.54
Medical Management24TL1.091.11
Non-Orthopedic Surgery and Acute Neurologic0-5TM1.271.30
Non-Orthopedic Surgery and Acute Neurologic6-9TN1.481.49
Non-Orthopedic Surgery and Acute Neurologic10-23TO1.551.55
Non-Orthopedic Surgery and Acute Neurologic24TP1.081.09
Table A3
PDPM assessment type to code value.
Assessment TypeCode Value
Initial Patient Assessment0
PPS 5-Day Assessment1
Table A4
PT and OT case mix groups and PT and OT CMIs based on clinical category and PT and OT function score.
Clinical CategoryPT & OT Function ScorePT & OT Case Mix GroupPT CMIOT CMI
Major Joint Replacement or Spinal Surgery0-5TA1.531.49
Major Joint Replacement or Spinal Surgery6-9TB1.691.63
Major Joint Replacement or Spinal Surgery10-23TC1.881.68
Major Joint Replacement or Spinal Surgery24TD1.921.53
Other Orthopedic0-5TE1.421.41
Other Orthopedic6-9TF1.611.59
Other Orthopedic10-23TG1.671.64
Other Orthopedic24TH1.161.15
Medical Management0-5TI1.131.17
Medical Management6-9TJ1.421.44
Medical Management10-23TK1.521.54
Medical Management24TL1.091.11
Non-Orthopedic Surgery and Acute Neurologic0-5TM1.271.30
Non-Orthopedic Surgery and Acute Neurologic6-9TN1.481.49
Non-Orthopedic Surgery and Acute Neurologic10-23TO1.551.55
Non-Orthopedic Surgery and Acute Neurologic24TP1.081.09
Table A5
SLP case mix groups and SLP CMIs based on whether patient has a mechanically altered diet or swallowing disorder and the presence of acute neurological conditions, SLP-related comorbidity, or cognitive impairment.
Condition*Mechanically Altered Dietor Swallowing DisorderSLP Case Mix GroupSLP CMI
NoneNeitherSA0.68
NoneEitherSB1.82
NoneBothSC2.66
Any OneNeitherSD1.46
Any OneEitherSE2.33
Any OneBothSF2.97
Any TwoNeitherSG2.04
Any TwoEitherSH2.85
Any TwoBothSI3.51
All ThreeNeitherSJ2.98
All ThreeEitherSK3.69
All ThreeBothSL4.19
  1. *

    Presence of Acute Neurological Condition, SLP-Related Comorbidity, or Cognitive Impairment.

Table A6
Nursing payment group (CMG) and corresponding CMIs based on RUG-IV Nursing RUG, extensive services status, clinical conditions, depression status, and restorative nursing services.
RUG-IV Nursing RUGExtensive ServicesClinical ConditionsDepressionRNSFunction ScoreCMGCMI
ES3Trach & Ventilator0-14ES34.04
ES2Trach or Ventilator0-14ES23.06
ES1Infection Isolation0-14ES12.91
HE2/HD2SMCYes0-5HDE22.39
HE1/HD1SMCNo0-5HDE11.99
HC2/HB2SMCYes6-14HBC22.23
HC1/HB1SMCNo6-14HBC11.85
LE2/LD2RMCYes0-5LDE22.07
LE1/LD1RMCNo0-5LDE11.72
LC2/LB2RMCYes6-14LBC21.71
LC1/LB1RMCNo6-14LBC11.43
CE2/CD2CRCYes0-5CDE21.86
CE1/CD1CRCNo905CDE11.62
CC2/CB2CRCYes6-14CBC21.54
CA2CRCYes15-16CA21.08
CC1/CB1CRCNo6-14CBC11.34
CA1CRCNo15-16CA10.94
BB2/BA2BCS2+11-16BAB21.04
BB1/BA1BCS0-111-16BAB10.99
PE2/PD2ADL2+0-5PDE21.57
PE1/PD1ADL0-10-5PDE11.47
PC2/PB2ADL2+6-14PBC21.21
PA2ADL2+15-16PA20.7
PC1/PB1ADL0-16-14PBC11.13
PA1ADL0-115-16PA10.66
Table A7
Non-Therapy Ancillaries (NTA) CMIs based on NTA case mix groups and NTA score ranges.
NTA Score RangeNTA Case Mix GroupCMI
12+NA3.25
9-11NB2.53
6-8NC1.85
3-5ND1.34
1-2NE0.96
0NF0.72
Table A8
Urban rate components.
Rate ComponentPTOTSLPNursingNTANon-Case-Mix (NCM)
Per Diem Amount$62.84$58.49$23.46$109.55$82.64$98.10
Table A9
Rural rate components.
Rate ComponentPTOTSLPNursingNTANon-Case-Mix (NCM)
Per Diem Amount$71.63$65.79$29.56$104.66$78.96$99.91
Table A10
Day in stay adjustment factor.
Day in StayAdjustment Factor
1-201.00
21-270.98
28-340.96
35-410.94
42-480.92
49-550.90
56-620.88
63-690.86
70-760.84
77-830.82
84-900.80
91-970.78
98-1500.76
Table A11
NTA component adjustment factor.
Day in StayAdjustment Factor
1-33.00
4-1501.00
PDPM HIPPS code classification build sample.
Table A12
Estimated Hourly Wages for Full-time, PRN, and Agency CNAs by State based on (Nursa, 2024).
StateFull-time CNA ($/hr)PRN CNA ($/hr)Agency CNA ($/hr)
Alabama15.0418.0520.30
Alaska22.6327.1630.55
Arizona19.6923.6326.58
Arkansas15.4118.4920.80
California22.6327.1630.55
Colorado20.9525.1428.28
Connecticut20.7024.8427.95
Delaware18.5722.2825.07
Florida17.6721.2023.85
Georgia16.7720.1222.64
Hawaii21.6325.9629.20
Idaho17.9221.5024.19
Illinois19.8723.8426.82
Indiana18.1021.7224.44
Iowa18.4522.1424.91
Kansas17.3220.7823.38
Kentucky17.3020.7623.36
Louisiana14.6317.5619.75
Maine20.6524.7827.88
Maryland19.6023.5226.46
Massachusetts21.2225.4628.65
Michigan18.7122.4525.26
Minnesota19.4023.2826.19
Mississippi12.3514.8216.67
Missouri15.0718.0820.34
Montana16.8320.2022.72
Nebraska17.0020.4022.95
Nevada19.8923.8726.85
New Hampshire20.3624.4327.49
New Jersey19.0222.8225.68
New Mexico16.2019.4421.87
New York19.5623.4726.41
North Carolina16.2419.4921.92
North Dakota18.3322.0024.75
Ohio16.0519.2621.67
Oklahoma13.3916.0718.08
Oregon18.6722.4025.20
Pennsylvania17.2320.6823.26
Rhode Island19.3923.2726.18
South Carolina15.4718.5620.88
South Dakota14.9017.8820.12
Tennessee15.5618.6721.01
Texas16.9420.3322.87
Utah16.1219.3421.76
Vermont18.6022.3225.11
Virginia17.1020.5223.09
Washington20.8024.9628.08
West Virginia14.2217.0619.20
Wisconsin17.5821.1023.73
Wyoming17.3820.8623.46
Table A13
ICD-10-CM Chapters and corresponding categories.
ChapterTitleCategories
1Certain Infectious and Parasitic DiseasesA00-B99
2NeoplasmsC00-D49
3Diseases of the Blood and Blood-Forming OrgansD50-D89
4Endocrine, Nutritional, and Metabolic DiseasesE00-E89
5Mental, Behavioral, and Neurodevelopmental DisordersF01-F99
6Diseases of the Nervous SystemG00-G99
7Diseases of the Eye and AdnexaH00-H59
8Diseases of the Ear and Mastoid ProcessH60-H95
9Diseases of the Circulatory SystemI00-I99
10Diseases of the Respiratory SystemJ00-J99
11Diseases of the Digestive SystemK00-K95
12Diseases of the Skin and Subcutaneous TissueL00-L99
13Diseases of the Musculoskeletal System and Connective TissueM00-M99
14Diseases of the Genitourinary SystemN00-N99
15Pregnancy, Childbirth, and the PuerperiumO00-O9A
16Certain Conditions Originating in the Perinatal PeriodP00-P96
17Congenital Malformations, Deformations, and Chromosomal AbnormalitiesQ00-Q99
18Symptoms, Signs, and Abnormal Clinical and Laboratory FindingsR00-R99
19Injury, Poisoning, and Certain Other Consequences of External CausesS00-T88
20External Causes of MorbidityV00-Y99
21Factors Influencing Health Status and Contact with Health ServicesZ00-Z99
22Codes for Special PurposesU00-U85, U89
Example of the ICD-10-CM hierarchy for a specific cholera diagnosis.

This hierarchy tree demonstrates that a cholera diagnosis falls under chapter 1 (certain infectious and parasitic diseases), section A00- A09 (intestinal infectious diseases), category A00 (cholera), and could ultimately be one of codes A00.0, A00.1, or A00.9 (biovar cholerae, biovar eltor, or unspecified). Beginning from the top layer and descending, there is an increase in group specificity.

Flow of environment for the snf_v0 Gymnasium environment.
Table A14
PPO Hyperparameters
HyperparameterValue
Learning rate (α)3 × 10-4
Batch size64
Steps per update2048
Epochs per batch10
Clipping parameter (ε)0.2
Max gradient norm0.5
Discount factor (γ)0.99
GAE λ0.95
Entropy coefficient0.01
Total timesteps200,000
Policy architectureMultiInputPolicy

References

  1. 1
    Skilled nursing facilities patient-driven payment model technical report
    1. Acumen
    (2018)
    http://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/SNFPPS/Downloads/PDPM_Technical_Report_508.pdf, Accessed, 29 August 2024.
  2. 2
    Computation, modeling, and simulation of HIV-AIDS epidemics with vaccination
    1. RR Ahangar
    (2022)
    Journal of Applied Mathematics and Physics 10:1066–1082.
    https://doi.org/10.4236/jamp.2022.104073
  3. 3
  4. 4
    An integrative model-based approach to hospital layout
    1. TW Butler
    2. KR Karwan
    3. JR Sweigart
    4. GR Reeves
    (1992)
    IIE Transactions 24:144–152.
    https://doi.org/10.1080/07408179208964211
  5. 5
    Patient driven payment model
    1. Centers for Medicare and Medicaid Services
    https://www.cms.gov/medicare/medicare-fee-for-service-payment/snfpps/pdpm.ca, Accessed, 10 June 2023.
  6. 6
  7. 7
    Applications of computer simulation to health care
    1. W England
    2. SD Roberts
    (1978)
    Technical report, IEEE.
  8. 8
  9. 9
    Recent advances in reinforcement learning in finance
    1. B Hambly
    2. R Xu
    3. H Yang
    (2023)
    Mathematical Finance 33:437–503.
    https://doi.org/10.1111/mafi.12382
  10. 10
    Simulation-based optimization of radiotherapy: Agent-based modeling and reinforcement learning
    1. A Jalalimanesh
    2. H Shahabi Haghighi
    3. A Ahmadi
    4. M Soltani
    (2017)
    Mathematics and Computers in Simulation 133:235–248.
    https://doi.org/10.1016/j.matcom.2016.05.008
  11. 11
  12. 12
  13. 13
    A markovian model for hospital admission scheduling
    1. P Kolesar
    (1970)
    Management Science 16:B384.
    https://doi.org/10.1287/mnsc.16.6.B384
  14. 14
    Skilled nursing facilities: Too many beds
    1. R Laes-Kushner
    (2018)
    https://repository.escholarship.umassmed.edu/handle/20.500.14038/26962, Accessed, 25 September 2024.
  15. 15
  16. 16
    The UVA/PADOVA Type 1 diabetes simulator: new features
    1. CD Man
    2. F Micheletto
    3. D Lv
    4. M Breton
    5. B Kovatchev
    6. C Cobelli
    (2014)
    Journal of Diabetes Science and Technology 8:26–34.
    https://doi.org/10.1177/1932296813514502
  17. 17
  18. 18
  19. 19
    Hospital readmission from skilled nursing facilities (SNFs): perspectives of hospital and SNF providers
    1. KE Minges
    2. M Campbell Britton
    3. BW Clark
    4. GM Ouellet
    5. B Hodshon
    6. SI Chaudhry
    (2019)
    Journal of the American Medical Directors Association 20:1050–1051.
    https://doi.org/10.1016/j.jamda.2019.03.005
  20. 20
  21. 21
    The revolving door of rehospitalization from skilled nursing facilities
    1. V Mor
    2. O Intrator
    3. Z Feng
    4. DC Grabowski
    (2010)
    Health Affairs 29:57–64.
    https://doi.org/10.1377/hlthaff.2009.0629
  22. 22
    Comparison of reinforcement learning algorithms applied to the cart-pole problem
    1. S Nagendra
    2. N Podila
    3. R Ugarakhod
    4. K George
    (2017)
    2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).
    https://doi.org/10.1109/ICACCI.2017.8125811
  23. 23
    Cna salary data by state
    1. Nursa
    (2024)
    https://nursa.com/salary, Accessed, 1 March 2024.
  24. 24
    Efficiency evaluation of skilled nursing facilities
    1. YA Ozcan
    2. SE Wogen
    3. LW Mau
    (1998)
    Journal of Medical Systems 22:211224.
    https://doi.org/10.1023/a:1022657600192
  25. 25
    The Synthetic Data Vault
    1. N Patki
    2. R Wedge
    3. K Veeramachaneni
    (2016)
    2016 IEEE International Conference on Data Science and Advanced Analytics.
    https://doi.org/10.1109/DSAA.2016.49
  26. 26
  27. 27
  28. 28
    Tools for thinking: Modelling in management science
    1. M Pidd
    (1997)
    Journal of the Operational Research Society, 48, 10.2307/3010517.
  29. 29
    Is a skilled nursing facility’s rehospitalization rate a valid quality measure?
    1. M Rahman
    2. DC Grabowski
    3. V Mor
    4. EC Norton
    (2016)
    Health Services Research 51:2158–2175.
    https://doi.org/10.1111/1475-6773.12603
  30. 30
    A systems analysis of a university-health-service outpatient clinic
    1. EJ Rising
    2. R Baron
    3. B Averill
    (1973)
    Operations Research 21:1030–1047.
    https://doi.org/10.1287/opre.21.5.1030
  31. 31
    Computer simulation of hospital patient scheduling systems
    1. GH Robinson
    2. P Wing
    3. LE Davis
    (1968)
    Health Services Research 3:130–141.
  32. 32
    Microsimulation model calibration using incremental mixture approximate bayesian computation
    1. CM Rutter
    2. J Ozik
    3. M DeYoreo
    4. N Collier
    (2019)
    The Annals of Applied Statistics 13:2189–2212.
    https://doi.org/10.1214/19-aoas1279
  33. 33
    Proximal policy optimization algorithms
    1. J Schulman
    2. F Wolski
    3. P Dhariwal
    4. A Radford
    5. O Klimov
    (2017)
    arXiv.
  34. 34
    Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
    1. D Silver
    2. T Hubert
    3. J Schrittwieser
    4. I Antonoglou
    (2017)
    arXiv.
  35. 35
    Annales de lISUP
    1. M Sklar
    (1959)
    229231, Fonctions de rpartition n dimensions et leurs marges, Annales de lISUP, Vol, 8.
  36. 36
    What is social science microsimulation?
    1. M Spielauer
    (2011)
    Social Science Computer Review 29:9–20.
    https://doi.org/10.1177/0894439310370085
  37. 37
    Autonomous vehicle navigation using evolutionary reinforcement learning
    1. A Stafylopatis
    2. K Blekas
    (1998)
    European Journal of Operational Research 108:306–318.
    https://doi.org/10.1016/S0377-2217(97)00372-X
  38. 38
  39. 39
  40. 40
    Agent-based modeling in public health: Current applications and future directions
    1. M Tracy
    2. M Cerdá
    3. KM Keyes
    (2018)
    Annual Review of Public Health 39:77–94.
    https://doi.org/10.1146/annurev-publhealth-040617-014317
  41. 41
  42. 42
    Skilled nursing facility (snf) billing guidelines 2024
    1. K Williamson
    (2024)
    https://tranquilmedsolutions.com/skilled-nursing-facility-billing-services, Accessed, 7 July 2026.

Article and author information

Author details

  1. Caroline Strickland

    Department of Computer Science, London, Canada
    For correspondence
    cstrick4@uwo.ca
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2458-3848
  2. Daniel J Lizotte

    1. Department of Computer Science, London, Canada
    2. Department of Epidemiology and Biostatistics, London, Canada
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9258-8619

Funding

This work was supported in part by funding from the Natural Sciences and Engineering Research Council of Canada and from PointClickCare. (NSERC Alliance grant ALLRP 566302-21).

Publication history

  1. Version of Record published: June 3, 2026 (version 1)

Copyright

© 2026, Strickland, Wagner, WangLizotte

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)