Data Documentation

MAFoCUS

Facility Level Data Dictionary

County Level Data Dictionary

State Level Data Dictionary

LTCFoCUS

Plan Level Data Dictionary

Contract Level Data Dictionary

HHFoCUS

Dedicated performance scoring for Home Health agencies within the Medicaid ecosystem.

FREQUENTLY ASKED QUESTIONS

Where do the data come from?
- The variables in LTCFocus were developed from several sources. Descriptions of each follow. Each variable label in the Data Dictionary includes the source of the data.
  - Certification And Survey Provider Enhanced Reporting (CASPER) - formerly known as Online Survey Certification And Reporting (OSCAR) - data are administrative data collected by state survey agencies during nursing facility annual certification inspections. Inspection surveys generally occur at least once every 15 months and all data gathered during inspections are compiled in the CASPER database. Because each facility is not assessed every 12 months, there are some without a survey in a given calendar year. Therefore, yearly estimates are derived from the closest survey within 6 months of the year of interest (before or after). Note that for some facilities, information for adjacent years is taken from a single survey. Facilities without a survey in the given timeframe have missing information for the year.
  - Minimum Data Set (MDS) data are resident level data related to resident clinical and functional status. The MDS is collected for every nursing home resident upon admission and at least quarterly thereafter. It is also collected whenever there is a change in residents’ overall status. Data include the residents’ diagnoses, treatments, medications, activities of daily living (ADL), and mood/behaviors. CMS collects all nursing home MDS assessments from the state into a national repository.
  - The Residential History File (RHF) is a data resource developed at the Brown University Center for Gerontology and Healthcare Research. It is built using Medicare enrollment data, Medicare claims, and MDS data. It can be used to track individuals as they move through the long-term care system, including between different care settings and different care types. The goal of the RHF is to create a per-person chronological history of health service utilization and location of care throughout a calendar year.

How are the data aggregated?
- The LTCFocus data are distributed in three separate files which include one observation per year per unit of data. Data are provided at three different levels in separate datasets: nursing facility, county, and state levels.

What is the timeframe of the aggregation?
- We have created two forms of aggregates for this purpose: incidence measures are based on all admissions in the facility, county, or state in each calendar year; and prevalence measures are based on all residents in the facility, county, or state on the first Thursday in April.

Why the first Thursday in April?
- Research has shown that the nursing home population fluctuates both by season during the year and by day of the week. The nursing home population is highest during the winter months and lowest during the summer months. In addition, we have found that nursing home admissions and discharges fluctuate during each week, with the greatest number of admissions occurring on Mondays and the greatest number of discharges occurring on Fridays. We sought to avoid these issues by calculating all MDS prevalence measures based on the nursing home population on the first Thursday in April each year.

Is every nursing home included?
- Each file contains information derived from all Medicare and Medicaid certified facilities in the 48 continental states and Hawaii excluding Alaska and Washington, DC. Alaska and Washington, DC are excluded due to sample size limitations. County level data are included only for those counties with at least one NH.

Where can I see a full list of variables?
- The Data Dictionary for all three file types is available using the same link sent to your email once you make a data download request on the Data webpage.

How are the variables named?
- With few exceptions, variable names in the LTCFocus datafiles follow a consistent pattern across levels of aggregation. Certain variables are then given a suffix depending on the level of data. The suffix _cty indicates a county-level variable, while the suffix _sta indicates a state-level variable. Variables with neither of these two suffixes are measured at the level of the nursing facility.

What does _mds3 mean in the variable names?
- The MDS transitioned from version 2.0 to version 3.0 in October 2010. The two versions are not compatible across many items. In addition, data quality dropped for the last quarter of data collected under version 2.0, spanning July to October 2010. Therefore, MDS measures provided for facilities in 2010 reflect data collected in the first half of the year extrapolated to the entire year. After 2010, variables that were measured differently in MDS 3.0 compared to MDS 2.0 were assigned new variable names with the suffix _mds3. Those variables that were measured identically in both MDS versions retained the same variable name.

Why are some values missing in certain years?
- Variables come from many sources and may contain missing values for a number of reasons. In some instances, measures that were once routinely collected by surveyors may no longer be valid. For example, CMS modified the MDS assessment in October 2019 when it replaced Resource Utilization Groups (RUGS-IV), their nursing home case-mix reimbursement model, with the Patient Driven Payment Model (PDPM).

What does LNE mean in the dataset?
- LNE means “low number of events”. Because much of the data are aggregated from CMS data, which are covered under the strict terms of a data use agreement, the most prevalent reason that a facility will be missing information on an item is to adhere to CMS’s cell suppression policy. This policy stipulates that no cell of 10 or less may be displayed. Also, no use of percentages or other mathematical formulas may be used if they result in the display of a cell 10 or less. Therefore, in these data, if either the numerator or the denominator was less than or equal to 10 after aggregating, the result was set to LNE. Because this is done after aggregating, facilities whose values were set to missing in the facility-level file can still contribute to the measure at the county and state levels.

How can I combine LTCFocus with other data?
- Researchers will need to rename the LTCFocus linking variables (e.g,. ZIP code, county, state, year, etc.) to be compatible with the names in the files with which they wish to link. Note that counties are only specific within state, and so county-level files must be merged by state and county in addition to year. Merging to the facility level file must be mapped to the provider (i.e., facility) number (prov1680) in the LTCFocus facility file.

What is an accpt_id?
- CMS assigns every building (i.e., SNF) a new provider number whenever the building changes ownership. However, researchers may be interested in building trends over time, regardless of ownership. The accpt_id variable is equal to the first provider number assigned to a building and is stable (i.e., unchanging) across years. As such, the accpt_id should not be used to combine LTCFocus with other data.