TWANG for SAS Macros (twang_mac.sas)
Setup Guide
TWANG is a package of functions in the R Environment for
Statistical Computing and Graphics. The package contains functions
for estimating propensity scores and associated weights using
Generalized Boosting Model and functions for assessing the
covariate balance provided by the resulting weights.
The macros in the TWANG for SAS implement the functions in the R
package by creating an R script file and running it in R batch mode
and then porting the results back to SAS.
To use the macros a user must have the macro file (twang_mac.sas)
and both SAS and R installed on his or her computer. Users will
need to work with their tech support to install SAS.
R is standalone freeware that users can download and install from
The Comprehensive R Archive Network (http://cran.us.r-project.org/).
The software can be installed by clicking on the link for the users
computer platform (e.g., Windows users would click on "Download R
for Windows" and then click on the "base" link to download the
standard R software).
Users will need to note the directory where the R software is
installed and the name of the executable file. For Windows users
the directory information for the standard installment is
C:\Program Files\R\R-3.0.4\bin\x64
where 3.0.4 is replaced by the current version of R at the time of
installation. The executable is R.exe for batch implementation.
Users should use R or R.exe for the executable not Rcmd or Rscript.
The macros only support R or R.exe.
To use the TWANG for SAS macros, the user will need to import the
macros into the SAS session through a %include with the file name
for the SAS macro files. For example,
%include "C:\Users\uname\SASFile\twang_mac.sas";
would be the correct syntax for user "uname" who has stored the
macro file in "SASFile\twang_mac.sas".
After the macro source code is included, the TWANG procedures can
be implemented using the specific macro calls following the help
files for the specific macros.
TWANG for SAS Macros
ps: estimates propensity scores
plot: generates default diagnostics plots
dxwts: evaluates quality of resulting weights
mnps: estimates propensity scores for 3+ treatments
nmplot: generates diagnostics plots for 3+ treatments
mnbaltable: control tabular balance checking for 3+ treatments
CBPS: estimate propensity scores using Covariate Balanced
Propensity Score (CBPS) method
update_twang: Updates the twang package
The macro file also contains several utility macros that are used
by the other macros.
Note on file creation. The macros will create a directory for
storing the files necessary for the twang package to load in R. The
directory is in the user's root directory in AppData\Local\twang.
The update_twang macro will automatically install and update the
package in this directory. The users should allow the macros to do
their default updates. If this does not occur the update_twang
macro can be called. Typically users will not need to do updating
manually. It will be handled automatically.
Help files for Macros
Macro: ps
Propensity Score Estimation
Description:
'ps' estimates propensity scores using a Generalized Boosted
Model and evaluates their quality using covariate balance.
It implements the ps function in the twang package of R.
Usage:
%ps(treatvar=,
vars=,
class=,
dataset=,
ntrees=10000,
intdepth=3,
shrinkage=0.01,
permtestiters=0,
stopmethod=ks.mean es.mean,
sampw=,
estimand=ATE,
output_dataset=_inputds,
Rcmd=,
plotname=,
objpath=)
Arguments:
treatvar: The names of the treatment indicator variables in the
data set named by dataset.
vars: List of the names of the covariates to be tested for
balance. The variables' names should not be in quotes
and should be separated by spaces, e.g., VAR1 VAR2 ....
class: List of the names of categorical variables among the
covariates. The variables' names should not be in quotes
and should be separated by spaces, e.g., VAR1 VAR2 ....
dataset: The dataset, which must include the treatment assignment
variable and the covariates named in vars.
ntrees: number of gbm iterations passed on to 'gbm'.
intdepth: 'interaction.depth' passed on to 'gbm'. Default equals 3.
shrinkage: 'shrinkage' passed on to 'gbm'. Default equals 0.01.
permtestiters: a non-negative integer giving the number of
iterations of the permutation test for the KS statistic.
If 'permtestiters=0' then the function returns an
analytic approximation to the p-value. Setting
'permtestiters=200' will yield precision to within 3% if
the true p-value is 0.05. Use 'perm.test.iters=500' to be
within 2%. Default = 0.
stopmethod: A method or set of methods for measuring and
summarizing balance across covariates. Current options are
'ks.mean', 'ks.max', 'es.mean', and 'es.max'. 'ks' refers
to the Kolmogorov-Smirnov statistic and 'es' refers to
standardized effect size (also called standardized bias or
standardized differences). These are summarized across
covariates by either the maximum ('.max') or the mean
('.mean'). Multiple stopping rules can be requested. List
the option name for all methods of interest without quotation
marks and separated only by a space.
sampw: Variable name for optional sampling weights.
estimand: The causal effect of interest. Options are ATE (average
treatment effect), which attempts to estimate the change in
the outcome if the treatment were applied to the entire
population versus if the control were applied to the entire
population, or ATT (average treatment effect on the
treated), which attempts to estimate the analogous effect,
averaging only over the treated population. Default
equals ATE.
output_dataset: The name of the dataset for the resulting weight
estimates. The default is a temporary SAS data set
called _inputds.
Rcmd: The file name for the R executable.
plotname: The file name for an optional output file of the default
diagnostic plots.
objpath: The folder name for an optional permanent file of the
R object with the resulting GBM model fit, a log of
the R session, and files created by the macro to run
the estimation.
Details:
ps has no special requirements for the naming of the treatment
indicator or covariates and places no limits on the number of
covariates. However, categorical variables need to be
specified in the "class" parameter. Categorial variables
should be listed in "vars" parameter and the in the "class"
parameter. R, and in particular, the twang package in R, can be
slow and require large amounts of memory to process large datasets.
Users might obtain better computational performance by including
only the variables to be used by ps in the dataset specified by the
data parameter.
Unless the users has added R to the path environmental variable,
the full path must be specified in Rcmd. For example for the
default setup of R Version 3.0.1 on Windows 7, the specification
is
Rcmd=C:/Program Files/R/R-3.0.1/bin/x64/R.exe,
The value 3.0.1 in the example would be replaced by the current
version number at the time R was installed.
For plotname and objpath the full path can be given. If not the
plot file is placed in the folder specified by objpath if it
is specified or the folder where the SAS code is launched
(BATCH SAS) or the user's home directory (interactive SAS)
and the directory specified in objpath is assumed to be a
subdirectory of the directory where the code is launched
(BATCH SAS) or the user's home directory (interactive SAS).
If objpath is not specified, then the macro writes the user
specified input data to a temporary CSV file, "datafile.csv", in
the SAStmp directory. It also writes an R script file "ps.r" to
this folder. It then runs the R script which produces temporary
files: wgt.csv, baltab.csv, and summary.csv in the SAStmp folder.
These are read into SAS and used to produce the final output. The
macro also creates a temporary file ps.Rout which is log of the R
run.
If objpath is specified, all the files are created in the folder
it designates. The files will remain in that folder until the
user deletes them.
Value:
Creates SAS temporary data sets named _weights,_baltab,and _summ.
_summ and _baltab contain diagnostic information on the weights
and are printed to the listing file.
_weights: contains the weights and estimated propensity scores
generated by the ps function in R. There is one weight
variable and one propensity score for each stopping rule
specified by the user. The weight variables are named
according to the stopping rule and estimand so that
if the stopping rule is "es.mean" and the the estimand
is "ATE" the variable with these weights is named
"es_mean_ATE". Names for other stopping rules and
either estimand will follow the same convention. The
propensity scores follow the same naming convention
except that "ps_" is appended to the beginning of
the variable names. The variable with propensity
scores estimated with the "es.mean" stopping rule
and the "ATE" estimate is names "ps_es_mean_ATE".
The file also contains a variable tempID which is used
to merge the file with the input dataset.
_baltab: contains the data on the balance of the covariates
before and after weighting. The data set contains
records for multiple tables created in R. Each table
summarizes the balance on all of the covariates and
missing data indicators for covariates with missing
data. There is one table for the unweighted data and
an additional table for the weights generated by
each stopping rule specified by the user.
In each table, there is one record in for each
continuous covariate, one record for each level of a
categorical (class) variable, and one record for each
missing value indicator for covariates with missing
data. Each record contains summary statistics for
the treatment and control groups and tests of
group differences. All statistics and tests are
weighted using the weights corresponding to the
table. Results for the unweighted table are not
weighted.
The variable included are:
row_name: Covariate name with weight name
appended as prefix
tx_mn: Treatment group mean
tx_sd: Treatment group standard deviation
ct_mn: Control group mean
ct_sd: Control group standard deviation
std_eff_sz: Standard difference in group means
stat: t-statistic for testing difference in means
p: p-value for testing difference in means
ks: KS statistic comparing groups
distributions
ks_pval: p-value for testing difference in
distributions
table_name: Weights being evaluated
_summ: contains the data summarizing the weights and
the covariate balance. The data set contains
one record for unweighted data and one for
the weights corresponding to each of the
stopping rules the user specified.
The variables included are:
row_name: Comparison name, unw (unweighted) or
concatenation of the stopping rule and the
estimand
n_treat: Treatment group sample size
n_ctrl: Control group sample size
ess_treat: Treatment group effective sample size
after weighting
ess_ctrl: Control group effective sample size
after weighting
max_es: Maximum of the (absolute) standardized
biases across all the covariates
mean_es: Mean of the (absolute) standardized
biases across all the covariates
max_ks: Maximum of the KS statistics across all
the covariates
max_ks_p: Permutation-based p-value for testing
the group differences using the maximum
KS statistic; only calculated when
permtestiters > 0
mean_ks: Mean of the KS statistics across all
the covariates
iter: The number iterations for the GBM
chosen by the stopping rule.
Creates the dataset specified by output_dataset with the
estimated weights and propensity scores. The default is a temporary
SAS dataset named _inputds. The dataset contains the user input data
with estimated weights and propensity scores appended. This data set
can be used to estimate treatment effect with the propensity score
based weights. The weight variables and estimated propensity score
variables are appended from _weights and follow the naming
conventions described above.
Macro: dxwts
Weight Evaluation
Description:
'dxwts' evaluates balance between two groups achieved by weighting
using one or more sets of weights provided by the users, for each
covariate specified by the users using.
Usage:
%dxwts(treatvar=,
vars=,
class=,
dataset=,
weightvars=,
estimand=,
sampw=,
permtestiters=,
Rcmd=,
objpath=);
treatvar: The names of the treatment indicator variables in the
data set named by dataset.
vars: List of the names of the covariates to be tested for
balance. The variables' names should not be in quotes and
should be separated by spaces, e.g., VAR1 VAR2 ....
class: List of the names of categorical variables among the
covariates. The variables' names should not be in quotes
and should be separated by spaces, e.g., VAR1 VAR2 ..
dataset: The name of the SAS data set that contains the
covariates, treatment indicator variable, and weights
to be assessed.
weightvars: The names of the variables in data set named by dataset
with the weights to be assessed.
estimand: The causal effect of interest. Options are ATE (average
treatment effect), which attempts to estimate the change in
the outcome if the treatment were applied to the entire
population versus if the control were applied to the entire
population, or ATT (average treatment effect on the
treated), which attempts to estimate the analogous effect,
averaging only over the treated population. Estimand
should match the estimand used in defining the weights.
sampw: Variable name for optional sampling weights.
permtestiters: a non-negative integer giving the number of
iterations of the permutation test for the KS statistic.
If 'permtestiters=0' then the function returns an
analytic approximation to the p-value. Setting
'permtestiters=200' will yield precision to within 3% if
the true p-value is 0.05. Use 'perm.test.iters=500' to be
within 2%.
Rcmd: The file name for the R executable.
objpath: The folder name for an optional permanent file of the
R object with the log of the R session and files
created by the macro.
Details:
This function tests the balance of covariates after weighting
by the specified weight variables.
Creates SAS temporary data sets named _baltab and _summ.
These files are parallel to the ones created by ps. They
contain diagnostic information on the weights and are printed
to the listing file.
_baltab: contains the data on the balance of the covariates
before and after weighting. The data set contains
records for multiple tables created in R. Each table
summarizes the balance on all of the covariates and
missing data indicators for covariates with missing
data. There is one table for the unweighted data and
an additional table for the weights generated by
each stopping rule specified by the user.
In each table, there is one record in for each
continuous covariate, one record for each level of a
categorical (class) variable, and one record for each
missing value indicator for covariates with missing
data. Each record contains summary statistics for
the treatment and control groups and tests of
group differences. All statistics and tests are
weighted using the weights corresponding to the
table. Results for the unweighted table are not
weighted.
The variable included are:
row_name: Covariate name with weight name
appended as prefix
tx_mn: Treatment group mean
tx_sd: Treatment group standard deviation
ct_mn: Control group mean
ct_sd: Control group standard deviation
std_eff_sz: Standard difference in group means
stat: t-statistic for testing difference in means
p: p-value for testing difference in means
ks: KS statistic comparing groups
distributions
ks_pval: p-value for testing difference in
distributions
table_name: Weights being evaluated
_dxsumm: contains the data summarizing the weights and the
the covariate balance. The data set contains
one record for unweighted data and one for
the weights corresponding to each of the
stopping rules the user specified.
The variables included are:
row_name: Comparison name, unw (unweighted) or
concatenation of the stopping rule and the
estimand
n_treat: Treatment group sample size
n_ctrl: Control group sample size
ess_treat: Treatment group effective sample size
after weighting
ess_ctrl: Control group effective sample size
after weighting
max_es: Maximum of the (absolute) standardized
biases across all the covariates
mean_es: Mean of the (absolute) standardized
biases across all the covariates
max_ks: Maximum of the KS statistics across all
the covariates
max_ks_p: Permutation-based p-value for testing
the group differences using the maximum
KS statistic; only calculated when
permtestiters > 0
mean_ks: Mean of the KS statistics across all
the covariates.
Unless the users has added R to the path environmental variable,
the full path must be specified in Rcmd. For example for the
default setup of R Version 3.0.1 on Windows 7, the specification
is
Rcmd=C:/Program Files/R/R-3.0.1/bin/x64/R.exe,
The value 3.0.1 in the example would be replaced by the current
version number at the time R was installed.
For objpath the full path can be given. If not the directory specified
by this parameter is assumed to be a subdirectory of the directory
where the code is launched (BATCH SAS) or the user's home directory
(interactive SAS).
If objpath is not specified, then the macro writes an R script file
"plot.r" to the SAStmp folder. The macro also creates a temporary
file plot.Rout which is a log of the R run.
If objpath is specified, all the files are created in the folder
it designates. The files will remain in that folder until the
user deletes them.
Macro: plot
Diagnostic Plot Generation
Description:
'plot' generates diagnostic plots available in the twang package
according to the users specifications.
Usage:
%plot(inputobj=,
plotname=,
plotformat=,
plots=,
subset=,
color=TRUE,
Rcmd=,
objpath= )
inputobj: The name of the file containing the R object produced by
the ps macros with the propensity score model fitting
results.
plotname: The file name for the resulting plots.
plotformat: The file format for the resulting plots. Typically this
will match the file extension given in plotname. If not
specified by the user it will equal the file extension
from the filename given by plotname. Valid values
are:
jpg - JPEG
pdf - PDF
png - PNG
wmf - Windows enhanced metafile
ps - postscript
If an invalid value is specified, the plot will be
saved as PDF.
plots: An indicator of which type of plot is desired. The options
are
optimize or 1 A plot of the balance criteria as a function
of the GBM iteration
boxplot or 2 Boxplots of the propensity scores for the
treatment and control cases
es or 3 Plots of the standardized effect size of the
covariates before and after reweighing
t or 4 Plots of the p-values from t-statistics comparing
means of treated and control subjects for covariates,
before and after weighting.
ks or 5 Plots of the p-values from Kolmogorov-Smirnov
statistics comparing distributions of covariates
of treated and control subjects, before and
after weighting.
histogram or 6 Histogram of weights for treated and control
subjects.(Currently unavailable.)
subset: If multiple 'stop.method' rules were used in the 'ps()' call,
'subset' restricts the plots of a subset of the stopping
rules that were employed. This argument expects a subset of
the integers from 1 to k, if k 'stop.method's were used.
color: If set to FALSE, grayscale figures will be produced
Rcmd: The file name for the R executable.
objpath: The folder name for an optional permanent file of the
R object with the resulting GBM model fit.
Details:
This function produces diagnostic plots created by twang.
Unless the users has added R to the path environmental variable,
the full path must be specified in Rcmd. For example for the
default setup of R Version 3.0.1 on Windows 7, the specification
is
Rcmd=C:/Program Files/R/R-3.0.1/bin/x64/R.exe,
The value 3.0.1 in the example would be replaced by the current
version number at the time R was installed.
For inputobj, plotname and objpath the full path can be given.
If not the input object is searched for in and the plot file is
placed in the folder specified by objpath if it is specified
or the folder where the SAS code is launched (BATCH SAS) or the
user's home directory (interactive SAS) and the directory
specified in objpath is assumed to be a subdirectory of the
directory where the code is launched (BATCH SAS) or the user's
home directory (interactive SAS).
If objpath is not specified, then the macro writes an R script file
"plot.r" to the SAStmp folder. The macro also creates a temporary
file plot.Rout which is a log of the R run.
If objpath is specified, all the files are created in the folder
it designates. The files will remain in that folder until the
user deletes them.
Macro: mnps
Description:
'mnps' calculates propensity scores for 3+ treatments using a
generalized boosted model as implemented in 'gbm' also
provides diagnostics of the model
Usage:
%macro mnps(treatvar=,
vars=,
class=,
dataset=,
ntrees=10000,
intdepth=3,
shrinkage=0.01,
permtestiters=0,
stopmethod=ks.mean es.mean,
sampw=,
estimand=ATE,
treatatt=NULL,
collapseto=pair,
output_dataset=_inputds,
return_ps=FALSE,
Rcmd=,
plotname=,
objpath=);
Arguments:
treatvar: The names of the treatment indicator variables in the
data set named by dataset.
vars: List of the names of the covariates to be tested for
balance. The variables' names should not be in quotes
and should be separated by spaces, e.g., VAR1 VAR2 ....
class: List of the names of categorical variables among the
covariates. The variables' names should not be in quotes
and should be separated by spaces, e.g., VAR1 VAR2 ....
dataset: The dataset, which must include the treatment assignment
variable and the covariates named in vars.
ntrees: number of gbm iterations passed on to 'gbm'.
intdepth: 'interaction.depth' passed on to 'gbm'. Default equals 3.
shrinkage: 'shrinkage' passed on to 'gbm'. Default equals 0.01.
permtestiters: a non-negative integer giving the number of
iterations of the permutation test for the KS statistic.
If 'permtestiters=0' then the function returns an
analytic approximation to the p-value. Setting
'permtestiters=200' will yield precision to within 3% if
the true p-value is 0.05. Use 'perm.test.iters=500' to be
within 2%. Default = 0.
stopmethod: A method or set of methods for measuring and
summarizing balance across covariates. Current options are
'ks.mean', 'ks.max', 'es.mean', and 'es.max'. 'ks' refers
to the Kolmogorov-Smirnov statistic and 'es' refers to
standardized effect size (also called standardized bias or
standardized differences). These are summarized across
covariates by either the maximum ('.max') or the mean
('.mean'). Multiple stopping rules can be requested. List
the option name for all methods of interest without quotation
marks and separated only by a space.
sampw: Variable name for optional sampling weights.
estimand: The causal effect of interest. Options are ATE (average
treatment effect), which attempts to estimate the change in
the outcome if the treatment were applied to the entire
population versus if the control were applied to the entire
population, or ATT (average treatment effect on the
treated), which attempts to estimate the analogous effect,
averaging only over the treated population. Default
equals ATE.
treatatt: If the estimand is specified as ATT, this argument is
used to specify which treatment condition is considered
'the treated'. It must equal one of the levels of the
treatment variable exactly including case. It is
ignored for ATE analyses.
collapseto: Specifies the level of detail in outputted balance
tables. If equal to 'none' then no balance tables are
printed and they must be explored by the mnbaltable
macro. If equal to 'pairs' all pairs of treatments
are tested when the estimand is ATE and all comparisons
against the target are tested when the estimand is ATT.
Tables are printed for all stopping methods. If equal to
'covariate' the maximum value of the balance statistics
are reported by covariate for all stopping rules. If
equal to 'stop.method' the maximum value of the balance
statistics are report for each stopping rule.
output_dataset: The name of the dataset for the resulting weight
estimates. The default is a temporary SAS data set
called _inputds.
return_ps: Boolean variable for whether to return the estimated
propensity scores (TRUE) or not (FALSE). Equals FALSE
for no return of the propensity scores by default.
Rcmd: The file name for the R executable.
plotname: The file name for an optional output file of the default
diagnostic plots.
objpath: The folder name for an optional permanent file of the
R object with the resulting GBM model fit, a log of
the R session, and files created by the macro to run
the estimation.
Details:
mnps is syntactically very similar to ps. It requires 3+ treatments
ps should be used when there are only two treatments being compared.
mnps has no special requirements for the naming of the treatment
indicator or covariates and places no limits on the number of
covariates. However, categorical variables need to be
specified in the "class" parameter. Categorial variables
should be listed in "vars" parameter and the in the "class"
parameter. R, and in particular, the twang package in R, can be
slow and require large amounts of memory to process large datasets.
Users might obtain better computational performance by including
only the variables to be used by ps in the dataset specified by the
data parameter.
Unless the users has added R to the path environmental variable,
the full path must be specified in Rcmd. For example for the
default setup of R Version 3.0.1 on Windows 7, the specification
is
Rcmd=C:/Program Files/R/R-3.0.1/bin/x64/R.exe,
The value 3.0.1 in the example would be replaced by the current
version number at the time R was installed.
When the number of treatment is greater than 4 balance tables
with all variables and all stopping rules can become very
large and hard to use. Setting collapseto = 'none' and using
the mnbaltable macro to explore balance may be preferable. The
default plot output can also become very large with 4+ treatments
and the mnplot macro might be a more effective way to visually
assess the quality of the propensity scores.
For plotname and objpath the full path can be given. If not the
plot file is placed in the folder specified by objpath if it
is specified or the folder where the SAS code is launched
(BATCH SAS) or the user's home directory (interactive SAS)
and the directory specified in objpath is assumed to be a
subdirectory of the directory where the code is launched
(BATCH SAS) or the user's home directory (interactive SAS).
If objpath is not specified, then the macro writes the user
specified input data to a temporary CSV file, "datafile.csv", in
the SAStmp directory. It also writes an R script file "ps.r" to
this folder. It then runs the R script which produces temporary
files: wgt.csv, baltab.csv, and summary.csv in the SAStmp folder.
These are read into SAS and used to produce the final output. The
macro also creates a temporary file ps.Rout which is log of the R
run.
If objpath is specified, all the files are created in the folder
it designates. The files will remain in that folder until the
user deletes them.
Value:
Creates SAS temporary data sets named _weights and_baltab for
either estimand. When the the estimand is ATE it creates two
additional datasets: _summ1 and _summ2 and when the estimand is
ATT it creates one additional dataset: _summ1. _baltab and
_summ1 and _summ2 or _summ contain diagnostic information on
the weights and are printed to the listing file.
_weights: contains the weights generated by the mnps function in R.
There is one weight variable for each stopping rule
specified by the user. The weight variables are named
according to the stopping rule and estimand so that
if the stopping rule is "es.mean" and the the estimand
is "ATE" the variable with these weights is named
"es_mean_ATE". Names for other stopping rules and
either estimand will follow the same convention. The
file also contains a variable tempID which is used to
merge the file with the input dataset.
_psests: contains the estimated propensity scores generated by
the mnps function in R, if return_ps = "TRUE". There is
one propensity variable for each stopping rule
specified by the user for each treatment group when
estimand equals "ATE" and one for each treatment group
other than the group specified by treatatt when
estimand equals "ATT". The propensity scores variables are
named according to the treatment group, the stopping rule,
and estimand. Treatment groups are identified by numeric
alphabetical rankings of the group names or level of the
treatment variable (treatvar) so that if treatvar
has levels "A", "B", and "C", the variable for group "A"
when the stopping rule is "es.mean" and the the estimand
is "ATE" is named "ps_2_es_mean_ATE". Names for other
treatment groups, stopping rules, and either estimand
will follow the same convention. The file also contains
a variable tempID which is used to merge the file with
the input dataset.
_txxwalk: contains a crosswalk from the treatment groups to
the propensity score variables, when return_ps =
"TRUE". It contains variable Tx_Level and
Prop_Score_Variable_Name. Its contents are printed
to the SAS log.
_baltab: contains the data on the balance of the covariates
before and after weighting. The data set contains
records for multiple tables created in R. The
information in the table depends on the value of the
collapseto parameter and the estimand. When
collapseto = none, no balance table is returned and
this dataset is not created.
When collapseto = pair (the default value if not
specified by the user), then the tables summarizes
the balance between each pair of treatments on all
of the covariates and missing data indicators for
covariates with missing data. When the estimand
equals ATT the comparisons are between the target
treatment and each other treatment. There is one
table for the unweighted data and an additional
table for the weights generated by each stopping
rule specified by the user.
In each table, there is one record in for each
continuous covariate, one record for each level of a
categorical (class) variable, and one record for each
missing value indicator for covariates with missing
data. Each record contains summary statistics and
tests group differences. All statistics and tests are
weighted using the weights corresponding to the
table. Results for the unweighted table are not
weighted.
When the estimand is "ATE", the included variables are:
tmt1: Name of treatment in pair with name that comes
first alphanumerically
tmt2: Name of treatment in pair with name that comes
second alphanumerically
var: Variable name
mean1: Group mean for tmt1 treatment
mean2: Group mean for tmt1 treatment
pop_sd: The "population" standard deviation,
the standard deviation from the pooled data from
all the treatment groups
std_eff_sz: Difference in group means divided by
pop_sd
p: p-value for testing difference in means
ks: KS statistic comparing groups
distributions
ks_pval: p-value for testing difference in
distributions
stop_method: The stop method used for the weights being evaluated
When the estimand is "ATT", the included variables are:
var: Variable name
tx_mn: Treatment group mean
tx_sd: Treatment group standard deviation
ct_mn: Control group mean
ct_sd: Control group standard deviation
std_eff_sz: Standard difference in group means
stat: t-statistic for testing difference in means
p: p-value for testing difference in means
ks: KS statistic comparing groups
distributions
ks_pval: p-value for testing difference in
distributions
control: The treatment group being compared to the
target group
stop_method: The stop method used for the weights being
evaluated
When collapseto = covariate, then the tables
presents the maximum imbalance across all pairwise
comparisons (all treatment pairs when the estimand is
ATE and all treatments versus the target when the
estimand is ATT) by covariate. There is one
table for the unweighted data and an additional
table for the weights generated by each stopping
rule specified by the user.
In each table, there is one record in for each
continuous covariate, one record for each level of a
categorical (class) variable, and one record for each
missing value indicator for covariates with missing
data. Each record contains summary statistics and
tests of group differences. All statistics and tests
are weighted using the weights corresponding to the
table. Results for the unweighted table are not
weighted.
The tables are the same for both estimands. The
included variables are:
var: Variable name
max_std_eff_sz: The maximum of the standardize mean
difference between treatment pairs
min_p: The smallest p-value from testing
the difference between treatments
max_ks: The maximum KS statistic comparing the
distribution of the covariate
between two treatments
min_ks_pval: The minimum p-value from testing
the the KS statistics between
treatments
stop_method: The stop method used for the weights being
evaluated
When collapseto = stop.method, then there is one table.
It presents the maximum imbalance across all pairwise
comparisons (all treatment pairs when the estimand is
ATE and all treatments versus the target when the
estimand is ATT) and all covariates. There is one
record for the unweighted data and an additional
record for the weights generated by each stopping
rule specified by the user. Each record contains
summary statistics and tests of group differences.
All statistics and tests are weighted using the
weights corresponding to the row. Results for the
unweighted row are not weighted.
The tables are the same for both estimands. The
included variables are:
max_std_eff_sz: The maximum of the standardize mean
difference between treatment pairs
across all covariates
min_p: The smallest p-value from testing
the difference between treatments
across all covariates
max_ks: The maximum KS statistic comparing the
distribution of a covariate
between any two treatments across
all covariates
min_ks_pval: The minimum p-value from testing
the the KS statistics between
any two treatments and across all
covariates
stop_method: The stop method used for the weights
being evaluated
_summ: contains the data summarizing the weights and
the covariate balance, when the estimand is ATT.
The data set contains one record for unweighted data
and one for the weights corresponding to each of the
stopping rules the user specified.
The variables included are:
comp_treat: The treatment being compared to the
target
row_name: Comparison name, unw (unweighted) or
concatenation of the stopping rule and the
estimand
n_treat: Treatment group sample size
n_ctrl: Control group sample size
ess_treat: Treatment group effective sample size
after weighting
ess_ctrl: Control group effective sample size
after weighting
max_es: Maximum of the (absolute) standardized
biases across all the covariates
mean_es: Mean of the (absolute) standardized
biases across all the covariates
max_ks: Maximum of the KS statistics across all
the covariates
max_ks_p: Permutation-based p-value for testing
the group differences using the maximum
KS statistic; only calculated when
permtestiters > 0
mean_ks: Mean of the KS statistics across all
the covariates
iter: The number iterations for the GBM
chosen by the stopping rule.
_summ1: contains the data summarizing the covariate balance,
when the estimand is ATE. The data set contains one
record for unweighted data and one for the weights
corresponding to each of the stopping rules the user
specified.
The variables included are:
max_std_eff_sz: The maximum of the standardize mean
difference between treatment pairs
across all covariates
min_p: The smallest p-value from testing
the difference between treatments
across all covariates
max_ks: The maximum KS statistic comparing the
distribution of a covariate
between any two treatments across
all covariates
min_ks_pval: The minimum p-value from testing
the the KS statistics between
any two treatments and across all
covariates
stop_method: The stop method used for the weights
being evaluated
_summ2: contains the data summarizing the weights, when the
estimand is ATE. The data set contains one record
for treatment.
The variables included are:
n: The sample size of the treatment
group
ESS_[stop method]: The effective sample size for
weights corresponding to the
stop method. There is one
variable for each user
specified stopping method.
[stop method] is the specified
stop method name, e.g.
ESS_es_mean.
Creates the dataset specified by output_dataset with the
estimated weights. The default is a temporary SAS dataset
named _inputds. The dataset contains the user input data
with estimated weights appended. This data set can be used to
estimate treatment effect with the propensity score based
weights. The weight variables are appended from _weights and
follow the naming conventions described above.
Macro: mnbaltable
Description:
'mnbaltable' produced tables of covariate balance for 3+
treatments using the stored results on the mnps macro;
produces all pairwise summaries or user controlled summaries
Usage:
%macro mnbaltable(inputobj=,
collapseto=pairs,
subset_var=,
subset_treat=,
subset_stop_method=,
es_cutoff=,
ks_cutoff=,
p_cutoff=,
ks_p_cutoff=,
Rcmd=,
objpath= );
Arguments:
inputobj: The name of the file containing the R object produced by
the ps macros with the propensity score model fitting
results.
collapseto: Specifies the level of detail in outputted balance
tables. If equal to 'pairs' all pairs of treatments
are tested when the estimand is ATE and all comparisons
against the target are tested when the estimand is ATT.
Tables are printed for all stopping methods. If equal to
'covariate' the maximum value of the balance statistics
are reported by covariate for all stopping rules. If
equal to 'stop.method' the maximum value of the balance
statistics are report for each stopping rule.
subset_var: specifies variables to be included in the table when
collapseto = pairs. The value of subset_var must equal
a name or names of variables used with %mnps. The variables'
names should not be in quotes and should be separated by
spaces, e.g., VAR1 VAR2 ...
subset_treat: specifies the treatments to include in the tabled
comparisons when collapseto = pairs. The value of
subset_treatment must equal a level of treatment
variable used with %mnps. The names are case sensitive.
subset_stop_method: specifies the stop methods to include in
the tabled comparisons when collapseto = pairs. The
value of subset_stop_method must equal a stopping
method used with %mnps.
es_cutoff: The minimum value for absolute standardized mean
difference for tabled results when collapseto = pairs
ks_cutoff: The minimum value for KS statistics for tabled results
when collapseto = pairs
p_cutoff: The maximum value for p-value for tests of mean
differences for tabled results when collapseto = pairs
ks_p_cutoff: The maximum value for p-value for tests of the KS
statistic for tabled results when collapseto = pairs
Rcmd: The file name for the R executable.
objpath: The folder name for an optional permanent file of the
R object with the resulting GBM model fit.
Details:
This function produces diagnostic balance tables created twang
for 3+ treatments following the use of %mnps.
Unless the users has added R to the path environmental variable,
the full path must be specified in Rcmd. For example for the
default setup of R Version 3.0.1 on Windows 7, the specification
is
Rcmd=C:/Program Files/R/R-3.0.1/bin/x64/R.exe,
The value 3.0.1 in the example would be replaced by the current
version number at the time R was installed.
For inputobj and objpath the full path can be given.
If not the input object is searched for in and balance table
csv files is placed in the folder specified by objpath
if it is specified or the folder where the SAS code is launched
(BATCH SAS) or the user's home directory (interactive SAS) and
the directory specified in objpath is assumed to be a subdirectory
of the directory where the code is launched (BATCH SAS) or the
user's home directory (interactive SAS).
If objpath is not specified, then the macro writes an R script file
"mnbaltable.r" to the SAStmp folder. The macro also creates temporary
files mnbaltable.Rout, which is a log of the R run, and mnbaltable.csv,
which include the tabled balance statistics.
If objpath is specified, all the files are created in the folder
it designates. The files will remain in that folder until the
user deletes them.
When the estimand = ATE and subset_treat equals just one of
the treatment levels, then statistics for pairwise comparisons
of other treatments with this treatment are reported. When the
estimand = ATE and subset_treat equals two or more treatment
levels, then statistics for all the pairwise comparisons among
the specified treatments are reported.
Value:
Creates SAS temporary data set named _baltab. _baltab contains
diagnostic information on the balance and is printed to the
listing file.
_baltab: contains the data on the balance of the covariates
before and after weighting. The data set contains
records for multiple tables created in R. The
information in the table depends on the value of the
collapseto parameter and the estimand. When
collapseto = none, no balance table is returned and
this dataset is not created.
When collapseto = pair (the default value if not
specified by the user), then the tables summarizes
the balance between each pair of treatments on all
of the covariates and missing data indicators for
covariates with missing data. When the estimand
equals ATT the comparisons are between the target
treatment and each other treatment. There is one
table for the unweighted data and an additional
table for the weights generated by each stopping
rule specified by the user.
In each table, there is one record in for each
continuous covariate, one record for each level of a
categorical (class) variable, and one record for each
missing value indicator for covariates with missing
data. Each record contains summary statistics and
tests group differences. All statistics and tests are
weighted using the weights corresponding to the
table. Results for the unweighted table are not
weighted.
When the estimand is "ATE", the included variables are:
tmt1: Name of treatment in pair with name that comes
first alphanumerically
tmt2: Name of treatment in pair with name that comes
second alphanumerically
var: Variable name
mean1: Group mean for tmt1 treatment
mean2: Group mean for tmt1 treatment
pop_sd: The "population" standard deviation,
the standard deviation from the pooled data from
all the treatment groups
std_eff_sz: Difference in group means divided by
pop_sd
p: p-value for testing difference in means
ks: KS statistic comparing groups
distributions
ks_pval: p-value for testing difference in
distributions
stop_method: The stop method used for the weights being evaluated
When the estimand is "ATT", the included variables are:
var: Variable name
tx_mn: Treatment group mean
tx_sd: Treatment group standard deviation
ct_mn: Control group mean
ct_sd: Control group standard deviation
std_eff_sz: Standard difference in group means
stat: t-statistic for testing difference in means
p: p-value for testing difference in means
ks: KS statistic comparing groups
distributions
ks_pval: p-value for testing difference in
distributions
control: The treatment group being compared to the
target group
stop_method: The stop method used for the weights being
evaluated
When collapseto = covariate, then the tables
presents the maximum imbalance across all pairwise
comparisons (all treatment pairs when the estimand is
ATE and all treatments versus the target when the
estimand is ATT) by covariate. There is one
table for the unweighted data and an additional
table for the weights generated by each stopping
rule specified by the user.
In each table, there is one record in for each
continuous covariate, one record for each level of a
categorical (class) variable, and one record for each
missing value indicator for covariates with missing
data. Each record contains summary statistics and
tests of group differences. All statistics and tests
are weighted using the weights corresponding to the
table. Results for the unweighted table are not
weighted.
The tables are the same for both estimands. The
included variables are:
var: Variable name
max_std_eff_sz: The maximum of the standardize mean
difference between treatment pairs
min_p: The smallest p-value from testing
the difference between treatments
max_ks: The maximum KS statistic comparing the
distribution of the covariate
between two treatments
min_ks_pval: The minimum p-value from testing
the the KS statistics between
treatments
stop_method: The stop method used for the weights being
evaluated
When collapseto = stop.method, then there is one table.
It presents the maximum imbalance across all pairwise
comparisons (all treatment pairs when the estimand is
ATE and all treatments versus the target when the
estimand is ATT) and all covariates. There is one
record for the unweighted data and an additional
record for the weights generated by each stopping
rule specified by the user. Each record contains
summary statistics and tests of group differences.
All statistics and tests are weighted using the
weights corresponding to the row. Results for the
unweighted row are not weighted.
The tables are the same for both estimands. The
included variables are:
max_std_eff_sz: The maximum of the standardize mean
difference between treatment pairs
across all covariates
min_p: The smallest p-value from testing
the difference between treatments
across all covariates
max_ks: The maximum KS statistic comparing the
distribution of a covariate
between any two treatments across
all covariates
min_ks_pval: The minimum p-value from testing
the the KS statistics between
any two treatments and across all
covariates
stop_method: The stop method used for the weights
being evaluated
Macro: mnplot
Diagnostic Plot Generation
Description:
'mnplot' generates diagnostic plots available in the twang package
according to the users specifications.
Usage:
%mnplot(inputobj=,
plotname=,
plotformat=,
plots=,
subset=,
color=TRUE,
pairwisemax=TRUE,
treatments=,
singleplot=,
multipage=FALSE,
Rcmd=,
objpath= )
inputobj: The name of the file containing the R object produced by
the ps macros with the propensity score model fitting
results.
plotname: The file name for the resulting plots.
plotformat: The file format for the resulting plots. Typically this
will match the file extension given in plotname. If not
specified by the user it will equal the file extension
from the filename given by plotname. Valid values
are:
jpg - JPEG
pdf - PDF
png - PNG
wmf - Windows enhanced metafile
ps - postscript
If an invalid value is specified, the plot will be
saved as PDF.
plots: An indicator of which type of plot is desired. The options
are
optimize or 1 A plot of the balance criteria as a function
of the GBM iteration
boxplot or 2 Boxplots of the propensity scores for the
treatment and control cases
es or 3 Plots of the standardized effect size of the
covariates before and after reweighing
t or 4 Plots of the p-values from t-statistics comparing
means of treated and control subjects for covariates,
before and after weighting.
ks or 5 Plots of the p-values from Kolmogorov-Smirnov
statistics comparing distributions of covariates
of treated and control subjects, before and
after weighting.
histogram or 6 Histogram of weights for treated and control
subjects.(Currently unavailable.)
subset: If multiple 'stop.method' rules were used in the 'ps()' call,
'subset' restricts the plots of a subset of the stopping
rules that were employed. This argument expects a subset of
the integers from 1 to k, if k 'stop.method's were used.
color: If set to FALSE, grayscale figures will be produced
pairwisemax: A Boolean variable specifying whether or not the
plots should provide summary across all pairwise
comparison of treatments or present separate plots for
each pairwise comparison
treatments: The treatments to be compared if pairwisemax=FALSE. By
default all pairs are compared if treatments is not
specified. treaments parameter must specify one or two
treatments. The values must be values of treatment
variable used when fitting with %mnps. If
treatments takes on one value then all pairwise
comparisons with between that specified treatment and
all the other treatments are plotted. If two treatments
are specified then only the plots for this pairwise
comparison are generated.
figurerows: The number of rows of panels in each plot.
singleplot: For plot calls that produce multiple plots, specifying
an integer value of 'singleplot' will return only the
corresponding plot. E.g., specifying 'singleplot = 2'
will return the second plot. The value must be a positive
integer between 1 and the total number of plots created
by the plot request. The value of 'singleplot' is ignored
when pairwisemax=TRUE since then only a single plot is
created.
multipage: When multiple frames of a figure are produced,
'multiPage=TRUE' will print each frame on a different
page of the figure file.
Rcmd: The file name for the R executable.
objpath: The folder name for an optional permanent file of the
R object with the resulting GBM model fit.
Details:
This function produces diagnostic plots created by twang.
Unless the users has added R to the path environmental variable,
the full path must be specified in Rcmd. For example for the
default setup of R Version 3.0.1 on Windows 7, the specification
is
Rcmd=C:/Program Files/R/R-3.0.1/bin/x64/R.exe,
The value 3.0.1 in the example would be replaced by the current
version number at the time R was installed.
For inputobj, plotname and objpath the full path can be given.
If not the input object is searched for in and the plot file is
placed in the folder specified by objpath if it is specified
or the folder where the SAS code is launched (BATCH SAS) or the
user's home directory (interactive SAS) and the directory
specified in objpath is assumed to be a subdirectory of the
directory where the code is launched (BATCH SAS) or the user's
home directory (interactive SAS).
If objpath is not specified, then the macro writes an R script file
"plot.r" to the SAStmp folder. The macro also creates a temporary
file plot.Rout which is a log of the R run.
If objpath is specified, all the files are created in the folder
it designates. The files will remain in that folder until the
user deletes them.
Macro: CBPS
Covariate Balanced Propensity Score Estimation
Description:
'CBPS' estimates propensity scores using the Covariate
Balanced Propensity Score methods of Imai and Ratkovic (2014)
and evaluates their quality. It implements the CBPS function
in the CBPS package of R.
Usage:
%CBPS(treatvar=,
vars=,
class=,
dataset=,
estimand=ATE,
method=over,
output_dataset=_inputds,
Rcmd=,
objpath=)
Arguments:
treatvar: The names of the treatment indicator variables in the
data set named by dataset.
vars: List of the names of the covariates to be tested for
balance. The variables' names should not be in quotes
and should be separated by spaces, e.g., VAR1 VAR2 ....
class: List of the names of categorical variables among the
covariates. The variables' names should not be in quotes
and should be separated by spaces, e.g., VAR1 VAR2 ....
dataset: The dataset, which must include the treatment assignment
variable and the covariates named in vars.
estimand: The causal effect of interest. Options are ATE (average
treatment effect), which attempts to estimate the change in
the outcome if the treatment were applied to the entire
population versus if the control were applied to the entire
population, or ATT (average treatment effect on the
treated), which attempts to estimate the analogous effect,
averaging only over the treated population. Default
equals ATE.
method: The CBPS fitting method: "over" (the default) for fitting an
over-identified model that combines the propensity score and
covariate balancing conditions; "exact" for fitting a model
that only contains the covariate balancing conditions.
output_dataset: The name of the dataset for the resulting weight
estimates. The default is a temporary SAS data set
called _inputds.
Rcmd: The file name for the R executable.
plotname: The file name for an optional output file of the default
diagnostic plots.
objpath: The folder name for an optional permanent file of the
R object with the resulting GBM model fit, a log of
the R session, and files created by the macro to run
the estimation.
Details:
CBPS has no special requirements for the naming of the treatment
indicator or covariates and places no limits on the number of
covariates. However, categorical variables need to be
specified in the "class" parameter. Categorial variables
should be listed in "vars" parameter and the in the "class"
parameter.
Unless the users has added R to the path environmental variable,
the full path must be specified in Rcmd. For example for the
default setup of R Version 3.0.1 on Windows 7, the specification
is
Rcmd=C:/Program Files/R/R-3.0.1/bin/x64/R.exe,
The value 3.0.1 in the example would be replaced by the current
version number at the time R was installed.
Value:
Creates SAS temporary data sets named _weights,_baltab,and _summ.
_summ and _baltab contain diagnostic information on the weights
and are printed to the listing file.
_weights: contains the weights generated by the ps function in R.
There is one weight variable names cbps. The
file also contains a variable tempID which is used to
merge the file with the input dataset.
_baltab: contains the data on the balance of the covariates
before and after weighting. The data set contains
records for multiple tables created in R. Each table
summarizes the balance on all of the covariates and
missing data indicators for covariates with missing
data. There is one table for the unweighted data and
an additional table for the weights generated by
each stopping rule specified by the user.
In each table, there is one record in for each
continuous covariate, one record for each level of a
categorical (class) variable, and one record for each
missing value indicator for covariates with missing
data. Each record contains summary statistics for
the treatment and control groups and tests of
group differences. All statistics and tests are
weighted using the weights corresponding to the
table. Results for the unweighted table are not
weighted.
The variable included are:
row_name: Covariate name with weight name
appended as prefix
tx_mn: Treatment group mean
tx_sd: Treatment group standard deviation
ct_mn: Control group mean
ct_sd: Control group standard deviation
std_eff_sz: Standard difference in group means
stat: t-statistic for testing difference in means
p: p-value for testing difference in means
ks: KS statistic comparing groups
distributions
ks_pval: p-value for testing difference in
distributions
table_name: Weights being evaluated
_dxsumm: contains the data summarizing the weights and
the covariate balance. The data set contains
one record for unweighted data and one for
the weights corresponding to each of the
stopping rules the user specified.
The variables included are:
row_name: Comparison name, unw (unweighted) or
concatenation of the stopping rule and the
estimand
n_treat: Treatment group sample size
n_ctrl: Control group sample size
ess_treat: Treatment group effective sample size
after weighting
ess_ctrl: Control group effective sample size
after weighting
max_es: Maximum of the (absolute) standardized
biases across all the covariates
mean_es: Mean of the (absolute) standardized
biases across all the covariates
max_ks: Maximum of the KS statistics across all
the covariates
max_ks_p: Permutation-based p-value for testing
the group differences using the maximum
KS statistic; only calculated when
permtestiters > 0
mean_ks: Mean of the KS statistics across all
the covariates.
Creates the dataset specified by output_dataset with the
estimated weights. The default is a temporary SAS dataset
named _inputds. The dataset contains the user input data
with estimated weights appended. This data set can be used to
estimate treatment effect with the propensity score based
weights. The weight variables are appended from _weights and
follow the naming conventions described above.
Macro: update_twang
R package updating
Description:
'update_twang' updates the TWANG package in the TWANG folder
in the c:\\Users\username\AppData\Local directory. The macro
is primarily used by other macros automatically when the
package is out of date.
Usage:
%update_twang(Rcmd=)
Rcmd: The file name for the R executable.
Details:
Unless the users has added R to the path environmental variable,
the full path must be specified in Rcmd. For example for the
default setup of R Version 3.0.1 on Windows 7, the specification
is
Rcmd=C:/Program Files/R/R-3.0.1/bin/x64/R.exe,
The value 3.0.1 in the example would be replaced by the current
version number at the time R was installed.
Macro: remove_twang_folder
Folder removal
Description:
'remove_twang_folder' removes the TWANG folder created in the
c:\\Users\username\AppData\Local directory.
Usage:
%remove_twang_folder
References:
McCaffrey, D. F. Ridgeway, G, & Morral A. (2004). “Propensity
Score Estimation with Boosted Regression for Evaluating Adolescent
Substance Abuse Treatment,” _Psychological Methods_ 9(4):403-425.
Ridgeway, G., McCaffrey, D., Morral, A., Burgette, L, & Griffin,
B. A. (2012). “Toolkit for Weighting and Analysis of Nonequivalent
Groups: A Tutorial for the Twang Package.” R package.