% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/extend_family.R
\name{extend_family}
\alias{extend_family}
\title{Extend a family}
\usage{
extend_family(
  family,
  latent = FALSE,
  latent_y_unqs = NULL,
  latent_ilink = NULL,
  latent_ll_oscale = NULL,
  latent_ppd_oscale = NULL,
  augdat_y_unqs = NULL,
  augdat_link = NULL,
  augdat_ilink = NULL,
  augdat_args_link = list(),
  augdat_args_ilink = list(),
  ...
)
}
\arguments{
\item{family}{An object of class \code{family}.}

\item{latent}{A single logical value indicating whether to use the latent
projection (\code{TRUE}) or not (\code{FALSE}). Note that setting \code{latent = TRUE}
causes all arguments starting with \code{augdat_} to be ignored.}

\item{latent_y_unqs}{Only relevant for a latent projection where the original
response space has finite support (i.e., the original response values may
be regarded as categories), in which case this needs to be the character
vector of unique response values (which will be assigned to \code{family$cats}
internally) or may be left at \code{NULL} (so that \pkg{projpred} will try to
infer it from \code{family$cats}). See also section "Latent projection" below.}

\item{latent_ilink}{Only relevant for the latent projection, in which case
this needs to be the inverse-link function. If the original response family
was the \code{\link[=binomial]{binomial()}} or the \code{\link[=poisson]{poisson()}} family, then \code{latent_ilink} can be
\code{NULL}, in which case an internal default will be used. Can also be \code{NULL}
in all other cases, but then an internal default based on \code{family$linkinv}
will be used which might not work for all families. See also section
"Latent projection" below.}

\item{latent_ll_oscale}{Only relevant for the latent projection, in which
case this needs to be the function computing response-scale (not
latent-scale) log-likelihood values. If \code{!is.null(family$cats)} (after
taking \code{latent_y_unqs} into account) or if the original response family was
the \code{\link[=binomial]{binomial()}} or the \code{\link[=poisson]{poisson()}} family, then \code{latent_ll_oscale} can be
\code{NULL}, in which case an internal default will be used. Can also be \code{NULL}
in all other cases, but then downstream functions will have limited
functionality (a message thrown by \code{\link[=extend_family]{extend_family()}} will state what
exactly won't be available). See also section "Latent projection" below.}

\item{latent_ppd_oscale}{Only relevant for the latent projection, in which
case this needs to be the function sampling response values given latent
predictors that have been transformed to response scale using
\code{latent_ilink}. If \code{!is.null(family$cats)} (after taking \code{latent_y_unqs}
into account) or if the original response family was the \code{\link[=binomial]{binomial()}} or
the \code{\link[=poisson]{poisson()}} family, then \code{latent_ppd_oscale} can be \code{NULL}, in which
case an internal default will be used. Can also be \code{NULL} in all other
cases, but then downstream functions will have limited functionality (a
message thrown by \code{\link[=extend_family]{extend_family()}} will state what exactly won't be
available). See also section "Latent projection" below. Note that although
this function has the abbreviation "PPD" in its name (which stands for
"posterior predictive distribution"), \pkg{projpred} currently only uses it
in \code{\link[=proj_predict]{proj_predict()}}, i.e., for sampling from what would better be termed
posterior-projection predictive distribution (PPPD).}

\item{augdat_y_unqs}{Only relevant for augmented-data projection, in which
case this needs to be the character vector of unique response values (which
will be assigned to \code{family$cats} internally) or may be left at \code{NULL} if
\code{family$cats} is already non-\code{NULL}. See also section "Augmented-data
projection" below.}

\item{augdat_link}{Only relevant for augmented-data projection, in which case
this needs to be the link function. Use \code{NULL} for the traditional
projection. See also section "Augmented-data projection" below.}

\item{augdat_ilink}{Only relevant for augmented-data projection, in which
case this needs to be the inverse-link function. Use \code{NULL} for the
traditional projection. See also section "Augmented-data projection" below.}

\item{augdat_args_link}{Only relevant for augmented-data projection, in which
case this may be a named \code{list} of arguments to pass to the function
supplied to \code{augdat_link}.}

\item{augdat_args_ilink}{Only relevant for augmented-data projection, in
which case this may be a named \code{list} of arguments to pass to the function
supplied to \code{augdat_ilink}.}

\item{...}{Ignored (exists only to swallow up further arguments which might
be passed to this function).}
}
\value{
The \code{family} object extended in the way needed by \pkg{projpred}.
}
\description{
This function adds some internally required elements to an object of class
\code{family} (see, e.g., \code{\link[=family]{family()}}). It is called internally by
\code{\link[=init_refmodel]{init_refmodel()}}, so you will rarely need to call it yourself.
}
\details{
In the following, \eqn{N}, \eqn{C_{\mathrm{cat}}}{C_cat},
\eqn{C_{\mathrm{lat}}}{C_lat}, \eqn{S_{\mathrm{ref}}}{S_ref}, and
\eqn{S_{\mathrm{prj}}}{S_prj} from help topic \link{refmodel-init-get} are used.
Note that \eqn{N} does not necessarily denote the number of original
observations; it can also refer to new observations. Furthermore, let \eqn{S}
denote either \eqn{S_{\mathrm{ref}}}{S_ref} or \eqn{S_{\mathrm{prj}}}{S_prj},
whichever is appropriate in the context where it is used.
}
\section{Augmented-data projection}{
As their first input, the functions supplied to arguments \code{augdat_link} and
\code{augdat_ilink} have to accept:
\itemize{
\item For \code{augdat_link}: an \eqn{S \times N \times C_{\mathrm{cat}}}{S x N x
C_cat} array containing the probabilities for the response categories. The
order of the response categories is the same as in \code{family$cats} (see
argument \code{augdat_y_unqs}).
\item For \code{augdat_ilink}: an \eqn{S \times N \times C_{\mathrm{lat}}}{S x N x
C_lat} array containing the linear predictors.
}

The return value of these functions needs to be:
\itemize{
\item For \code{augdat_link}: an \eqn{S \times N \times C_{\mathrm{lat}}}{S x N x
C_lat} array containing the linear predictors.
\item For \code{augdat_ilink}: an \eqn{S \times N \times C_{\mathrm{cat}}}{S x N x
C_cat} array containing the probabilities for the response categories. The
order of the response categories has to be the same as in \code{family$cats} (see
argument \code{augdat_y_unqs}).
}

For the augmented-data projection, the response vector resulting from
\code{extract_model_data} (see \code{\link[=init_refmodel]{init_refmodel()}}) is coerced to a \code{factor} (using
\code{\link[=as.factor]{as.factor()}}) at multiple places throughout this package. Inside of
\code{\link[=init_refmodel]{init_refmodel()}}, the levels of this \code{factor} have to be identical to
\code{family$cats} (\emph{after} applying \code{\link[=extend_family]{extend_family()}} inside of
\code{\link[=init_refmodel]{init_refmodel()}}). Everywhere else, these levels have to be a subset of
\verb{<refmodel>$family$cats} (where \verb{<refmodel>} is an object resulting from
\code{\link[=init_refmodel]{init_refmodel()}}). See argument \code{augdat_y_unqs} for how to control
\code{family$cats}.

For ordinal \pkg{brms} families, be aware that the submodels (onto which the
reference model is projected) currently have the following restrictions:
\itemize{
\item The discrimination parameter \code{disc} is not supported (i.e., it is a
constant with value 1).
\item The thresholds are \code{"flexible"} (see \code{\link[brms:brmsfamily]{brms::brmsfamily()}}).
\item The thresholds do not vary across the levels of a \code{factor}-like variable
(see argument \code{gr} of \code{\link[brms:addition-terms]{brms::resp_thres()}}).
\item The \code{"probit_approx"} link is replaced by \code{"probit"}.
}

For the \code{\link[brms:brmsfamily]{brms::categorical()}} family, be aware that:
\itemize{
\item For multilevel submodels, the group-level effects are allowed to be
correlated between different response categories.
\item For multilevel submodels, \pkg{mclogit} versions < 0.9.4 may throw the
error \code{'a' (<number> x 1) must be square}. Updating \pkg{mclogit} to a
version >= 0.9.4 should fix this.
}
}

\section{Latent projection}{
The function supplied to argument \code{latent_ilink} needs to have the prototype

\if{html}{\out{<div class="sourceCode r">}}\preformatted{latent_ilink(lpreds, cl_ref, wdraws_ref = rep(1, length(cl_ref)))
}\if{html}{\out{</div>}}

where:
\itemize{
\item \code{lpreds} accepts an \eqn{S \times N}{S x N} matrix containing the linear
predictors.
\item \code{cl_ref} accepts a numeric vector of length \eqn{S_{\mathrm{ref}}}{S_ref},
containing \pkg{projpred}'s internal cluster indices for these draws.
\item \code{wdraws_ref} accepts a numeric vector of length
\eqn{S_{\mathrm{ref}}}{S_ref}, containing weights for these draws. These
weights should be treated as not being normalized (i.e., they don't
necessarily sum to \code{1}).
}

The return value of \code{latent_ilink} needs to contain the linear predictors
transformed to the original response space, with the following structure:
\itemize{
\item If \code{is.null(family$cats)} (after taking \code{latent_y_unqs} into account): an
\eqn{S \times N}{S x N} matrix.
\item If \code{!is.null(family$cats)} (after taking \code{latent_y_unqs} into account): an
\eqn{S \times N \times C_{\mathrm{cat}}}{S x N x C_cat} array. In that case,
\code{latent_ilink} needs to return \emph{probabilities} (for the response categories
given in \code{family$cats}, after taking \code{latent_y_unqs} into account).
}

The function supplied to argument \code{latent_ll_oscale} needs to have the
prototype

\if{html}{\out{<div class="sourceCode r">}}\preformatted{latent_ll_oscale(ilpreds, dis, y_oscale, wobs = rep(1, ncol(ilpreds)),
                 cens, cl_ref, wdraws_ref = rep(1, length(cl_ref)))
}\if{html}{\out{</div>}}

where:
\itemize{
\item \code{ilpreds} accepts the return value from \code{latent_ilink}.
\item \code{dis} accepts a vector of length \eqn{S} containing dispersion parameter
draws.
\item \code{y_oscale} accepts a vector of length \eqn{N} containing response values on
the original response scale.
\item \code{wobs} accepts a numeric vector of length \eqn{N} containing observation
weights.
\item \code{cens} accepts a vector containing censoring indicators for the
observations for which to calculate the response-scale log-likelihood values
(i.e., for the observations from the second dimension of \code{ilpreds}). When
calling \code{latent_ll_oscale}, \pkg{projpred} always specifies argument \code{cens}
(with value \code{NULL} if attribute \code{cens_var} of \code{latent_ll_oscale} does not
exist or is \code{NULL}), so a default value of \code{cens} can be defined, but will
not be used.
\item \code{cl_ref} accepts the same input as argument \code{cl_ref} of \code{latent_ilink}.
\item \code{wdraws_ref} accepts the same input as argument \code{wdraws_ref} of
\code{latent_ilink}.
}

In case of censoring (in the response values, i.e., survival or time-to-event
analysis), the latent projection (with response-scale analyses) can be used
by setting an attribute \code{cens_var} of the \code{latent_ll_oscale} function to a
right-hand side formula with the name of the variable containing the
censoring indicators (e.g., \code{0} = uncensored, \code{1} = censored) on its
right-hand side. This variable named in the \code{cens_var} attribute is then
retrieved (internally, whenever calling the \code{latent_ll_oscale} function) from
the original dataset (possibly subsetted to the observations corresponding to
the second dimension of \code{ilpreds}), \code{newdata}, or element \code{data} from
\code{\link[=varsel]{varsel()}}'s argument \code{d_test}, whichever is applicable. The content of the
retrieved variable is passed to argument \code{cens} of the \code{latent_ll_oscale}
function. Note that only the performance statistics \code{"elpd"}, \code{"mlpd"}, and
\code{"gmpd"} take censoring into account (on response scale).

The return value of \code{latent_ll_oscale} needs to be an \eqn{S \times N}{S x N}
matrix containing the response-scale (not latent-scale) log-likelihood values
for the \eqn{N} observations from its inputs.

The function supplied to argument \code{latent_ppd_oscale} needs to have the
prototype

\if{html}{\out{<div class="sourceCode r">}}\preformatted{latent_ppd_oscale(ilpreds_resamp, dis_resamp,
                  wobs = rep(1, ncol(ilpreds_resamp)), cl_ref,
                  wdraws_ref = rep(1, length(cl_ref)), idxs_prjdraws)
}\if{html}{\out{</div>}}

where:
\itemize{
\item \code{ilpreds_resamp} accepts the return value from \code{latent_ilink}, but possibly
with resampled (clustered) draws (see argument \code{nresample_clusters} of
\code{\link[=proj_predict]{proj_predict()}}).
\item \code{dis_resamp} accepts a vector of length \code{dim(ilpreds_resamp)[1]} containing
dispersion parameter draws, possibly resampled (in the same way as the draws
in \code{ilpreds_resamp}, see also argument \code{idxs_prjdraws}).
\item \code{wobs} accepts a numeric vector of length \eqn{N} containing observation
weights.
\item \code{cl_ref} accepts the same input as argument \code{cl_ref} of \code{latent_ilink}.
\item \code{wdraws_ref} accepts the same input as argument \code{wdraws_ref} of
\code{latent_ilink}.
\item \code{idxs_prjdraws} accepts a numeric vector of length \code{dim(ilpreds_resamp)[1]}
containing the resampled indices of the projected draws (i.e., these indices
are values from the set \eqn{\{1, ..., \texttt{dim(ilpreds)[1]}\}}{{1, ...,
dim(ilpreds)[1]}} where \code{ilpreds} denotes the return value of
\code{latent_ilink}).
}

The return value of \code{latent_ppd_oscale} needs to be a
\eqn{\texttt{dim(ilpreds\_resamp)[1]} \times N}{dim(ilpreds_resamp)[1] x N}
matrix containing the response-scale (not latent-scale) draws from the
posterior(-projection) predictive distributions for the \eqn{N} observations
from its inputs.

If the bodies of these three functions involve parameter draws from the
reference model which have not been projected (e.g., for \code{latent_ilink}, the
thresholds in an ordinal model), \code{\link[=cl_agg]{cl_agg()}} is provided as a helper function
for aggregating these reference model draws in the same way as the draws have
been aggregated for the first argument of these functions (e.g., \code{lpreds} in
case of \code{latent_ilink}).

In fact, the weights passed to argument \code{wdraws_ref} are nonconstant only in
case of \code{\link[=cv_varsel]{cv_varsel()}} with \code{cv_method = "LOO"} and \code{validate_search = TRUE}.
In that case, the weights passed to this argument are the PSIS-LOO CV weights
for one observation. Note that although argument \code{wdraws_ref} has the suffix
\verb{_ref}, \code{wdraws_ref} does not necessarily obtain weights for the \emph{initial}
reference model's posterior draws: In case of \code{\link[=cv_varsel]{cv_varsel()}} with \code{cv_method = "kfold"}, these weights may refer to one of the \eqn{K} reference model
refits (but in that case, they are constant anyway).

If \code{family$cats} is not \code{NULL} (after taking \code{latent_y_unqs} into account),
then the response vector resulting from \code{extract_model_data} (see
\code{\link[=init_refmodel]{init_refmodel()}}) is coerced to a \code{factor} (using \code{\link[=as.factor]{as.factor()}}) at multiple
places throughout this package. Inside of \code{\link[=init_refmodel]{init_refmodel()}}, the levels of
this \code{factor} have to be identical to \code{family$cats} (\emph{after} applying
\code{\link[=extend_family]{extend_family()}} inside of \code{\link[=init_refmodel]{init_refmodel()}}). Everywhere else, these levels
have to be a subset of \verb{<refmodel>$family$cats} (where \verb{<refmodel>} is an
object resulting from \code{\link[=init_refmodel]{init_refmodel()}}).
}

