Outcomes and endpoints

Definition of outcomes, endpoints and group comparisons

Outcomes and endpoints are defined in the Simulation tab, in the dedicated section.

Outcomes

Outcomes represent a post-processing of the simulation outputs done for each individual. They correspond to the measure of interest per individual. Outcomes can be of different types: values (e.g Cmax), binary true/false (e.g true if Cmax < threshold), or time-to-event (e.g time to NADIR).

To create a new outcome, the user can click on the “plus” button next to Outcomes or on the “+ New outcome element” within the menu of an endpoint:

First, each outcome is given an element name, which allows to select the outcome for endpoint definition. Then, the first step is to select an output element to work on. Only the output elements selected in the Simulation section are available. The definition of the outcome includes the following options to post-process the selected output.

For continuous, count and categorical outputs:

  •  Relative to baseline: divides (“ratio”) or subtracts (“difference”) the output values by the baseline (i.e the first value of the output). Because the first value of the transformed output is non-informative (ratio=1 or difference=0), it is removed.
  •  Output processing average per id / max per id / min per id: takes the average, minimum or maximum over all output values (possibly divided or subtracted by the first value) for each individual. In case of minimum or maximum, the user can choose “value of min/max” to take the value of the min or max, or choose “time of min/max” to take the time of the min or max. When “time of min/max” is selected, the outcome is of type ‘time-to-event’.
  • Apply threshold: applies a logical test (usually comparing the outcome value to a threshold) to get a binary true/false outcome. A threshold value and a comparison sign (=, !=, >=, <=, >, <) must be set.

For single time-to-event outputs:

  • Has event / has no event / time of event: “has event” and “has no event” calculates if the individual had or didn’t have an event during the observation period. This results in a binary true/false outcome. “time of event” uses the TTE output as outcome directly. The outcome is then of type ‘time-to-event’.

For repeated time-to-event outputs:

  • Has at least one event / number of events per id / time of event #: “has at least one event” leads to a binary true/false outcome depending if the individual has at least one event during the observation period or not. “number of events per id” count the total number of events for each individual. This is an outcome of type ‘value’. “Time of event # “ generates a time-to-event outcome with a single event per individual. The index of the event to be considered can be typed-in by the user.

When creating a new outcome, the outcome can be added to a new or an existing endpoint via the option “Add to endpoint” at the bottom. Only the endpoints containing outcomes of the same type are shown.

All defined outcomes, whether used or not in an endpoint, appear in the list of outcomes:

Outcomes of the same type can be combined together using AND/OR for binary true/false outcomes  (e.g Cmax < threshold1 and Ctrough > threshold2), or MIN/MAX for double values and time-to-event (e.g max(CmaxParent, CmaxMetabolite) ).

Endpoints

Endpoints summarize the outcome values over all individuals, for each simulation group and each replicate. To create a new endpoint, one or several outcomes of the same type can be selected from the drop-down menu (green highlight below). Endpoints are also created when at the bottom of an outcome definition the option “Add to endpoint: New endpoint” is selected. Once an endpoint is created, its name can be edited (blue highlight below).

The options to summarize the individual outcomes into an endpoint depend on the type of outcomes:

  • value outcomes: the endpoint can be
    • geometric mean. The coefficient of variation is also calculated.
    • arithmetic mean. The standard deviation is also calculated.
    • median. The 5th and 95th percentiles are also calculated.
  • binary true/false outcomes: the endpoint will be the percentage of true. The total number of true is also calculated.
  • time-to-event outcomes: the endpoint will be the median survival (time at which the Kaplan Meier curve is equal to 0.5). The lower bound (5th) and and upper bound (95th) of the confidence interval of the median survival are also calculated.

Group comparison

The endpoints can be compared across simulation groups by activating the toggle “Group comparison” (green highlight). One of the simulation groups must be defined as the reference group (blue highlight) and all other groups will be compared to this reference group. For each endpoint, the group comparison can rely on a direct comparison of the endpoint values, or on a statistical test (purple highlight). The exact comparison or test applied depends on the type of the endpoint.

Endpoint “median” for value outcome

One calculates the difference between the median of the test group and of the reference group: \( \textrm{median}_{test} – \textrm{median}_{ref} \) with \(\textrm{median}_{test}\) and \(\textrm{median}_{ref}\) the median in the test and reference group respectively.

In case of a direct comparison, this difference is compared using operators =, !=, >, >=, <, <=  to a user-defined value (default: 0). When this logical comparison is true, the trial simulation is considered as ‘success’.

Example: When the direct comparison “difference > 25” is true, it means that the median in the test group is larger than in the reference group by at least 25 units (e.g ng/mL if the outcome is a peak concentration).

In case of a statistical test, a Wilcoxon rank sum test (equivalent to the Mann-Whitney test) (“Same individuals among groups” not selected) or a Wilcoxon signed rank test (“Same individuals among groups” selected) is done to compare the median of the test and reference group. “difference \(\neq\) 0” represents the alternative hypothesis H1. By default, a two-sided test is done (sign \(\neq\)) and one checks if the medians significantly differ from each other. Single-sided tests can be defined by choosing “>” or “<“. The minimal or maximal (depending of the direction of the test) difference can also be specified (default: 0). The statistical test results into a p-value, which is compared to a user-defined threshold (default: 0.05). If the p-value is below the threshold, the trial simulation is considered as ‘success’.

Example: When the statistical test testing the alternative H1 hypothesis “difference > 25” results into a small p-value, it means that the medians of the test and reference groups differ by more than 25 units significantly (e.g ng/mL if the outcome is a peak concentration) with a larger value for test.

Endpoint “arithmetic mean” for value outcome

One calculates the difference between the mean of the test group and of the reference group: \( \textrm{mean}_{test} – \textrm{mean}_{ref} \) with \(\textrm{mean}_{test}\) and \(\textrm{mean}_{ref}\) the mean in the test and reference group respectively.

In case of a direct comparison, this difference is compared using operators =, !=, >, >=, <, <=  to a user-defined value (default: 0). When this logical comparison is true, the trial simulation is considered as ‘success’.

Example: When the direct comparison “difference > 14 ” is true, it means that the mean in the test group is larger than in the reference group by at least 14 units (e.g ng/mL if the outcome is a peak concentration).

In case of a statistical test, an unpaired t-test (“Same individuals among groups” not selected) or a paired t-test (“Same individuals among groups” selected) is done to compare the test and reference group. “difference \(\neq\) 0” represents the alternative hypothesis H1. By default, a two-sided test is done (sign \(\neq\)) and one checks if the two means significantly differ from each other. Single-sided tests can be defined by choosing “>” or “<“. The minimal or maximal (depending of the direction of the test) difference can also be specified (default: 0). The statistical test results into a p-value, which is compared to a user-defined threshold (default: 0.05). If the p-value is below the threshold, the trial simulation is considered as ‘success’.

Example: When the statistical test testing the alternative H1 hypothesis “difference > 14” results into a small p-value, it means that the means of the test and reference groups differ significantly by more than 14 units (e.g ng/mL if the outcome is a peak concentration) with a larger value for test.

Endpoint “geometric mean” for value outcome

One calculates the ratio of the test geometric mean divided by the reference geometric mean: \( \textrm{geoMean}_{test} / \textrm{geoMean}_{ref} \) with \(\textrm{geoMean}_{test}\) and \(\textrm{geoMean}_{ref}\) the geometric mean in the test and reference group respectively.

In case of a direct comparison, this ratio is compared using operators =, !=, >, >=, <, <=  to a user-defined value (default: 1). When this logical comparison is true, the trial simulation is considered as ‘success’.

Example: When the direct comparison “ratio > 2” is true, it means that the geometric mean in the test group is at least twice larger than in the reference group.

In case of a statistical test, an unpaired t-test (“Same individuals among groups” not selected) or a paired t-test (“Same individuals among groups” selected) is done on the log-transformed values (which are assumed to follow a normal distribution) to compare the means of the test and reference group. “ratio \(\neq\) 1” represents the alternative hypothesis H1. By default, a two-sided test is done (sign \(\neq\)) and one checks if the geometric means significantly differ from each other. Single-sided tests can be defined by choosing “>” or “<“. The minimal or maximal (depending of the direction of the test) ratio can also be specified (default: 1). The statistical test results into a p-value, which is compared to a user-defined threshold (default: 0.05). If the p-value is below the threshold, the trial simulation is considered as ‘success’.

Example: When the statistical test testing the alternative H1 hypothesis “ratio > 2” results into a small p-value, it means that the geometric mean of the test group is significantly larger than twice the geometric mean of the reference group.

Endpoint “percent true” for binary true/false outcome

One calculates the odds ratio between the test group and the reference group. The odds ratio definition is different depending if we are in a paired samples case or not.

“Same individuals among groups” not selected (unpaired samples)

The odd ratio is \( \frac{pTest}{1-pTest} / \frac{pRef}{1-pRef} \) with \(pTest\) and \(pRef\) the fraction of true outcomes in the test and reference group respectively. \(1-pTest\) represents the fraction of false outcomes.

The odd ratio can be defined equivalently using the following contingency table.

“Same individuals among groups” selected (paired samples)

In case of identical individuals among groups, the individuals which have the same outcome value in both groups (so Ref=True and Test=True, or Ref=False and Test=False) are not counted. The odd ratio is defined as\( \frac{nTest_TRef_F}{nTest_FRef_T}\), with \(nTest_TRef_F\) the number of individuals with true in the test group and and false in the reference group. It corresponds to the contingency table below. The odds ratio can frequently be zero or infinity, in particular in the absence of measurement noise, when an individual true in the reference group is also necessarily true in the test group (corresponding for instance to a higher dose).

 

In case of a direct comparison, this odds ratio is compared using operators =, !=, >, >=, <, <=  to a user-defined value (default: 1). When this logical comparison is true, the trial simulation is considered as ‘success’.

Example: When the direct comparison “odds ratio > 2”  is true, it means that the odds in the test group are at least twice larger than in the reference group.

In case of a statistical test, a Fisher’s exact test (“Same individuals among groups” not selected) or a McNemar’s exact test (“Same individuals among groups” selected) is done to compare the results of the test and reference group via the construction of a 2×2 contingency table (which contain more information than the endpoinds \(pTest\) and \(pRef\)). “odds ratio \(\neq\) 1” represents the alternative hypothesis H1. By default, a two-sided test is done (sign \(\neq\)) and one checks if the odds ratio significantly differs from 1. Single-sided tests can be defined by choosing “>” or “<“. The minimal or maximal (depending of the direction of the test) odds ratio can also be specified (default: 1). The statistical test results into a p-value, which is compared to a user-defined threshold (default: 0.05). If the p-value is below the threshold, the trial simulation is considered as ‘success’.

Example: When the statistical test testing the alternative H1 hypothesis “odds ratio > 2” results into a small p-value, it means that the odds of the test group are significantly larger than twice the odds of the reference group.

Endpoint “median survival” for time-to-event outcome

One calculates the difference between the median survival of the test group and of the reference group: \( \textrm{medSurv}_{test} – \textrm{medSurv}_{ref} \) with \(\textrm{medSurv}_{test}\) and \(\textrm{medSurv}_{ref}\) the median survival (time at which the Kaplan-Meier estimates equals 0.5) in the test and reference group respectively.

In case of a direct comparison, this difference is compared using operators =, !=, >, >=, <, <=  to a user-defined value (default: 1). When this logical comparison is true, the trial simulation is considered as ‘success’.

Example: Direct comparison “difference > 60” corresponds to median survival in the test group larger than in the reference group by 60 time units (e.g days).

In case of a statistical test, a logrank test is done to compare the survival Kaplan-Meier curves. When “Same individuals among groups” is selected, a variance correction is applied (see Jung 1999). “difference \(\neq\) 0” represents the alternative hypothesis H1. By default, a two-sided test is done (sign \(\neq\)) and one checks if the survival curves significantly differ. Single-sided tests can be defined by choosing “>” or “<“. For the log rank test, it is not possible to define a “minimal difference”. The statistical test results into a p-value, which is compared to a user-defined threshold (default: 0.05). If the p-value is below the threshold, the trial simulation is considered as ‘success’.

Example: When the statistical test testing the alternative H1 hypothesis “difference \(\neq\) 0” results into a small p-value, it means that the survival curves from the two groups differ significantly.

Calculating the outcomes and endpoints

The outcomes, endpoints and group comparisons are calculated when clicking on the task “Outcomes&Endpoints” or when running the scenario with the task “Outcomes&Endpoints” selected. As the outcomes are a post-processing of the simulation outputs, the “Simulation” task must run first.

The calculated values are displayed in the Results tab and Plots tab.

modal close image