--- jupyter: python3 --- * Le document qui suit n'est qu'un programme fait il y a 5 ans, et non repris depuis (mais ça marche. Utilisant très peu Python, je n'ai pas documenté les fonctions, qui ont été utilisées. * J'ai essayé de réglé tant bien que mal un bug d'affichage à partir de la proportionnalité des risques, qui conduisait à un affichage en pleine page. Cette section n'étant pas très développée, je n'ai pas trop insisté (l'accès à la table des matières n'est plus disponible sur la moitié du document). # Python Deux paquets d'analyse: principalement `lifelines` (km, cox, aft...) et `statsmodels``` (estimation logit en temps discret, kaplan-Meier, Cox). Le package `statsmodels` est également en mesure d'estimer des courbes de séjour de type Kaplan-Meier et des modèles à risque proportionnel de Cox. Le package `lifelines` couvre la quasi totalité des méthodes standards, à l'exception des risques concurrents. Le calcul des Rmst a été ajouté au package `lifelines` récemment. J'ai ajouté cette mise à jour. ```{python} # Penser à installer les packages: pip install nom_package import numpy as np import pandas as pd import patsy as pt import lifelines as lf import matplotlib.pyplot as plt import statsmodels as sm ``` Chargement de la base ```{python} trans = pd.read_csv("https://raw.githubusercontent.com/mthevenin/analyse_duree/master/bases/transplantation.csv") trans.head(10) trans.info() ``` ## Package lifelines **Documentation**: <https://lifelines.readthedocs.io/en/latest/> ### Non Paramétrique: Kaplan Meier #### Calcul des estimateurs **Estimateur KM et durée médiane** ```{python} T = trans['stime'] E = trans['died'] from lifelines import KaplanMeierFitter kmf = KaplanMeierFitter() kmf.fit(T,E) print(kmf.survival_function_) a = "DUREE MEDIANE:" b = kmf.median_survival_time_ print(a,b) ``` ```{python} kmf.plot() ``` **Comparaison des fonctions de survie** ```{python} ax = plt.subplot(111) kmf = KaplanMeierFitter() for name, grouped_df in trans.groupby('surgery'): kmf.fit(grouped_df['stime'], grouped_df['died'], label=name) kmf.plot(ax=ax) ``` #### Tests du logrank ```{python} from lifelines.statistics import multivariate_logrank_test results = multivariate_logrank_test(trans['stime'], trans['surgery'], trans['died']) results.print_summary() ``` #### Calcul des Rmst {#rmst} ::: callout-note * Update 2024. * Programmation très lourde. * Pas de test de comparaison des RMST (différence ou ratio). ::: * Chargement de la fonction ```{python} from lifelines.utils import restricted_mean_survival_time ``` * Définition de la valeur du groupe exposé (ici surgery égal à 1) ```{python} ix = trans['surgery'] == 1 ``` * Définition de la durée maximale. Ici 1407 jours (idem R par défaut) ```{python} tmax = 1407 ``` * Calcul des Rmst ```{python} kmf_1 = KaplanMeierFitter().fit(T[ix], E[ix], label='Opérés') rmst_1 = restricted_mean_survival_time(kmf_1, t=tmax) kmf_0 = KaplanMeierFitter().fit(T[~ix], E[~ix], label='Non opérés') rmst_0 = restricted_mean_survival_time(kmf_0, t=tmax) ``` Rmst pour surgery = 0 ```{python} rmst_0 ``` Rmst pour surgery = 1 ```{python} rmst_1 ``` * Courbes des Rmst pour tmax = 1407 ```{python} from matplotlib import pyplot as plt from lifelines.plotting import rmst_plot ax = plt.subplot(311) rmst_plot(kmf_1, t=tmax, ax=ax) ax = plt.subplot(312) rmst_plot(kmf_0, t=tmax, ax=ax) ax = plt.subplot(313) rmst_plot(kmf_1, model2=kmf_0, t=tmax, ax=ax) ``` ### Semi paramétrique: Cox #### Estimation ```{python} model = 'year + age + C(surgery) -1' X = pt.dmatrix(model, trans, return_type='dataframe') design_info = X.design_info YX = X.join(trans[['stime','died']]) YX.drop(['C(surgery)[0]'], axis=1, inplace=True) YX.head() from lifelines import CoxPHFitter cph = CoxPHFitter() cph.fit(YX, duration_col='stime', event_col='died') cph.print_summary() cph.plot() ``` <div style="margin-left:350px; margin-right: 300px"> #### Tests hypothèse PH **Test PH: Schoenfeld Méthode 1** ```{python} cph.check_assumptions(YX,p_value_threshold=0.05) ``` **Test PH: Schoenfeld Méthode 2** ```{python} from lifelines.statistics import proportional_hazard_test zph = proportional_hazard_test(cph, YX, time_transform='all') zph.print_summary() ``` **Test PH: intéraction** ```{python} from lifelines.utils import to_episodic_format from lifelines import CoxTimeVaryingFitter ``` *Transformation de la base YX* ```{python} long = to_episodic_format(YX, duration_col='stime', event_col='died') ``` *Création de la variable d'intéraction* ```{python} long['surgery_t'] = long['C(surgery)[1]'] * long['stop'] ``` *Estimation* ```{python} ctv = CoxTimeVaryingFitter() ctv.fit(long, id_col='id', event_col='died', start_col='start', stop_col='stop',) ctv.print_summary(4) ``` <div style="margin-left:350px; margin-right: 300px"> ### Modèle à temps discret #### Ajustement continu Modèle logistique estimé avec le paquet `statsmodel`. La fonction `to_episodic_format` de `lifelines` permet de mettre en forme la base. Pour la durée, on utilisera ici la variable **mois** (regroupement de stime par intervalle de 30 jours). ```{python} import statsmodels.formula.api as smf #type R formule => ce qu'on utilisera# import statsmodels.api as sm #type python# ``` **Transformation de la base en format long** ```{python} td = pd.read_csv("https://raw.githubusercontent.com/mthevenin/analyse_duree/master/bases/transplantation.csv") td.drop(['id'], axis=1, inplace=True) td['dur'] = td['mois'] td = to_episodic_format(td, duration_col='mois', event_col='died') ``` **Evaluation de l'ajustement avec des fonctions quadratiques** ```{python} td['t2'] = td['stop']**2 td['t3'] = td['stop']**3 fit1 = smf.glm(formula= "died ~ stop", data=td, family=sm.families.Binomial()).fit() fit2 = smf.glm(formula= "died ~ stop + t2", data=td, family=sm.families.Binomial()).fit() fit3 = smf.glm(formula= "died ~ stop + t2 + t3", data=td, family=sm.families.Binomial()).fit() ``` **Comparaison des AIC** ```{python} print("AIC pour ajustement t1") print(fit1.aic) print("AIC pour ajustement durée t1 + t2") print(fit2.aic) print("AIC pour ajustement durée t1 + t2 + t3") print(fit3.aic) ``` **Estimation du modèle** ```{python} tdfit = smf.glm(formula= "died ~ stop + t2 + t3 + year + age + surgery", data=td, family=sm.families.Binomial()).fit() tdfit.summary() ``` #### Ajustement discret *Création des intervalles pour l'exemple (quantile de la durée en mois)* ```{python} td['ct4'] = pd.qcut(td['stop'],[0, .25, .5, .75, 1.]) td['ct4'].value_counts(normalize=True)*100 td.ct4 = pd.Categorical(td.ct4) td['ct4'] = td.ct4.cat.codes ``` Pour chaque individu, on conserve une seule observation par intervalle. ```{python} td2 = td td2['t'] = td2['ct4'] td2 = td2.sort_values(['id', 'stop']) td2 = td2.groupby(['id','ct4']).last() ``` **Estimation** ```{python} td2fit = smf.glm(formula= "died ~ C(t) + year + age + surgery", data=td2, family=sm.families.Binomial()).fit() td2fit.summary() ``` <div style="margin-left:350px; margin-right: 300px"> ### Modèle paramétrique de type AFT ```{python} from lifelines import WeibullAFTFitter, LogLogisticAFTFitter ``` **Weibull** ```{python} aftw = WeibullAFTFitter() aftw.fit(YX, duration_col='stime', event_col='died') aftw.print_summary() ``` <div style="margin-left:350px; margin-right: 300px"> **Loglogistique** ```{python} aftl = LogLogisticAFTFitter() aftl.fit(YX, duration_col='stime', event_col='died') aftl.print_summary() ``` <div style="margin-left:350px; margin-right: 300px"> ## Package statsmodels <https://www.statsmodels.org/dev/duration.html> Le package permet d'estimer des fonctions de séjour de type Kaplan-Meier et des modèles de Cox. ### Kaplan-Meier ```{python} km = sm.SurvfuncRight(trans["stime"], trans["died"]) km.summary() ``` Les test du log-rank sont disponibles avec la fonction `survdiff` (nom idem R). Au niveau graphique, la programmation semble un peu lourde et mériterait d'être simplifiée (donc non traitée). **Comparaison de S(t) à partir des tests du log-rank** <br> Résultat: (statistique de test, p-value) *Test non pondéré* ```{python} sm.duration.survdiff(trans.stime, trans.died, trans.surgery) ``` *Breslow* ```{python} sm.duration.survdiff(trans.stime, trans.died, trans.surgery, weight_type='gb') ``` *Tarone-Ware* ```{python} sm.duration.survdiff(trans.stime, trans.died, trans.surgery, weight_type='tw') ``` ### Modèle de Cox ```{python} mod = smf.phreg("stime ~ year + age + surgery ",trans, status='died', ties="efron") rslt = mod.fit() print(rslt.summary()) ``` </div>

18 Python

Le document qui suit n’est qu’un programme fait il y a 5 ans, et non repris depuis (mais ça marche. Utilisant très peu Python, je n’ai pas documenté les fonctions, qui ont été utilisées.
J’ai essayé de réglé tant bien que mal un bug d’affichage à partir de la proportionnalité des risques, qui conduisait à un affichage en pleine page. Cette section n’étant pas très développée, je n’ai pas trop insisté (l’accès à la table des matières n’est plus disponible sur la moitié du document).

Deux paquets d’analyse: principalement lifelines (km, cox, aft…) et `statsmodels``` (estimation logit en temps discret, kaplan-Meier, Cox).

Le package statsmodels est également en mesure d’estimer des courbes de séjour de type Kaplan-Meier et des modèles à risque proportionnel de Cox. Le package lifelines couvre la quasi totalité des méthodes standards, à l’exception des risques concurrents.

Le calcul des Rmst a été ajouté au package lifelines récemment. J’ai ajouté cette mise à jour.

# Penser à installer les packages: pip install nom_package

import numpy  as np
import pandas as pd
import patsy  as pt
import lifelines as lf
import matplotlib.pyplot as plt
import statsmodels as sm

Chargement de la base

trans = pd.read_csv("https://raw.githubusercontent.com/mthevenin/analyse_duree/master/bases/transplantation.csv")

trans.head(10)
trans.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 103 entries, 0 to 102
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   id          103 non-null    int64
 1   year        103 non-null    int64
 2   age         103 non-null    int64
 3   died        103 non-null    int64
 4   stime       103 non-null    int64
 5   surgery     103 non-null    int64
 6   transplant  103 non-null    int64
 7   wait        103 non-null    int64
 8   mois        103 non-null    int64
 9   compet      103 non-null    int64
dtypes: int64(10)
memory usage: 8.2 KB

18.1 Package lifelines

Documentation: https://lifelines.readthedocs.io/en/latest/

18.1.1 Non Paramétrique: Kaplan Meier

18.1.1.1 Calcul des estimateurs

Estimateur KM et durée médiane

T = trans['stime']
E = trans['died']


from lifelines import KaplanMeierFitter
kmf = KaplanMeierFitter()
kmf.fit(T,E)
print(kmf.survival_function_)
a = "DUREE MEDIANE:"
b = kmf.median_survival_time_
print(a,b)

          KM_estimate
timeline             
0.0          1.000000
1.0          0.990291
2.0          0.961165
3.0          0.932039
5.0          0.912621
...               ...
1400.0       0.151912
1407.0       0.151912
1571.0       0.151912
1586.0       0.151912
1799.0       0.151912

[89 rows x 1 columns]
DUREE MEDIANE: 100.0

kmf.plot()

Comparaison des fonctions de survie

ax = plt.subplot(111)
kmf = KaplanMeierFitter()
for name, grouped_df in trans.groupby('surgery'):
    kmf.fit(grouped_df['stime'], grouped_df['died'], label=name)
    kmf.plot(ax=ax)

18.1.1.2 Tests du logrank

from lifelines.statistics import multivariate_logrank_test
results = multivariate_logrank_test(trans['stime'], trans['surgery'], trans['died'])
results.print_summary()

t_0	-1
null_distribution	chi squared
degrees_of_freedom	1
test_name	multivariate_logrank_test

	test_statistic	p	-log2(p)
0	6.59	0.01	6.61

18.1.1.3 Calcul des Rmst

Note

Update 2024.
Programmation très lourde.
Pas de test de comparaison des RMST (différence ou ratio).

Chargement de la fonction

from lifelines.utils import restricted_mean_survival_time

Définition de la valeur du groupe exposé (ici surgery égal à 1)

ix = trans['surgery'] == 1

Définition de la durée maximale. Ici 1407 jours (idem R par défaut)

tmax = 1407

Calcul des Rmst

kmf_1 = KaplanMeierFitter().fit(T[ix], E[ix], label='Opérés')
rmst_1 = restricted_mean_survival_time(kmf_1, t=tmax)

kmf_0 = KaplanMeierFitter().fit(T[~ix], E[~ix], label='Non opérés')
rmst_0 = restricted_mean_survival_time(kmf_0, t=tmax)

Rmst pour surgery = 0

rmst_0

379.14763007035475

Rmst pour surgery = 1

rmst_1

884.5757575757575

Courbes des Rmst pour tmax = 1407

from matplotlib import pyplot as plt
from lifelines.plotting import rmst_plot

ax = plt.subplot(311)
rmst_plot(kmf_1, t=tmax, ax=ax)

ax = plt.subplot(312)
rmst_plot(kmf_0, t=tmax, ax=ax)

ax = plt.subplot(313)
rmst_plot(kmf_1, model2=kmf_0, t=tmax, ax=ax)

18.1.2 Semi paramétrique: Cox

18.1.2.1 Estimation

model = 'year + age + C(surgery) -1'
X = pt.dmatrix(model, trans, return_type='dataframe')
design_info = X.design_info
YX = X.join(trans[['stime','died']])
YX.drop(['C(surgery)[0]'], axis=1, inplace=True)
YX.head()


from lifelines import CoxPHFitter
cph = CoxPHFitter()
cph.fit(YX, duration_col='stime', event_col='died')
cph.print_summary()
cph.plot()

model	lifelines.CoxPHFitter
duration col	'stime'
event col	'died'
baseline estimation	breslow
number of observations	103
number of events observed	75
partial log-likelihood	-289.31
time fit was run	2024-09-24 06:52:44 UTC

	coef	exp(coef)	se(coef)	coef lower 95%	coef upper 95%	exp(coef) lower 95%	exp(coef) upper 95%	z	p	-log2(p)
C(surgery)[1]	-0.99	0.37	0.44	-1.84	-0.13	0.16	0.88	-2.26	0.02	5.40
year	-0.12	0.89	0.07	-0.25	0.01	0.78	1.01	-1.78	0.08	3.72
age	0.03	1.03	0.01	0.00	0.06	1.00	1.06	2.19	0.03	5.12

Concordance	0.65
Partial AIC	584.61
log-likelihood ratio test	17.63 on 3 df
-log2(p) of ll-ratio test	10.90

The ``p_value_threshold`` is set at 0.05. Even under the null hypothesis of no violations, some covariates will be below the threshold by chance. This is compounded when there are many covariates. Similarly, when there are lots of observations, even minor deviances from the proportional hazard assumption will be flagged. With that in mind, it's best to use a combination of statistical tests and visual tests to determine the most serious violations. Produce visual plots using ``check_assumptions(..., show_plots=True)`` and looking for non-constant lines. See link [A] below for a full example. 1. Variable 'C(surgery)[1]' failed the non-proportional test: p-value is 0.0452. Advice: with so few unique values (only 2), you can include `strata=['C(surgery)[1]', ...]` in the call in `.fit`. See documentation in link [E] below. --- [A] https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html [B] https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html#Bin-variable-and-stratify-on-it [C] https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html#Introduce-time-varying-covariates [D] https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html#Modify-the-functional-form [E] https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html#Stratification

null_distribution	chi squared
degrees_of_freedom	1
model	<lifelines.CoxPHFitter: fitted with 103 total ...
test_name	proportional_hazard_test

		test_statistic	p	-log2(p)
C(surgery)[1]	km	4.01	0.05	4.47
rank	3.74	0.05	4.23
age	km	1.18	0.28	1.86
rank	1.06	0.30	1.72
year	km	2.07	0.15	2.73
rank	2.08	0.15	2.75

C(surgery)[1]

4.01

0.05

4.47

rank

3.74

0.05

4.23

age

1.18

0.28

1.86

rank

1.06

0.30

1.72

year

2.07

0.15

2.73

rank

2.08

0.15

2.75

null_distribution	chi squared
degrees_of_freedom	1
model	<lifelines.CoxPHFitter: fitted with 103 total ...
test_name	proportional_hazard_test

		test_statistic	p	-log2(p)
C(surgery)[1]	identity	5.54	0.02	5.75
km	4.01	0.05	4.47
log	3.69	0.05	4.19
rank	3.74	0.05	4.23
age	identity	1.61	0.20	2.29
km	1.18	0.28	1.86
log	0.61	0.44	1.20
rank	1.06	0.30	1.72
year	identity	0.80	0.37	1.43
km	2.07	0.15	2.73
log	1.34	0.25	2.02
rank	2.08	0.15	2.75

C(surgery)[1]

identity

5.54

0.02

5.75

4.01

0.05

4.47

log

3.69

0.05

4.19

rank

3.74

0.05

4.23

age

identity

1.61

0.20

2.29

1.18

0.28

1.86

log

0.61

0.44

1.20

rank

1.06

0.30

1.72

year

identity

0.80

0.37

1.43

2.07

0.15

2.73

log

1.34

0.25

2.02

rank

2.08

0.15

2.75

model	lifelines.CoxTimeVaryingFitter
event col	'died'
number of subjects	103
number of periods	31938
number of events	75
partial log-likelihood	-287.3290
time fit was run	2024-09-24 06:52:44 UTC

	coef	exp(coef)	se(coef)	coef lower 95%	coef upper 95%	exp(coef) lower 95%	exp(coef) upper 95%	z	p	-log2(p)
C(surgery)[1]	-1.7547	0.1730	0.6744	-3.0765	-0.4330	0.0461	0.6486	-2.6020	0.0093	6.7533
age	0.0289	1.0293	0.0134	0.0025	0.0552	1.0025	1.0568	2.1479	0.0317	4.9785
year	-0.1231	0.8842	0.0668	-0.2541	0.0079	0.7756	1.0080	-1.8415	0.0656	3.9312
surgery_t	0.0022	1.0022	0.0011	0.0001	0.0044	1.0001	1.0044	2.0239	0.0430	4.5400

C(surgery)[1]

-1.7547

0.1730

0.6744

-3.0765

-0.4330

0.0461

0.6486

0.0000

-2.6020

0.0093

6.7533

age

0.0289

1.0293

0.0134

0.0025

0.0552

1.0025

1.0568

0.0000

2.1479

0.0317

4.9785

year

-0.1231

0.8842

0.0668

-0.2541

0.0079

0.7756

1.0080

0.0000

-1.8415

0.0656

3.9312

surgery_t

0.0022

1.0022

0.0011

0.0001

0.0044

1.0001

1.0044

0.0000

2.0239

0.0430

4.5400

Partial AIC	582.6581
log-likelihood ratio test	21.5846 on 4 df
-log2(p) of ll-ratio test	12.0103

td = pd.read_csv("https://raw.githubusercontent.com/mthevenin/analyse_duree/master/bases/transplantation.csv") td.drop(['id'], axis=1, inplace=True) td['dur'] = td['mois'] td = to_episodic_format(td, duration_col='mois', event_col='died')

td['t2'] = td['stop']**2 td['t3'] = td['stop']**3 fit1 = smf.glm(formula= "died ~ stop", data=td, family=sm.families.Binomial()).fit() fit2 = smf.glm(formula= "died ~ stop + t2", data=td, family=sm.families.Binomial()).fit() fit3 = smf.glm(formula= "died ~ stop + t2 + t3", data=td, family=sm.families.Binomial()).fit()

Generalized Linear Model Regression Results
Dep. Variable:	died	No. Observations:	1127
Model:	GLM	Df Residuals:	1120
Model Family:	Binomial	Df Model:	6
Link Function:	Logit	Scale:	1.0000
Method:	IRLS	Log-Likelihood:	-230.34
Date:	Tue, 24 Sep 2024	Deviance:	460.67
Time:	08:52:48	Pearson chi2:	1.30e+03
No. Iterations:	7	Pseudo R-squ. (CS):	0.07732
Covariance Type:	nonrobust

Generalized Linear Model Regression Results

Dep. Variable:

died

No. Observations:

1127

Model:

GLM

Df Residuals:

1120

Model Family:

Binomial

Df Model:

Link Function:

Logit

Scale:

1.0000

Method:

IRLS

Log-Likelihood:

-230.34

Date:

Tue, 24 Sep 2024

Deviance:

460.67

Time:

08:52:48

Pearson chi2:

1.30e+03

No. Iterations:

Pseudo R-squ. (CS):

0.07732

Covariance Type:

nonrobust

Generalized Linear Model Regression Results
Dep. Variable:	died	No. Observations:	197
Model:	GLM	Df Residuals:	190
Model Family:	Binomial	Df Model:	6
Link Function:	Logit	Scale:	1.0000
Method:	IRLS	Log-Likelihood:	-111.24
Date:	Tue, 24 Sep 2024	Deviance:	222.48
Time:	08:52:48	Pearson chi2:	221.
No. Iterations:	4	Pseudo R-squ. (CS):	0.1808
Covariance Type:	nonrobust

Generalized Linear Model Regression Results

Dep. Variable:

died

No. Observations:

197

Model:

GLM

Df Residuals:

190

Model Family:

Binomial

Df Model:

Link Function:

Logit

Scale:

1.0000

Method:

IRLS

Log-Likelihood:

-111.24

Date:

Tue, 24 Sep 2024

Deviance:

222.48

Time:

08:52:48

Pearson chi2:

221.

No. Iterations:

Pseudo R-squ. (CS):

0.1808

Covariance Type:

nonrobust

model	lifelines.WeibullAFTFitter
duration col	'stime'
event col	'died'
number of observations	103
number of events observed	75
log-likelihood	-488.17
time fit was run	2024-09-24 06:52:48 UTC

		coef	exp(coef)	se(coef)	coef lower 95%	coef upper 95%	exp(coef) lower 95%	exp(coef) upper 95%	cmp to	z	p	-log2(p)
lambda_	C(surgery)[1]	1.97	7.17	0.78	0.44	3.50	1.56	33.05	0.00	2.53	0.01	6.45
age	-0.06	0.94	0.02	-0.11	-0.01	0.90	0.99	0.00	-2.49	0.01	6.28
year	0.16	1.18	0.12	-0.08	0.40	0.93	1.49	0.00	1.33	0.18	2.44
Intercept	-3.02	0.05	8.73	-20.13	14.09	0.00	1.31e+06	0.00	-0.35	0.73	0.46
rho_	Intercept	-0.59	0.56	0.09	-0.77	-0.41	0.46	0.67	0.00	-6.33	<0.005	31.93

lambda_

C(surgery)[1]

1.97

7.17

0.78

0.44

3.50

1.56

33.05

0.00

2.53

0.01

6.45

age

-0.06

0.94

0.02

-0.11

-0.01

0.90

0.99

0.00

-2.49

0.01

6.28

year

0.16

1.18

0.12

-0.08

0.40

0.93

1.49

0.00

1.33

0.18

2.44

Intercept

-3.02

0.05

8.73

-20.13

14.09

0.00

1.31e+06

0.00

-0.35

0.73

0.46

rho_

Intercept

-0.59

0.56

0.09

-0.77

-0.41

0.46

0.67

0.00

-6.33

<0.005

31.93

model	lifelines.LogLogisticAFTFitter
duration col	'stime'
event col	'died'
number of observations	103
number of events observed	75
log-likelihood	-482.58
time fit was run	2024-09-24 06:52:48 UTC

		coef	exp(coef)	se(coef)	coef lower 95%	coef upper 95%	exp(coef) lower 95%	exp(coef) upper 95%	cmp to	z	p	-log2(p)
alpha_	C(surgery)[1]	2.27	9.70	0.69	0.92	3.63	2.50	37.56	0.00	3.29	<0.005	9.95
age	-0.04	0.96	0.02	-0.08	-0.00	0.92	1.00	0.00	-2.01	0.04	4.48
year	0.24	1.27	0.12	0.01	0.47	1.01	1.60	0.00	2.06	0.04	4.66
Intercept	-10.43	0.00	8.34	-26.77	5.92	0.00	372.19	0.00	-1.25	0.21	2.24
beta_	Intercept	-0.18	0.84	0.10	-0.37	0.01	0.69	1.01	0.00	-1.86	0.06	3.99

alpha_

C(surgery)[1]

2.27

9.70

0.69

0.92

3.63

2.50

37.56

0.00

3.29

<0.005

9.95

age

-0.04

0.96

0.02

-0.08

-0.00

0.92

1.00

0.00

-2.01

0.04

4.48

year

0.24

1.27

0.12

0.01

0.47

1.01

1.60

0.00

2.06

0.04

4.66

Intercept

-10.43

0.00

8.34

-26.77

5.92

0.00

372.19

0.00

-1.25

0.21

2.24

beta_

Intercept

-0.18

0.84

0.10

-0.37

0.01

0.69

1.01

0.00

-1.86

0.06

3.99

Concordance	0.66
AIC	975.16
log-likelihood ratio test	21.69 on 3 df
-log2(p) of ll-ratio test	13.69

18.2 Package statsmodels

Le package permet d’estimer des fonctions de séjour de type Kaplan-Meier et des modèles de Cox.

18.2.1 Kaplan-Meier

km = sm.SurvfuncRight(trans["stime"], trans["died"])
km.summary()

	Surv prob	Surv prob SE	num at risk	num events
Time
1	0.990291	0.009661	103	1.0
2	0.961165	0.019037	102	3.0
3	0.932039	0.024799	99	3.0
5	0.912621	0.027825	96	2.0
6	0.893204	0.030432	94	2.0
...	...	...	...	...
852	0.250655	0.048731	14	1.0
979	0.227868	0.049341	11	1.0
995	0.205081	0.049390	10	1.0
1032	0.182295	0.048877	9	1.0
1386	0.151912	0.049277	6	1.0

62 rows × 4 columns

Les test du log-rank sont disponibles avec la fonction survdiff (nom idem R). Au niveau graphique, la programmation semble un peu lourde et mériterait d’être simplifiée (donc non traitée).

Comparaison de S(t) à partir des tests du log-rank

Résultat: (statistique de test, p-value)

Test non pondéré

sm.duration.survdiff(trans.stime, trans.died, trans.surgery)

(6.5900123232343875, 0.010255246157888975)

Breslow

sm.duration.survdiff(trans.stime, trans.died, trans.surgery, weight_type='gb')

(8.989753779902493, 0.0027149757927903417)

Tarone-Ware

sm.duration.survdiff(trans.stime, trans.died, trans.surgery, weight_type='tw')

(8.462352726451392, 0.0036257256194570653)

18.2.2 Modèle de Cox

mod = smf.phreg("stime ~  year + age + surgery ",trans, status='died', ties="efron")
rslt = mod.fit()
print(rslt.summary())

                       Results: PHReg
=============================================================
Model:                    PH Reg       Sample size:       103
Dependent variable:       stime        Num. events:       75 
Ties:                     Efron                              
-------------------------------------------------------------
         log HR log HR SE   HR      t    P>|t|  [0.025 0.975]
-------------------------------------------------------------
year    -0.1196    0.0673 0.8872 -1.7765 0.0757 0.7775 1.0124
age      0.0296    0.0135 1.0300  2.1872 0.0287 1.0031 1.0577
surgery -0.9873    0.4363 0.3726 -2.2632 0.0236 0.1584 0.8761
=============================================================
Confidence intervals are for the hazard ratios

	coef	std err	z	P>\|z\|	[0.025	0.975]
Intercept	7.0827	5.308	1.334	0.182	-3.320	17.486
stop	-0.3721	0.082	-4.516	0.000	-0.534	-0.211
t2	0.0142	0.005	2.835	0.005	0.004	0.024
t3	-0.0002	7.85e-05	-2.113	0.035	-0.000	-1.2e-05
year	-0.1327	0.074	-1.798	0.072	-0.277	0.012
age	0.0333	0.015	2.270	0.023	0.005	0.062
surgery	-1.0109	0.449	-2.254	0.024	-1.890	-0.132

	coef	std err	z	P>\|z\|	[0.025	0.975]
Intercept	12.4467	6.654	1.871	0.061	-0.594	25.488
C(t)[T.1]	-1.0334	0.419	-2.467	0.014	-1.854	-0.212
C(t)[T.2]	-1.6152	0.545	-2.965	0.003	-2.683	-0.547
C(t)[T.3]	-0.4789	0.599	-0.799	0.424	-1.654	0.696
year	-0.2032	0.093	-2.181	0.029	-0.386	-0.021
age	0.0469	0.018	2.533	0.011	0.011	0.083
surgery	-1.1102	0.503	-2.209	0.027	-2.095	-0.125

18 Python

18.1 Package lifelines

18.1.1 Non Paramétrique: Kaplan Meier

18.1.1.1 Calcul des estimateurs

18.1.1.2 Tests du logrank

18.1.1.3 Calcul des Rmst

18.1.2 Semi paramétrique: Cox

18.1.2.1 Estimation

18.1.2.2 Tests hypothèse PH

18.1.3 Modèle à temps discret

18.1.3.1 Ajustement continu

18.1.3.2 Ajustement discret

18.1.4 Modèle paramétrique de type AFT

18.2 Package statsmodels

18.2.1 Kaplan-Meier

18.2.2 Modèle de Cox