2009-09-09 8 views
5

Necesito calcular las varianzas dentro y entre ejecución de algunos datos como parte del desarrollo de un nuevo método de química analítica. También necesito intervalos de confianza de estos datos usando el lenguaje RCalcular dentro y entre varianzas e intervalos de confianza en R

Supongo que necesito usar anova o algo así?

Mis datos es como

> variance 
    Run Rep Value 
1 1 1 9.85 
2 1 2 9.95 
3 1 3 10.00 
4 2 1 9.90 
5 2 2 8.80 
6 2 3 9.50 
7 3 1 11.20 
8 3 2 11.10 
9 3 3 9.80 
10 4 1 9.70 
11 4 2 10.10 
12 4 3 10.00 

Respuesta

1

He estado buscando un problema similar. He encontrado referencias a intervalos de confianza de cálculo por Burdick y Graybill (Burdick, R. y Graybill, F. 1992, Intervalos de confianza en componentes de varianza, CRC Press)

Utilizando un código que he estado intentando obtengo estos valores



> kiaraov = aov(Value~Run+Error(Run),data=kiar) 

> summary(kiaraov) 

Error: Run 
    Df Sum Sq Mean Sq 
Run 3 2.57583 0.85861 

Error: Within 
      Df Sum Sq Mean Sq F value Pr(>F) 
Residuals 8 1.93833 0.24229    
> confint = 95 

> a = (1-(confint/100))/2 

> grandmean = as.vector(kiaraov$"(Intercept)"[[1]][1]) # Grand Mean (I think) 

> within = summary(kiaraov)$"Error: Within"[[1]]$"Mean Sq" # S2^2Mean Square Value for Within Run 

> dfRun = summary(kiaraov)$"Error: Run"[[1]]$"Df" 

> dfWithin = summary(kiaraov)$"Error: Within"[[1]]$"Df" 

> Run = summary(kiaraov)$"Error: Run"[[1]]$"Mean Sq" # S1^2Mean Square for between Run 

> between = (Run-within)/((dfWithin/(dfRun+1))+1) # (S1^2-S2^2)/J 

> total = between+within 

> between # Between Run Variance 
[1] 0.2054398 

> within # Within Run Variance 
[1] 0.2422917 

> total # Total Variance 
[1] 0.4477315 

> betweenCV = sqrt(between)/grandmean * 100 # Between Run CV% 

> withinCV = sqrt(within)/grandmean * 100 # Within Run CV% 

> totalCV = sqrt(total)/grandmean * 100 # Total CV% 

> #within confidence intervals 

> withinLCB = within/qf(1-a,8,Inf) # Within LCB 

> withinUCB = within/qf(a,8,Inf) # Within UCB 

> #Between Confidence Intervals 

> n1 = dfRun 

> n2 = dfWithin 

> G1 = 1-(1/qf(1-a,n1,Inf)) # According to Burdick and Graybill this should be a 

> G2 = 1-(1/qf(1-a,n2,Inf)) 

> H1 = (1/qf(a,n1,Inf))-1 # and this should be 1-a, but my results don't agree 

> H2 = (1/qf(a,n2,Inf))-1 

> G12 = ((qf(1-a,n1,n2)-1)^2-(G1^2*qf(1-a,n1,n2)^2)-(H2^2))/qf(1-a,n1,n2) # again, should be a, not 1-a 

> H12 = ((1-qf(a,n1,n2))^2-H1^2*qf(a,n1,n2)^2-G2^2)/qf(a,n1,n2) # again, should be 1-a, not a 

> Vu = H1^2*Run^2+G2^2*within^2+H12*Run*within 

> Vl = G1^2*Run^2+H2^2*within^2+G12*within*Run 

> betweenLCB = (Run-within-sqrt(Vl))/J # Betwen LCB 

> betweenUCB = (Run-within+sqrt(Vu))/J # Between UCB 

> #Total Confidence Intervals 

> y = (Run+(J-1)*within)/J 

> totalLCB = y-(sqrt(G1^2*Run^2+G2^2*(J-1)^2*within^2)/J) # Total LCB 

> totalUCB = y+(sqrt(H1^2*Run^2+H2^2*(J-1)^2*within^2)/J) # Total UCB 

> result = data.frame(Name=c("within", "between", "total"),CV=c(withinCV,betweenCV,totalCV),LCB=c(sqrt(withinLCB)/grandmean*100,sqrt(betweenLCB)/grandmean*100,sqrt(totalLCB)/grandmean*100),UCB=c(sqrt(withinUCB)/grandmean*100,sqrt(betweenUCB)/grandmean*100,sqrt(totalUCB)/grandmean*100)) 

> result 
    Name  CV  LCB  UCB 
1 within 4.926418 3.327584 9.43789 
2 between 4.536327  NaN 19.73568 
3 total 6.696855 4.846030 20.42647 

Aquí el intervalo de confianza inferior para entre rUN CV es menor que cero, por lo reportado como NaN.

Me encantaría tener una mejor manera de hacerlo. Si tengo tiempo, podría tratar de crear una función para hacer esto.

Paul.

-

Editar: Yo escribo con el tiempo una función, aquí está (Cuidado con este hotel)

#' avar Function 
#' 
#' Calculate thewithin, between and total %CV of a dataset by ANOVA, and the 
#' associated confidence intervals 
#' 
#' @param dataf - The data frame to use, in long format 
#' @param afactor Character string representing the column in dataf that contains the factor 
#' @param aresponse Charactyer string representing the column in dataf that contains the response value 
#' @param aconfidence What Confidence limits to use, default = 95% 
#' @param digits Significant Digits to report to, default = 3 
#' @param debug Boolean, Should debug messages be displayed, default=FALSE 
#' @returnType dataframe containing the Mean, Within, Between and Total %CV and LCB and UCB for each 
#' @return 
#' @author Paul Hurley 
#' @export 
#' @examples 
#' #Using the BGBottles data from Burdick and Graybill Page 62 
#' assayvar(dataf=BGBottles, afactor="Machine", aresponse="weight") 
avar<-function(dataf, afactor, aresponse, aconfidence=95, digits=3, debug=FALSE){ 
    dataf<-subset(dataf,!is.na(with(dataf,get(aresponse)))) 
    nmissing<-function(x) sum(!is.na(x)) 
    n<-nrow(subset(dataf,is.numeric(with(dataf,get(aresponse))))) 
    datadesc<-ddply(dataf, afactor, colwise(nmissing,aresponse)) 
    I<-nrow(datadesc) 
    if(debug){print(datadesc)} 
    if(min(datadesc[,2])==max(datadesc[,2])){ 
     balance<-TRUE 
     J<-min(datadesc[,2]) 
     if(debug){message(paste("Dataset is balanced, J=",J,"I is ",I,sep=""))} 
    } else { 
     balance<-FALSE 
     Jh<-I/(sum(1/datadesc[,2], na.rm = TRUE)) 
     J<-Jh 
     m<-min(datadesc[,2]) 
     M<-max(datadesc[,2]) 
     if(debug){message(paste("Dataset is unbalanced, like me, I is ",I,sep=""))} 
     if(debug){message(paste("Jh is ",Jh, ", m is ",m, ", M is ",M, sep=""))} 
    } 
    if(debug){message(paste("Call afactor=",afactor,", aresponse=",aresponse,sep=""))} 
    formulatext<-paste(as.character(aresponse)," ~ 1 + Error(",as.character(afactor),")",sep="") 
    if(debug){message(paste("formula text is ",formulatext,sep=""))} 
    aovformula<-formula(formulatext) 
    if(debug){message(paste("Formula is ",as.character(aovformula),sep=""))} 
    assayaov<-aov(formula=aovformula,data=dataf) 
    if(debug){ 
     print(assayaov) 
     print(summary(assayaov)) 
    } 
    a<-1-((1-(aconfidence/100))/2) 
    if(debug){message(paste("confidence is ",aconfidence,", alpha is ",a,sep=""))} 
    grandmean<-as.vector(assayaov$"(Intercept)"[[1]][1]) # Grand Mean (I think) 
    if(debug){message(paste("n is",n,sep=""))} 

    #This line commented out, seems to choke with an aov object built from an external formula 
    #grandmean<-as.vector(model.tables(assayaov,type="means")[[1]]$`Grand mean`) # Grand Mean (I think) 
    within<-summary(assayaov)[[2]][[1]]$"Mean Sq" # d2e, S2^2 Mean Square Value for Within Machine = 0.1819 
    dfRun<-summary(assayaov)[[1]][[1]]$"Df" # DF for within = 3 
    dfWithin<-summary(assayaov)[[2]][[1]]$"Df" # DF for within = 8 
    Run<-summary(assayaov)[[1]][[1]]$"Mean Sq" # S1^2Mean Square for Machine 
    if(debug){message(paste("mean square for Run ?",Run,sep=""))} 
    #Was between<-(Run-within)/((dfWithin/(dfRun+1))+1) but my comment suggests this should be just J, so I'll use J ! 
    between<-(Run-within)/J # d2a (S1^2-S2^2)/J 
    if(debug){message(paste("S1^2 mean square machine is ",Run,", S2^2 mean square within is ",within))} 
    total<-between+within 
    between # Between Run Variance 
    within # Within Run Variance 
    total # Total Variance 
    if(debug){message(paste("between is ",between,", within is ",within,", Total is ",total,sep=""))} 

    betweenCV<-sqrt(between)/grandmean * 100 # Between Run CV% 
    withinCV<-sqrt(within)/grandmean * 100 # Within Run CV% 
    totalCV<-sqrt(total)/grandmean * 100 # Total CV% 
    n1<-dfRun 
    n2<-dfWithin 
    if(debug){message(paste("n1 is ",n1,", n2 is ",n2,sep=""))} 
    #within confidence intervals 
    if(balance){ 
     withinLCB<-within/qf(a,n2,Inf) # Within LCB 
     withinUCB<-within/qf(1-a,n2,Inf) # Within UCB 
    } else { 
     withinLCB<-within/qf(a,n2,Inf) # Within LCB 
     withinUCB<-within/qf(1-a,n2,Inf) # Within UCB 
    } 
#Mean Confidence Intervals 
    if(debug){message(paste(grandmean,"+/-(sqrt(",Run,"/",n,")*qt(",a,",df=",I-1,"))",sep=""))} 
    meanLCB<-grandmean+(sqrt(Run/n)*qt(1-a,df=I-1)) # wrong 
    meanUCB<-grandmean-(sqrt(Run/n)*qt(1-a,df=I-1)) # wrong 
    if(debug){message(paste("Grandmean is ",grandmean,", meanLCB = ",meanLCB,", meanUCB = ",meanUCB,aresponse,sep=""))} 
    if(debug){print(summary(assayaov))} 
#Between Confidence Intervals 
    G1<-1-(1/qf(a,n1,Inf)) 
    G2<-1-(1/qf(a,n2,Inf)) 
    H1<-(1/qf(1-a,n1,Inf))-1 
    H2<-(1/qf(1-a,n2,Inf))-1 
    G12<-((qf(a,n1,n2)-1)^2-(G1^2*qf(a,n1,n2)^2)-(H2^2))/qf(a,n1,n2) 
    H12<-((1-qf(1-a,n1,n2))^2-H1^2*qf(1-a,n1,n2)^2-G2^2)/qf(1-a,n1,n2) 
    if(debug){message(paste("G1 is ",G1,", G2 is ",G2,sep="")) 
     message(paste("H1 is ",H1,", H2 is ",H2,sep="")) 
     message(paste("G12 is ",G12,", H12 is ",H12,sep="")) 
    } 
    if(balance){ 
     Vu<-H1^2*Run^2+G2^2*within^2+H12*Run*within 
     Vl<-G1^2*Run^2+H2^2*within^2+G12*within*Run 
     betweenLCB<-(Run-within-sqrt(Vl))/J # Betwen LCB 
     betweenUCB<-(Run-within+sqrt(Vu))/J # Between UCB 
    } else { 
     #Burdick and Graybill seem to suggest calculating anova of mean values to find n1S12u/Jh 
     meandataf<-ddply(.data=dataf,.variable=afactor, .fun=function(df){mean(with(df, get(aresponse)), na.rm=TRUE)}) 
     meandataaov<-aov(formula(paste("V1~",afactor,sep="")), data=meandataf) 
     sumsquare<-summary(meandataaov)[[1]]$`Sum Sq` 
     #so maybe S12u is just that bit ? 
     Runu<-(sumsquare*Jh)/n1 
     if(debug){message(paste("n1S12u/Jh is ",sumsquare,", so S12u is ",Runu,sep=""))} 
     Vu<-H1^2*Runu^2+G2^2*within^2+H12*Runu*within 
     Vl<-G1^2*Runu^2+H2^2*within^2+G12*within*Runu 
     betweenLCB<-(Runu-within-sqrt(Vl))/Jh # Betwen LCB 
     betweenUCB<-(Runu-within+sqrt(Vu))/Jh # Between UCB 
     if(debug){message(paste("betweenLCB is ",betweenLCB,", between UCB is ",betweenUCB,sep=""))} 
    } 
#Total Confidence Intervals 
    if(balance){ 
     y<-(Run+(J-1)*within)/J 
     if(debug){message(paste("y is ",y,sep=""))} 
     totalLCB<-y-(sqrt(G1^2*Run^2+G2^2*(J-1)^2*within^2)/J) # Total LCB 
     totalUCB<-y+(sqrt(H1^2*Run^2+H2^2*(J-1)^2*within^2)/J) # Total UCB 
    } else { 
     y<-(Runu+(Jh-1)*within)/Jh 
     if(debug){message(paste("y is ",y,sep=""))} 
     totalLCB<-y-(sqrt(G1^2*Runu^2+G2^2*(Jh-1)^2*within^2)/Jh) # Total LCB 
     totalUCB<-y+(sqrt(H1^2*Runu^2+H2^2*(Jh-1)^2*within^2)/Jh) # Total UCB 
    } 
    if(debug){message(paste("totalLCB is ",totalLCB,", total UCB is ",totalUCB,sep=""))} 
# result<-data.frame(Name=c("within", "between", "total"),CV=c(withinCV,betweenCV,totalCV), 
#   LCB=c(sqrt(withinLCB)/grandmean*100,sqrt(betweenLCB)/grandmean*100,sqrt(totalLCB)/grandmean*100), 
#   UCB=c(sqrt(withinUCB)/grandmean*100,sqrt(betweenUCB)/grandmean*100,sqrt(totalUCB)/grandmean*100)) 
    result<-data.frame(Mean=grandmean,MeanLCB=meanLCB, MeanUCB=meanUCB, Within=withinCV,WithinLCB=sqrt(withinLCB)/grandmean*100, WithinUCB=sqrt(withinUCB)/grandmean*100, 
      Between=betweenCV, BetweenLCB=sqrt(betweenLCB)/grandmean*100, BetweenUCB=sqrt(betweenUCB)/grandmean*100, 
      Total=totalCV, TotalLCB=sqrt(totalLCB)/grandmean*100, TotalUCB=sqrt(totalUCB)/grandmean*100) 
    if(!digits=="NA"){ 
     result$Mean<-signif(result$Mean,digits=digits) 
     result$MeanLCB<-signif(result$MeanLCB,digits=digits) 
     result$MeanUCB<-signif(result$MeanUCB,digits=digits) 
     result$Within<-signif(result$Within,digits=digits) 
     result$WithinLCB<-signif(result$WithinLCB,digits=digits) 
     result$WithinUCB<-signif(result$WithinUCB,digits=digits) 
     result$Between<-signif(result$Between,digits=digits) 
     result$BetweenLCB<-signif(result$BetweenLCB,digits=digits) 
     result$BetweenUCB<-signif(result$BetweenUCB,digits=digits) 
     result$Total<-signif(result$Total,digits=digits) 
     result$TotalLCB<-signif(result$TotalLCB,digits=digits) 
     result$TotalUCB<-signif(result$TotalUCB,digits=digits) 
    } 
    return(result) 
} 

assayvar<-function(adata, aresponse, afactor, anominal, aconfidence=95, digits=3, debug=FALSE){ 
    result<-ddply(adata,anominal,function(df){ 
       resul<-avar(dataf=df,afactor=afactor,aresponse=aresponse,aconfidence=aconfidence, digits=digits, debug=debug) 
       resul$n<-nrow(subset(df, !is.na(with(df, get(aresponse))))) 
       return(resul) 
      }) 
    return(result) 
} 
3

Si desea aplicar una función (como var) a través de un factor como Run o Rep, puede utilizar tapply:

> with(variance, tapply(Value, Run, var)) 
      1   2   3   4 
0.005833333 0.310000000 0.610000000 0.043333333 
> with(variance, tapply(Value, Rep, var)) 
      1   2   3 
0.48562500 0.88729167 0.05583333 
+0

Nice! Eso es Elegant Code en mi opinión. – kpierce8

6

Tienes cuatro grupos de tres observaciones:

> run1 = c(9.85, 9.95, 10.00) 
> run2 = c(9.90, 8.80, 9.50) 
> run3 = c(11.20, 11.10, 9.80) 
> run4 = c(9.70, 10.10, 10.00) 
> runs = c(run1, run2, run3, run4) 
> runs 
[1] 9.85 9.95 10.00 9.90 8.80 9.50 11.20 11.10 9.80 9.70 10.10 10.00 

hacen que algunas etiquetas:

> n = rep(3, 4) 
> group = rep(1:4, n) 
> group 
[1] 1 1 1 2 2 2 3 3 3 4 4 4 

Calcular Intraserie Estadísticas:

> withinRunStats = function(x) c(sum = sum(x), mean = mean(x), var = var(x), n = length(x)) 
> tapply(runs, group, withinRunStats) 
$`1` 
     sum   mean   var   n 
29.800000000 9.933333333 0.005833333 3.000000000 

$`2` 
    sum mean var  n 
28.20 9.40 0.31 3.00 

$`3` 
    sum mean var  n 
32.10 10.70 0.61 3.00 

$`4` 
     sum  mean   var   n 
29.80000000 9.93333333 0.04333333 3.00000000 

usted puede hacer algunas ANOVA aquí:

> data = data.frame(y = runs, group = factor(group)) 
> data 
     y group 
1 9.85  1 
2 9.95  1 
3 10.00  1 
4 9.90  2 
5 8.80  2 
6 9.50  2 
7 11.20  3 
8 11.10  3 
9 9.80  3 
10 9.70  4 
11 10.10  4 
12 10.00  4 

> fit = lm(runs ~ group, data) 
> fit 

Call: 
lm(formula = runs ~ group, data = data) 

Coefficients: 
(Intercept)  group2  group3  group4 
    9.933e+00 -5.333e-01 7.667e-01 -2.448e-15 

> anova(fit) 
Analysis of Variance Table 

Response: runs 
      Df Sum Sq Mean Sq F value Pr(>F) 
group  3 2.57583 0.85861 3.5437 0.06769 . 
Residuals 8 1.93833 0.24229     
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> degreesOfFreedom = anova(fit)[, "Df"] 
> names(degreesOfFreedom) = c("treatment", "error") 
> degreesOfFreedom 
treatment  error 
     3   8 

error o dentro de los grupos varianza:

> anova(fit)["Residuals", "Mean Sq"] 
[1] 0.2422917 

Treatmen t o variación entre grupos:

> anova(fit)["group", "Mean Sq"] 
[1] 0.8586111 

Esto debería darle suficiente para hacer intervalos de confianza.

3

Voy a tomar una grieta en esto cuando tengo más tiempo, pero mientras tanto, aquí está la dput() para la estructura de datos de Kiar:

structure(list(Run = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4), Rep = c(1, 
2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3), Value = c(9.85, 9.95, 10, 9.9, 
8.8, 9.5, 11.2, 11.1, 9.8, 9.7, 10.1, 10)), .Names = c("Run", 
"Rep", "Value"), row.names = c(NA, -12L), class = "data.frame") 

... en caso de que desea tomar una tiro rápido en eso.

Cuestiones relacionadas