2012-04-06 11 views
6

El siguiente código devuelve el número de tickets resueltos y el número de tickets abiertos para un período (el período es YYYY, WW) que se remonta un determinado número de días. Por ejemplo, si @NoOfDays es 7:Creación de una línea de tendencia desde el conjunto de datos SQL

resolved | abierto | semana | año | período

56 | 30 | 13 | 2012 | 2012, 13

237 | 222 | 14 | 2012 | 2012, 14

'resuelto' y 'abierto' se grafican en las líneas (y) durante el período (x). Me gustaría agregar otra columna 'tendencia' que arrojaría un número que, cuando se graficó durante un período, será una línea de tendencia (regresión lineal simple). I do desea utilizar ambos conjuntos de valores como una fuente de datos para la tendencia.

Este es el código que tengo:

SELECT a.resolved, b.opened, a.weekClosed AS week, a.yearClosed AS year, 
    CAST(a.yearClosed as varchar(5)) + ', ' + CAST(a.weekClosed as varchar(5)) AS period 
FROM 
    (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS resolved, { fn WEEK(date_closed) } AS weekClosed, { fn YEAR(date_closed) } AS yearClosed 
    FROM v_rpt_Service 
    WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0)) 
    GROUP BY { fn WEEK(date_closed) }, { fn YEAR(date_closed) }) AS a 
LEFT OUTER JOIN 
    (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS opened, { fn WEEK(date_entered) } AS weekEntered, { fn YEAR(date_entered) 
    } AS yearEntered 
    FROM v_rpt_Service AS v_rpt_Service_1 
    WHERE  (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0)) 
    GROUP BY { fn WEEK(date_entered) }, { fn YEAR(date_entered) }) AS b ON a.weekClosed = b.weekEntered AND a.yearClosed = b.yearEntered 
ORDER BY year, week 

Editar:

Según serc.carleton.edu/files/mathyouneed/best_fit_line_dividing.pdf, parece que quiero romper los datos a la mitad, luego calcule el promedio. Entonces necesito encontrar la línea que mejor se ajuste, y usar la pendiente y la intersección con el eje y para calcular los valores necesarios para regresar en 'tendencia' usando y = mx + b?

Sé que esto es muy posible en SQL, sin embargo, el programa en el que estoy insertando el SQL tiene limitaciones en lo que puedo hacer.

Los puntos rojos y azules son los números que estoy devolviendo ahora (abiertos y resueltos). Necesito devolver un valor para cada período en 'tendencia' para crear la línea violeta. (Esta imagen es hipotético)

Hypothetical Chart

+0

¿Esto es para MS SQLServer, o para un RDBMS diferente? –

+0

MS SQLServer es correcto. –

Respuesta

1

lo he descubierto. Dividí los datos en múltiples tablas derivadas y subconsultas, dividiendo esencialmente los datos a la mitad. Estos son mis fórmulas para obtener cada valor:

*(each row is a week)* 
y1 = average of data first half 
y2 = average of data second half 
x1 = 1/4 of number of weeks 
x2 = 3/4 of number of weeks 
m = (y1-y2)/(x1-x2) 
b = y2 - (m * x2) 
trend = (m * row_number) + b 

Y aquí es mi código (muy sucio) SQL:

SELECT resolved_half1,resolved_half2,opened_half1,opened_half2, c.period, 
((SUM (resolved_half1) OVER() + SUM(opened_half1) OVER()) - (SUM(resolved_half2) OVER() + SUM(opened_half2) OVER()))/((COUNT(resolved_half1) OVER() + COUNT(opened_half1) OVER())/2) as y1, 
((SUM(resolved_half2) OVER() + SUM(opened_half2) OVER())/(COUNT(resolved_half2) OVER() + COUNT (opened_half2) OVER())) as y2, 
((COUNT(c.period) OVER())/4) as x1, 
(((COUNT(c.period) OVER())/4) * 3) as x2, 
((CAST(((SUM (resolved_half1) OVER() + SUM(opened_half1) OVER()) - (SUM(resolved_half2) OVER() + SUM(opened_half2) OVER()))/((COUNT(resolved_half1) OVER() + COUNT(opened_half1) OVER())/2) as float) - CAST(((SUM(resolved_half2) OVER() + SUM(opened_half2) OVER())/(COUNT(resolved_half2) OVER() + COUNT (opened_half2) OVER())) as float))/(CAST(((COUNT(c.period) OVER())/4) as float) - CAST((((COUNT(c.period) OVER())/4) * 3) as float))) as m, 
(CAST(((SUM(resolved_half2) OVER() + SUM(opened_half2) OVER())/(COUNT(resolved_half2) OVER() + COUNT (opened_half2) OVER())) as float) - (((CAST(((SUM (resolved_half1) OVER() + SUM(opened_half1) OVER()) - (SUM(resolved_half2) OVER() + SUM(opened_half2) OVER()))/((COUNT(resolved_half1) OVER() + COUNT(opened_half1) OVER())/2) as float) - CAST(((SUM(resolved_half2) OVER() + SUM(opened_half2) OVER())/(COUNT(resolved_half2) OVER() + COUNT (opened_half2) OVER())) as float))/(CAST(((COUNT(c.period) OVER())/4) as float) - CAST((((COUNT(c.period) OVER())/4) * 3) as float))) * (((COUNT(c.period) OVER())/4) * 3))) as b, 
((((CAST(((SUM (resolved_half1) OVER() + SUM(opened_half1) OVER()) - (SUM(resolved_half2) OVER() + SUM(opened_half2) OVER()))/((COUNT(resolved_half1) OVER() + COUNT(opened_half1) OVER())/2) as float) - CAST(((SUM(resolved_half2) OVER() + SUM(opened_half2) OVER())/(COUNT(resolved_half2) OVER() + COUNT (opened_half2) OVER())) as float))/(CAST(((COUNT(c.period) OVER())/4) as float) - CAST((((COUNT(c.period) OVER())/4) * 3) as float))) * (ROW_NUMBER() OVER(ORDER BY c.yearClosed,c.weekClosed))) + (CAST(((SUM(resolved_half2) OVER() + SUM(opened_half2) OVER())/(COUNT(resolved_half2) OVER() + COUNT (opened_half2) OVER())) as float) - (((CAST(((SUM (resolved_half1) OVER() + SUM(opened_half1) OVER()) - (SUM(resolved_half2) OVER() + SUM(opened_half2) OVER()))/((COUNT(resolved_half1) OVER() + COUNT(opened_half1) OVER())/2) as float) - CAST(((SUM(resolved_half2) OVER() + SUM(opened_half2) OVER())/(COUNT(resolved_half2) OVER() + COUNT (opened_half2) OVER())) as float))/(CAST(((COUNT(c.period) OVER())/4) as float) - CAST((((COUNT(c.period) OVER())/4) * 3) as float))) * (((COUNT(c.period) OVER())/4) * 3)))) as trend, 
ROW_NUMBER() OVER(ORDER BY c.yearClosed,c.weekClosed) as row 

FROM 
    (SELECT *, CAST(yearClosed as varchar(5)) + ', ' + CAST(weekClosed as varchar(5)) AS period 
    FROM (SELECT  TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS resolved_half1, { fn WEEK(date_closed) } AS weekClosed, { fn YEAR(date_closed) } AS yearClosed 
          FROM   v_rpt_Service 
     WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - (180), 0)) 

     GROUP BY { fn WEEK(date_closed) }, { fn YEAR(date_closed) }) AS a 
     LEFT OUTER JOIN 
     (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS opened_half1, { fn WEEK(date_entered) } AS weekEntered, { fn YEAR(date_entered) 
     FROM v_rpt_Service AS v_rpt_Service_1 
     WHERE (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - (180), 0)) 
     GROUP BY { fn WEEK(date_entered) }, { fn YEAR(date_entered) }) AS b ON a.weekClosed = b.weekEntered AND a.yearClosed = b.yearEntered) as c 
     LEFT OUTER JOIN 
     (SELECT *, CAST(yearClosed as varchar(5)) + ', ' + CAST(weekClosed as varchar(5)) AS period 
     FROM (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS resolved_half2, { fn WEEK(date_closed) } AS weekClosed, { fn YEAR(date_closed) } AS yearClosed 
     FROM v_rpt_Service 
     WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - (180/2), 0)) 
     GROUP BY { fn WEEK(date_closed) }, { fn YEAR(date_closed) }) AS d 
     LEFT OUTER JOIN 
     (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS opened_half2, { fn WEEK(date_entered) } AS weekEntered, { fn YEAR(date_entered)} AS yearEntered 
     FROM v_rpt_Service AS v_rpt_Service_1 
     WHERE (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - (180/2), 0)) 
     GROUP BY { fn WEEK(date_entered) }, { fn YEAR(date_entered) }) AS e ON d.weekClosed = e.weekEntered AND d.yearClosed = e.yearEntered 
) as f ON c.yearClosed = f.yearClosed AND c.weekClosed = f.weekClosed AND c.weekEntered = f.weekEntered AND c.yearEntered = f.yearEntered AND c.period = f.period 
GROUP BY c.period, resolved_half1,resolved_half2,opened_half1,opened_half2,c.yearClosed,c.weekClosed 
ORDER BY row 

Este código utiliza un valor codificado de 180 días. Todavía necesito poder usar un varibale para seleccionar el número de días (sin obtener un error de división por 0), y el código realmente necesita ser limpiado. Si alguien puede hacer esas dos cosas por mí (no soy el mejor en SQL), la recompensa es de ellos.

Image:

Chart

0

creo que esto va a hacer el truco - si se debe publicar algunos datos reales de la muestra y voy a ver si puedo modificarlo solucionarlo:

DECLARE @noOfDays INT 
SET @noofdays = 180 

;WITH tickets AS 
(
SELECT DISTINCT 
DATENAME(YEAR,date_closed) + RIGHT('000' + CAST(DATEPART(WEEK,date_closed) AS VARCHAR(5)),3) as Period 
,ticket_nbr 
,1 as ticket_type --resolved 
FROM v_rpt_Service 
WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0)) 
UNION ALL 
SELECT DISTINCT 
DATENAME(YEAR,date_closed) + RIGHT('000' + CAST(DATEPART(WEEK,date_closed) AS VARCHAR(5)),3) as Period 
,ticket_nbr 
,0 as ticket_type --opened 
FROM v_rpt_Service 
WHERE (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0)) 
) 
,tickets2 AS 
(
SELECT 
Period 
,SUM(CASE WHEN ticket_type = 0 THEN 1 ELSE 0 END) as opened 
,SUM(CASE WHEN ticket_type = 1 THEN 1 ELSE 0 END) as closed 
FROM tickets 
GROUP BY 
Period 
) 
,tickets3 AS 
(
SELECT 
Period 
,row_number() OVER (ORDER BY period ASC) as row 
,opened 
,closed 
,COUNT(period) OVER() as base 
,SUM(opened) OVER() as [Sumopened] 
,SUM(opened * opened) OVER() as [Sumopened^2] 
,SUM(opened * closed) OVER() as [Sumopenedclosed] 
,SUM(closed) OVER() as [Sumclosed] 
,SUM(closed * closed) OVER() as [Sumclosed^2] 
,SUM(opened * closed) OVER() * COUNT(period) OVER() AS [nSumopenedclosed] 
,SUM(opened) OVER() * SUM(closed) OVER() AS [Sumopened*Sumclosed] 
,SUM(opened * opened) OVER() * COUNT(period) OVER() AS [nSumopened^2] 
,SUM(opened) OVER() * SUM(opened) OVER() as [Sumopened*Sumopened] 
FROM tickets2 
) 
--Formula for linear regression is Y = A + BX 
SELECT 
period 
,opened 
,closed 
,((1.0/base) * [Sumclosed]) - 
([Sumopenedclosed] - ([Sumopened*Sumclosed]/base))/([Sumopened^2] - ([Sumopened*Sumopened]/base)) *((1.0/base) * [Sumopened]) 
+ row * ([Sumopenedclosed] - ([Sumopened*Sumclosed]/base))/([Sumopened^2] - ([Sumopened*Sumopened]/base)) 
AS trend_point 
,((1.0/base) * [Sumclosed]) - 
([Sumopenedclosed] - ([Sumopened*Sumclosed]/base))/([Sumopened^2] - ([Sumopened*Sumopened]/base)) *((1.0/base) * [Sumopened]) AS A 
,([Sumopenedclosed] - ([Sumopened*Sumclosed]/base))/([Sumopened^2] - ([Sumopened*Sumopened]/base)) as B 
from tickets3 
3

Estaba interesado en el problema, y ​​he descubierto que la mejor forma de asimilar una consulta compleja es reformatearlo usando mi propio estilo y convenciones. Los apliqué a su solución, y el resultado está abajo. No tengo ni idea de si esto tendrá ningún valor para usted ...

  • había unos pocos bits de código que no creo que son parte de la sintaxis MS T-SQL, como ({fn xxx } y la WEEK(xxx) función.
  • Este código se compila, pero no puedo ejecutarlo ya que no tengo una tabla de datos configurada correctamente.
  • Hice una serie de cambios en la codificación que tomarían una gran cantidad de explicaciones, y me voy a saltear la mayor parte de eso. Agregue un comentario si desea algo explicado.
  • Lancé un montón de espacio en blanco. La diferencia entre los códigos legibles e ilegibles suele ser sólo la percepción y la sensibilidad del espectador, y es posible que odie mis convenciones.
  • No está seguro de lo que el conjunto de resultados final debe ser (es decir, los que consiguen devueltos columnas)

más notas:

  • no conseguirá artículos entró Esta consulta en una semana si no hay elementos fueron también cerrado en esa semana
  • Las semanas pueden ser parciales, por ejemplo no todos los siete días pueden estar presentes (ajuste @Interval para incluir siempre las semanas completas, pero ¿qué pasa con los intervalos impares?)
  • Multiplique los valores de conteo (*) por 1.0 para convertirlos en flotadores anticipadamente (evita cálculos matemáticos enteros y enteros) truncamiento)
  • convertido en un CTE para permitir que las fórmulas anteriores para ser reemplazados por símbolos en las fórmulas posteriores (a la que punto las cosas se volvieron mucho más legible)

Así que esto es lo que ocurrió:

;WITH cte as (
select 
    c.period 
    ,resolved_half1 
    ,resolved_half2 
    ,opened_half1 
    ,opened_half2 
    ,row = row_number() over(order by c.yearClosed, c.weekClosed) 
    ,y1 = ((SUM(resolved_half1) + SUM(opened_half1)) - (SUM(resolved_half2) + SUM(opened_half2)))/((count(resolved_half1) + count(opened_half1))/2) 
    ,y2 = ((SUM(resolved_half2) + SUM(opened_half2))/(count(resolved_half2) + COUNT (opened_half2))) 
    ,x1 = ((count(c.period))/4) 
    ,x2 = (((count(c.period))/4) * 3) 
from (select 
      a.yearclosed 
     ,a.weekClosed 
     ,a.resolved_half1 
     ,b.yearEntered 
     ,b.weekEntered 
     ,b.opened_half1 
     ,cast(a.yearClosed as varchar(5)) + ', ' + cast(a.weekClosed as varchar(5)) period 
     from (-- Number of items per week that closed within @Interval 
       select 
       count(distinct TicketNbr) * 1.0 resolved_half1 
       ,datepart(wk, date_closed)  weekClosed 
       ,year(date_closed)    yearClosed 
       from v_rpt_Service 
       where date_closed >= @FullInterval 
       group by 
       datepart(wk, date_closed) 
       ,year(date_closed)) a 
     left outer join (-- Number of items per week that were entered within @Interval 
          select 
          count(distinct TicketNbr) * 1.0 opened_half1 
          ,datepart(wk, date_entered)  weekEntered 
          ,year(date_entered)    yearEntered 
          from v_rpt_Service 
          where date_entered >= @FullInterval 
          group by 
          datepart(wk, date_entered) 
          ,year(date_entered)) b 
      on a.weekClosed = b.weekEntered 
      and a.yearClosed = b.yearEntered) c 
    left outer join (select 
         d.yearclosed 
         ,d.weekClosed 
         ,d.resolved_half2 
         ,e.yearEntered 
         ,e.weekEntered 
         ,e.opened_half2 
         ,cast(yearClosed as varchar(5)) + ', ' + cast(weekClosed as varchar(5)) period 
        from (select 
          count(distinct TicketNbr) * 1.0 resolved_half2 
          ,datepart(wk, date_closed)  weekClosed 
          ,year(date_closed)    yearClosed 
          from v_rpt_Service 
          where date_closed >= @HalfInterval 
          group by 
          datepart(wk, date_closed) 
          ,year(date_closed)) d 
        left outer join (select 
             count(distinct TicketNbr) * 1.0 opened_half2 
             ,datepart(wk, date_entered)  weekEntered 
             ,year(date_entered)    yearEntered 
             from v_rpt_Service 
             where date_entered >= @HalfInterval 
             group by 
              datepart(wk, date_entered) 
              ,year(date_entered)) e 
         on d.weekClosed = e.weekEntered 
         and d.yearClosed = e.yearEntered) f 
    on c.period = f.period 
group by 
    c.period 
    ,resolved_half1 
    ,resolved_half2 
    ,opened_half1 
    ,opened_half2 
    ,c.yearClosed 
    ,c.weekClosed 
) 
SELECT 
    row 
    ,Period 
    ,x1 
    ,y1 
    ,x2 
    ,y2 
    ,m = ((y1 - y2)/(x1 - x2)) 
    ,b = (y2 - (((y1 - y2)/(x1 - x2)) * x2)) 
    ,trend = ((((y1 - y2)/(x1 - x2)) * (row)) + (y2 - (((y1 - y2)/(x1 - x2)) * x2))) 
from cte 
order by row 

Como una adenda, todas las subconsultas "c" co uld se sustituirá por algo como lo siguiente, y "f" con una versión ligeramente modificada. Un mejor o peor desempeño depende del tamaño de la tabla, la indexación y otros imponderables.

select 
    datepart(wk, date_closed) weekClosed 
    ,year(date_closed)   yearClosed 
    ,count (distinct case 
        when date_closed >= @FullInterval then TicketNbr 
        else null 
       end)   resolved_half1 
    ,count (distinct case 
        when date_entered >= @FullInterval then TicketNbr 
        else null 
       end)   opened_half1 
from v_rpt_Service 
where date_closed >= @FullInterval 
    or date_entered >= @FullInterval 
group by 
    datepart(wk, date_closed) 
    ,year(date_closed) 
Cuestiones relacionadas