Cómo calcular la racha más larga en SQL?

4

GroupBy falta.

para seleccionar el total de días-hombre (para todos) la asistencia de toda la oficina.

Select Id,Count(*) from Employee where IsPresent=1

Para seleccionar la asistencia de días hombre por empleado.

Select Id,Count(*) 
from Employee 
where IsPresent=1 
group by id;

Pero eso no es bueno, ya que cuenta los días totales de asistencia y no la duración de la asistencia continua.

Lo que debe hacer es construir una tabla temporal con otra columna de fecha fecha2. date2 está configurado a hoy. La tabla es la lista de todos los días que un empleado está ausente.

create tmpdb.absentdates as 
Select id, date, today as date2 
from EMPLOYEE 
where IsPresent=0 
order by id, date;

Así que el truco es calcular la diferencia de fecha entre dos días de ausencia para encontrar la duración de los días actuales. Ahora, complete la fecha2 con la próxima fecha de ausencia por empleado. El registro más reciente por empleado no se actualizará, pero se dejará con el valor de hoy porque no hay registros con una fecha mayor que la de hoy en la base de datos.

update tmpdb.absentdates 
set date2 = 
    select min(a2.date) 
    from 
    tmpdb.absentdates a1, 
    tmpdb.absentdates a2 
    where a1.id = a2.id 
    and a1.date < a2.date

Los anteriores se actualice la consulta mediante la realización de una unión sobre sí mismo y puede causar estancamiento consulta por lo que es mejor crear dos copias de la tabla temporal.

create tmpdb.absentdatesX as 
Select id, date 
from EMPLOYEE 
where IsPresent=0 
order by id, date; 

create tmpdb.absentdates as 
select *, today as date2 
from tmpdb.absentdatesX;

es necesario insertar la fecha de contratación, suponiendo que la fecha más temprana por empleado en la base de datos es la fecha de contratación.

insert into tmpdb.absentdates a 
select a.id, min(e.date), today 
from EMPLOYEE e 
where a.id = e.id

Ahora actualizar la fecha 2 con la siguiente fecha posterior ausente para poder llevar a cabo la fecha 2 - fecha.

update tmpdb.absentdates 
set date2 = 
    select min(x.date) 
    from 
    tmpdb.absentdates a, 
    tmpdb.absentdatesX x 
    where a.id = x.id 
    and a.date < x.date

esto mostrará la longitud de días que un EMP está continuamente presente:

select id, datediff(date2, date) as continuousPresence 
from tmpdb.absentdates 
group by id, continuousPresence 
order by id, continuousPresence

Pero sólo quieren racha más larga:

select id, max(datediff(date2, date) as continuousPresence) 
from tmpdb.absentdates 
group by id 
order by id

Sin embargo, lo anterior es todavía problemático porque datediff no tiene en cuenta las vacaciones y los fines de semana.

Por lo tanto, dependemos del recuento de registros como días laborables legítimos.

create tmpdb.absentCount as 
Select a.id, a.date, a.date2, count(*) as continuousPresence 
from EMPLOYEE e, tmpdb.absentdates a 
where e.id = a.id 
    and e.date >= a.date 
    and e.date < a.date2 
group by a.id, a.date 
order by a.id, a.date;

Recuerde, cada vez que se utiliza un agregador como conde, avenida yo necesitan GroupBy la lista elemento seleccionado porque es de sentido común que hay que agregar por ellos.

Ahora seleccione la racha máxima

select id, max(continuousPresence) 
from tmpdb.absentCount 
group by id

Para una lista de las fechas consecutivas:

select id, date, date2, continuousPresence 
from tmpdb.absentCount 
group by id 
having continuousPresence = max(continuousPresence);

Puede haber algunos errores (TSQL SQL Server) anteriores, pero esta es la idea general.

Fuente

2010-06-15 23:58:21

+0

Por alguna razón, no pude crear la tabla temporal ... me muestra el error de un objeto desconocido ... así que utilicé una declaración select into ... – Vishal

1

Prueba esto:

select 
    e.Id, 
    e.date, 
    (select 
     max(e1.date) 
    from 
     employee e1 
    where 
     e1.Id = e.Id and 
     e1.date < e.date and 
     e1.IsPresent = 0) StreakStartDate, 
    (select 
     min(e2.date) 
    from 
     employee e2 
    where 
     e2.Id = e.Id and 
     e2.date > e.date and 
     e2.IsPresent = 0) StreakEndDate   
from 
    employee e 
where 
    e.IsPresent = 1

Entonces se entera de la racha más larga para cada empleado:

select id, max(datediff(streakStartDate, streakEndDate)) 
from (<use subquery above>) 
group by id

No estoy totalmente seguro de que esta consulta tiene la sintaxis correcta porque havn't base de datos ahora . También tenga en cuenta que las columnas de inicio de racha y final de racha no contienen el primer y el último día en que el empleado estuvo presente, sino las fechas más cercanas cuando estuvo ausente. Si las fechas en la tabla tienen aproximadamente la misma distancia, esto no significa que, de lo contrario, la consulta se volverá un poco más compleja, porque necesitamos saber las fechas de presencia más cercanas. También estas mejoras permiten manejar la situación cuando la racha más larga es la primera o la última racha.

La idea principal es para cada fecha en que el empleado estuvo presente, averigua el inicio de la racha y el final de la racha.

Para cada fila en la tabla cuando el empleado estaba presente, el inicio de racha es la fecha máxima que es menor que la fecha de la fila actual cuando el empleado estaba ausente.

Fuente

2010-06-15 23:05:45 STO

0

Lo hice una vez para determinar días consecutivos que un bombero había estado en turno de al menos 15 minutos.

Su caso es un poco más simple.

Si desea suponer que ningún empleado llegó más de 32 veces consecutivas, puede usar una Expresión de tabla común. Pero un mejor enfoque sería usar una tabla temporal y un ciclo while.

Necesitará una columna llamada StartingRowID. Siga uniéndose desde su tabla temporal a la tabla employeeWorkDay para el siguiente día de trabajo consecutivo del empleado e insértelos nuevamente en la tabla temporal. Cuando @@ Row_Count = 0, ha capturado la racha más larga.

Ahora agregue por StartingRowID para obtener el primer día de la racha más larga. Me estoy quedando corto de tiempo, o incluiría algún código de muestra.

Fuente

2010-06-15 23:12:38

4

EDITAR Aquí hay una versión de SQL Server de la consulta:

with LowerBound as (select second_day.EmployeeId 
     , second_day."DATE" as LowerDate 
     , row_number() over (partition by second_day.EmployeeId 
      order by second_day."DATE") as RN 
    from T second_day 
    left outer join T first_day 
     on first_day.EmployeeId = second_day.EmployeeId 
     and first_day."DATE" = dateadd(day, -1, second_day."DATE") 
     and first_day.IsPresent = 1 
    where first_day.EmployeeId is null 
    and second_day.IsPresent = 1) 
, UpperBound as (select first_day.EmployeeId 
     , first_day."DATE" as UpperDate 
     , row_number() over (partition by first_day.EmployeeId 
      order by first_day."DATE") as RN 
    from T first_day 
    left outer join T second_day 
     on first_day.EmployeeId = second_day.EmployeeId 
     and first_day."DATE" = dateadd(day, -1, second_day."DATE") 
     and second_day.IsPresent = 1 
    where second_day.EmployeeId is null 
    and first_day.IsPresent = 1) 
select LB.EmployeeID, max(datediff(day, LowerDate, UpperDate) + 1) as LongestStreak 
from LowerBound LB 
inner join UpperBound UB 
    on LB.EmployeeId = UB.EmployeeId 
    and LB.RN = UB.RN 
group by LB.EmployeeId

SQL Server versión de los datos de prueba:

create table T (EmployeeId int 
    , "DATE" date not null 
    , IsPresent bit not null 
    , constraint T_PK primary key (EmployeeId, "DATE") 
) 


insert into T values (1, '2000-01-01', 1); 
insert into T values (2, '2000-01-01', 0); 
insert into T values (3, '2000-01-01', 0); 
insert into T values (3, '2000-01-02', 1); 
insert into T values (3, '2000-01-03', 1); 
insert into T values (3, '2000-01-04', 0); 
insert into T values (3, '2000-01-05', 1); 
insert into T values (3, '2000-01-06', 1); 
insert into T values (3, '2000-01-07', 0); 
insert into T values (4, '2000-01-01', 0); 
insert into T values (4, '2000-01-02', 1); 
insert into T values (4, '2000-01-03', 1); 
insert into T values (4, '2000-01-04', 1); 
insert into T values (4, '2000-01-05', 1); 
insert into T values (4, '2000-01-06', 1); 
insert into T values (4, '2000-01-07', 0); 
insert into T values (5, '2000-01-01', 0); 
insert into T values (5, '2000-01-02', 1); 
insert into T values (5, '2000-01-03', 0); 
insert into T values (5, '2000-01-04', 1); 
insert into T values (5, '2000-01-05', 1); 
insert into T values (5, '2000-01-06', 1); 
insert into T values (5, '2000-01-07', 0);

Lo sentimos, este se escribe en Oracle, para sustituir la adecuada Aritmética de fechas de SQL Server.

Supuestos:

fecha es o bien un valor de fecha o DateTime con el componente de tiempo de 00:00:00.
La clave principal es (EmployeeId, Date)
Todos los campos son not null

Si una fecha es que falta para el empleado, que eran no presente. (Se utiliza para manejar el comienzo y el final de la serie de datos, sino que también significa que faltan fechas en el medio se romperá rayas Podría ser un problema dependiendo de los requerimientos

with LowerBound as (select second_day.EmployeeId 
     , second_day."DATE" as LowerDate 
     , row_number() over (partition by second_day.EmployeeId 
      order by second_day."DATE") as RN 
    from T second_day 
    left outer join T first_day 
     on first_day.EmployeeId = second_day.EmployeeId 
     and first_day."DATE" = second_day."DATE" - 1 
     and first_day.IsPresent = 1 
    where first_day.EmployeeId is null 
    and second_day.IsPresent = 1) 
, UpperBound as (select first_day.EmployeeId 
     , first_day."DATE" as UpperDate 
     , row_number() over (partition by first_day.EmployeeId 
      order by first_day."DATE") as RN 
    from T first_day 
    left outer join T second_day 
     on first_day.EmployeeId = second_day.EmployeeId 
     and first_day."DATE" = second_day."DATE" - 1 
     and second_day.IsPresent = 1 
    where second_day.EmployeeId is null 
    and first_day.IsPresent = 1) 
select LB.EmployeeID, max(UpperDate - LowerDate + 1) as LongestStreak 
from LowerBound LB 
inner join UpperBound UB 
    on LB.EmployeeId = UB.EmployeeId 
    and LB.RN = UB.RN 
group by LB.EmployeeId

de datos de prueba:..

create table T (EmployeeId number(38) 
     , "DATE" date not null check ("DATE" = trunc("DATE")) 
     , IsPresent number not null check (IsPresent in (0, 1)) 
     , constraint T_PK primary key (EmployeeId, "DATE") 
    ) 
    /

    insert into T values (1, to_date('2000-01-01', 'YYYY-MM-DD'), 1); 
    insert into T values (2, to_date('2000-01-01', 'YYYY-MM-DD'), 0); 
    insert into T values (3, to_date('2000-01-01', 'YYYY-MM-DD'), 0); 
    insert into T values (3, to_date('2000-01-02', 'YYYY-MM-DD'), 1); 
    insert into T values (3, to_date('2000-01-03', 'YYYY-MM-DD'), 1); 
    insert into T values (3, to_date('2000-01-04', 'YYYY-MM-DD'), 0); 
    insert into T values (3, to_date('2000-01-05', 'YYYY-MM-DD'), 1); 
    insert into T values (3, to_date('2000-01-06', 'YYYY-MM-DD'), 1); 
    insert into T values (3, to_date('2000-01-07', 'YYYY-MM-DD'), 0); 
    insert into T values (4, to_date('2000-01-01', 'YYYY-MM-DD'), 0); 
    insert into T values (4, to_date('2000-01-02', 'YYYY-MM-DD'), 1); 
    insert into T values (4, to_date('2000-01-03', 'YYYY-MM-DD'), 1); 
    insert into T values (4, to_date('2000-01-04', 'YYYY-MM-DD'), 1); 
    insert into T values (4, to_date('2000-01-05', 'YYYY-MM-DD'), 1); 
    insert into T values (4, to_date('2000-01-06', 'YYYY-MM-DD'), 1); 
    insert into T values (4, to_date('2000-01-07', 'YYYY-MM-DD'), 0); 
    insert into T values (5, to_date('2000-01-01', 'YYYY-MM-DD'), 0); 
    insert into T values (5, to_date('2000-01-02', 'YYYY-MM-DD'), 1); 
    insert into T values (5, to_date('2000-01-03', 'YYYY-MM-DD'), 0); 
    insert into T values (5, to_date('2000-01-04', 'YYYY-MM-DD'), 1); 
    insert into T values (5, to_date('2000-01-05', 'YYYY-MM-DD'), 1); 
    insert into T values (5, to_date('2000-01-06', 'YYYY-MM-DD'), 1); 
    insert into T values (5, to_date('2000-01-07', 'YYYY-MM-DD'), 0);

Fuente

2010-06-15 23:17:54

1

Aquí hay una versión alternativa, para manejar días perdidos de manera diferente. Digamos que solo registra un registro por días de trabajo, y estar en el trabajo de lunes a viernes una semana y de lunes a viernes de la semana siguiente cuenta como diez días consecutivos. Esta consulta supone que las fechas faltantes en el medio de una serie de filas son días no laborables.

with LowerBound as (select second_day.EmployeeId 
     , second_day."DATE" as LowerDate 
     , row_number() over (partition by second_day.EmployeeId 
      order by second_day."DATE") as RN 
    from T second_day 
    left outer join T first_day 
     on first_day.EmployeeId = second_day.EmployeeId 
     and first_day."DATE" = dateadd(day, -1, second_day."DATE") 
     and first_day.IsPresent = 1 
    where first_day.EmployeeId is null 
    and second_day.IsPresent = 1) 
, UpperBound as (select first_day.EmployeeId 
     , first_day."DATE" as UpperDate 
     , row_number() over (partition by first_day.EmployeeId 
      order by first_day."DATE") as RN 
    from T first_day 
    left outer join T second_day 
     on first_day.EmployeeId = second_day.EmployeeId 
     and first_day."DATE" = dateadd(day, -1, second_day."DATE") 
     and second_day.IsPresent = 1 
    where second_day.EmployeeId is null 
    and first_day.IsPresent = 1) 
select LB.EmployeeID, max(datediff(day, LowerDate, UpperDate) + 1) as LongestStreak 
from LowerBound LB 
inner join UpperBound UB 
    on LB.EmployeeId = UB.EmployeeId 
    and LB.RN = UB.RN 
group by LB.EmployeeId 

go 

with NumberedRows as (select EmployeeId 
     , "DATE" 
     , IsPresent 
     , row_number() over (partition by EmployeeId 
      order by "DATE") as RN 
--  , min("DATE") over (partition by EmployeeId, IsPresent) as MinDate 
--  , max("DATE") over (partition by EmployeeId, IsPresent) as MaxDate 
    from T) 
, LowerBound as (select SecondRow.EmployeeId 
     , SecondRow.RN 
     , row_number() over (partition by SecondRow.EmployeeId 
      order by SecondRow.RN) as LowerBoundRN 
    from NumberedRows SecondRow 
    left outer join NumberedRows FirstRow 
     on FirstRow.IsPresent = 1 
     and FirstRow.EmployeeId = SecondRow.EmployeeId 
     and FirstRow.RN + 1 = SecondRow.RN 
    where FirstRow.EmployeeId is null 
    and SecondRow.IsPresent = 1) 
, UpperBound as (select FirstRow.EmployeeId 
     , FirstRow.RN 
     , row_number() over (partition by FirstRow.EmployeeId 
      order by FirstRow.RN) as UpperBoundRN 
    from NumberedRows FirstRow 
    left outer join NumberedRows SecondRow 
     on SecondRow.IsPresent = 1 
     and FirstRow.EmployeeId = SecondRow.EmployeeId 
     and FirstRow.RN + 1 = SecondRow.RN 
    where SecondRow.EmployeeId is null 
    and FirstRow.IsPresent = 1) 
select LB.EmployeeId, max(UB.RN - LB.RN + 1) 
from LowerBound LB 
inner join UpperBound UB 
    on LB.EmployeeId = UB.EmployeeId 
    and LB.LowerBoundRN = UB.UpperBoundRN 
group by LB.EmployeeId

Fuente

2010-06-16 04:59:01

Cómo calcular la racha más larga en SQL?

Respuesta

Cuestiones relacionadas