Skip to content Skip to sidebar Skip to footer

How To Solve A Gap-and-islands Problem With A High Volume Set Of Data In Impala

Have a Type 2 Dimension residing in an Impala table with ~500M rows having 102 columns : ( C1, C2, ..., C8,...C100, Eff_DT, EXP_DT) Need to select only the rows that have distinct

Solution 1:

In all likelihood, the previous solution will work when modified for the id column:

select id, c1, c2, min(eff_dt), max(exp_dt)
from (select t.*,
             row_number() over (partitionby id orderby eff_dt) as seqnum,
             row_number() over (partitionby id, c1, c2 orderby eff_dt) as seqnum_1
      from t
     ) t
groupby id, c1, c2, (seqnum - seqnum_1);

You should be able to expand the number of columns as you with.

Post a Comment for "How To Solve A Gap-and-islands Problem With A High Volume Set Of Data In Impala"