How To Solve A Gap-and-islands Problem With A High Volume Set Of Data In Impala
Have a Type 2 Dimension residing in an Impala table with ~500M rows having 102 columns : ( C1, C2, ..., C8,...C100, Eff_DT, EXP_DT) Need to select only the rows that have distinct
Solution 1:
In all likelihood, the previous solution will work when modified for the id column:
select id, c1, c2, min(eff_dt), max(exp_dt)
from (select t.*,
row_number() over (partitionby id orderby eff_dt) as seqnum,
row_number() over (partitionby id, c1, c2 orderby eff_dt) as seqnum_1
from t
) t
groupby id, c1, c2, (seqnum - seqnum_1);
You should be able to expand the number of columns as you with.
Post a Comment for "How To Solve A Gap-and-islands Problem With A High Volume Set Of Data In Impala"