At the 2007 BI Conference (great event, btw – had a blast), I was asked the question “How would you compute the number of days before the current period that someone spent 20% of the value they had spent until that day.”

I gave, I thought at the time, a reasonable answer. I suggested doing a nonempty on the leaves of the time hierarchy, computing the running sum for each day from the beginning of time until the current day (or the last day in the current period if the current member wasn’t at the day level), filtering the result so that the running sum exceeded 20% of the total and then computing the number of days between the first day in the filtered result and the current day.

Then I returned to my office and tried it myself – it was hard. If you’re an MDX expert, go try it yourself before you read on.  I learned a new trick while doing it that might help me other unrelated problems.

Anyway, computing a running sum itself is pretty easy. I’m using Adventure Works (AS2005 has been out for a while – time to wean myself off of Foodmart) :

 

with member measures.rs as

 

sum([date].calendar.currentmember.level.members(0):[date].calendar.currentmember, measures.[internet sales amount])

 

select [Date].[Calendar].[Month].members on 0,

[Customer].[Country].[Country].members on 1

from [Adventure Works]

where

 measures.rs

And this takes no time at all (if you don’t have As2005 sp2, please install – we made some optimizations in this area).

So far, so good. Now I want to get that result:

 

 

with

member measures.TargetPercentage as 0.20

member measures.rs as sum([date].calendar.currentmember.level.members(0):[date].calendar.currentmember, measures.[internet sales amount])

member measures.DayCount as

 

rank(

      filter(

            {[date].calendar.currentmember.level.members(0):[date].calendar.currentmember} as AllDays

            , measures.rs > measures.TargetPercentage * sum(AllDays, measures.[internet sales amount])

            ).item(0).item(0)

            , {[date].calendar.currentmember:[date].calendar.currentmember.level.members(0)}

)

 

select

      [Customer].[Country].[Country].members on 0

from [Adventure Works]

where

(

      measures.DayCount

      ,[Date].[Calendar].[Date].&[402]

)

 

 

This works just fine. There is one interesting thing here – the use of a set alias “AllDay”. The filter function iterates over the set ({[date].calendar.currentmember.level.members(0): [date].calendar.currentmember}) and changes the current member in the calendar hierarchy. To compare it to the original current member (the one in the where clause)  I need to create the set alias so I can compare the running sum to each member in the filter set to the original running sum.

And I’m using the rank function to find where the first member in the filtered result is compared to the current member.

 

It’s still not quite right though.  If the current member in the calendar hierarchy is not at the date level, it will not return me the number of days. To fix this, I take the descendants of the current member at leaves:

 

with

member measures.TargetPercentage as 0.20

member measures.rs as sum([date].calendar.currentmember.level.members(0):[date].calendar.currentmember, measures.[internet sales amount])

member measures.DayCount as

 

rank(

      filter(

            {[date].calendar.[date].members(0): tail(descendants([date].calendar.currentmember,,leaves),1).item(0)} as AllDays

            , measures.rs > measures.TargetPercentage * sum(AllDays, measures.[internet sales amount])

            ).item(0).item(0)

            , tail(descendants([date].calendar.currentmember,,leaves),1).item(0): {[date].calendar.[date].members(0)}

)

 

select

      [Customer].[Country].[Country].members on 0

from [Adventure Works]

where

(

      measures.DayCount

      ,[Date].[Calendar].[Date].&[402]

)

 

I thought I was done – performance on my laptop for this query was pretty good. But when I tried a non-trivial number of values, performance was pretty bad. It took 42 seconds for the 48 months and the 6 countries in Adventure Works. The problem is I’m computing the running sum so many times – as many times as there are days between the first day and the current day. Considering that, I’m impressed with how fast the response was but it’s still too slow.

I got it down to 6 seconds. But this is where things get tricky.

The first thing was to reduce the number of running sums by exploiting the fact that the running sum of sales today is equal to the sum of days in the current month + the sum of months in the current quarter + etc, etc, etc. In MDX terms, the sum over this:

union(

      {[Date].[Calendar].currentmember},

      generate(

            ascendants([Date].[Calendar].currentmember)

                  ,iif(

                  [Date].[Calendar].currentmember is [Date].[Calendar].currentmember.firstsibling

                  ,{}

                  ,[Date].[Calendar].currentmember.firstsibling : [Date].[Calendar].currentmember.prevmember

            )

      )

,ALL)

The next approach was to filter over this set to find the first member in time where the condition was satisfied and then find its first leaf descendant where the condition was satisfied:

with

member measures.myRatio as 0.20

member measures.rs as

  sum({[Date].[Calendar].currentmember.level.members(0):[Date].[Calendar].currentmember},[Measures].[Internet Sales Amount])

 

member measures.DayCount as

rank(

      filter(    

            descendants(           

                  hierarchize(

                        filter(

                              union(

                                    {[Date].[Calendar].currentmember},

                                    generate(

                                          ascendants([Date].[Calendar].currentmember)

                                                ,iif(

                                                [Date].[Calendar].currentmember is [Date].[Calendar].currentmember.firstsibling

                                                ,{}

                                                ,[Date].[Calendar].currentmember.firstsibling : [Date].[Calendar].currentmember.prevmember

                                          )

                                    )

                              ,ALL)

                          as s

                          ,measures.rs > measures.myRatio * sum(s,[Measures].[Internet Sales Amount])

                        )

                  ).item(0).item(0)

            ,,leaves)

      ,measures.rs > measures.myRatio * sum(s,[Measures].[Internet Sales Amount])

      ).item(0)

, {tail( descendants([Date].[Calendar].currentmember,,leaves),1).item(0).item(0) : [Date].[Calendar].[Date].members(0)}

)

 

select

[Date].[Calendar].[Month].members on 0,

[customer Geography].[Country].members on 1

from [Adventure Works]

where (measures.DayCount)

 

This works just fine – and query time went down from 42 seconds to 9 seconds! Not bad.

But there’s something here I found unappealing. I dislike that once that ascendant member is found, the running sums are recalculated for each leaf descendent. For example, if that ascendant member is a year member, I’ll do those calculations 365 times. I wanted to find a way to walk down the tree the same way the generate(ascendants(…)) technique was used to traverse up.

And this is where I spent hours trying various things. I kept coming up to the same problem that I overwrote the current member in the time hierarchy and lost track of where I was. It was then a colleague suggested using set aliases.

 

Some history – set aliases are intended to be used in the context in which they are defined. IN other works, when you do filter( <exp> as <alias>,  <condition>), the alias should only be used in the condition and not outside the filter function. And that’s the way it was first implemented in AS2005. But it turned out there was a problem because AS2000 allowed aliases to be used outside. And some clever developers  exploited that. So to support backwards compatibility, we relaxed the restriction.  Aliases can be used in the expression where they’re defined outside the filter at the same point and below in the query execution tree. For example, this works:

with member measures.x as

sum(

      filter([Customer].[Country].[Country].members as myAlias, true),

      [Measures].[Internet Sales Amount]/count(myAlias)

)

 

select measures.x on 0 from [Adventure Works

But this doesn’t:

with member measures.x as

sum(

      filter([Customer].[Country].[Country].members as myAlias, true),

      [Measures].[Internet Sales Amount]

)

/

count(myAlias)

select measures.x on 0 from [Adventure Works]

So the next approach was to exploit this “feature” and create set aliases using the filter function. The engine is smart enough to see the condition always evaluates to true, so it’s the filter is a performance no-op but leaves the alias intact so I can use it. Here’s the result:

with

member measures.myRatio as 0.20

member measures.rs as

  sum({[Date].[Calendar].currentmember.level.members(0):[Date].[Calendar].currentmember},[Measures].[Internet Sales Amount])

 

member measures.DayCount as

rank(

      iif(isleaf(                                                

        filter(

            iif(isleaf(                                          

              filter(

                  iif(isleaf(                                    

                     filter(

                        iif(isleaf(

                              filter(                                  

                              hierarchize(

                                    filter(

                                      union(

                                      {[Date].[Calendar].currentmember},

                                      generate(

                                           ascendants([Date].[Calendar].currentmember)

                                           ,iif(

                                                [Date].[Calendar].currentmember is [Date].[Calendar].currentmember.firstsibling

                                                ,{}

                                                ,[Date].[Calendar].currentmember.firstsibling : [Date].[Calendar].currentmember.prevmember

                                           )

                                      ),ALL) as s

                                      ,measures.rs > measures.myRatio * sum(s,[Measures].[Internet Sales Amount])

                                    )

                              ).item(0).item(0)

                              as t0, true).item(0)

                              )

                              , t0.item(0)

                              , filter(t0.item(0).children, measures.rs > measures.myRatio * sum(s,[Measures].[Internet Sales Amount])).item(0).item(0)

                         )

                        as t1, true).item(0)

                        )

                        , t1.item(0)

                        , filter(t1.item(0).children, measures.rs > measures.myRatio * sum(s,[Measures].[Internet Sales Amount])).item(0).item(0)

                  )

                  as t2, true).item(0)

                  )

                  , t2.item(0)

                  , filter(t2.item(0).children, measures.rs > measures.myRatio * sum(s,[Measures].[Internet Sales Amount])).item(0).item(0)

            )

            as t3, true).item(0)

            )

            , t3.item(0)

            , filter(t3.item(0).children, measures.rs > measures.myRatio * sum(s,[Measures].[Internet Sales Amount])).item(0).item(0)

      )

, {tail( descendants([Date].[Calendar].currentmember,,leaves),1).item(0).item(0) : [Date].[Calendar].[Date].members(0)}

)

 

select

[Date].[Calendar].[Month].members on 0,

[customer Geography].[Country].members on 1

from [Adventure Works]

where (measures.DayCount)

 

This was my final expression. 6 seconds! I find that member in the hierarchy that first matches the condition. Then the nested IIF(isleaf(…)) functions check whether the member is at the leaf and if not, look among the child membes until I’m guaranteed to be at leaf (as many nested IIFs as needed to guarantee to get to leaf given that I know how many levels in the hierarchy. Kind of a hack, but I didn’t find a way to cleverly recurse down.)

 

If I had to put an expression like this in a cube, I’d likely use the second approach and not the fastest – it’s so complicated to be virtually unmaintainable. But perhaps you can find a use for set aliases to solve your own problems.