Smart and fast way to compute triple summation?

Illustration
genny_smith - 2022-05-04T13:31:07+00:00
Question: Smart and fast way to compute triple summation?

Hi all,   I need to compute the following triple summation for different values of n for all s.   Also, ~p, l_s and q_s1, q_s2 and q_s3 are known and π is the multinomial distribution. It's quite easy to do using for-loops (which I did below) but the execution takes forever as n increases. For comparison when n=10 it takes 345 seconds. I need to work with n=10000.   Is there a smart way to speed up things?   clear clc load data.mat tic n = 10; l_s = height( segments ); % # of segments L = sum( segments.NewLength ); % total legnth of Segments expSlength = zeros(l_s,1); meanPass = median( barriers.prePass ); qs1 = segments.NewLength ./ L; qs2 = segments.segDis2Mth ./ L; qs3 = (L - segments.NewLength - segments.segDis2Mth) ./ L; for s = 1 : l_s for k = 0 : n innersum = 0; for t = 0 : n-k B = 0; for r = 0 : k A = segments.NewLength(s) / (k + 1) * meanPass^r; B = B + A; end PROB = [qs1(s), qs2(s), qs3(s)]; X = [ k, t, n-k-t ]; p_stk = mnpdf(X, PROB); innersum = innersum + p_stk * meanPass^t * B; end expSlength(s) = expSlength(s) + innersum; end end toc  

Expert Answer

Profile picture of Kshitij Singh Kshitij Singh answered . 2025-11-20

PART 4: Swapped loops:

 

tic
n   = 50; 
l_s = numel(segments.batNetID);   % # of segments
L   = sum(segments.NewLength);    % total legnth of Segments
expSlength = zeros(l_s, 1);

meanPass = median( barriers.prePass );
qs1 = segments.NewLength ./ L;
qs2 = segments.segDis2Mth ./ L;
qs3 = (L - segments.NewLength - segments.segDis2Mth) ./ L;

meanPass_pow  = cumprod([1, repmat(meanPass, 1, n)]);
meanPass_powC = cumsum(meanPass_pow);

gammaln2 = gammaln(1:n+1);
logPROB  = log([qs1, qs2, qs3]).';

for k = 0:n
   X = zeros(n - k + 1, 3);
   for t = 0:n-k
      X(t+1, :) = [k, t, n-k-t];
   end
   c = gammaln2(n + 1) - sum(gammaln2(X + 1), 2);

     % VERSION 1: full inner loop
     % for s = 1:l_s
     %   B = meanPass_powC(k + 1) * segments.NewLength(s) / (k + 1);      
     %   p_stk = exp(c + X * logPROB(:, s));
     %   
     %   innersum = 0;
     %   for t = 0 : n-k
     %      innersum = innersum + p_stk(t+1) * meanPass_pow(t+1);
     %   end
     %   expSlength(s) = expSlength(s) + innersum * B;
     % end

     % VERSION 2: Partially vectorized
     % p_stk = exp(c + X * logPROB);
     % B = segments.NewLength * (meanPass_powC(k + 1) / (k + 1));
     % for s = 1:l_s
     %    innersum = 0;
     %    for t = 0 : n-k
     %       innersum = innersum + p_stk(t+1, s) * meanPass_pow(t+1);
     %    end
     %    expSlength(s) = expSlength(s) + innersum * B(s);
     % end

     % VERSION 3: fully vectorized
     p_stk      = exp(c + X * logPROB);
     B          = segments.NewLength * (meanPass_powC(k + 1) / (k + 1));
     expSlength = expSlength + (meanPass_pow(1:n-k+1) * p_stk).' .* B;  
  end
  toc

 n = 50;
 ARRAYFUN:   8.2 sec
 VERSION 1: 19.5 sec
 VERSION 2:  6.2 sec
 VERSION 3:  5.0 sec
It is getting near to be solvable without pain. Try a PARFOR (I do not have the Parallel Processing Toolbox) in addition.
 
I'm astonished that the 50 lines of code in mnpdf() can be replaced by the fully vectorized exp(c + X * logPROB).
 
[EDITED] For n=10000 a temporary array of the size 6.9GB is required for p_stk, while the other temporary arrays are vectors of 735kB only. This might run successfully on a 16GB machine.


Not satisfied with the answer ?? ASK NOW

Get a Free Consultation or a Sample Assignment Review!