Excel: Counting Unique Values - why CSE?


Hi all. I was wondering if anyone could explain why, when using the standard FREQUENCY/MATCH construction for counting unique values within a range, in almost every single case is this done using a CSE formula, even though the non-CSE equivalent with SUMPRODUCT is a perfectly valid one? For example, to calculate the number of unique entries in the range A1:A10 (I will assume that there are no blanks in this range, and also that the entries are not such that they merit any prior coercion, e.g. with "~"), we almost always see: =SUM(IF(FREQUENCY(MATCH(A1:A10,A1:A10,0),ROW(A1:A10)-ROW(A1)+1),1)) with CSE. But why? Why not: =SUMPRODUCT(N(FREQUENCY(MATCH(A1:A10,A1:A10,0),ROW(A1:A10)-ROW(A1)+1)>0)) ? (Or some other suitable coercer - double-unary, addition of zero, etc. - in place of my use of the N function, if you prefer.) After all, in almost all other cases, the general consensus seems to be that we only resort to a CSE formula when absolutely necessary, i.e. if there does not exist a non-CSE set-up. But here that is clearly not the case. And yet, as I said, practically every single example that I've seen (and that amounts to several hundred, I would imagine) uses the CSE construction, not the non-CSE one. Is this simply an example of habit? Someone posted the CSE version one day (perhaps not realising that it was achievable using simply SUMPRODUCT) and people simply caught on, perhaps also not realising that it might be translatable into a non-CSE equivalent? Or is there a more logical explanation? Is it the case, for example, that, for whatever reasons, the CSE version is actually preferable? Does it, for example, offer better performance in terms of calculation speed and/or efficiency? I have not tested this hypothesis, but can this really be the case? If so, does it apply to all cases of CSE vs SUMPRODUCT? Or is there something in particular about the combination of FREQUENCY and MATCH which lends itself better to a CSE construction than to a non-CSE one? Again, I cannot imagine how this could be possible. And if it's not particular to this construction, but rather true in general - that this CSE construction with IF(...,1) performs better than SUMPRODUCT - then shouldn't we be using it in all cases? So instead of: =SUMPRODUCT(N(A1:A10="A"),N(B1:B10="B")) we should all be using the CSE: =SUM(IF(A1:A10="A",IF(B1:B10="B",1))) Apologies for the rather esoteric post, but this has been on my mind for a while. I know of no other situation in which a CSE formula is preferred to such an extent over a non-CSE one, and I cannot for the life of me think why this might be the case. Regards


This question generated 12 answers. To proceed to the answers, click here.

This thread is current as of June 19, 2015.


For more resources for Microsoft Excel