just in case: UDF for Exponential moving average in Pig Latin

Today I faced with the fact that there's no native way to calculate moving average in Pig.

For example:

A = {(5, 1), (2, 2), (7, 3), (4, 4)}
And we need to calculate EMA of first field, with weight of second field. alpha=0.5.

ema(A) = (5*1 + 2*0.5 + 7*0.25 + 4*0.125) / (1 + 0.5 + 0.25 + 0.125) = 4.4

In Pig and Python UDF it will be like this:

REGISTER 'python_udf.py' USING jython AS myfuncs;

B = GROUP A ALL;
C = FOREACH times {
    GENERATE A as src,
            myfuncs.EMA(A, 1, 3, 0.5) as ema;
}

DUMP C;

UDF:

@outputSchema("value:double")
def EMA(D, weight_field, wmax, alpha):
    """
    Calculates exponential moving average
    note: weights are reversed!
    """
    weights = [x for x in range(1, wmax+1)]
    weights_values = {}
    wv = 1.0
    for w in weights:
        weights_values[w] = wv
        wv *= alpha
    denom = sum(weights_values.values())
    numer = 0.0
    for weight in weights:
        numer += sum(1 for x in D if x[weight_field] == weight)*weights_values[weight]
    return numer/denom

Pretty straightforward, but it works. If you know more elegant way, please share it!

just in case

понедельник, 28 апреля 2014 г.

UDF for Exponential moving average in Pig Latin

Комментариев нет

Отправить комментарий

понедельник, 28 апреля 2014 г.

UDF for Exponential moving average in Pig Latin

Комментариев нет

Отправить комментарий

понедельник, 28 апреля 2014 г.