python - pandas dataframe groupby like mysql, yet into new column -


df = pd.dataframe({'a':[11,11,22,22],'mask':[0,0,0,1],'values':np.arange(10,30,5)}) df        mask    values 0    11  0   10 1    11  0   15 2    22  0   20 3    22  1   25 

now how can group a, , keep column names in tact, , yet put custom function z:

def calculate_df_stats(dfs):      mask_ = list(dfs['b'])     mean = np.ma.array(list(dfs['values']), mask=mask_).mean()     return mean  df['z'] = df.groupby('a').agg(calculate_df_stats) # not work 

and generate:

       mask   values   z 0    11  0        10    12.5 1    22  0        20    25 

whatever replaces values column masked mean.

and can solution applied function on 2 columns , return in new column?

thanks!

edit: clarify more: let's have such table in mysql:

select * `reader_datapoint` `wavelength` = '560'  limit 200; 

which gives me such result: http://pastebin.com/qxiawcjq

if run this:

select *, avg(action_value) `reader_datapoint` `wavelength` = '560'  group `reader_plate_id`; 

i get:

datapoint_id    plate_id    coordinate_x    coordinate_y    res_value   wavelength  ignore  avg(action_value) 193 1   0   0   2.1783  560 null    2.090027083333334 481 2   0   0   1.7544  560 null    1.4695583333333333 769 3   0   0   2.0161  560 null    1.6637885416666673 

how can replicate behaviour in pandas? note column names stay same, first value taken, , new column added.

if want original columns in result, can first calculate grouped , aggregated dataframe (but have aggregate in way original columns. took first occuring example):

>>> df = pd.dataframe({'a':[11,11,22,22],'mask':[0,0,0,1],'values':np.arange(10,30,5)}) >>>  >>> grouped = df.groupby("a") >>>  >>> result = grouped.agg('first') >>> result     mask  values                11     0      10 22     0      20 

and add column 'z' result applying function on groupby result 'grouped':

>>> def calculate_df_stats(dfs): ...     mask_ = list(dfs['mask']) ...     mean = np.ma.array(list(dfs['values']), mask=mask_).mean() ...     return mean ...  >>> result['z'] = grouped.apply(calculate_df_stats) >>>  >>> result     mask  values     z                      11     0      10  12.5 22     0      20  20.0 

in function definition can use more columns (just name) return result.


Comments

Popular posts from this blog

c++ - Function signature as a function template parameter -

algorithm - What are some ways to combine a number of (potentially incompatible) sorted sub-sets of a total set into a (partial) ordering of the total set? -

How to call a javascript function after the page loads with a chrome extension? -