python - pandas dataframe groupby like mysql, yet into new column -
df = pd.dataframe({'a':[11,11,22,22],'mask':[0,0,0,1],'values':np.arange(10,30,5)}) df mask values 0 11 0 10 1 11 0 15 2 22 0 20 3 22 1 25
now how can group a, , keep column names in tact, , yet put custom function z:
def calculate_df_stats(dfs): mask_ = list(dfs['b']) mean = np.ma.array(list(dfs['values']), mask=mask_).mean() return mean df['z'] = df.groupby('a').agg(calculate_df_stats) # not work
and generate:
mask values z 0 11 0 10 12.5 1 22 0 20 25
whatever replaces values column masked mean.
and can solution applied function on 2 columns , return in new column?
thanks!
edit: clarify more: let's have such table in mysql:
select * `reader_datapoint` `wavelength` = '560' limit 200;
which gives me such result: http://pastebin.com/qxiawcjq
if run this:
select *, avg(action_value) `reader_datapoint` `wavelength` = '560' group `reader_plate_id`;
i get:
datapoint_id plate_id coordinate_x coordinate_y res_value wavelength ignore avg(action_value) 193 1 0 0 2.1783 560 null 2.090027083333334 481 2 0 0 1.7544 560 null 1.4695583333333333 769 3 0 0 2.0161 560 null 1.6637885416666673
how can replicate behaviour in pandas? note column names stay same, first value taken, , new column added.
if want original columns in result, can first calculate grouped , aggregated dataframe (but have aggregate in way original columns. took first occuring example):
>>> df = pd.dataframe({'a':[11,11,22,22],'mask':[0,0,0,1],'values':np.arange(10,30,5)}) >>> >>> grouped = df.groupby("a") >>> >>> result = grouped.agg('first') >>> result mask values 11 0 10 22 0 20
and add column 'z' result applying function on groupby result 'grouped':
>>> def calculate_df_stats(dfs): ... mask_ = list(dfs['mask']) ... mean = np.ma.array(list(dfs['values']), mask=mask_).mean() ... return mean ... >>> result['z'] = grouped.apply(calculate_df_stats) >>> >>> result mask values z 11 0 10 12.5 22 0 20 20.0
in function definition can use more columns (just name) return result.
Comments
Post a Comment