I have data in the form:
FUND|BROKER|QTYF1|B1|10F1|B1|50F1|B2|20F1|B3|20
When I group it by FUND, and BROKER, I would like to calculate QTY as a percentage of the total at the group level. Like so,
FUND|BROKER|QTY %|QTY EXPLANATIONF1|B1|60%|(10+50)/(10+50+20+20)F1|B2|20%|(20)/(10+50+20+20)F1|B2|20%|(20)/(10+50+20+20)
Or when I group by just FUND, like so
FUND|BROKER|QTY %|QTY EXPLANATIONF1|B1|16.66|(10)/(10 + 50)F1|B1|83.33|(50)/(10 + 50)F1|B2|100|(20)/(20)F1|B3|100|(20)/(20)
I would like to achieve this using spark-sql if possible or through dataframe functions.
I think I have to use Windowing functions, so I can get access to the total of the grouped dataset, but I've not had much luck using them the right way.
Dataset<Row> result = sparkSession.sql("SELECT fund_short_name, broker_short_name,first(quantity)/ sum(quantity) as new_col FROM margin_summary group by fund_short_name, broker_short_name" );