Quantcast
Channel: How do I calculate percentages over groups in spark? - Stack Overflow
Viewing all articles
Browse latest Browse all 3

How do I calculate percentages over groups in spark?

$
0
0

I have data in the form:

FUND|BROKER|QTYF1|B1|10F1|B1|50F1|B2|20F1|B3|20

When I group it by FUND, and BROKER, I would like to calculate QTY as a percentage of the total at the group level. Like so,

FUND|BROKER|QTY %|QTY EXPLANATIONF1|B1|60%|(10+50)/(10+50+20+20)F1|B2|20%|(20)/(10+50+20+20)F1|B2|20%|(20)/(10+50+20+20)

Or when I group by just FUND, like so

FUND|BROKER|QTY %|QTY EXPLANATIONF1|B1|16.66|(10)/(10 + 50)F1|B1|83.33|(50)/(10 + 50)F1|B2|100|(20)/(20)F1|B3|100|(20)/(20)

I would like to achieve this using spark-sql if possible or through dataframe functions.

I think I have to use Windowing functions, so I can get access to the total of the grouped dataset, but I've not had much luck using them the right way.

Dataset<Row> result = sparkSession.sql("SELECT fund_short_name, broker_short_name,first(quantity)/ sum(quantity) as new_col FROM margin_summary group by fund_short_name, broker_short_name" );

Viewing all articles
Browse latest Browse all 3

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>