Quantcast
Viewing all articles
Browse latest Browse all 3

Answer by Vamsi Prabhala for How do I calculate percentages over groups in spark?

PySpark SQL solution.

This can be done using sum as a window function defining 2 windows - one with a grouping on broker, fund and the other only on fund.

from pyspark.sql import Windowfrom pyspark.sql.functions import sumw1 = Window.partitionBy(df.fund,df.broker)w2 = Window.partitionBy(df.fund)res = df.withColumn('qty_pct',sum(df.qty).over(w1)/sum(df.qty).over(w2))res.select(res.fund,res.broker,res.qty_pct).distinct().show()

Edit: Result 2 is simpler.

res2 = df.withColumn('qty_pct',df.qty/sum(df.qty).over(w1))res2.show()

SQL solution would be

select distinct fund,broker,100*sum(qty) over(partition by fund,broker)/sum(qty) over(partition by fund)from tbl

Viewing all articles
Browse latest Browse all 3

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>