python spark 通過key來統計不同values個數

本文轉載自查看原文 2017-07-12 14:07 1654 python/ spark

>>> rdd = sc.parallelize([("a", "1"), ("b", 1), ("a", 1), ("a", 1)])
>>> rdd.distinct().countByKey().items()
[('a', 2), ('b', 1)]

OR:

from operator import add


rdd.distinct().map(lambda x: (x[0], 1)).reduceByKey(add)

rdd.distinct().keys().map(lambda x: (x, 1)).reduceByKey(add)

distinct(numPartitions=None)

Return a new RDD containing the distinct elements in this RDD.

 
          >>> sorted(sc.parallelize([1, 1, 2, 3]).distinct().collect()) [1, 2, 3] 
         

countByKey()

Count the number of elements for each key, and return the result to the master as a dictionary.

 
          >>> rdd = sc.parallelize([("a", 1), ("b", 1), ("a", 1)]) >>> sorted(rdd.countByKey().items()) [('a', 2), ('b', 1)]

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python 字符個數統計 python-統計字符個數 Python 統計文本中單詞的個數 python統計列表內元素個數 python 統計數組中某個元素的個數 python 統計單詞個數，並按個數與字母排序 'dict_values' object does not support indexing, Python字典dict中由value查key python 統計數組中大於某個值的數的個數 Python 基礎 - 統計文本里單詞的個數以及出現的次數 Python入門習題7.分別統計輸入各類字符個數