Bonus Speed - Profiling - 快!(上)

有冇試過同人講自己寫既program好很快?但係又唔知有幾快呀。
有冇“朋友”行埋來同你講叫你咁咁咁就會快d,但係佢講既野又係9up多ga。
又有冇試過冇啦啦食曬你部機d ram 呀,但係你又唔知邊q個問題喎。

因為大家未試過profiling 吧。

講真,有時用個profiler check一check,一試便知龍與鳳這個道理大家又點會唔明白呀~

問題係做完試完改完,個老細又唔會加你人工,最多咪講句thank you(on9)=[













不過,你一生唔會得一個老細,為下一個老細加油吧!轉工急先鋒!上!




入正題

今次我用左一條由Project Euler抽出了一條十分簡單既問題做我地既main course: 題目如下:
Find the sum of all the multiples of 3 or 5 below 1000.
姐係咁: 3+5+6+9+10+12+....999 = ?

好,唔明既朋友我地停一停諗一諗,緊接下來我會用3種方法去解決。

  1. 開個loop由0數到1000,check每一個number可唔可以mod(%) by 3or5 [Brute force]
  2. construct 一個formula 計條數出來 [Fast, require Maths]
  3. multiprocessing / multithreading 去協助方法1。[Fast, Require Multi core machine]
好,如果你想話我on9,咁就一定ga 啦,因為計d咁既野,用1或者2冇死錯人啦,用3! 你以為自己好醒呀,教人用multi processing! 

但係。。。。如果我set個數由below 1000 去到below 1000M 呢,你除左用方法2去配條式出來就要不然開個1000M既loop,喂,你估loop既耐先,用幾多ram先。

唉~~說到這裡,我還未跟大家提及使用甚麼語言。
今次我會用pythonjava(下集) 一齊去搞搞佢,因為會得出一個好有意思既對比
註:code既syntax,唔係今次既賣點。

係python,我會用 cProfile  (Built-in, no step to install!)  去check下我個running time,Lets Start with Constant Value first.
BOUND = 100000000
NUM_PROCESS = 10
# Warning, we assume STEP is an integar only.
STEP = BOUND/NUM_PROCESS
方法一:
range去開loop是常識吧~ 
錯~你應該要用xrange了, 分別好簡單,xrange係唔用memory去開個loop index list,同你慳返d ram,相反,range係未入loop前就已經要開個list去做index,姐係你個range愈大愈食ram。
def alternate_method_xrange():
  # loop throught each value from 1..1000 by range / xrange
  # check value and add it up if condition matched
  a = 0
  for x in xrange(BOUND):
    if x%3==0 or x%5==0:
      a += x
  return a 


因為方法2其實只係一條好普通既數學題,在下唔講解了。
def fastest_way():
  n = int((BOUND-1)/3)
  m = int((BOUND-1)/5)
  r = int((BOUND-1)/(5*3))

  return (3*(n+1)*n/2) + (5*(m+1)*m/2) - (15*(r+1)*r/2)

方法三:
用multiprocessing(留心係multiprocessing)去開一個process pool, 意思係個pool入面有好多worker等住幫你做野,你比task list, function入個pool,個pool.map會自己分工。
from multiprocessing import Pool
import time
"""
use in multiprocessing_the_right_way
"""
def mod_check_with_range(r):
  a = 0
  for x in xrange(r, r+STEP):
    if x%3==0 or x%5==0:
      a += x
  return a

"""
Where Magic happen
"""
def alternate_method_multiprocess_the_right_way():
  t = time.time()
  p = Pool(NUM_PROCESS)
  res = p.map(mod_check_with_range, range(0, BOUND, STEP))
  r = sum(res)
  print '\tmap(mod_check_with_range, range(bound)):\n\t\t%s seconds' % \
     (time.time() - t)
  return r 


[因為cProfile 同multiprocessing一齊用會有少少問題,我地用time去計running time]

好,在下用cPython2.7.5  2.3GHz Intel Core i7 , Mac OSX Mountain Lion:

def main():
  print fastest_way()
  print alternate_method_xrange()
  print alternate_method_range()

if __name__ == "__main__":
  cProfile.run('main()') #The only place using cProfile, How cool is that
  print alternate_method_multiprocess_the_right_way() #Since cProfile will have no meaning when multiprocess come in, we let it run individually  


The Result as Follow:





分析:
answer: bound below 1000M is 2333333316666668
speed - formula(0s) > multiprocess(3.568s) > xrange(16.047) > range(16.460s)

Woww! 快左好多呀。 未算。。。。 其實仲有得再快既~~

bonus:
我地用cPython,係因為我地內置跟機附設。問題係世界那麼大,有樣野叫pypy,佢同cPython有好多分別,其中speed係最大分別,因為加左JIT(just in time complier)。好,先睇result [以後再講pypy]:


神呀~ 又快左呀!!
speed - formula(0s) > multiprocess(1.018 3xfaster) > xrange(2.063 8xfaster) > range(2.386 8xfaster)


總結:
今次只是玩弄性質,不用太過在意解決既問題。今後會再講“multiprocessing” 同“multithreading” 同埋java既分析。等唔切既朋友可以上我github到睇返python 同java 既coding。因為再講會太長,唔好意思!
呀~github既source file 入面有easter egg,你會搵到的~ =]


p.s. 一路寫blog一路無盡~!




Comments

Popular posts from this blog

民意 - 身份 - 真偽

GAE - How to make a Free(Cheap) application