优化是基于性能分析的结果

高效地分析性能

前提是完整的单元测试,先解决愚蠢的小问题

Julia集合

Julia集合包含一个CPU密集型组件和一个显式的输入集合,可以用来分析CPU和RAM的使用情况,可以产生一个复杂的输出图像的分型数列,每个点的计算是独立的,故这个问题是一个完美并行计算问题。

通过输出计算灰度图像,可以帮助我们直观化计算,图中白色的像素对应计算开销大的点,黑色为计算快的地方

使用 time.time() 统计函数运行时间

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
import time
import numpy as np
from matplotlib import pylab as plt
from functools import wraps

x1, x2, y1, y2 = -1.8, 1.8, -1.8, 1.8
c_real, c_imag = -0.62772, -0.42193


def timefn(fn):
    @wraps(fn)
    def measure_time(*args, **kwargs):
        st = time.time()
        result = fn(*args, **kwargs)
        ed = time.time()
        print("@timefn {} took {}s".format(fn.__name__, ed - st))
        return result
    return measure_time


def calc_z_serial_purepython(max_iterations, zs, cs):
    output = [0] * len(zs)
    for i in range(len(zs)):
        n = 0
        z = zs[i]
        c = cs[i]
        while abs(z) < 2 and n < max_iterations:
            z = z * z + c
            n += 1
        output[i] = n
    return output


@timefn
def calc_pure_python(desired_width, max_iterations):
    """
    构造 zs 和 cs (常复数列)
    """
    x_step = (float(x2 - x1) / float(desired_width))
    y_step = (float(y1 - y2) / float(desired_width))
    x, y = [], []
    ycoord = y2
    while ycoord > y1:
        y.append(ycoord)
        ycoord += y_step
    xcoord = x1
    while xcoord < x2:
        x.append(xcoord)
        xcoord += x_step

    zs, cs = [], []
    for ycoord in y:
        for xcoord in x:
            zs.append(complex(xcoord, ycoord))
            cs.append(complex(c_real, c_imag))

    print("Length of x:", len(x))
    print("total elements:", len(zs))
    output = calc_z_serial_purepython(max_iterations, zs, cs)
    if desired_width == 1000:
        assert sum(output) == 33219980

    return output


if __name__ == "__main__":
    desired_width = 1000
    mat = calc_pure_python(desired_width=desired_width, max_iterations=300)
    mat = np.array(mat).reshape(desired_width, desired_width)
    plt.imshow(mat, cmap='gray')
    plt.title(r'Julia 数列分形')
    plt.axis('off')
    plt.show()

输出结果

1
2
3
Length of x: 1000
total elements: 1000000
calc_z_serial_purepython took 10.0214s

Julia分形

使用 cli 运行 timeit

pythonw -m timeit -n 5 -r 5 -s "import Julia" "Julia.calc_pure_python(desired_width=1000, max_iterations=300)"
-n 循环n次, -s 重复s次 返回最好的结果
也可以在IPython中标使用过魔法函数%timeit calc_pure_python(desired_width=1000, max_iterations=300) 进行测试

1
2
# %timeit calc_pure_python(desired_width=1000, max_iterations=300)
13.8 s ± 709 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

使用 Unix 的 time 命令进行计时

/usr/bin/time -p pythonw Julia.py or
gtime -v pythonw Julia.py 用time显示更多的信息(brew install gnu-time)
输出

1
2
3
4
5
6
Length of x: 1000
total elements: 1000000
@timefn calc_pure_python took 11.55336880683899s
real        12.38
user        11.76
sys          0.31

real 记录整体的耗时
user记录CPU花在任务上的时间,不包括内核函数耗费的时间
sys记录内核函数耗费的时间

使用 cProfile 模块

使用内建的cProfilel对代码内函数累计耗时进行分析,
python -m cProfile -s cumlative Julia.py
输出

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
Length of x: 1000
total elements: 1000000
@timefn calc_pure_python took 16.961487770080566s
         36546792 function calls (36540128 primitive calls) in 18.020 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    326/1    0.017    0.000   18.021   18.021 {built-in method builtins.exec}
        1    0.000    0.000   18.021   18.021 Julia.py:1(<module>)
        1    0.047    0.047   16.962   16.962 Julia.py:11(measure_time)
        1    0.931    0.931   16.914   16.914 Julia.py:34(calc_pure_python)
        1   12.273   12.273   15.785   15.785 Julia.py:21(calc_z_serial_purepython)
 34220052    3.511    0.000    3.511    0.000 {built-in method builtins.abs}
  2021548    0.192    0.000    0.192    0.000 {method 'append' of 'list' objects}

calc_z_serial_purepython的函数调用花费了15.785s, 约有(16.914-15.785=1.129s)花费在calc_pure_python
内建函数被调用了34220052次,{abs} 大约花费了3.511s

生成 cProfile 统计文件分析

python -m cProfile -o profile.state Julia.py

使用交互式环境分析

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
In [1]: import pstats

In [2]: p = pstats.Stats("profile.state")

In [3]: p.sort_stats("cumulative")
Out[3]: <pstats.Stats at 0x10b2dee80>

In [4]: p.print_stats()
Wed Jan 30 14:16:30 2019    profile.state

         36546758 function calls (36540094 primitive calls) in 18.139 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    326/1    0.017    0.000   18.140   18.140 {built-in method builtins.exec}
        1    0.000    0.000   18.140   18.140 Julia.py:1(<module>)
        1    0.040    0.040   17.200   17.200 Julia.py:11(measure_time)
        1    0.975    0.975   17.160   17.160 Julia.py:34(calc_pure_python)
        1   12.447   12.447   15.979   15.979 Julia.py:21(calc_z_serial_purepython)
 34220052    3.532    0.000    3.532    0.000 {built-in method builtins.abs}
  2021548    0.203    0.000    0.203    0.000 {method 'append' of 'list' objects}
In [5]: p.print_callers() # 打印调用者信息
   Ordered by: cumulative time
Function                                was called by...
                                                ncalls  tottime  cumtime
{built-in method builtins.exec}         <-      28    0.015    0.018  collections/__init__.py:357(namedtuple)
                                                 1    0.000    0.000  six.py:21(<module>)
Julia.py:1(<module>)                    <-       1    0.000   18.140  {built-in method builtins.exec}
Julia.py:11(measure_time)               <-       1    0.040   17.200  Julia.py:1(<module>)
Julia.py:34(calc_pure_python)           <-       1    0.975   17.160  Julia.py:11(measure_time)
Julia.py:21(calc_z_serial_purepython)   <-       1   12.447   15.979  Julia.py:34(calc_pure_python)
{built-in method builtins.abs}          <-      72    0.000    0.000  datetime.py:356(__new__)
                                          34219980    3.532    3.532  Julia.py:21(calc_z_serial_purepython)

使用 runsnakerun 可视化性能

1
2
3
4
5
6
echo "install runsnakerun Python 3 port"
conda install wxpython \
wget https://codeload.github.com/venthur/snakerunner/zip/master \
unzip master -d ./ \
pythonw runsnakerun/squaremap.py install \
pythonw runsnakerun/runsnake.py ~/Desktop/profile.state

使用 line_profilelr 进行逐行分析

pip install line_profiler
对需要分析的源码使用装饰器@profile装饰
注释掉无用的import kernprof -l -v Julia.py 参数-l 指定逐行分析,-v 用于显示而不是仅仅生成*.lprof文件
分析输出

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Length of x: 1000
total elements: 1000000
@timefn calc_pure_python took 114.2270290851593s
Wrote profile results to Julia.py.lprof
Timer unit: 1e-06 s

Total time: 68.5963 s
File: Julia.py
Function: calc_z_serial_purepython at line 21

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    21                                           @profile
    22                                           def calc_z_serial_purepython(max_iterations, zs, cs):
    23         1       5881.0   5881.0      0.0      output = [0] * len(zs)
    24   1000001     532830.0      0.5      0.8      for i in range(len(zs)):
    25   1000000     489566.0      0.5      0.7          n = 0
    26   1000000     594124.0      0.6      0.9          z = zs[i]
    27   1000000     550479.0      0.6      0.8          c = cs[i]
    28  34219980   26084661.0      0.8     38.0          while abs(z) < 2 and n < max_iterations:
    29  33219980   21142269.0      0.6     30.8              z = z * z + c
    30  33219980   18619273.0      0.6     27.1              n += 1
    31   1000000     577216.0      0.6      0.8          output[i] = n
    32         1          1.0      1.0      0.0      return output

从% Time 列可见循环体判断耗时占比30.8%, 更新z占比30.8%, 迭代n占比27.1%, 可见Python的动态机制相当费时,现在我们有必要拆开While的判断条件,进一步分析

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Length of x: 1000
total elements: 1000000
@timefn calc_pure_python took 178.96182918548584s
Wrote profile results to Julia.py.lprof
Timer unit: 1e-06 s

Total time: 103.592 s
File: Julia.py
Function: calc_z_serial_purepython at line 21

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    21                                           @profile
    22                                           def calc_z_serial_purepython(max_iterations, zs, cs):
    23         1       7309.0   7309.0      0.0      output = [0] * len(zs)
    24   1000001     548930.0      0.5      0.5      for i in range(len(zs)):
    25   1000000     493039.0      0.5      0.5          n = 0
    26   1000000     606243.0      0.6      0.6          z = zs[i]
    27   1000000     555052.0      0.6      0.5          c = cs[i]
    28                                                   # splite while condations
    29                                                   # ------------------------------------------------
    30   1000000     492872.0      0.5      0.5          while True:
    31  34219980   23040259.0      0.7     22.2              not_yet_escaped = abs(z) < 2
    32  34219980   19264794.0      0.6     18.6              iterations_left = n < max_iterations
    33  34219980   17987205.0      0.5     17.4              if not_yet_escaped and iterations_left:
    34                                                           # +++++++++++++++++++++++++++++++++++++++++
    35                                                           # while abs(z) < 2 and n < max_iterations:
    36  33219980   21210997.0      0.6     20.5                      z = z * z + c
    37  33219980   18282424.0      0.6     17.6                      n += 1
    38                                                           # ++++++++++++++++++++++++++++++++++++++++
    39                                                       else:
    40   1000000     522318.0      0.5      0.5                  break
    41                                                   # ------------------------------------------------
    42   1000000     580586.0      0.6      0.6          output[i] = n
    43         1          0.0      0.0      0.0      return output

n的判断显然耗时少,基于此我们可以利用逻辑运算的短路性质,先判断n再判断z, 每301次少判断一次z, 尝试改变while条件的判断顺序

先 z 再 n
Total time: 65.2372 s
35 34219980 24824304.0 0.7 38.1 while abs(z) < 2 and n < max_iterations:

先 n 再 z
Total time: 63.592 s
35 34219980 23858361.0 0.7 37.5 while n < max_iterations and abs(z) < 2:

可见顺序影响微乎,且局限性太强, 根据事先的估计测试另外一个函数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Length of x: 1000
total elements: 1000000
@timefn calc_pure_python took 36.08730697631836s
Wrote profile results to Julia.py.lprof
Timer unit: 1e-06 s

Total time: 34.3176 s
File: Julia.py
Function: calc_pure_python at line 45

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    45                                           @timefn
    46                                           @profile
    47                                           def calc_pure_python(desired_width, max_iterations):
    48                                               """
    49                                               构造 zs 和 cs (常复数列)
    50                                               """
    51         1         10.0     10.0      0.0      x_step = (float(x2 - x1) / float(desired_width))
    52         1          2.0      2.0      0.0      y_step = (float(y1 - y2) / float(desired_width))
    53         1          2.0      2.0      0.0      x, y = [], []
    54         1          1.0      1.0      0.0      ycoord = y2
    55      1001        791.0      0.8      0.0      while ycoord > y1:
    56      1000        769.0      0.8      0.0          y.append(ycoord)
    57      1000        740.0      0.7      0.0          ycoord += y_step
    58         1          1.0      1.0      0.0      xcoord = x1
    59      1001        694.0      0.7      0.0      while xcoord < x2:
    60      1000        753.0      0.8      0.0          x.append(xcoord)
    61      1000        799.0      0.8      0.0          xcoord += x_step
    62                                           
    63         1          1.0      1.0      0.0      zs, cs = [], []
    64      1001        887.0      0.9      0.0      for ycoord in y:
    65   1001000     708149.0      0.7      2.1          for xcoord in x:
    66   1000000    1116311.0      1.1      3.3              zs.append(complex(xcoord, ycoord))
    67   1000000    1055806.0      1.1      3.1              cs.append(complex(c_real, c_imag))
    68                                           
    69         1         87.0     87.0      0.0      print("Length of x:", len(x))
    70         1         14.0     14.0      0.0      print("total elements:", len(zs))
    71         1   31425719.0 31425719.0     91.6      output = calc_z_serial_purepython(max_iterations, zs, cs)
    72         1          2.0      2.0      0.0      if desired_width == 1000:
    73         1       6025.0   6025.0      0.0          assert sum(output) == 33219980
    74                                           
    75         1          2.0      2.0      0.0      return output

可见创建列表耗费的时间相对来说无足轻重。

使用 memory_profiler 诊断内存的用量

逐行分析语句RAM分配情况,或者在IPython中使用魔法函数%memit, 用法与line_profiler和%timeit一致
python -m memory_profiler Julia.py 内存分析比较耗时

RAM分析输出

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Length of x: 1000
total elements: 1000000
@timefn calc_pure_python took 341.9512851238251s
Filename: Julia.py

Line #    Mem usage    Increment   Line Contents
================================================
    45   42.395 MiB   42.395 MiB   @timefn
    46                             @profile
    47                             def calc_pure_python(desired_width, max_iterations):
    48                                 """
    49                                 构造 zs 和 cs (常复数列)
    50                                 """
    51   42.395 MiB    0.000 MiB       x_step = (float(x2 - x1) / float(desired_width))
    52   42.395 MiB    0.000 MiB       y_step = (float(y1 - y2) / float(desired_width))
    53   42.395 MiB    0.000 MiB       x, y = [], []
    54   42.395 MiB    0.000 MiB       ycoord = y2
    55   42.398 MiB    0.000 MiB       while ycoord > y1:
    56   42.398 MiB    0.004 MiB           y.append(ycoord)
    57   42.398 MiB    0.000 MiB           ycoord += y_step
    58   42.398 MiB    0.000 MiB       xcoord = x1
    59   42.418 MiB    0.004 MiB       while xcoord < x2:
    60   42.418 MiB    0.000 MiB           x.append(xcoord)
    61   42.418 MiB    0.000 MiB           xcoord += x_step
    62                             
    63   42.418 MiB    0.000 MiB       zs, cs = [], []
    64  104.699 MiB    0.000 MiB       for ycoord in y:
    65  104.699 MiB    0.000 MiB           for xcoord in x:
    66  104.699 MiB    0.027 MiB               zs.append(complex(xcoord, ycoord))
    67  104.699 MiB    0.031 MiB               cs.append(complex(c_real, c_imag))
    68                             
    69  104.703 MiB    0.004 MiB       print("Length of x:", len(x))
    70  104.703 MiB    0.000 MiB       print("total elements:", len(zs))
    71  108.703 MiB    4.000 MiB       output = calc_z_serial_purepython(max_iterations, zs, cs)
    72  108.707 MiB    0.004 MiB       if desired_width == 1000:
    73  108.707 MiB    0.000 MiB           assert sum(output) == 33219980
    74                             
    75  108.723 MiB    0.016 MiB       return output

更为详细的RAM分析(7038s)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
Length of x: 1000
total elements: 1000000
@timefn calc_pure_python took 7038.780977964401s
Filename: Julia.py

Line #    Mem usage    Increment   Line Contents
================================================
    20  104.754 MiB  104.754 MiB   @profile
    21                             def calc_z_serial_purepython(max_iterations, zs, cs):
    22  112.387 MiB    7.633 MiB       output = [0] * len(zs)
    23  112.453 MiB    0.000 MiB       for i in range(len(zs)):
    24  112.453 MiB    0.000 MiB           n = 0
    25  112.453 MiB    0.004 MiB           z = zs[i]
    26  112.453 MiB    0.004 MiB           c = cs[i]
    27                                     # splite while condations
    28                                     # ------------------------------------------------
    29                                     # while True:
    30                                     #     not_yet_escaped = abs(z) < 2
    31                                     #     iterations_left = n < max_iterations
    32                                     #     if not_yet_escaped and iterations_left:
    33                                     # +++++++++++++++++++++++++++++++++++++++++
    34  112.453 MiB    0.004 MiB           while n < max_iterations and abs(z) < 2:
    35  112.453 MiB    0.000 MiB               z = z * z + c
    36  112.453 MiB    0.000 MiB               n += 1
    37                                         # ++++++++++++++++++++++++++++++++++++++++
    38                                         # else:
    39                                         #     break
    40                                     # ------------------------------------------------
    41  112.453 MiB    0.000 MiB           output[i] = n
    42   50.055 MiB    0.000 MiB       return output


Filename: Julia.py

Line #    Mem usage    Increment   Line Contents
================================================
    45   42.223 MiB   42.223 MiB   @timefn
    46                             @profile
    47                             def calc_pure_python(desired_width, max_iterations):
    48                                 """
    49                                 构造 zs 和 cs (常复数列)
    50                                 """
    51   42.223 MiB    0.000 MiB       x_step = (float(x2 - x1) / float(desired_width))
    52   42.223 MiB    0.000 MiB       y_step = (float(y1 - y2) / float(desired_width))
    53   42.223 MiB    0.000 MiB       x, y = [], []
    54   42.223 MiB    0.000 MiB       ycoord = y2
    55   42.223 MiB    0.000 MiB       while ycoord > y1:
    56   42.223 MiB    0.000 MiB           y.append(ycoord)
    57   42.223 MiB    0.000 MiB           ycoord += y_step
    58   42.223 MiB    0.000 MiB       xcoord = x1
    59   42.254 MiB    0.004 MiB       while xcoord < x2:
    60   42.254 MiB    0.004 MiB           x.append(xcoord)
    61   42.254 MiB    0.004 MiB           xcoord += x_step
    62                             
    63   42.254 MiB    0.000 MiB       zs, cs = [], []
    64  104.746 MiB    0.000 MiB       for ycoord in y:
    65  104.746 MiB    0.000 MiB           for xcoord in x:
    66  104.746 MiB    0.066 MiB               zs.append(complex(xcoord, ycoord))
    67  104.746 MiB    0.059 MiB               cs.append(complex(c_real, c_imag))
    68                             
    69  104.750 MiB    0.004 MiB       print("Length of x:", len(x))
    70  104.750 MiB    0.000 MiB       print("total elements:", len(zs))
    71   50.172 MiB   50.172 MiB       output = calc_z_serial_purepython(max_iterations, zs, cs)
    72   50.172 MiB    0.000 MiB       if desired_width == 1000:
    73   56.398 MiB    6.227 MiB           assert sum(output) == 33219980
    74                             
    75   56.422 MiB    0.023 MiB       return output

用 heapy 调查堆上的对象

/usr/local/bin/pip -V
pip 18.1 from /usr/local/lib/python2.7/site-packages/pip (python 2.7)

安装guppy在Python2.7上 /usr/local/bin/pip install guppy
验证安装
python2 -c "import guppy; print guppy.__doc__"

1
2
3
4
5
6
7
8
Top level package of Guppy, a library and programming environment
currently providing in particular the Heapy subsystem, which supports
object and heap memory sizing, profiling and debugging.

What is exported is the following:

hpy()	Create an object that provides a Heapy entry point.
Root()	Create an object that provides a top level entry point.

修改完的Python2函数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# encoding=utf-8
from __future__ import print_function
import time
import numpy as np
# from matplotlib import pylab as plt
from functools import wraps

x1, x2, y1, y2 = -1.8, 1.8, -1.8, 1.8
c_real, c_imag = -0.62772, -0.42193


def timefn(fn):
    @wraps(fn)
    def measure_time(*args, **kwargs):
        st = time.time()
        result = fn(*args, **kwargs)
        ed = time.time()
        print("@timefn {} took {}s".format(fn.__name__, ed - st))
        return result
    return measure_time

# @profile
def calc_z_serial_purepython(max_iterations, zs, cs):
    output = [0] * len(zs)
    for i in range(len(zs)):
        n = 0
        z = zs[i]
        c = cs[i]
        # splite while condations
        # ------------------------------------------------
        # while True:
        #     not_yet_escaped = abs(z) < 2
        #     iterations_left = n < max_iterations
        #     if not_yet_escaped and iterations_left:
        # +++++++++++++++++++++++++++++++++++++++++
        while n < max_iterations and abs(z) < 2:
            z = z * z + c
            n += 1
            # ++++++++++++++++++++++++++++++++++++++++
            # else:
            #     break
        # ------------------------------------------------
        output[i] = n
    return output


@timefn
# @profile
def calc_pure_python(desired_width, max_iterations):
    """
    构造 zs 和 cs (常复数列)
    """
    x_step = (float(x2 - x1) / float(desired_width))
    y_step = (float(y1 - y2) / float(desired_width))
    x, y = [], []
    ycoord = y2
    while ycoord > y1:
        y.append(ycoord)
        ycoord += y_step
    xcoord = x1
    while xcoord < x2:
        x.append(xcoord)
        xcoord += x_step
    from guppy import hpy; hp = hpy()
    print("heapy after creating y and x lists of floats")
    h = hp.heap()
    print(h)

    zs, cs = [], []
    for ycoord in y:
        for xcoord in x:
            zs.append(complex(xcoord, ycoord))
            cs.append(complex(c_real, c_imag))

    print("heapy after creating zs and cs using complex numbers")
    h = hp.heap()
    print(h)

    print("Length of x:", len(x))
    print("total elements:", len(zs))
    output = calc_z_serial_purepython(max_iterations, zs, cs)
    print("heapy after calling calc_z_serial_purepython")
    h = hp.heap()
    print(h)
    
    if desired_width == 1000:
        assert sum(output) == 33219980

    return output


if __name__ == "__main__":
    desired_width = 1000
    mat = calc_pure_python(desired_width=desired_width, max_iterations=300)
    # mat = np.array(mat).reshape(desired_width, desired_width)
    # plt.imshow(mat, cmap='gray')
    # plt.title(r'Julia 数列分形')
    # plt.axis('off')
    # plt.show()

内存分析结果

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
heapy after creating y and x lists of floats
Partition of a set of 74521 objects. Total size = 11301352 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  33460  45  4425272  39   4425272  39 str
     1  17624  24  1526264  14   5951536  53 tuple
     2    214   0   883216   8   6834752  60 dict of module
     3   4709   6   602752   5   7437504  66 types.CodeType
     4    573   1   599736   5   8037240  71 dict of type
     5   4553   6   546360   5   8583600  76 function
     6    573   1   512760   5   9096360  80 type
     7    443   1   497288   4   9593648  85 dict (no owner)
     8     93   0   226744   2   9820392  87 unicode
     9   2231   3   178480   2   9998872  88 __builtin__.wrapper_descriptor
<221 more rows. Type e.g. '_.more' to view.>
heapy after creating zs and cs using complex numbers
Partition of a set of 2074529 objects. Total size = 91556576 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 2000003  96 64000096  70  64000096  70 complex
     1    494   0 16365936  18  80366032  88 list
     2  33462   2  4425400   5  84791432  93 str
     3  17623   1  1526200   2  86317632  94 tuple
     4    214   0   883216   1  87200848  95 dict of module
     5   4709   0   602752   1  87803600  96 types.CodeType
     6    573   0   599736   1  88403336  97 dict of type
     7   4552   0   546240   1  88949576  97 function
     8    573   0   512760   1  89462336  98 type
     9    449   0   498968   1  89961304  98 dict (no owner)
<221 more rows. Type e.g. '_.more' to view.>
Length of x: 1000
total elements: 1000000
heapy after calling calc_z_serial_purepython
Partition of a set of 2174931 objects. Total size = 102092800 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 2000003  92 64000096  63  64000096  63 complex
     1    495   0 24492472  24  88492568  87 list
     2  33462   2  4425400   4  92917968  91 str
     3 100951   5  2422824   2  95340792  93 int
     4  17623   1  1526200   1  96866992  95 tuple
     5    214   0   883216   1  97750208  96 dict of module
     6   4709   0   602752   1  98352960  96 types.CodeType
     7    573   0   599736   1  98952696  97 dict of type
     8   4552   0   546240   1  99498936  97 function
     9    573   0   512760   1 100011696  98 type
<221 more rows. Type e.g. '_.more' to view.>
@timefn calc_pure_python took 16.4142379761s

通过观察每一段的内存分配,可以手动GC

用dowser实时画出变量的实例

pass

用 dis 模块检查 CPython 字节码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
In [1]: import dis

In [2]: import Julia

In [3]: dis.dis(Julia.calc_z_serial_purepython)
 24           0 LOAD_CONST               1 (0)
              2 BUILD_LIST               1
              4 LOAD_GLOBAL              0 (len)
              6 LOAD_FAST                1 (zs)
              8 CALL_FUNCTION            1
             10 BINARY_MULTIPLY
             12 STORE_FAST               3 (output)

 25          14 SETUP_LOOP              94 (to 110)
             16 LOAD_GLOBAL              1 (range)
             18 LOAD_GLOBAL              0 (len)
             20 LOAD_FAST                1 (zs)
             22 CALL_FUNCTION            1
             24 CALL_FUNCTION            1
             26 GET_ITER
        >>   28 FOR_ITER                78 (to 108)
             30 STORE_FAST               4 (i)

 26          32 LOAD_CONST               1 (0)
             34 STORE_FAST               5 (n)

 27          36 LOAD_FAST                1 (zs)
             38 LOAD_FAST                4 (i)
             40 BINARY_SUBSCR
             42 STORE_FAST               6 (z)

 28          44 LOAD_FAST                2 (cs)
             46 LOAD_FAST                4 (i)
             48 BINARY_SUBSCR
             50 STORE_FAST               7 (c)

 36          52 SETUP_LOOP              44 (to 98)
        >>   54 LOAD_FAST                5 (n)
             56 LOAD_FAST                0 (max_iterations)
             58 COMPARE_OP               0 (<)
             60 POP_JUMP_IF_FALSE       96
             62 LOAD_GLOBAL              2 (abs)
             64 LOAD_FAST                6 (z)
             66 CALL_FUNCTION            1
             68 LOAD_CONST               2 (2)
             70 COMPARE_OP               0 (<)
             72 POP_JUMP_IF_FALSE       96

 37          74 LOAD_FAST                6 (z)
             76 LOAD_FAST                6 (z)
             78 BINARY_MULTIPLY
             80 LOAD_FAST                7 (c)
             82 BINARY_ADD
             84 STORE_FAST               6 (z)

 38          86 LOAD_FAST                5 (n)
             88 LOAD_CONST               3 (1)
             90 INPLACE_ADD
             92 STORE_FAST               5 (n)
             94 JUMP_ABSOLUTE           54
        >>   96 POP_BLOCK

 43     >>   98 LOAD_FAST                5 (n)
            100 LOAD_FAST                3 (output)
            102 LOAD_FAST                4 (i)
            104 STORE_SUBSCR
            106 JUMP_ABSOLUTE           28
        >>  108 POP_BLOCK

 44     >>  110 LOAD_FAST                3 (output)
            112 RETURN_VALUE

通常使用更为简洁的内建函数,使用更少的字节码

单元测试 No-op的 @profile 装饰器

保持单元测试完整覆盖程式

使用No-op的装饰器避免单元测试遇到装饰器@profile没导入本地空间

基于 nosetests框架的测试, 例如:

1
2
3
4
5
6
7
8
9
import unittest                                                                               
@profile
def some_fn(nbr):
    return nbr * 2

class TestCase(unittest.TestCase):
    def test(self):
        reselt = some_fn(2)
        self.assertEquals(reselt, 4)

NameError

ERROR: Failure: NameError (name ‘profile’ is not defined)

解决方法:在开头添加一个no-op条件,根据命名空间是否含有profile选择是否添加装饰器

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import unittest

# line_profiler
# 手写profile装饰器
if "line_profiler" not in dir():
    def profile(func):
        def inner(*args, **kwargs):
            return func(*args, **kwargs)
        return inner
# 使用 nosetests 测试时 dir() 为
# ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'unittest']

# 使用  kernprof -l -v ex.py 时 dir() 为
# ['__builtins__', '__file__', '__name__', 'args', 'builtins', 'execfile_', 'extension', 'line_profiler', 'options', 'parser', 'prof', 'script_file', 'unittest', 'usage']


@profile
def some_fn(nbr):
    return nbr * 2


class TestCase(unittest.TestCase):
    def test(self):
        reselt = some_fn(2)
        self.assertEquals(reselt, 4)