- Introduction to Python
- Getting started with Python and the IPython notebook
- Functions are first class objects
- Data science is OSEMN
- Working with text
- Preprocessing text data
- Working with structured data
- Using SQLite3
- Using HDF5
- Using numpy
- Using Pandas
- Computational problems in statistics
- Computer numbers and mathematics
- Algorithmic complexity
- Linear Algebra and Linear Systems
- Linear Algebra and Matrix Decompositions
- Change of Basis
- Optimization and Non-linear Methods
- Practical Optimizatio Routines
- Finding roots
- Optimization Primer
- Using scipy.optimize
- Gradient deescent
- Newton’s method and variants
- Constrained optimization
- Curve fitting
- Finding paraemeters for ODE models
- Optimization of graph node placement
- Optimization of standard statistical models
- Fitting ODEs with the Levenberg–Marquardt algorithm
- 1D example
- 2D example
- Algorithms for Optimization and Root Finding for Multivariate Problems
- Expectation Maximizatio (EM) Algorithm
- Monte Carlo Methods
- Resampling methods
- Resampling
- Simulations
- Setting the random seed
- Sampling with and without replacement
- Calculation of Cook’s distance
- Permutation resampling
- Design of simulation experiments
- Example: Simulations to estimate power
- Check with R
- Estimating the CDF
- Estimating the PDF
- Kernel density estimation
- Multivariate kerndel density estimation
- Markov Chain Monte Carlo (MCMC)
- Using PyMC2
- Using PyMC3
- Using PyStan
- C Crash Course
- Code Optimization
- Using C code in Python
- Using functions from various compiled languages in Python
- Julia and Python
- Converting Python Code to C for speed
- Optimization bake-off
- Writing Parallel Code
- Massively parallel programming with GPUs
- Writing CUDA in C
- Distributed computing for Big Data
- Hadoop MapReduce on AWS EMR with mrjob
- Spark on a local mahcine using 4 nodes
- Modules and Packaging
- Tour of the Jupyter (IPython3) notebook
- Polyglot programming
- What you should know and learn more about
- Wrapping R libraries with Rpy
From numbers to Functions: Stability and conditioning
Suppose we have a computer algorithm \(g(x)\) that represents the mathematical function \(f(x)\). \(g(x)\) is stable if for some small perturbation \(\epsilon\), \(g(x+\epsilon) \approx f(x)\)
A mathematical function \(f(x)\) is well-conditioned if \(f(x + \epsilon) \approx f(x)\) for all small perturbations \(\epsilon\).
That is, the function\(f(x)\) is well-conditioned if the solution varies gradually as problem varies. For a well-conditinoed function, all small perutbations have small effects. However, a poorly-conditioned problem only needs some small perturbations to have large effects. For example, inverting a nearly singluar matrix is a poorly condiitioned problem.
A numerical algorithm \(g(x)\) is numerically-stable if \(g(x) \approx f(x')\) for some \(x' \approx x\). Note that stability is a property that relates the algorithm \(g(x)\) to the problem \(f(x)\).
That is, the algorithm\(g(x)\) is numerically stable if it gives nearly the right answer to nearly the right question. Numerically unstable algorithms tend to amplify approximation errors due to computer arithmetic over time. If we used an infitinte precision numerical system, stable and unstable alorithms would have the same accuracy. However, as we have seen (e.g. variance calculation), when using floating point numbers, algebrically equivaelent algorithms can give different results.
In general, we need both a well-conditinoed problem and nuerical stabilty of the algorihtm to reliably accurate answers. In this case, we can be sure that \(g(x) \approx f(x)\).
Unstable version
# Catastrophic cancellation occurs when subtracitng # two numbers that are very close to one another # Here is another example def f(x): return (1 - np.cos(x))/(x*x)
x = np.linspace(-4e-8, 4e-8, 100) plt.plot(x,f(x)); plt.axvline(1.1e-8, color='red') plt.xlim([-4e-8, 4e-8]);
# We know from L'Hopital's rule that the answer is 0.5 at 0 # and should be very close to 0.5 throughout this tiny interval # but errors arisee due to catastrophic cancellation print '%.30f' % np.cos(1.1e-8) print '%.30f' % (1 - np.cos(1.1e-8)) # exact answer is 6.05e-17 print '%2f' % ((1 - np.cos(1.1e-8))/(1.1e-8*1.1e-8))
0.999999999999999888977697537484 0.000000000000000111022302462516 0.917540
Stable version
# Numerically stable version of funtion # using long-forgotten half-angle formula from trignometry def f1(x): return 2*np.sin(x/2)**2/(x*x)
x = np.linspace(-4e-8, 4e-8, 100) plt.plot(x,f1(x)); plt.axvline(1.1e-8, color='red') plt.xlim([-4e-8, 4e-8]);
Stable and unstable versions of variance
\[s^2 = \frac{1}{n-1}\sum(x - \bar{x})^2\]
# sum of squares method (vectorized version) def sum_of_squers_var(x): n = len(x) return (1.0/(n*(n-1))*(n*np.sum(x**2) - (np.sum(x))**2))
This should set off warning bells - big number minus big number!
# direct method def direct_var(x): n = len(x) xbar = np.mean(x) return 1.0/(n-1)*np.sum((x - xbar)**2)
Much better - at least the squaring occurs after the subtraction
# Welford's method def welford_var(x): s = 0 m = x[0] for i in range(1, len(x)): m += (x[i]-m)/i s += (x[i]-m)**2 return s/(len(x) -1 )
Classic algorithm from Knuth’s Art of Computer Programming
x_ = np.random.uniform(0,1,1e6) x = 1e12 + x_
# correct answer np.var(x_)
0.0835
sum_of_squers_var(x)
737870500.8189
direct_var(x)
0.0835
welford_var(x)
0.0835
Lesson: Mathematical formulas may behave differently when directly translated into code!
This problem also appears in navie algorithms for finding simple regression coefficients and Pearson’s correlation coefficient.
See this series of blog posts for a clear explanation:
- http://www.johndcook.com/blog/2008/09/28/theoretical-explanation-for-numerical-results/
- http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/
- http://www.johndcook.com/blog/2008/10/20/comparing-two-ways-to-fit-a-line-to-data/
- http://www.johndcook.com/blog/2008/11/05/how-to-calculate-pearson-correlation-accurately/
Avoiding catastrophic cancellation by formula rearrangement
There are a copule of common tricks that may be useful if you are worried about catastrophic cancellation.
Use library functions where possible
Instead of
np.log(x + 1)
which can be inaccurate for \(x\) near zero, use
np.log1p(x)
Similarly, instead of
np.sin(x)/x
use
np.sinc(x)
See if Numpy base functions has what you need.
Rationalize the numerator to remove cancellation for the following problem
\[\sqrt{x + 1} - \sqrt{x}\]
Use basic algebra to remove canceellation for the following problem
\[\frac{1}{\sqrt x} - \frac{1}{\sqrt{x + 1}}\]
Use trignometric identities to remove cancellation for the following 3 problems
\[\sin (x+ \epsilon) - \sin x\] \[\frac{1 - \cos x}{\sin x}\] \[\int_N^{N+1} \frac{dx}{1 + x^2}\]
Poorly conditioned problems
# The tangent function is poorly conditioned x1 = 1.57078 x2 = 1.57079 t1 = np.tan(x1) t2 = np.tan(x2)
print 't1 =', t1 print 't2 =', t2 print '% change in x =', 100.0*(x2-x1)/x1 print '% change in tan(x) =', (100.0*(t2-t1)/t1)
t1 = 61249.0085315 t2 = 158057.913416 % change in x = 0.000636626389427 % change in tan(x) = 158.057913435
Ill-conditioned matrices
In this example, we want to solve a simple linear system Ax = b where A and b are given.
A = 0.5*np.array([[1,1], [1+1e-10, 1-1e-10]]) b1 = np.array([2,2]) b2 = np.array([2.01, 2])
np.linalg.solve(A, b1)
array([ 2., 2.])
np.linalg.solve(A, b2)
array([-99999989.706, 99999993.726])
The condition number of a matrix is a useful diagnostic - it is defined as the norm of A times the norm of the inverse of A. If this number is large, the matrix is ill-conditioned. Since there are many ways to calculuate a matrix norm, there are also many condition numbers, but they are roughly equivalent for our purpsoes.
np.linalg.cond(A)
19999973849.2252
np.linalg.cond(A, 'fro')
19999998343.1927
Simple things to try with ill-conditioned matrices
- Can you remove dependent or collinear variables? If one variable is (almost) an exact muliple of another, it provides no additional information and can be removed from the matrix.
- Can you normalize the data so that all vairables are on the same scale? For example, if columns represent featrue values, standardizign featurres to have zero mean and unit standard deviaiton can be helpful.
- Can you use functions from linear algebra libraries instead of rolling your own. For example, the
lstsq
function fromscipy.linalg
will deal with collinear variables sensibly.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论