- Introduction to Python
- Getting started with Python and the IPython notebook
- Functions are first class objects
- Data science is OSEMN
- Working with text
- Preprocessing text data
- Working with structured data
- Using SQLite3
- Using HDF5
- Using numpy
- Using Pandas
- Computational problems in statistics
- Computer numbers and mathematics
- Algorithmic complexity
- Linear Algebra and Linear Systems
- Linear Algebra and Matrix Decompositions
- Change of Basis
- Optimization and Non-linear Methods
- Practical Optimizatio Routines
- Finding roots
- Optimization Primer
- Using scipy.optimize
- Gradient deescent
- Newton’s method and variants
- Constrained optimization
- Curve fitting
- Finding paraemeters for ODE models
- Optimization of graph node placement
- Optimization of standard statistical models
- Fitting ODEs with the Levenberg–Marquardt algorithm
- 1D example
- 2D example
- Algorithms for Optimization and Root Finding for Multivariate Problems
- Expectation Maximizatio (EM) Algorithm
- Monte Carlo Methods
- Resampling methods
- Resampling
- Simulations
- Setting the random seed
- Sampling with and without replacement
- Calculation of Cook’s distance
- Permutation resampling
- Design of simulation experiments
- Example: Simulations to estimate power
- Check with R
- Estimating the CDF
- Estimating the PDF
- Kernel density estimation
- Multivariate kerndel density estimation
- Markov Chain Monte Carlo (MCMC)
- Using PyMC2
- Using PyMC3
- Using PyStan
- C Crash Course
- Code Optimization
- Using C code in Python
- Using functions from various compiled languages in Python
- Julia and Python
- Converting Python Code to C for speed
- Optimization bake-off
- Writing Parallel Code
- Massively parallel programming with GPUs
- Writing CUDA in C
- Distributed computing for Big Data
- Hadoop MapReduce on AWS EMR with mrjob
- Spark on a local mahcine using 4 nodes
- Modules and Packaging
- Tour of the Jupyter (IPython3) notebook
- Polyglot programming
- What you should know and learn more about
- Wrapping R libraries with Rpy
Finite representation of numbers
For integers, there is a maximum and minimum representatble number for langauges. Python integers are acutally objects, so they intelligently switch to arbitrary precision numbers when you go beyond these limits, but this is not true for most other languages including C and R. With 64 bit representation, the maximumm is 2^63 - 1 and the minimum is -2^63 - 1.
import sys sys.maxint
9223372036854775807
2**63-1 == sys.maxint
True
# Python handles "overflow" of integers gracefully by # swithing from integers to "long" abritrary precsion numbers sys.maxint + 1
9223372036854775808L
Integer division
This has been illustrated more than once, becuase it is such a common source of bugs. Be very careful when dividing one integer by another. Here are some common workarounds.
# Explicit float conversion print float(1)/3
0.333333333333
# Implicit float conversion print (0.0 + 1)/3 print (1.0 * 1)/3
0.333333333333 0.333333333333
# Telling Python to ALWAYS do floaitng point with '/' # Integer division can still be done with '//' # The __futre__ package contains routines that are only # found beyond some Python release number. from __future__ import division print (1/3) print (1//3)
0.333333333333 0
Documentation about the fuure package
Overflow in langauges such as C “wraps around” and gives negative numbers
This will not work out of the box because the VM is missing some packages. If you want to really, really want to run this, you can issue the following commands from the command line and have your sudo password ready. It is not necessary to run this - this is just an example to show integer overflow in C - it does not happen in Python.
sudo apt-get update sudo apt-get install build-essential sudo apt-get install llvm pip install git+https://github.com/dabeaz/bitey.git
%%file demo.c #include "limits.h" long limit() { return LONG_MAX; } long overflow() { long x = LONG_MAX; return x+1; }
Writing demo.c
! clang -emit-llvm -c demo.c -o demo.o
import bitey import demo demo.limit(), demo.overflow()
(9223372036854775807, -9223372036854775808)
Floating point numbers
A floating point number is stored in 3 pieces (sign bit, exponent, mantissa) so that every float is represetned as get +/- mantissa ^ exponent. Because of this, the interval between consecutive numbers is smallest (high precison) for numebrs close to 0 and largest for numbers close to the lower and upper bounds.
Because exponents have to be singed to represent both small and large numbers, but it is more convenint to use unsigned numbers here, the exponnent has an offset (also knwnn as the exponentn bias). For example, if the expoennt is an unsigned 8-bit number, it can rerpesent the range (0, 255). By using an offset of 128, it will now represent the range (-127, 128).
from IPython.display import Image
Binary represetnation of a floating point number
Image(url='../Images/ca76e1b9890dbf60404525dc1a556ac0.jpg')
Intervals between consecutive floating point numbers are not constant
Because of this, if you are adding many numbers, it is more accuate to first add the small numbers before the large numbers.
Image(url='http:///fig1.jpg')
Floating point numbers on your system
Information about the floating point reresentation on your system can be obtained from sys.float_info
. Definitions of the stored values are given at https://docs.python.org/2/library/sys.html#sys.float_info
import sys print sys.float_info
sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)
Floating point numbers may not be precise
'%.20f' % (0.1 * 0.1 * 100)
'1.00000000000000022204'
# Because of this, don't chek for equality of floating point numbers! # Bad s = 0.0 for i in range(1000): s += 1.0/10.0 if s == 1.0: break print i # OK TOL = 1e-9 s = 0.0 for i in range(1000): s += 1.0/10.0 if abs(s - 1.0) < TOL: break print i
999 9
# Loss of precision 1 + 6.022e23 - 6.022e23
0.0000
Lesson: Avoid algorithms that subtract two numbers that are very close to one anotoer. The loss of significnance is greater when both numbers are very large due to the limited number of precsion bits available.
Associative law does not necessarily hold
6.022e23 - 6.022e23 + 1
1.0000
1 + 6.022e23 - 6.022e23
0.0000
Distributive law does not hold
a = np.exp(1); b = np.pi; c = np.sin(1); a*(b+c) == a*b+a*c
False
# loss of precision can be a problem when calculating likelihoods probs = np.random.random(1000) np.prod(probs)
0.0000
# when multiplying lots of small numbers, work in log space np.sum(np.log(probs))
-980.0558
Lesson: Work in log space for very small or very big numbers to reduce underflow/overflow
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论