- Introduction to Python
- Getting started with Python and the IPython notebook
- Functions are first class objects
- Data science is OSEMN
- Working with text
- Preprocessing text data
- Working with structured data
- Using SQLite3
- Using HDF5
- Using numpy
- Using Pandas
- Computational problems in statistics
- Computer numbers and mathematics
- Algorithmic complexity
- Linear Algebra and Linear Systems
- Linear Algebra and Matrix Decompositions
- Change of Basis
- Optimization and Non-linear Methods
- Practical Optimizatio Routines
- Finding roots
- Optimization Primer
- Using scipy.optimize
- Gradient deescent
- Newton’s method and variants
- Constrained optimization
- Curve fitting
- Finding paraemeters for ODE models
- Optimization of graph node placement
- Optimization of standard statistical models
- Fitting ODEs with the Levenberg–Marquardt algorithm
- 1D example
- 2D example
- Algorithms for Optimization and Root Finding for Multivariate Problems
- Expectation Maximizatio (EM) Algorithm
- Monte Carlo Methods
- Resampling methods
- Resampling
- Simulations
- Setting the random seed
- Sampling with and without replacement
- Calculation of Cook’s distance
- Permutation resampling
- Design of simulation experiments
- Example: Simulations to estimate power
- Check with R
- Estimating the CDF
- Estimating the PDF
- Kernel density estimation
- Multivariate kerndel density estimation
- Markov Chain Monte Carlo (MCMC)
- Using PyMC2
- Using PyMC3
- Using PyStan
- C Crash Course
- Code Optimization
- Using C code in Python
- Using functions from various compiled languages in Python
- Julia and Python
- Converting Python Code to C for speed
- Optimization bake-off
- Writing Parallel Code
- Massively parallel programming with GPUs
- Writing CUDA in C
- Distributed computing for Big Data
- Hadoop MapReduce on AWS EMR with mrjob
- Spark on a local mahcine using 4 nodes
- Modules and Packaging
- Tour of the Jupyter (IPython3) notebook
- Polyglot programming
- What you should know and learn more about
- Wrapping R libraries with Rpy
Estimating parameters of a linear regreession model
We will show how to estimate regression parameters using a simple linear modesl
\[y \sim ax + b\]
We can restate the linear model
\[y = ax + b + \epsilon\]
as sampling from a probability distribution
\[y \sim \mathcal{N}(ax + b, \sigma^2)\]
Now we can use pymc to estimate the paramters \(a\), \(b\) and \(\sigma\) (pymc2 uses precision \(\tau\) which is \(1/\sigma^2\) so we need to do a simple transformation). We will assume the following priors
\[\begin{split}a \sim \mathcal{N}(0, 100) \\ b \sim \mathcal{N}(0, 100) \\ \tau \sim \text{Gamma}(0.1, 0.1)\end{split}\]
Here we need a helper function to let PyMC know that the mean is a deterministic function of the parameters \(a\), \(b\) and \(x\). We can do this with a decorator, like so:
@pymc.deterministic def mu(a=a, b=b, x=x): return a*x + b
# observed data n = 21 a = 6 b = 2 sigma = 2 x = np.linspace(0, 1, n) y_obs = a*x + b + np.random.normal(0, sigma, n) data = pd.DataFrame(np.array([x, y_obs]).T, columns=['x', 'y'])
data.plot(x='x', y='y', kind='scatter', s=50);
# define priors a = pymc.Normal('slope', mu=0, tau=1.0/10**2) b = pymc.Normal('intercept', mu=0, tau=1.0/10**2) tau = pymc.Gamma("tau", alpha=0.1, beta=0.1) # define likelihood @pymc.deterministic def mu(a=a, b=b, x=x): return a*x + b y = pymc.Normal('y', mu=mu, tau=tau, value=y_obs, observed=True) # inference m = pymc.Model([a, b, tau, x, y]) mc = pymc.MCMC(m) mc.sample(iter=11000, burn=10000)
[-----------------100%-----------------] 11000 of 11000 complete in 6.1 sec
abar = a.stats()['mean'] bbar = b.stats()['mean'] data.plot(x='x', y='y', kind='scatter', s=50); xp = np.array([x.min(), x.max()]) plt.plot(a.trace()*xp[:, None] + b.trace(), c='red', alpha=0.01) plt.plot(xp, abar*xp + bbar, linewidth=2, c='red');
pymc.Matplot.plot(mc)
Plotting intercept Plotting slope Plotting tau
/Users/cliburn/anaconda/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2507: VisibleDeprecationWarning: rank is deprecated; use the ndim attribute or function instead. To find the rank of a matrix see numpy.linalg.matrix_rank. VisibleDeprecationWarning)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论