- Introduction to Python
- Getting started with Python and the IPython notebook
- Functions are first class objects
- Data science is OSEMN
- Working with text
- Preprocessing text data
- Working with structured data
- Using SQLite3
- Using HDF5
- Using numpy
- Using Pandas
- Computational problems in statistics
- Computer numbers and mathematics
- Algorithmic complexity
- Linear Algebra and Linear Systems
- Linear Algebra and Matrix Decompositions
- Change of Basis
- Optimization and Non-linear Methods
- Practical Optimizatio Routines
- Finding roots
- Optimization Primer
- Using scipy.optimize
- Gradient deescent
- Newton’s method and variants
- Constrained optimization
- Curve fitting
- Finding paraemeters for ODE models
- Optimization of graph node placement
- Optimization of standard statistical models
- Fitting ODEs with the Levenberg–Marquardt algorithm
- 1D example
- 2D example
- Algorithms for Optimization and Root Finding for Multivariate Problems
- Expectation Maximizatio (EM) Algorithm
- Monte Carlo Methods
- Resampling methods
- Resampling
- Simulations
- Setting the random seed
- Sampling with and without replacement
- Calculation of Cook’s distance
- Permutation resampling
- Design of simulation experiments
- Example: Simulations to estimate power
- Check with R
- Estimating the CDF
- Estimating the PDF
- Kernel density estimation
- Multivariate kerndel density estimation
- Markov Chain Monte Carlo (MCMC)
- Using PyMC2
- Using PyMC3
- Using PyStan
- C Crash Course
- Code Optimization
- Using C code in Python
- Using functions from various compiled languages in Python
- Julia and Python
- Converting Python Code to C for speed
- Optimization bake-off
- Writing Parallel Code
- Massively parallel programming with GPUs
- Writing CUDA in C
- Distributed computing for Big Data
- Hadoop MapReduce on AWS EMR with mrjob
- Spark on a local mahcine using 4 nodes
- Modules and Packaging
- Tour of the Jupyter (IPython3) notebook
- Polyglot programming
- What you should know and learn more about
- Wrapping R libraries with Rpy
文章来源于网络收集而来,版权归原创者所有,如有侵权请及时联系!
Using EM
Suppose we augment with the latent variable \(z\) that indicates which of the \(k\) Gaussians our observation \(y\) came from. The derivation of the E and M steps are the same as for the toy example, only with more algebra.
For the E-step, we have
For the M-step, we have to find \(\theta = (w, \mu, \Sigma)\) that maximizes \(Q\)
By taking derivatives with respect to \((w, \mu, \Sigma)\) respectively and solving (remember to use Lagrange multipliers for the constraint that \(\sum_{j=1}^k w_j = 1\)), we get
from scipy.stats import multivariate_normal as mvn
def em_gmm_orig(xs, pis, mus, sigmas, tol=0.01, max_iter=100): n, p = xs.shape k = len(pis) ll_old = 0 for i in range(max_iter): exp_A = [] exp_B = [] ll_new = 0 # E-step ws = np.zeros((k, n)) for j in range(len(mus)): for i in range(n): ws[j, i] = pis[j] * mvn(mus[j], sigmas[j]).pdf(xs[i]) ws /= ws.sum(0) # M-step pis = np.zeros(k) for j in range(len(mus)): for i in range(n): pis[j] += ws[j, i] pis /= n mus = np.zeros((k, p)) for j in range(k): for i in range(n): mus[j] += ws[j, i] * xs[i] mus[j] /= ws[j, :].sum() sigmas = np.zeros((k, p, p)) for j in range(k): for i in range(n): ys = np.reshape(xs[i]- mus[j], (2,1)) sigmas[j] += ws[j, i] * np.dot(ys, ys.T) sigmas[j] /= ws[j,:].sum() # update complete log likelihoood ll_new = 0.0 for i in range(n): s = 0 for j in range(k): s += pis[j] * mvn(mus[j], sigmas[j]).pdf(xs[i]) ll_new += np.log(s) if np.abs(ll_new - ll_old) < tol: break ll_old = ll_new return ll_new, pis, mus, sigmas
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论