Introduction to Python
- Variables
- Operators
- Iterators
- Conditional Statements
- Functions
- Strings and String Handling
- Lists, Tuples, Dictionaries
- Classes
- Modules
- The standard library
- Keeping the Anaconda distribution up-to-date
- Exercises
Getting started with Python and the IPython notebook
- Cells
- Code Cells
- Magic Commands
- Python as Glue
- Python <-> R <-> Matlab <-> Octave
- More Glue: Julia and Perl
Functions are first class objects
Data science is OSEMN
- Obtaining data
- Scrubbing data
- Exercises
Working with text
- String methods
- Splitting and joining strings
- The string module
- Regular expressions
- The NLTK toolkit
- Exercises
Preprocessing text data
- Example: Counting words in a document
Working with structured data
Using SQLite3
- Basic concepts of database normalization
Using HDF5
- Interfacing withPandas
Using numpy
- References
- Example
- NDArray
- Broadcasting, row, column and matrix operations
- Universal functions (Ufuncs)
- Generalized ufucns
- Random numbers
- Linear algebra
- Exercises
Using Pandas
- Series
- DataFrame
- Panels
- Split-Apply-Combine
- Using statsmodels
Computational problems in statistics
- Textbook example - is coin fair?
- Bayesian approach
- Comment
Computer numbers and mathematics
- Some examples of numbers behaving badly
- Finite representation of numbers
- Using arbitrary precision libraries
- From numbers to Functions: Stability and conditioning
- Exercises
Algorithmic complexity
- Profling and benchmarking
- Measuring algorithmic complexity
- Space complexity
Linear Algebra and Linear Systems
- Simultaneous Equations
- Linear Independence
- Norms and Distance of Vectors
- Trace and Determinant of Matrices
- Column space, Row space, Rank and Kernel
- Matrices as Linear Transformations
- Matrix Norms
- Special Matrices
- Exercises
Linear Algebra and Matrix Decompositions
- Large Linear Systems
- Example: Netflix Competition (circa 2006-2009)
- Matrix Decompositions
- Matrix Decompositions for PCA and Least Squares
- Singular Value Decomposition
- Stabilty and Condition Number
- Exercises
Change of Basis
- Variance and covariance
- Eigendecomposition of the covariance matrix
- PCA
- Change of basis via PCA
- Graphical illustration of change of basis
- Dimension reduction via PCA
- Using Singular Value Decomposition (SVD) for PCA
Optimization and Non-linear Methods
- Example: Maximum Likelihood Estimation (MLE)
- Bisection Method
- Secant Method
- Newton-Rhapson Method
- Gauss-Newton
- Inverse Quadratic Interpolation
- Brent’s Method
Practical Optimizatio Routines
- Finding roots
- Optimization Primer
- Using scipy.optimize
- Gradient deescent
- Newton’s method and variants
- Constrained optimization
- Curve fitting
- Finding paraemeters for ODE models
- Optimization of graph node placement
- Optimization of standard statistical models
- Fitting ODEs with the Levenberg–Marquardt algorithm
- 1D example
- 2D example
Algorithms for Optimization and Root Finding for Multivariate Problems
- Optimizers
- Solvers
- GLM Estimation and IRLS
Expectation Maximizatio (EM) Algorithm
- Jensen’s inequality
- Maximum likelihood with complete information
- Incomplete information
- Gaussian mixture models
- Using EM
- Vectorized version
- Vectorization with Einstein summation notation
- Comparison of EM routines
Monte Carlo Methods
- Pseudorandom number generators (PRNG)
- Monte Carlo swindles (Variance reduction techniques)
- Quasi-random numbers
Resampling methods
- Resampling
- Simulations
- Setting the random seed
- Sampling with and without replacement
- Calculation of Cook’s distance
- Permutation resampling
- Design of simulation experiments
- Example: Simulations to estimate power
- Check with R
- Estimating the CDF
- Estimating the PDF
- Kernel density estimation
- Multivariate kerndel density estimation
Markov Chain Monte Carlo (MCMC)
- Bayesian Data Analysis
- Metropolis-Hastings sampler
- Gibbs sampler
- Slice sampler
- Hierarchical models
Using PyMC2
- Coin toss
- Estimating mean and standard deviation of normal distribution
- Estimating parameters of a linear regreession model
- Estimating parameters of a logistic model
- Using a hierarchcical model
Using PyMC3
- Coin toss
- Estimating mean and standard deviation of normal distribution
- Estimating parameters of a linear regreession model
- Estimating parameters of a logistic model
- Using a hierarchcical model
Using PyStan
- References
- Simple Logistic model
C Crash Course
- Hello world
- A tutorial example - coding a Fibonacci function in C
- Types in C
- Operators
- Control of program flow
- Arrays and pointers
- Functions
- Function pointers
- Using make to compile C programs
- Exercise
Code Optimization
- Profiling
- Using better algorihtms and data structures
- I/O Bound problems
- Problem set for optimization
Using C code in Python
- Example: The Fibonacci Sequence
- Using clang and bitey
- Using gcc and ctypes
- Using Cython
- Benchmark
Using functions from various compiled languages in Python
- C
- C++
- Fortran
- Benchmarking
- Wrapping a function from a C library for use in Python
- Wrapping functions from C++ library for use in Pyton
Julia and Python
- Defining a function in Julia
- Using it in Python
- Using Python libraries in Julia
Converting Python Code to C for speed
- Example: Fibonacci
- Example: Matrix multiplication
- Example: Pairwise distance matrix
- Profiling code
- Numba
- Cython
- Comparison with optimized C from scipy
Optimization bake-off
- Python version
- Numpy version
- Numexpr version
- Numba version
- NumbaPro version
- Parakeet version
- Cython version
- C version
- C++ version
- Fortran version
- Bake-off
- Summary
- Recommendations for optimizing Python code
Writing Parallel Code
- Concepts
- Embarassingly parallel programs
- Using Multiprocessing
- Using IPython parallel for interactive parallel computing
- Other parallel programming approaches not covered
- References
Massively parallel programming with GPUs
- Programming GPUs
- GPU Architecture
- CUDA Python
- Getting Started with CUDA
- Vector addition - the ‘Hello, world’ of CUDA
- Performing a reduction on CUDA
- Recreational
- More examples
Writing CUDA in C
- Review of GPU Architechture - A Simplification
- Cuda C program - an Outline
Distributed computing for Big Data
- Why and when does distributed computing matter?
- Ingredients for effiicient distributed computing
- What is Hadoop?
- Review of functional programming
- The Hadoop MapReduce workflow
- Using Hadoop MapReduce
- Spark
Hadoop MapReduce on AWS EMR with mrjob
- MapReduce code
- Configuration file
- Launching job
Spark on a local mahcine using 4 nodes
- Using Spark in standalone prograsm
- Introduction to Spark concepts with a data manipulation example
- Using the MLlib for Regression
- References
Modules and Packaging
- Modules
- Distributing your package
Tour of the Jupyter (IPython3) notebook
- Installing Jupyter
- Installing other kernels
- Installing extensions
- Installing Python3 while keeping Python2
- Now, restart your notebook server
Polyglot programming
- Python 2
- Python 3
- Bash
- R
- Scala
- Julia
- Processing
What you should know and learn more about
- Statistical foundations
- Computing foundations
- Mathematical foundations
- Statistical algorithms
- Libraries worth knowing about after numpy, scipy and matplotlib
Wrapping R libraries with Rpy

文江博客开发文档 Computational Statistics in Python 文章详情

文章来源于网络收集而来，版权归原创者所有，如有侵权请及时联系！

Using EM

发布于 2025-02-25 23:43:55 字数 1964 浏览 0 评论 0 收藏 0

Suppose we augment with the latent variable \(z\) that indicates which of the \(k\) Gaussians our observation \(y\) came from. The derivation of the E and M steps are the same as for the toy example, only with more algebra.

For the E-step, we have

For the M-step, we have to find \(\theta = (w, \mu, \Sigma)\) that maximizes \(Q\)

By taking derivatives with respect to \((w, \mu, \Sigma)\) respectively and solving (remember to use Lagrange multipliers for the constraint that \(\sum_{j=1}^k w_j = 1\)), we get

from scipy.stats import multivariate_normal as mvn

def em_gmm_orig(xs, pis, mus, sigmas, tol=0.01, max_iter=100):

    n, p = xs.shape
    k = len(pis)

    ll_old = 0
    for i in range(max_iter):
        exp_A = []
        exp_B = []
        ll_new = 0

        # E-step
        ws = np.zeros((k, n))
        for j in range(len(mus)):
            for i in range(n):
                ws[j, i] = pis[j] * mvn(mus[j], sigmas[j]).pdf(xs[i])
        ws /= ws.sum(0)

        # M-step
        pis = np.zeros(k)
        for j in range(len(mus)):
            for i in range(n):
                pis[j] += ws[j, i]
        pis /= n

        mus = np.zeros((k, p))
        for j in range(k):
            for i in range(n):
                mus[j] += ws[j, i] * xs[i]
            mus[j] /= ws[j, :].sum()

        sigmas = np.zeros((k, p, p))
        for j in range(k):
            for i in range(n):
                ys = np.reshape(xs[i]- mus[j], (2,1))
                sigmas[j] += ws[j, i] * np.dot(ys, ys.T)
            sigmas[j] /= ws[j,:].sum()

        # update complete log likelihoood
        ll_new = 0.0
        for i in range(n):
            s = 0
            for j in range(k):
                s += pis[j] * mvn(mus[j], sigmas[j]).pdf(xs[i])
            ll_new += np.log(s)

        if np.abs(ll_new - ll_old) < tol:
            break
        ll_old = ll_new

    return ll_new, pis, mus, sigmas

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文