- Introduction to Python
- Getting started with Python and the IPython notebook
- Functions are first class objects
- Data science is OSEMN
- Working with text
- Preprocessing text data
- Working with structured data
- Using SQLite3
- Using HDF5
- Using numpy
- Using Pandas
- Computational problems in statistics
- Computer numbers and mathematics
- Algorithmic complexity
- Linear Algebra and Linear Systems
- Linear Algebra and Matrix Decompositions
- Change of Basis
- Optimization and Non-linear Methods
- Practical Optimizatio Routines
- Finding roots
- Optimization Primer
- Using scipy.optimize
- Gradient deescent
- Newton’s method and variants
- Constrained optimization
- Curve fitting
- Finding paraemeters for ODE models
- Optimization of graph node placement
- Optimization of standard statistical models
- Fitting ODEs with the Levenberg–Marquardt algorithm
- 1D example
- 2D example
- Algorithms for Optimization and Root Finding for Multivariate Problems
- Expectation Maximizatio (EM) Algorithm
- Monte Carlo Methods
- Resampling methods
- Resampling
- Simulations
- Setting the random seed
- Sampling with and without replacement
- Calculation of Cook’s distance
- Permutation resampling
- Design of simulation experiments
- Example: Simulations to estimate power
- Check with R
- Estimating the CDF
- Estimating the PDF
- Kernel density estimation
- Multivariate kerndel density estimation
- Markov Chain Monte Carlo (MCMC)
- Using PyMC2
- Using PyMC3
- Using PyStan
- C Crash Course
- Code Optimization
- Using C code in Python
- Using functions from various compiled languages in Python
- Julia and Python
- Converting Python Code to C for speed
- Optimization bake-off
- Writing Parallel Code
- Massively parallel programming with GPUs
- Writing CUDA in C
- Distributed computing for Big Data
- Hadoop MapReduce on AWS EMR with mrjob
- Spark on a local mahcine using 4 nodes
- Modules and Packaging
- Tour of the Jupyter (IPython3) notebook
- Polyglot programming
- What you should know and learn more about
- Wrapping R libraries with Rpy
Using better algorihtms and data structures
The first major optimization is to see if we can reduce the algorithmic complexity of our solution, say from \(\mathcal{O}(n^2)\) to \(\mathcal{O}(n \log(n))\). Unless you are going to invent an entirely new algorithm (possible but uncommon), this involves research into whether the data structures used are optimal, or whetther there is a way to reformulate the problem to take advantage of better algorithms. If your inital solution is by “brute force”, there is sometimes room for huge performacne gains here.
Taking a course in data structures and algorithms is a very worthwhile investment of your time if you are developing novel statsitical algorithms - perhaps Bloom filters, locality sensitive hashing, priority queues, Barnes-Hut partitionaing, dynamic programming or minimal spanning trees can be used to solve your problem - in which case you can expect to see dramatic improvements over naive brute-force implementations.
Suppose you were given two lists xs
and ys
and asked to find the unique elements in common between them.
xs = np.random.randint(0, 1000, 10000) ys = np.random.randint(0, 1000, 10000)
# This is easy to solve using a nested loop def common1(xs, ys): """Using lists.""" zs = [] for x in xs: for y in ys: if x==y and x not in zs: zs.append(x) return zs %timeit -n1 -r1 common1(xs, ys)
1 loops, best of 1: 14.7 s per loop
# However, it is much more efficient to use the set data structure def common2(xs, ys): return list(set(xs) & set(ys)) %timeit -n1 -r1 common2(xs, ys)
1 loops, best of 1: 2.82 ms per loop
assert(sorted(common1(xs, ys)) == sorted(common2(xs, ys)))
We have seen many such examples in the course - for example, numerical quadrature versus Monte Carlo integration, gradient desceent versus conjugate gradient descent, random walk Metropolis versus Hamiltonian Monte Carlo.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论