- Introduction to Python
- Getting started with Python and the IPython notebook
- Functions are first class objects
- Data science is OSEMN
- Working with text
- Preprocessing text data
- Working with structured data
- Using SQLite3
- Using HDF5
- Using numpy
- Using Pandas
- Computational problems in statistics
- Computer numbers and mathematics
- Algorithmic complexity
- Linear Algebra and Linear Systems
- Linear Algebra and Matrix Decompositions
- Change of Basis
- Optimization and Non-linear Methods
- Practical Optimizatio Routines
- Finding roots
- Optimization Primer
- Using scipy.optimize
- Gradient deescent
- Newton’s method and variants
- Constrained optimization
- Curve fitting
- Finding paraemeters for ODE models
- Optimization of graph node placement
- Optimization of standard statistical models
- Fitting ODEs with the Levenberg–Marquardt algorithm
- 1D example
- 2D example
- Algorithms for Optimization and Root Finding for Multivariate Problems
- Expectation Maximizatio (EM) Algorithm
- Monte Carlo Methods
- Resampling methods
- Resampling
- Simulations
- Setting the random seed
- Sampling with and without replacement
- Calculation of Cook’s distance
- Permutation resampling
- Design of simulation experiments
- Example: Simulations to estimate power
- Check with R
- Estimating the CDF
- Estimating the PDF
- Kernel density estimation
- Multivariate kerndel density estimation
- Markov Chain Monte Carlo (MCMC)
- Using PyMC2
- Using PyMC3
- Using PyStan
- C Crash Course
- Code Optimization
- Using C code in Python
- Using functions from various compiled languages in Python
- Julia and Python
- Converting Python Code to C for speed
- Optimization bake-off
- Writing Parallel Code
- Massively parallel programming with GPUs
- Writing CUDA in C
- Distributed computing for Big Data
- Hadoop MapReduce on AWS EMR with mrjob
- Spark on a local mahcine using 4 nodes
- Modules and Packaging
- Tour of the Jupyter (IPython3) notebook
- Polyglot programming
- What you should know and learn more about
- Wrapping R libraries with Rpy
文章来源于网络收集而来,版权归原创者所有,如有侵权请及时联系!
Example
From http://scipy-lectures.github.io/intro/numpy/exercises.html#data-statistics
The data in populations.txt describes the populations of hares and lynxes (and carrots) in northern Canada during 20 years:
Computes and print, based on the data in populations.txt...
- The mean and std of the populations of each species for the years in the period.
- Which year each species had the largest population.
- Which species has the largest population for each year. (Hint: argsort & fancy indexing of np.array([‘H’, ‘L’, ‘C’]))
- Which years any of the populations is above 50000. (Hint: comparisons and np.any)
- The top 2 years for each species when they had the lowest populations. (Hint: argsort, fancy indexing)
- Compare (plot) the change in hare population (see help(np.gradient)) and the number of lynxes - Check correlation (see help(np.corrcoef)).
... all without for-loops.
# download the data locally if not os.path.exists('populations.txt'): ! wget http://scipy-lectures.github.io/_downloads/populations.txt
# peek at the file to see its structure ! head -n 6 populations.txt
# year hare lynx carrot 1900 30e3 4e3 48300 1901 47.2e3 6.1e3 48200 1902 70.2e3 9.8e3 41500 1903 77.4e3 35.2e3 38200 1904 36.3e3 59.4e3 40600
# load data into a numpy array data = np.loadtxt('populations.txt').astype('int') data[:5, :]
array([[ 1900, 30000, 4000, 48300], [ 1901, 47200, 6100, 48200], [ 1902, 70200, 9800, 41500], [ 1903, 77400, 35200, 38200], [ 1904, 36300, 59400, 40600]])
# provide convenient named variables populations = data[:, 1:] year, hare, lynx, carrot = data.T
# The mean and std of the populations of each species for the years in the period print "Mean (hare, lynx, carrot):", populations.mean(axis=0) print "Std (hare, lynx, carrot):", populations.std(axis=0)
Mean (hare, lynx, carrot): [ 34080.9524 20166.6667 42400. ] Std (hare, lynx, carrot): [ 20897.9065 16254.5915 3322.5062]
# Which year each species had the largest population. print "Year with largest population (hare, lynx, carrot)", print year[np.argmax(populations, axis=0)]
Year with largest population (hare, lynx, carrot) [1903 1904 1900]
# Which species has the largest population for each year. species = ['hare', 'lynx', 'carrot'] zip(year, np.take(species, np.argmax(populations, axis=1)))
[(1900, 'carrot'), (1901, 'carrot'), (1902, 'hare'), (1903, 'hare'), (1904, 'lynx'), (1905, 'lynx'), (1906, 'carrot'), (1907, 'carrot'), (1908, 'carrot'), (1909, 'carrot'), (1910, 'carrot'), (1911, 'carrot'), (1912, 'hare'), (1913, 'hare'), (1914, 'hare'), (1915, 'lynx'), (1916, 'carrot'), (1917, 'carrot'), (1918, 'carrot'), (1919, 'carrot'), (1920, 'carrot')]
# Which years any of the populations is above 50000 print year[np.any(populations > 50000, axis=1)]
[1902 1903 1904 1912 1913 1914 1915]
# The top 2 years for each species when they had the lowest populations. print year[np.argsort(populations, axis=0)[:2]]
[[1917 1900 1916] [1916 1901 1903]]
plt.plot(year, lynx, 'r-', year, np.gradient(hare), 'b--') plt.legend(['lynx', 'grad(hare)'], loc='best') print np.corrcoef(lynx, np.gradient(hare))
[[ 1. -0.9179] [-0.9179 1. ]]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论