返回介绍

Example

发布于 2025-02-25 23:43:38 字数 3929 浏览 0 评论 0 收藏 0

From http://scipy-lectures.github.io/intro/numpy/exercises.html#data-statistics

The data in populations.txt describes the populations of hares and lynxes (and carrots) in northern Canada during 20 years:

Computes and print, based on the data in populations.txt...

  • The mean and std of the populations of each species for the years in the period.
  • Which year each species had the largest population.
  • Which species has the largest population for each year. (Hint: argsort & fancy indexing of np.array([‘H’, ‘L’, ‘C’]))
  • Which years any of the populations is above 50000. (Hint: comparisons and np.any)
  • The top 2 years for each species when they had the lowest populations. (Hint: argsort, fancy indexing)
  • Compare (plot) the change in hare population (see help(np.gradient)) and the number of lynxes - Check correlation (see help(np.corrcoef)).

... all without for-loops.

# download the data locally
if not os.path.exists('populations.txt'):
    ! wget http://scipy-lectures.github.io/_downloads/populations.txt
# peek at the file to see its structure
! head -n 6 populations.txt
# year      hare    lynx    carrot
1900        30e3    4e3     48300
1901        47.2e3  6.1e3   48200
1902        70.2e3  9.8e3   41500
1903        77.4e3  35.2e3  38200
1904        36.3e3  59.4e3  40600
# load data into a numpy array
data = np.loadtxt('populations.txt').astype('int')
data[:5, :]
array([[ 1900, 30000,  4000, 48300],
       [ 1901, 47200,  6100, 48200],
       [ 1902, 70200,  9800, 41500],
       [ 1903, 77400, 35200, 38200],
       [ 1904, 36300, 59400, 40600]])
# provide convenient named variables
populations = data[:, 1:]
year, hare, lynx, carrot = data.T
# The mean and std of the populations of each species for the years in the period
print "Mean (hare, lynx, carrot):", populations.mean(axis=0)
print "Std (hare, lynx, carrot):", populations.std(axis=0)
Mean (hare, lynx, carrot): [ 34080.9524  20166.6667  42400.    ]
Std (hare, lynx, carrot): [ 20897.9065  16254.5915   3322.5062]
# Which year each species had the largest population.
print "Year with largest population (hare, lynx, carrot)",
print year[np.argmax(populations, axis=0)]
Year with largest population (hare, lynx, carrot) [1903 1904 1900]
# Which species has the largest population for each year.
species = ['hare', 'lynx', 'carrot']
zip(year, np.take(species, np.argmax(populations, axis=1)))
[(1900, 'carrot'),
 (1901, 'carrot'),
 (1902, 'hare'),
 (1903, 'hare'),
 (1904, 'lynx'),
 (1905, 'lynx'),
 (1906, 'carrot'),
 (1907, 'carrot'),
 (1908, 'carrot'),
 (1909, 'carrot'),
 (1910, 'carrot'),
 (1911, 'carrot'),
 (1912, 'hare'),
 (1913, 'hare'),
 (1914, 'hare'),
 (1915, 'lynx'),
 (1916, 'carrot'),
 (1917, 'carrot'),
 (1918, 'carrot'),
 (1919, 'carrot'),
 (1920, 'carrot')]
# Which years any of the populations is above 50000
print year[np.any(populations > 50000, axis=1)]
[1902 1903 1904 1912 1913 1914 1915]
# The top 2 years for each species when they had the lowest populations.
print year[np.argsort(populations, axis=0)[:2]]
[[1917 1900 1916]
 [1916 1901 1903]]
plt.plot(year, lynx, 'r-', year, np.gradient(hare), 'b--')
plt.legend(['lynx', 'grad(hare)'], loc='best')
print np.corrcoef(lynx, np.gradient(hare))
[[ 1.     -0.9179]
 [-0.9179  1.    ]]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文