当前位置：文江博客话题详情

在 python 中保存 .dta 文件

发布于 2024-12-06 02:20:21 字数 105 浏览 1 评论 0原文

我想知道是否有人知道一个Python包，可以让你以统计数据分析软件Stata的.dta格式保存numpy数组/recarrays。这确实会加快我所拥有的系统中的几个步骤。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

时光倒影 2024-12-13 02:20:21

scikits.statsmodels 包包含一个 Stata 数据文件阅读器，正如 @Sven 所指出的，它部分依赖于 PyDTA。特别是，genfromdta()将返回一个ndarray，例如
从Python 2.7/statsmodels 0.3.1开始：

>>> import scikits.statsmodels.api as sm
>>> arr = sm.iolib.genfromdta('/Applications/Stata12/auto.dta')
>>> type(arr)
<type 'numpy.ndarray'>

可以依次使用savetxt()函数将数组保存为文本文件，可以在Stata中导入。例如，我们可以将上面的内容导出为

>>> sm.iolib.savetxt('auto.txt', arr, fmt='%2s', delimiter=",")

并在没有字典文件的 Stata 中读取它，如下所示：

. insheet using auto.txt, clear

我相信在不久的将来应该添加一个 *.dta 阅读器。

The scikits.statsmodels package includes a reader for Stata data files, which relies in part on PyDTA as pointed out by @Sven. In particular, genfromdta() will return an ndarray, e.g.
from Python 2.7/statsmodels 0.3.1:

>>> import scikits.statsmodels.api as sm
>>> arr = sm.iolib.genfromdta('/Applications/Stata12/auto.dta')
>>> type(arr)
<type 'numpy.ndarray'>

The savetxt() function can be used in turn to save an array as a text file, which can be imported in Stata. For example, we can export the above as

>>> sm.iolib.savetxt('auto.txt', arr, fmt='%2s', delimiter=",")

and read it in Stata without a dictionary file as follows:

. insheet using auto.txt, clear

I believe a *.dta reader should be added in the near future.

回复收藏 0 原文

北风几吹夏 2024-12-13 02:20:21

我能找到的唯一用于 STATA 互操作性的 Python 库仅提供对 .dta< 的只读访问/代码> 文件。然而，R foreign 库提供了一个函数 write.dta 和 RPy 提供R 的 Python 接口。也许是以下组合这些工具可以帮助你。

回复收藏 0 原文

_畞蕅 2024-12-13 02:20:21

pandas DataFrame 对象现在有一个“to_stata”方法。因此，您可以执行

import pandas as pd
df = pd.read_stata('my_data_in.dta')
df.to_stata('my_data_out.dta')

免责声明：第一步非常慢（在我的测试中，读取 51 MB dta 大约需要 1 分钟 - 另请参阅这个问题），第二个生成的文件可能比原始文件大得多（在我的测试中，大小从 51 MB 变为 111MB）。这个答案可能看起来不太优雅，但它可能更有效。

pandas DataFrame objects now have a "to_stata" method. So you can do for instance

import pandas as pd
df = pd.read_stata('my_data_in.dta')
df.to_stata('my_data_out.dta')

DISCLAIMER: the first step is quite slow (in my test, around 1 minute for reading a 51 MB dta - also see this question), and the second produces a file which can be way larger than the original one (in my test, the size goes from 51 MB to 111MB). This answer may look less elegant, but it is probably more efficient.

回复收藏 0 原文

~没有更多了~