科学程序员的阅读清单
我正在努力成为一名科学程序员。我有足够的数学和统计背景,但缺乏编程背景。我发现学习如何使用一种语言进行科学编程非常困难,因为大多数 SP 参考资料都非常琐碎。
我的工作涉及统计/金融建模,但不涉及物理模型。目前,我广泛使用 Python 以及 numpy 和 scipy。完成 R/Mathematica。我了解足够的 C/C++ 来阅读代码。没有 Fortran 经验。
我不知道这对于科学程序员来说是否是一个很好的语言列表。如果是这样,那么在科学环境中学习这些语言的语法和设计模式有什么好的阅读清单。
I am working to become a scientific programmer. I have enough background in Math and Stat but rather lacking on programming background. I found it very hard to learn how to use a language for scientific programming because most of the reference for SP are close to trivial.
My work involves statistical/financial modelling and none with physics model. Currently, I use Python extensively with numpy and scipy. Done R/Mathematica. I know enough C/C++ to read code. No experience in Fortran.
I dont know if this is a good list of language for a scientific programmer. If this is, what is a good reading list for learning the syntax and design pattern of these languages in scientific settings.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(17)
对于科学环境中的通用 C++,Andrei Alexandrescu 的现代 C++ 设计 可能是关于常见的设计模式。
For generic C++ in scientific enviroments, Modern C++ Design by Andrei Alexandrescu is probably the standard book about the common design patterns.
一旦您启动并运行,我强烈建议您阅读此博客。
它描述了如何使用 C++ 模板来提供类型安全单元。例如,如果将速度乘以时间,则得到距离等。
Once you are up and running, I would strongly recommend reading this blog.
It describes how you use C++ templates to provide type safe units. So for example, if you multiply velocity by time you get a distance etc.
阅读源代码也有很大帮助。从这个意义上来说,Python 是伟大的。仅仅通过挖掘科学Python工具的源代码,我就学到了大量的信息。除此之外,关注您最喜欢的工具的邮件列表和论坛可以进一步提高您的技能。
Reading source-code helps a lot, too. Python is great in this sense. I have learnt a great amount of information just by digging through the source codes of scientific Python tools. On top of this following your favourite tools' mailing-lists and forums can enhance your skills further.
这可能有用:数学建模的本质
this might be useful: the nature of mathematical modeling
Donald Knuth:半数值算法,第 2 卷 计算机编程的艺术
Press, Teukolsky, Vetterling, Flannery: Numerical Recipes in C++(这本书很棒,只是要注意 许可证)
现代C++ 设计
并查看 GNU 科学库 的源代码。
Donald Knuth: Seminumerical Algorithms, Volume 2 of The Art of Computer Programming
Press, Teukolsky, Vetterling, Flannery: Numerical Recipes in C++ (the book is great, just beware of the license)
Modern C++ Design
and have a gander at the source code for the GNU Scientific Library.
编写科学软件:良好风格指南是一本提供全面建议的好书用于现代科学编程。
Writing Scientific Software: A Guide to Good Style is a good book with overall advice for modern scientific programming.
对于 Java,我建议查看 Unit-API
实现是 Eclipse UOMo (http://www.eclipse.org/uomo) 或 JScience.org(Unit-API 正在进行中,存在 JSR-275 的早期实现)
For Java I recommend a look at Unit-API
Implementations are Eclipse UOMo (http://www.eclipse.org/uomo) or JScience.org (work in progress for Unit-API, earlier implementations of JSR-275 exist)
在某些阶段,您将需要浮点运算。做好一件事情很难,做好一件事情比较难,做好一件事情很容易。这篇论文是必读的:
每个计算机科学家应该了解的浮点运算< /a>
At some stage you're going to need floating point arithmetic. It's hard to do it well, less hard to do it competently, and easy to do it badly. This paper is a must read:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
我强烈推荐
Barton 和 Nackman 的《Scientific and Engineering C++: An Introduction with Advanced Techniques and Examples》(科学与工程 C++:高级技术和示例简介),
不要因其年代久远而被推迟,它非常出色。用您最喜欢的语言(只要是 C、C++ 或 Fortran)编写的数值食谱非常简明,并且非常适合学习,但并不总是针对每个问题的最佳算法。
我还喜欢
Karniadakis 的《C++ 和 MPI 中的并行科学计算:并行算法及其实现的无缝方法》
越早开始并行计算越好。
I thoroughly recommend
Scientific and Engineering C++: An Introduction with Advanced Techniques and Examples by Barton and Nackman
Don't be put off by its age, it's excellent. Numerical Recipes in your favourite language (so long as it is C,C++ or Fortran) is compendious, and excellent for learning from, not always the best algorithms for each problem.
I also like
Parallel Scientific Computing in C++ and MPI: A Seamless Approach to Parallel Algorithms and their Implementation by Karniadakis
The sooner you start parallel computing the better.
我的第一个建议是,你看看你的特定领域的前 5 名大学,看看他们教授什么以及教授使用什么进行研究。这就是您如何发现相关的语言/方法。
另外看看这个stackoverflow问题(“practices-for-programming -在科学环境中”)。
您正在做统计/金融建模吗? 我自己在该领域使用 R,它很快就成为标准用于统计分析,特别是在社会科学领域,但在金融领域也是如此(例如,请参见http://rinfinance.com)。 Matlab 可能在工业中应用更广泛,但我感觉这可能正在改变。如果性能是一个主要因素,我只会将 C++ 作为最后的手段。
查看这些相关问题,以帮助查找与 R 相关的阅读材料:
就统计和金融相关的书籍推荐而言,我仍然认为最好的通用选项是David Ruppert 的《统计与金融》 (您可以在这里找到大部分 R 代码 和 您可以在此处找到大部分 R 代码 和 作者的网站有 matlab 代码)。
最后,如果您的科学计算不是统计性的,那么我实际上认为 Mathematica 是最好的工具。它似乎在程序员中很少被提及,但在我看来,它是纯科学研究的最佳工具。它对积分和偏微分方程等方面的支持比 matlab 更好。他们在 Wolfram 网站上有一个不错的书籍列表。
My first suggestion is that you look at the top 5 universities for your specific field, look at what they're teaching and what the professors are using for research. That's how you can discover the relevant language/approach.
Also have a look at this stackoverflow question ("practices-for-programming-in-a-scientific-environment").
You're doing statistical/finance modeling? I use R in that field myself, and it is quickly becoming the standard for statistical analysis, especially in the social sciences, but in finance as well (see, for instance, http://rinfinance.com). Matlab is probably still more widely used in industry, but I have the sense that this may be changing. I would only fall back to C++ as a last resort if performance is a major factor.
Look at these related questions for help finding reading materials related to R:
In terms of book recommendations related to statistics and finance, I still think that the best general option is David Ruppert's "Statistics and Finance" (you can find most of the R code here and the author's website has matlab code).
Lastly, if your scientific computing isn't statistical, then I actually think that Mathematica is the best tool. It seems to get very little mention amongst programmers, but it is the best tool for pure scientific research in my view. It has much better support for things like integration and partial differential equations that matlab. They have a nice list of books on the wolfram website.
在语言方面,我认为您的覆盖范围很广。 Python 非常适合实验和原型设计,Mathematica 非常适合帮助理论知识,如果您需要进行认真的数字运算,则可以使用 C/C++。
我还可能建议你培养对汇编语言和函数式语言(例如 Haskell)的欣赏,并不是真的要使用,而是因为它们对你的编程技能和风格以及它们带回家的概念的影响给你。它们可能有一天也会派上用场。
我还认为学习并行编程(并发/分布式)至关重要,因为这是获得科学问题有时所需的计算能力的唯一方法。无论您是否真正使用函数式语言来解决问题,接触函数式编程在这方面都会非常有帮助。
不幸的是,我在阅读方面没有太多建议,但您可能会找到科学家和工程师数字信号处理指南 有帮助。
In terms of languages, I think you have a good coverage. Python is great for experimentation and prototyping, Mathematica is good for helping with the theoretical stuff, and C/C++ are there if you need to do serious number crunching.
I might also suggest you develop an appreciation of an assembly language and also a functional language (such as Haskell), not really to use, but rather because of the effect they have on your programming skills and style, and of the concepts they bring home to you. They might also come in handy one day.
I would also consider it vital to learn about parallel programming (concurrent/distributed) as this is the only way to access the sort of computing power that sometimes is necessary for scientific problems. Exposure to functional programming would be quite helpful in this regard, whether or not you actually use a functional language to solve the problem.
Unfortunately I don't have much to suggest in the way of reading, but you may find The Scientist and Engineer's Guide to Digital Signal Processing helpful.
我是一名科学程序员,刚进入该领域近两年。我更喜欢生物学和物理建模,但我敢打赌你正在寻找的东西非常相似。当我申请工作和实习时,有两件事我认为知道并不那么重要,但却导致我最终错失了机会。其中之一是 MATLAB,已经提到过。另一个是数据库设计——无论您处于 SP 的哪个领域,都可能会有大量数据需要以某种方式进行管理。
Michael Hernandez 的Database Design for Mere Mortals一书被推荐给我,因为这是一个良好的开始,并在我的准备过程中帮助了我很多。我还会确保您至少了解一些基本的 SQL(如果您还不了解的话)。
I'm a scientific programmer who just entered the field in the past 2 years. I'm into more biology and physics modeling, but I bet what you're looking for is pretty similar. While I was applying to jobs and internships there were two things that I didn't think would be that important to know, but caused me to end up missing out on opportunities. One was MATLAB, which has already been mentioned. The other was database design -- no matter what area of SP you're in, there's probably going to be a lot of data that has to be managed somehow.
The book Database Design for Mere Mortals by Michael Hernandez was recommended to me as being a good start and helped me out a lot in my preparation. I would also make sure you at least understand some basic SQL if you don't already.
我建议任何一本有用的数字食谱书(选择一种语言)。
根据您使用的语言或者您是否要进行可视化,可能会有其他建议。
我非常喜欢的另一本书是数值方法的面向对象实现,作者:迪迪埃·贝塞特。他展示了如何在 Java 和 Smalltalk 中计算许多方程,但更重要的是,他在帮助展示如何优化在计算机上使用的方程以及如何处理由于计算机限制而产生的错误方面做得非常出色。
I would suggest any of the numerical recipes books (pick a language) to be useful.
Depending on the languages you use or if you will be doing visualization there can be other suggestions.
Another book I really like is Object-Oriented Implementation of Numerical Methods, by Didier Besset. He shows how to do many equations in Java and smalltalk, but what is more important is that he does a fantastic job with helping to show how to optimize equations for use on a computer and how to deal with errors because of limitations on the computer.
Donald Knuth 关于半数值算法的书。
Donald Knuth's book on seminumerical algorithms.
MATLAB 广泛应用于设计、快速开发甚至生产应用程序中(我当前的项目有一个 MATLAB 生成的 DLL,用于执行一些高级数字运算,这比我们的原生 C++ 更容易完成,并且我们的 FPGA 使用 MATLAB 生成的信号处理核心,这比在 VHDL 中手动编码要容易得多)。您可能还会感兴趣MATLAB 金融工具箱。
这并不是说 MATLAB 是您所在领域的最佳选择,但至少在工程领域,它被广泛使用,并且不会很快普及。
MATLAB is widely used in engineering for design, rapid development, and even production applications (my current project has a MATLAB-generated DLL for doing some advanced number crunching that was easier to do than in our native C++, and our FPGAs use MATLAB-generated cores for signal processing too, which is much easier than coding the same by hand in VHDL). There's also a financial toolbox for MATLAB that may be of interest to you.
This is not to say that MATLAB is the best choice for your field, but at least in engineering, it's widely used and not going anywhere soon.
科学程序员面临的一个问题是维护代码(和数据)存储库,其他人可以使用它来重现您的实验。根据我的经验,这是商业开发中不需要的技能。
以下是有关此内容的一些读物:
这些位于计算生物学的背景,但我认为它适用于大多数科学编程。
另外,请参阅计算科学的 Python 脚本。
One issue scientific programmers face is maintaining a repository of code (and data) that others can use to reproduce your experiments. In my experience this is a skill not required in commercial development.
Here are some readings on this:
These are in the context of computational biology but I assume it applies to most scientific programming.
Also, look at Python Scripting for Computational Science.
好的,这是我出于相同目的而使用的书籍清单:
数值方法科学家和工程师
数值食谱第三版:科学计算的艺术
CUDA 示例:通用 GPU 编程简介
使用 OpenMP:可移植共享内存并行编程(科学和工程计算)
使用 MPI 和 OpenMP 进行 C 并行编程
Donald Knuth:半数值算法,计算机编程艺术第 2 卷
另外,我发现自己最近使用的是 R 而不是 Python。
Ok here's my list of books that I've been using for the very same purpose:
Numerical Methods for Scientists and Engineers
Numerical Recipes 3rd Edition: The Art of Scientific Computing
CUDA by Example: An Introduction to General-Purpose GPU Programming
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Parallel Programming in C with MPI and OpenMP
Donald Knuth: Seminumerical Algorithms, Volume 2 of The Art of Computer Programming
Also I found myself using R rather than Python lately.