- Introduction to Python
- Getting started with Python and the IPython notebook
- Functions are first class objects
- Data science is OSEMN
- Working with text
- Preprocessing text data
- Working with structured data
- Using SQLite3
- Using HDF5
- Using numpy
- Using Pandas
- Computational problems in statistics
- Computer numbers and mathematics
- Algorithmic complexity
- Linear Algebra and Linear Systems
- Linear Algebra and Matrix Decompositions
- Change of Basis
- Optimization and Non-linear Methods
- Practical Optimizatio Routines
- Finding roots
- Optimization Primer
- Using scipy.optimize
- Gradient deescent
- Newton’s method and variants
- Constrained optimization
- Curve fitting
- Finding paraemeters for ODE models
- Optimization of graph node placement
- Optimization of standard statistical models
- Fitting ODEs with the Levenberg–Marquardt algorithm
- 1D example
- 2D example
- Algorithms for Optimization and Root Finding for Multivariate Problems
- Expectation Maximizatio (EM) Algorithm
- Monte Carlo Methods
- Resampling methods
- Resampling
- Simulations
- Setting the random seed
- Sampling with and without replacement
- Calculation of Cook’s distance
- Permutation resampling
- Design of simulation experiments
- Example: Simulations to estimate power
- Check with R
- Estimating the CDF
- Estimating the PDF
- Kernel density estimation
- Multivariate kerndel density estimation
- Markov Chain Monte Carlo (MCMC)
- Using PyMC2
- Using PyMC3
- Using PyStan
- C Crash Course
- Code Optimization
- Using C code in Python
- Using functions from various compiled languages in Python
- Julia and Python
- Converting Python Code to C for speed
- Optimization bake-off
- Writing Parallel Code
- Massively parallel programming with GPUs
- Writing CUDA in C
- Distributed computing for Big Data
- Hadoop MapReduce on AWS EMR with mrjob
- Spark on a local mahcine using 4 nodes
- Modules and Packaging
- Tour of the Jupyter (IPython3) notebook
- Polyglot programming
- What you should know and learn more about
- Wrapping R libraries with Rpy
文章来源于网络收集而来,版权归原创者所有,如有侵权请及时联系!
Exercises
1 . Write a function to find the complementary strand given a DNA sequence. For example
Given ATCGTTA Return TAGCAAT
Note: The following are complementary bases A|T, C|G.
# YOUR CODE HERE def complement(dna): """Return compelementary strand given DNA sequence.""" import string table = string.maketrans('actgACTG', 'tgacTGAC') return dna.translate(table) print complement('ATCGTTA')
TAGCAAT
2 . Write a regular expression that matches the following:
- Phone numbers with the format: (919)-1234567 (i.e. (123)-9876543 should match but not 234-1234567 or (123)-666666)
- Email addresss john.doe@duke.edu (i.e. steve@gmail.com should match but not steve@gmail )
- DNA seqences with the motif A-C-T-G where - indicates 0 or 1 other nucleotide (any of A,C,T or G)
# YOUR CODE HERE phone_pat = re.compile(r'\(\d{3}\)-\d{7}') for s in ['(123)-9876543', '234-1234567', '123)-666666)']: m = phone_pat.match(s) if m: print 'Mathced', s else: print 'Not matched', s
Mathced (123)-9876543 Not matched 234-1234567 Not matched 123)-666666)
Note: This is just for practice - actual email validators should not be using regular expressions because the rules for a valid eamil are insanely complex , and should probably be checked with a parser.
email_pat = re.compile(r'[\w]+[\.[\w]+]?@([\w]+\.)+[\w]+') for s in ['johm@', 'john.doe@duke.edu', 'steve@gmail.com', 'steve@gmail']: m = email_pat.match(s) if m: print 'Mathced', s else: print 'Not matched', s
Not matched johm@ Mathced john.doe@duke.edu Mathced steve@gmail.com Not matched steve@gmail
motif_pat = re.compile(r'A.?C.?T.?G') for s in ['GATTACA', 'ACTG', 'AACCTTGG', 'AAACCCTTTGGG']: m = motif_pat.match(s) if m: print 'Mathced', s else: print 'Not matched', s
Not matched GATTACA Mathced ACTG Mathced AACCTTGG Not matched AAACCCTTTGGG
3 . Download ‘Pride and Prejudice’ by Jane Austem from Project Gutenbrrg.
- Remove all punctuation and covert to lower case
- Count how many times the word ‘married’ appears
- Count how often the word ‘daughter’ and ‘married’ appear in the same 10-word window
# YOUR CODE HERE if not os.path.exists('pride_and_prejudice.txt'): ! curl 'http://www.gutenberg.org/cache/epub/1342/pg1342.txt' > 'pride_and_prejudice.txt'
import string with open('pride_and_prejudice.txt') as f: s = f.read() s = s.lower().translate(None, string.punctuation) words = s.split() size = 10 windows = list(partition(size, words)) print "'daughter' and 'married' appera %d times in the same 10-word window" % \ sum('daughter' in window and 'married' in window for window in windows) print "The word 'married' appears %d times" % s.count('married')
'daughter' and 'married' appera 5 times in the same 10-word window The word 'married' appears 61 times
4 . Download “The Gutenberg Webster’s Unabridged Dictionary” from Project Gutenbrrg
- First extract all defined words (109561 words) - oops I cannot replicate this number
- Count the number of defined English words containing 3 or more vowels (aeiou)
- Find all longest palindrome (a palindrome is a word that is spelt the same forwards as backwards - e.g. ‘deified’)
# YOUR CODE HERE # If you look at the plain text file, # it is quite hard to figure out how to extract a defined word. # We have more luck wiht the HTNL file. if not os.path.exists('websters.html'): ! curl 'www.gutenberg.org/cache/epub/29765/pg29765.html' > 'websters.html'
! head -n 400 websters.html | tail -n 30
# Notice that in the HTML, word definitions have the structure <p id="xxxxxxx">WORD</br> or <p id="xxxxxxx">WORD NEWLINE text = open('websters.html').read() word = re.compile(r'<p id="id\d+">([A-Z]+)[<br/>|\r\n+]') words = word.findall(text) count = 0 for word in words: if word.count('A') + word.count('E') + word.count('I') + word.count('O') + word.count('U') >= 3: count += 1 print "Number of words is %d" % len(words) print "Number of words with 3 or more vowels is %d" % count palindromes = [word for word in words if word == word[::-1]] lengths = map(len, palindromes) max_len = max(lengths) print "Longest palindromes are", [p for p in palindromes if len(p) == max_len]
Number of words is 103020 Number of words with 3 or more vowels is 69210 Longest palindromes are ['MALAYALAM']
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论