STATISTICAL PROGRAMMING - PYTHON Professor: IGNACIO LARRU MARTÍNEZ E-Mail: ilarru@faculty.ie.edu Ignacio Larrú is a freelance Python developer. His work involves developing new products and advising tech startups about automation of processes using Python. Additionally Ignacio is a tech lead investor and CFO in K Fund a venture capital fund especialized in technology startups. Ignacio is a former VP of investment baking at Credit Agricole and also works as a freelance consultant in the use of Big Data technologies to implement new business models. Previously he has worked as entrepreneur and founder of 6 different start-ups ranging from online retailers to complex software in the civil sector. He started his career as an IT consultant with PricewaterhouseCoopers developing software applications for leading financial institutions. Published by IE Publishing Department. Last revised, November 2016. 1
WHY THIS COURSE? It is easy to fall in love with Python given its simple syntax and coder-friendly structures. Since its appearance in 1991, Python has become one of the most popular interpreted programming languages. Among this interpreted languages Python has distinguished itself by its large scientific computing community and useful libraries for data manipulation. Python provides excellent capabilities for data analysis and interactive, exploratory computing and data visualization through its various dedicated libraries (primarily pandas, but also Num Py and matplotlib) This course will teach you the required general overview of the Python programming language coupled with specific use cases for data analysis OBJECTIVES The objectives of this course are as follows: Learn how to write Python programs Apply solid computer science design principles to our programs Learn the various data analysis specific functionalities in Python METHODOLOGY This course is organized around presentation of concepts, active discussions, programming assignments and class participation. Class participation is mandatory. Your voice is indispensable. It is important that you come to class prepared in order to enrich class discussions. BIBLIOGRAPHY There is no required book for the course but if students want to have a reference guide of the Python language they can use any of the various available resources on the Python language, for example: Python for Data Analysis Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney Python Programming: An Introduction to Computer Science 2nd Edition by John Zelle PROGRAM SESSION 1 In this session we will comment general course principles while we program our first program (hello world). Additionally in this session we will discuss simple data types and variable assignment in Python. 2
SESSION 2 CONDITIONAL EXECUTION, ITERATION AND PROGRAM PLANNING In this session we will learn how to code conditions (if/else) in our programs together with while/for loops and how to write pseudo code to help us plan our software designs. SESSION 3 LISTS, TUPLES, SETS AND DICTIONARIES Once we know how to manage the execution flow of our programs we will learn advanced data structures like lists and dictionaries SESSION 4 FUNCTIONS We don t want to code the same code every time we need to perform certain actions, in this sessions we will learn how to encapsulate code in functions so we can reuse them efficiently SESSION 5 FILES AND EXCEPTIONS In this session we will learn how to use files to persist the state of our program and how to manage exceptions to capture unexpected behavior in our execution flow SESSION 6 STRING, DATE MANIPULATION & REGULAR EXPRESSIONS String manipulation is heavily used in any data analysis project, in this session we will learn how to manage textual data and dates in Python. SESSION 7 OBJECT ORIENTED PROGRAMMING We can take code encapsulation one step further using object oriented design in our programs. In this session we will learn the basic principles of Object Oriented programming (objects, classes, properties and methods) as we will work with objects for the rest of the course. SESSION 8 OBJECT ORIENTED PROGRAMMING We can take code encapsulation one step further using object oriented design in our programs. In this session we will learn how SESSION 9 ADVANCED OBJECT ORIENTED PROGRAMMING In this session we will continue learning advanced OOP topics lke inheritance, duck typing and interfaces 3
SESSIONS 10 & 11 GUI DEVELOPMENT In this sessions we will learn how to develop graphical user interface using the tkinter module so our users can interact with our program easier though a window based interface. SESSION 12 NUM PY BASICS: ARRAYS AND VECTORIZED COMPUTATION In this session we will start with the data analysis functionalities of Python, from the multidimensional array object to linear algebra operations and random numbers SESSION 13 Pandas data structures are a powerful tool for data management in Python, in this session we will start working with them SESSION 14 PLOTTING AND VISUALIZATION Data visualization is a very important step in any data analysis project, in this session we will review plotting functions in Python (matplotlib module) to help us achieve our goals SESSION 15 DESCRIPTIVE AND INFERENTIAL STATISTICS In this session we will learn the electronic data analysis capabilities of Python together with the most common statistical test to validate our hypotheses SESSION 16 Regression analysis in Python In this session we will start the review of the SciPy module with its capabilities regarding regression analysis (linear and logistic) SESSION 17 CLASSIFICATION ALGORITHMS In this session we will discuss the various classifying algorithms in the SciPy module and how to evaluate them SESSION 18 CLUSTERING In this session we will comment the various clustering alternatives available in Python 4
SESSION 19 RECOMMENDATION In this session we will review ow to implement the collaborative filtering algorithms in Python SESSION 20 FInal Exam EVALUATION METHOD Final evaluation will be based on (1) engagement in the classroom, (2) final exam and (3) a course assignment to be prepared in groups, with a breakdown of percentage contribution as follows: Criteria Score % Class Participation 20% Individual work 50% Workgroups 30% 5