Intro to Python for data analytics

Intro and Objectives

Python is a full-featured programming language used for all kinds of things. Our focus will be on learning Python from the perspective of doing data analytics work. Python has several advantages for data science work:

  • easy to learn

  • tons of data science related libraries

  • many Python data science tutorials

  • Python, being a full featured language, is much more powerful than R for many of the data science related tasks you’ll face

  • it is widely used

In this first part of the intro we’ll learn about fundamental Python programming concepts such as:

  • Jupyter notebooks and Jupyter lab

  • variables, numpy, math, peek at plotting

  • looping and conditional logic

  • lists and dictionaries

  • reading files

  • We’ll start learning to use the Spyder IDE as we start to create more complex programs.

I’m assuming that if you are in this class, you have some familiarity with computer programming. So, this is a “whirlwind” intro to Python. The syntax is really quite easy to learn and I focus instead on some of the concepts and data structures that those of you that just have a VBA background might not be familiar with such as lists, dictionaries, sequential reading of files. The Jupyter notebooks upon which this session is based are for you to use as a starting point for further exploration and learning.

Readings

StackOverflow is THE number one Q&A site for all things programming. There are tags for every conceivable programming language. It is essential that you learn how to ask good questions on sites like this or when asking questions of me or in our Help Forum.

Downloads

Inside the Download file, in addition to the files needed for this session, you’ll also find a folder with all of the Jupyter notebooks for the Whirlwind Tour of Python book listed above.

Activities

Note

Using Jupyter notebooks with conda virtual environments is an evolving thing. As of now (Fall 2024), and described at this part of our software page, I am using a dedicated conda virtual env named jupyter for launching Jupyter Lab.

$ conda activate jupyter

$ jupyter lab

Then, within Jupyter Lab, I’m changing the notebook kernel to our datasci conda virtual env.

But, as described at the link above, if you end up launching Jupyter Lab from the base conda env, you’ll be fine.

We will begin with an overview of Python and its use in data analytics. Then we’ll start to learn Python in the context of data analysis, by working through a number of Jupyter notebooks together. While working through the notebooks, the topic of Conda virtual environments will come up. Conda is the package and environment management system for the Anaconda Python distro. Here is a nice newbie intro to Conda virtual environments.

If you end up using Anaconda from Windows or Mac, eventually you’ll learn about creating virtual environments (see the pcda VM page). Until then, things will work just fine in the base environment. You may have to install a few libraries but that’s no big deal.

I didn’t make screencasts for these last two. There are very short. Just explore them.

  • 05-file-globbing-pcda.ipynb

    • processing a bunch of data files by globbing

  • 06-more-on-conditions-pcda.ipynb

    • if-then-elif-else logic

Explore