Teaching Coding Bootcamp for Scientists

Recently, I’ve had the chance to teach several Python coding boot camps. Several have been “attend if you want” optional workshops, and most recently, I did one as part of an NSF CAREER funded (not my grant) institutional bridge program. It’s been an interesting journey, and I wanted to summarize a few thoughts about it.

Motivation

First off: why a coding boot camp? Put simply, I think college students need to learn at least a bit of coding, to stay competitive in today’s job market. Particularly in the technical / STEM fields, it’s highly likely that a given student will either (a) directly need to write code, (b) use someone else’s code, or (c) do work that could be made vastly easier with a little automation. In either case, a smidgen of knowledge about coding, what it can do, what it can’t, and a familiarity to avoid being afraid of code, is going to do students well.

In addition to those altruistic reasons, the fact of the matter is that I enjoy coding, and increasing my own expertise. Having a concrete goal, like these workshops, gives me the motivation to really improve, and to make sure that what I’m doing matches best practices.

Why Python? Python makes programming fun again. I enjoy coding in Python; the syntax is clean, the available libraries and communities are incredibly robust, and it’s a very good genera-purpose language for most things I’m interested in: science, data analysis, machine learning, task automation, and helper utilities.

Where to start

When I first decided to teach a Python bootcamp, like any good scientist, I knew to stand on the shoulders of giants. So I did some searching, and found an excellent Introductory course in Python for scientists from the Software Carpentry Foundation.

It uses the framework of a scientist examining patient inflammation data as a framework to teach core python concepts, even the venerable Python idioms like:

first, second = 'Grace', 'Hopper'
third, fourth = second, first
print(third, fourth)

It has a good introduction to the very basics of programming, and very quickly gets to using matplotlib and graphing to help analyze data. However, it’s a bit clunky, and doesn’t get into what I consider the heart of programming — functions — until later than it needs to. However, it’s an incredible resource, and I’m glad I found it.

The early approach

Suitably armed with some great ideas, a general template, and a passion for education, it was time to write up some lessons.

In my first iteration, I veered too far into trying to explain with words instead of demonstrating. I wrote code like:

## Let's start with a function/shortcut to collect user data.
def GetData(): # this line tells the computer that the shorthand for all these steps
# is called "GetData". When I later type "GetData()", I mean
# "do all this stuff".
print("Non-Linear Fitting for Dose-Response Data\n") # this shows a nice title on
# the screen to start
input_x = raw_input("Type X values, comma seperated, then hit return\n")
# this line does a lot! It prints a prompt for X data to the user,
# then grabs that data and saves it as "input_x"
# the user will input data, so it will receive something like:
# 1,2,3,4,5

In hindsight, yikes! That approach is tiring to read, and isn’t teaching the right lesson– good code should be self-evident enough in function that it doesn’t need those kind of comments.

A more refined approach

The best workshop was the latest one, enabled through the NSF CAREER grant proposal, and it was focused in a way that previous efforts had not been– students had their own data (EKG responses from crickets) that they needed to analyze. I think that immediacy of need, and clear use case, helped me build a better focus.

As before, you have to build the basics first, but it was nice to be able to do a “functions first” approach, such as:

def make_EKG_dataframe_from_raw(
list_of_times: List[str], list_of_responses: List[float]
) -> pd.DataFrame:
"""Create a dataframe from the raw recorded EKG data."""
times_as_dt = [
datetime.datetime.strptime(each, "%Y-%m-%d %H:%M:%S") for each in list_of_times
]
df = pd.DataFrame.from_dict(
{key: val for key, val in zip(times_as_dt, list_of_responses)}, orient="index"
)
return df

What’s next?

Based on that experience, I’m going to rework my slides and exercises to have a more focused goal in the next workshop that I teach (hopefully in September or October this year), and really work on getting the reusability of functions first and foremost on the minds of the students.

 

Resources:

I’ve found a number of resources useful in introducing Python to scientists with limited or no programming background.  Here’s a few:

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.