Teaching Coding Bootcamp for Scientists
Recently, I’ve had the chance to teach several Python coding boot camps. Several have been “attend if you want” optional workshops, and most recently, I did one as part of an NSF CAREER funded (not my grant) institutional bridge program. It’s been an interesting journey, and I wanted to summarize a few thoughts about it.
Motivation
First off: why a coding boot camp? Put simply, I think college students need to learn at least a bit of coding, to stay competitive in today’s job market. Particularly in the technical / STEM fields, it’s highly likely that a given student will either (a) directly need to write code, (b) use someone else’s code, or (c) do work that could be made vastly easier with a little automation. In either case, a smidgen of knowledge about coding, what it can do, what it can’t, and a familiarity to avoid being afraid of code, is going to do students well.
In addition to those altruistic reasons, the fact of the matter is that I enjoy coding, and increasing my own expertise. Having a concrete goal, like these workshops, gives me the motivation to really improve, and to make sure that what I’m doing matches best practices.
Why Python? Python makes programming fun again. I enjoy coding in Python; the syntax is clean, the available libraries and communities are incredibly robust, and it’s a very good genera-purpose language for most things I’m interested in: science, data analysis, machine learning, task automation, and helper utilities.
Where to start
When I first decided to teach a Python bootcamp, like any good scientist, I knew to stand on the shoulders of giants. So I did some searching, and found an excellent Introductory course in Python for scientists from the Software Carpentry Foundation.
It uses the framework of a scientist examining patient inflammation data as a framework to teach core python concepts, even the venerable Python idioms like:
first, second = 'Grace', 'Hopper' | |
third, fourth = second, first | |
print(third, fourth) |
It has a good introduction to the very basics of programming, and very quickly gets to using matplotlib
and graphing to help analyze data. However, it’s a bit clunky, and doesn’t get into what I consider the heart of programming — functions — until later than it needs to. However, it’s an incredible resource, and I’m glad I found it.
The early approach
Suitably armed with some great ideas, a general template, and a passion for education, it was time to write up some lessons.
In my first iteration, I veered too far into trying to explain with words instead of demonstrating. I wrote code like:
## Let's start with a function/shortcut to collect user data. | |
def GetData(): # this line tells the computer that the shorthand for all these steps | |
# is called "GetData". When I later type "GetData()", I mean | |
# "do all this stuff". | |
print("Non-Linear Fitting for Dose-Response Data\n") # this shows a nice title on | |
# the screen to start | |
input_x = raw_input("Type X values, comma seperated, then hit return\n") | |
# this line does a lot! It prints a prompt for X data to the user, | |
# then grabs that data and saves it as "input_x" | |
# the user will input data, so it will receive something like: | |
# 1,2,3,4,5 |
In hindsight, yikes! That approach is tiring to read, and isn’t teaching the right lesson– good code should be self-evident enough in function that it doesn’t need those kind of comments.
A more refined approach
The best workshop was the latest one, enabled through the NSF CAREER grant proposal, and it was focused in a way that previous efforts had not been– students had their own data (EKG responses from crickets) that they needed to analyze. I think that immediacy of need, and clear use case, helped me build a better focus.
As before, you have to build the basics first, but it was nice to be able to do a “functions first” approach, such as:
def make_EKG_dataframe_from_raw( | |
list_of_times: List[str], list_of_responses: List[float] | |
) -> pd.DataFrame: | |
"""Create a dataframe from the raw recorded EKG data.""" | |
times_as_dt = [ | |
datetime.datetime.strptime(each, "%Y-%m-%d %H:%M:%S") for each in list_of_times | |
] | |
df = pd.DataFrame.from_dict( | |
{key: val for key, val in zip(times_as_dt, list_of_responses)}, orient="index" | |
) | |
return df |
What’s next?
Based on that experience, I’m going to rework my slides and exercises to have a more focused goal in the next workshop that I teach (hopefully in September or October this year), and really work on getting the reusability of functions first and foremost on the minds of the students.
Resources:
I’ve found a number of resources useful in introducing Python to scientists with limited or no programming background. Here’s a few: