A Python-Centered Fast Track to Multidisciplinary Data Science for Mathematics Students

John Ringland

Mathematics Department, University at Buffalo

Math students, through their comfort with abstraction, have a lot to offer the practice of data analysis, but many have led a sheltered life with no exposure to data or even to the "STE" of STEM. This course is designed to provide for such students a rapid introduction to tools and methods for acquiring, storing, manipulating, exploring, and displaying data - both small and big. Applications are drawn including astronomy, bioinformatics, health care informatics, mathematics, physics, urban and regional planning, and data journalism. The course embodies the following four principles: 1. Students learning by doing. The course is taught in an intensively hands-on format: rarely does 10 minutes pass without the students actively engaged in an activity. 2. Open-ended data explorations can challenge and stimulate the interest of students at all levels of experience and ability. 3. Writing reports gives students opportunities to integrate knowledge, to discover ways they can express themselves, and to create products they can be proud of. 4. Non-proprietary software empowers for the long term. We seek to provide tools and skills that students can continue to use after graduation - at work and at home, regardless of their career path. The Python programming language stands alone in being completely free, powerful enough, broadly applicable enough, and easy enough to learn quickly, to facilitate the ambitious demands of our course. It has excellent facilities for text-processing and web-scraping, for sophisticated graphics, for tabular data-processing beyond the capacities of spreadsheets, and interfaces with databases of all kinds and with big-data frameworks like Hadoop and Apache Spark, and with machine learning systems like Google Tensorflow. Geographical Information Systems like QGIS (and its proprietary counterparts) are scriptable with Python. I will show examples of course modules and of work produced by students.