Welcome Aboard! πŸ™Œ

MATH/COSC 3570 Introduction to Data Science

Dr. Cheng-Han Yu
Department of Mathematical and Statistical Sciences
Marquette University

Taipei, Taiwan

Taiwan location

My Journey

  • Assistant Professor (2020/08 - )

  • Postdoctoral Fellow

  • PhD in Statistics

  • MA in Economics/PhD program in Statistics

How to Reach Me

  • Office hours TuTh 4:50 - 5:50 PM and Wed 12 - 1 PM in Cudahy Hall 353.
  • πŸ“§
    • Answer your question within 24 hours.
    • Expect a reply on Monday if shoot me a message on weekends.
    • Start your subject line with [math3570] or [cosc3570] followed by a clear description of your question.
  • I will NOT reply your e-mail if … Check the email policy in the syllabus!

When You Have Two Dr. Yu in One Semester πŸ˜‚

::: - This is a real case happened last semester. A student of my class sent her homework of another class linear systems to me because the instructor of that course also has last name Yu, and she messed up. So if you remember to add course number in the subject line, it will greatly reduce the possibility of sending a wrong message. And it will save your time and my time, and so we all can work more efficiently. Right? :::

What is This Course?

  • Every aspect of doing a practical data science project, from importing data to deploying what we learn from data.

❓ What are prerequisites?
πŸ‘‰ COSC 1010 (Intro Programming) and MATH 4720 (Intro Stats) or MATH 2780 (Intro Regression)


❓ Is this like another intro stats course?
πŸ‘‰ No. Statistics and data science are closely related.

Nowadays
πŸ‘‰ Data science is a broader subject than statistics.

πŸ‘‰ Statistics focuses more on analyzing and learning from data, a part of the entire workflow of data science.


❓ Is this like another intro CS or programming course?
πŸ‘‰ Absolutely not. We learn how to code for doing data science, not for understanding computer systems and structures.

What is NOT Covered in This Course

  • Advanced data analytics and computing
    • MATH 4750 Statistical Computing
    • MATH 4760 Time Series Analysis
    • MATH 4780 Regression Analysis
    • MATH 4790 Bayesian Statistics
    • COSC 4600 Fundamentals of Artificial Intelligence
    • COSC 4610 Data Mining
    • COEN 4860 Introduction to Neural Networks
  • Big data: We start with small, in-memory data sets. You don’t know how to tackle big data unless you have experience with small data.
  • Database: You’ll learn SQL in
    • COSC 4800 Principles of Database Systems
    • INSY 4052 Database Management Systems.

What Computing Languages?

~ 70%

~ 30%

  • You’ve learned Python in COSC 1010. Being R-Python bilingual is getting more important!

πŸ‘‰ Wouldn’t it be great to add both languages to your resume! 😎

❌ If you do NOT want to learn R and/or Python, do NOT take this course! (3570 is offered every semester)

❌ Drop deadline: 01/24/2024 11:59 PM.

Course Materials

Course Website - https://math3570-s24.github.io/website/

  • All course materials

Learning Management System (D2L)

  • News

  • Assessments > Grades

Grading Policy ✨

  • 40% In-class lab exercises

  • 30% Homework

  • 30% Final project competition

  • Extra credit opportunities

  • ❌ You have to participate (in-person) in the final presentation in order to pass the course.
  • ❌ You will NOT be allowed to do any extra credit projects/homework/exam to compensate for a poor grade.

Grade-Percentage Conversion

  • \([x, y)\) means greater than or equal to \(x\) and less than \(y\). For example, 94.0 is in [94, 100] and the grade is A and 93.8 is in [90, 94) and the grade is A-.
Grade Percentage
A [94, 100]
A- [90, 94)
B+ [87, 90)
B [84, 87)
B- [80, 84)
C+ [77, 80)
C [74, 77)
C- [70, 74)
D+ [65, 70)
D [60, 65)
F [0, 60)

Lab Exercises (40%)

  • Graded as Complete/Incomplete and used as evidence of attendance and participation.

  • You are allowed to have two incomplete lab exercises without any penalty.

  • Beyond that, 2% grade percentage will be taken off for each missing/incomplete exercise.

  • You will create a RStudio project in Posit Cloud saving all of your lab exercises. (We’ll go through know-how together)

  • ❌ No make-up lab exercises for any reason.

Homework (30%)

  • The homework assignments are individual. Submit your own work.

  • ❌ You may not directly share or discuss answers/code with anyone other than the instructor. But you are welcome to discuss the problems in general and ask for advice.

  • Homework will be assigned through GitHub:
    • clone/pull the homework repo into Posit Cloud
    • work on the Quarto file in the repo (We’ll go through know-how together)
  • You will have at least one week to complete your assignment.

  • ❌ No make-up homework for any reason unless you got COVID or excused absence. πŸ™

Project (30%)

  • You will be doing a final group project.

  • Your project can be:

    1. Data analysis using statistical models or machine learning algorithms

    2. Introduce a R or Python package not learned in class, including live demo

    3. Introduce a data science tool (visualization, computing, etc) not learned in class, including live demo

    4. Web development: Website or dashboard for data science, including live demo

  • The final project presentation is on Monday, 5/6, 10:30 AM - 12:30 PM

  • More information will be released later.

Sharing/Reusing Code Policy

  • Unless explicitly stated otherwise, you may make use of any online resources, but you must explicitly cite where you obtained any code you directly use or use as inspiration in your solutions.
  • ❌ Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism, regardless of source. 😱

Academic Integrity

This course expects all students to follow University and College statements on academic integrity.

  • Honor Pledge and Honor Code: I recognize the importance of personal integrity in all aspects of life and work. I commit myself to truthfulness, honor, and responsibility, by which I earn the respect of others. I support the development of good character, and commit myself to uphold the highest standards of academic integrity as an important aspect of personal integrity. My commitment obliges me to conduct myself according to the Marquette University Honor Code.

Q & A

❓ Will this course require any Textbook(s) or other materials to be purchased?
πŸ‘‰ No required textbooks. All resources are free on line!


❓ Is it teaching the basics of data science?
πŸ‘‰ Yes, I think so.


❓ What do you think will be the most interesting part of the course?
πŸ‘‰ I love data visualization and web development.


❓ If this course is a joint label between COSC and MATH, is there a difference between any particular class section or other that we may have registered for?
πŸ‘‰ No difference. MATH 3570 and COSC 3570 are exactly the same course.


❓ What kind of time estimate do you believe most students should spend on reading + assignments for the course?
πŸ‘‰ Everyone is different. The more the better.

Q & A

❓ How accessible are you outside of class AND office hours?
πŸ‘‰ We can schedule a Teams/in-person meeting if you need.


❓ Will this class help me better understand how to code proficiently?
πŸ‘‰ As you learn to speak a foreign language, you need to code a lot, frequently and constantly in order to be proficient in any programming language. No shortcut.


❓ Do you know of any internships or research positions offered through Marquette University that incorporate the skills learned in this Data Science course?
πŸ‘‰ Quite many. Northwestern Mutual, Direct Supply, for example. I’ll share intern info with you if I get any.


❓ Your question.