CX4240: Introduction to Computational Data Analysis (2021 Spring)

Table of Contents

Logistics

  • Lecture time: Mons and Weds, 3:30pm-4:45pm
  • Location: https://bluejeans.com/5341114422
  • Instructors: Chao Zhang
  • Teaching Assistant: Brad Baker <bbradt@gatech.edu> and Yue Yu <yueyu@gatech.edu>
  • Office Hours:
    • Instructor: Weds 2:30-3:20pm @ bluejeans.com/5341114422
    • TA Office Hour: Mons 2:30-3:20pm @ bluejeans.com/881278440
  • Piazza: https://piazza.com/gatech/spring2021/cx4240
    • Piazza will be the main place for course discussions and announcements. If you have questions, please ask it on Piazza first because 1) other students may have the same question; 2) you will get help faster than emails.
    • If it's something you do not like to discuss publicly on Piazza, send an email with CX4240 in the subject.

Course Content

Q: What will be covered in this course? A: This course introduces techniques for computational data analysis, with an emphasis on machine learning algorithms and their applications to real-world data. On the technique side, we will cover key supervised machine learning methods (linear regression, logistic regression, neural networks, tree-based models) and unsupervised method (k-means, Gaussian mixture models, expectation-maximization, dimension reduction). On the application side, it will introduce various applications of these techniques, particularly on text data analysis and natural language processing. It will introduce how to formulate real-world tasks as data analysis problems, key methods for solving these problems, and their advantages and disadvantages.

Q: Who will benefit from this course? A: The learning objective is that by the end of this course, the students are able to formulate their data analysis problems at hand, choose appropriate computational models to acquire insights from data automatically, and even come up with innovative solutions for solving open problems in this field. The course will be helpful for students who want to solve practical problems using machine learning and data science techniques. The course will provide useful techniques for students who want to do edge-cutting research in data mining, machine learning, natural language processing, and others.

Q: What are the prerequisites? A: Prerequisites for this course include 1) solid knowledge of probability, statistics, calculus, and linear algebra; 2) basic knowledge of machine learning; 3) solid programming skills, preferably in Python.

Schedule

Date Topic Due
01/18/2021 No Class (Martin Luther King Day)  
01/20/2021 Course Overview Piazza Signup
01/25/2021 Math Basics I  
01/27/2021 Math Basics II HW1 Out
02/01/2021 Data Analysis Toolbox  
02/03/2021 Example Projects  
02/08/2021 Linear Regression HW1 Due
02/10/2021 Linear Regression  
02/15/2021 Naïve Bayes Classifier HW2 Out
02/17/2021 Logistic Regression  
02/22/2021 Support Vector Machine  
02/24/2021 Neural Networks  
03/01/2021 Neural Networks HW2 Due
03/03/2021 CNNs and RNNs  
03/08/2021 Decision Trees HW3 Out
03/10/2021 Random Forest & Midterm Review  
03/15/2021 Midterm Exam Day  
03/17/2021 Clustering Analysis and K-Means  
03/22/2021 Hierarchical Clustering HW3 Due
03/24/2021 No Class (Mid-Semester Break)  
03/29/2021 Gaussian Mixture Model HW4 Out
03/31/2021 Dimension Reduction project presentation signup
04/05/2021 Application I: Text Embedding  
04/07/2021 Application II: Text Classification  
04/12/2021 Review Class HW4 Due
04/14/2021 Project Video Preparation presentation video due 04/15
04/19/2021 Project Presentation and Peer Grading  
04/21/2021 Final Exam Day  
04/26/2021 No Class project report due

Grading

Homework (40%)

There will be four assignments, each account for 10% towards your final score. Each assignment could be either programming or written analysis for testing your understanding of the taught content.

  • Late policy: Assignments are due at 11:59PM of the due date. You will be allowed 2 total late days (48 hours) without penalty for the entire semester (for homework only, not applicable to exams or projects). Once those days are used, you will be penalized according to the following policy:
    • Homework is worth full credit before the due time.
    • It is worth 75% credit for the next 24 hours.
    • It is worth 50% credit for the second next 24 hours.
    • It is worth zero credit after that.
  • Follow the Georgia Tech Academic Honor Code.

Project (30%)

You need to complete a project on using computational data analysis techniques to tackle a real-life data analysis problem. Each project needs to be completed in a team of 2-4 people. Here are some guidelines and resources for doing your project smoothly.

The breakdown of the 30% project score is as follows:

  • Project presentation (10%)
    • Every team need to make a YouTube video presentation for your project and post the link on canvas
    • The deadline for uploading the link to your video is 04/13 11:59pm ET; we will create pages for submitting video links on Canvas.
  • Project report (10%): you need to write up a final report for your project and submit on Canvas by 04/26 11:59pm. Here are some instructions and templates for the final project report
  • Project peer grading (10%): you need to grade 10 project presentations of other teams, each will count for 1 point. Please use this Google form for peer grading and see here some FAQs about peer grading

Midterm Exam (15%)

A take-home midterm exam will be held on March 15 in lieu of the regular class:

  • You will have a 24-hour time window to complete the exam. The exam will be released on Canvas at March 14 11:59PM ET, and closed at March 15 11:59PM ET.
  • The midterm exam will be a take-home and open-book exam. However, no peer communication is allowed—you may not message or collaborate with others, and that includes posting questions or answers on websites during the exam period.
  • There will be no make-up exams. You will get zero credit for your missed midterm exam.
  • We will release the detailed instructions before the exam.

Final Exam (15%)

A take-home final exam will be held on April 21 in lieu of the regular class:

  • You will have a 24-hour time window to complete the exam. The exam will be released on Canvas at April 20 11:59PM ET, and closed at April 21 11:59PM ET.
  • Similar to the mid-term exam, the final exam will be a take-home and open-book exam. However, no peer communication is allowed—you may not message or collaborate with others, and that includes posting questions or answers on websites during the exam period.
  • There will be no make-up exams. You will get zero credit for your missed midterm exam.
  • We will release the detailed instructions before the exam.

FAQs

  • For remote lectures, will there be recordings?
    • Yes, for every remote lecture and paper presentation, we will record on Bluejeans and shared a link after class.

Resources

Other resources, such as machine learning toolboxes and datasets, will be provided throughout the course.