CX4240: Introduction to Computational Data Analysis (2025 Spring)

Table of Contents

Logistics

  • Lecture time: Mons and Weds, 3:30pm-4:45pm
  • Location: Howey Physics L2
  • Instructors: Chao Zhang
  • Teaching Assistants:
    • Aditi Agarwal <aditiagarwal@gatech.edu>
    • Aditya Mavle <amavle3@gatech.edu>
    • Rui Feng <rfeng@gatech.edu>
    • Vanshika Shah <vshah316@gatech.edu>
  • Office Hours:
    • Office Hour 1: Mon 2-3pm @ KACB 3121 (online option: link)
    • Office Hour 2: Thu 1-2pm @ KACB 3121 (online option: link)
  • Piazza: https://piazza.com/gatech/spring2025/cx4240

Course Content

Q: What will be covered in this course? A: This course introduces techniques for computational data analysis, with an emphasis on machine learning algorithms and their applications to real-world data. On the technique side, we will cover key machine learning methods (linear regression, logistic regression, neural networks, tree-based models) and self-supervised learning for foundation models. On the application side, it will introduce various applications of these techniques. It will introduce how to formulate real-world tasks as data analysis problems, key methods for solving these problems, and their advantages and disadvantages.

Q: Who will benefit from this course? A: The learning objective is that by the end of this course, the students are able to formulate their data analysis problems at hand, choose appropriate computational models to acquire insights from data automatically, and even come up with innovative solutions for solving open problems in this field. The course will be helpful for students who want to solve practical problems using machine learning and data science techniques. The course will provide useful techniques for students who want to do edge-cutting research in data mining, machine learning, natural language processing, and others.

Q: What are the prerequisites? A: Prerequisites for this course include 1) solid knowledge of probability, statistics, calculus, and linear algebra; 2) basic knowledge of machine learning; 3) solid programming skills, preferably in Python.

Schedule

Date Topic Due  
  Module 1: Background    
01/06/2025 Course Overview    
01/08/2025 ML Basics, Optimization and MLE Piazza Signup  
01/13/2025 Hands-On Tutorial: Python for Data Analysis    
  Module 2: Linear Models    
01/15/2025 Linear Regression HW1 Out  
01/20/2025 No Class (Martin Luther King Day)    
01/22/2025 Linear Regression    
01/27/2025 Logistic Regression    
01/29/2025 Naïve Bayes Classifier    
02/03/2025 Project Guidelines HW1 Due  
02/05/2025 Feature Design and Text Data HW2 Out  
02/10/2025 Hands-On Tutorial: Linear Models    
  Module 3: Neural Networks    
02/12/2025 Neural Networks    
02/17/2025 Neural Networks HW2 Due  
02/19/2025 CNNs and RNNs HW3 Out  
02/24/2025 Transformers    
02/26/2025 Hands-On Tutorial: Neural Networks    
  Module 4: Tree Models    
03/03/2025 Decision Trees    
03/05/2025 Random Forest HW3 Due  
03/10/2025 Hands-On Tutorial: Tree Models    
03/12/2025 Midterm Exam    
03/17/2025 No Class (spring break)    
03/19/2025 No Class (spring break)    
  Module 5: Large Language Models    
03/24/2025 Large Language Model (LLM)    
03/26/2025 LLM Instruction Fine-Tuning Project presentation signup  
03/31/2025 LLM Alignment    
04/02/2025 Hands-On Tutorial: LLMs    
  Module 6: Projects    
04/07/2025 Project presentation    
04/09/2025 Project Presentation    
04/14/2025 Project Presentation    
04/16/2025 Project Presentation    
04/21/2025 No Class Project Report Due  

Grading

Homework (30%)

There will be three assignments, each account for 10% towards your final score. Each assignment includes written analysis and/or programming for testing your understanding of the taught content.

  • Late policy: Assignments are due at 11:59PM of the due date. You will be allowed 2 total late days (48 hours) without penalty for the entire semester (for homework only, not applicable to exams or projects). Once those days are used, you will be penalized according to the following policy:
    • Homework is worth full credit before the due time.
    • It is worth 75% credit for the next 24 hours.
    • It is worth 50% credit for the second next 24 hours.
    • It is worth zero credit after that.
  • Follow the Georgia Tech Academic Honor Code.

Project (30%)

You need to complete a project on using computational data analysis techniques to come up with solutions on machine learning for finance. Each project needs to be completed in a team of 4-6 people.

Exam (40%)

One exam will be held on March 12 in lieu of the regular class:

  • It will be a closed-book exam, so no notes or communication with peers is allowed.
  • There will be no make-up exams, so be sure to attend on the scheduled date. Missing the exam will result in zero credit.

Resources

Other resources, such as machine learning toolboxes and datasets, will be provided throughout the course.