CX4240: Project Guidelines and Resources

How to come up with good project ideas?

Here are some strategies that can be helpful for coming up with good ideas for course projects. Of course, you’re not limited to the strategies, you know your own background and interests the best, so it will always be you who can come up with the best idea that suits your background and interests.

  • Strategy 1: Leverage some recent big advances in AI. Can you do some cool things with recent models such as ChatGPT and Stable Diffusion?
  • Strategy 2: Think about using data analysis techniques for specific domains and tasks. For example, maybe we want to use social media data to better understand people’s reactions to COVID-19, or maybe we want to use Machine Learning to predict stock market price, or maybe we want to acquire insights from biomedical research papers. What are the challenges of applying existing models to such specific domains and tasks? For example, maybe the data genres are unique that limit the model’s performance, or maybe we lack domain-specific labels. And what new knowledge can we gain for these domains by applying our methods?
  • Strategy 3: Think about the drawbacks of existing techniques and whether you can solve such drawbacks. Such drawbacks often arise in specific contexts. For example, maybe the techniques cannot work for streaming data, can you design an online version of that model, to avoid training the model from scratch every time? Or maybe the model is very slow to train, can you speed up its training process? Or maybe the model is very large, can you try existing model pruning and compression techniques to reduce the size?
  • Strategy 4: Empirical but comprehensive studies to gain better and deep insights into the methods. For the same problem, there are many different techniques that can be applied. What are their difference besides the mathematical properties we have learned in class? For what kind of data will one method be superior to another? You can do empirical studies to find out the answers. The goal here is not to just apply to the existing model to some other datasets and obtain dry numbers. Instead, the goal is to obtain insights, insights that help us better understand the strengths and weaknesses of different methods more comprehensively. In what scenarios do those methods work? In what scenarios do they not work? And why? Can you convince us?
  • Strategy 5: Use ChatGPT.

No matter which strategies you choose, always ask yourselves the following questions we listed in our first lecture:

  • What is the problem?
  • Is this problem a practical and interesting one?
  • What datasets will you use for the problem?
  • What are some potential data analytics solutions to the problem?
  • What is your idea/plan for solving the problem?
  • How to formulate the idea and design models?
  • How will you evaluate your model or validate your idea/hypothesis?
  • What’s your final conclusion/insights from this project?

How to find proper datasets for my project?

There are different ways to obtain proper datasets for your project:

  1. Online public datasets. Here are several compiled dataset lists that you can use for your project:
  2. Kaggle is a great source for finding different datasets:
  3. Collect your own data! Example here

Project report

What content should the final project report cover?

  • Some key ingredients in your project report:
    • What is the problem you are trying to solve? and why is this problem interesting?
    • What data are you using?
    • What’s the method you are using?
    • What are the results you attain?
    • What are the key takeaways and conclusion from this project?

What should be the format of the final project?

  • Please use single-column, single-space, 1-inch margin, font size 10pt/11pt for your report. Using latex is highly recommended, although not mandatory. you could find an example latex template here.

How long should the project report be?

  • We do not have strict requirements on length. Concise and clear reports are always preferred. Typically, a final report is between 5-8 pages, including references.