CSE8803: Project Guideline and Resources

Project resources

How to come up with good project ideas?

Here are some strategies that can be helpful for coming up with good ideas for course projects. Of course, you're not limited to the strategies, you know your own background and interests the best, so it will always be you who can come up with the best idea that suits your background and interests.

  • Strategy 1: Address some drawbacks of existing models. Think of the drawbacks of existing deep learning models in specific aspects. For example, maybe the method cannot work for streaming data, can you design an online version of that model, to avoid training the model from scratch every time? Or maybe the model is very slow to train, can you speed up its training process? Or maybe the model is very large, can you try existing model pruning and compression techniques to reduce the size? Or maybe the model is not robust to minor perturbations?
  • Strategy 2: Empirical & comprehensive studies to gain deeper insights about our learned methods. Every paper shows us their method is the winner in the experiment section… but is it really the whole story? Do they really work well for all kinds of datasets and applications? You can do empirical studies to find out the answers. The goal here is not to just apply to the existing model to some other datasets and obtain another set of numbers. Instead, the goal is to obtain insights, insights that help us better understand the strengths and weaknesses of different methods more comprehensively. In what scenarios do those methods work? In what scenarios do they not work? And why? Can you convince us?
  • Strategy 3: Work on a benchmark for text data and NLP. There are popular benchmarks like SQUAD (https://rajpurkar.github.io/SQuAD-explorer/) or GLUE (https://gluebenchmark.com/leaderboard). Can you draw insights from the leading models on those leader boards? Do think you can combine some of those ideas to create a more powerful method? The goal is not to get to the top position in those leader boards (computation resources matter a lot for these leaderboards), but to research, create new ideas, and validate those ideas using these leader boards.
  • Strategy 4: Apply our learned models to solve problems in specific domains. For example, maybe we want to use social media data to better understand people's reactions to COVID-19, or maybe we want to acquire insights from biomedical research papers. What are the challenges of applying existing models to such specific domains and tasks? For example, maybe the text genres are unique that limit the model's performance, or maybe we lack domain-specific labels. And what new knowledge can we gain for these domains by applying our methods?

How to find proper datasets for my project?

There are different ways to obtain proper datasets for your project:

  1. Online public datasets. Here are several compiled dataset lists that you can use for your project:
  2. Kaggle is a great source for finding different datasets:
  3. Collect your own data! For example, you can crawl Twitter data using their public API.

Project report

What content should the final project report cover?

  • Some key ingredients in your project report:
    • What is the problem you are trying to solve? and why is this problem interesting?
    • What data are you using?
    • What's the method you are using?
    • What are the results you attain?
    • What are the key takeaways and conclusion from this project?

What should be the format of the final project?

  • Please use single-column, single-space, 1-inch margin, font size 10pt/11pt for your report. Using latex is highly recommended, although not mandatory. you could find an example latex template here.

How long should the project report be?

  • We do not have strict requirements on length. Concise and clear reports are always preferred. Typically, a final report is between 6-10 pages, including references.