Low-Resource Text Mining

Text data account for more than 80% of all data in organizations and play a critical role in countless domains. But success stories of existing text mining and natural language processing tools still rely on excessive labeled data, which are often too costly to obtain in practice. The goal of this project is to develop next-generation text mining methods that turn massive text data into actionable knowledge with limited human supervision. We study an array of fundamental text mining tasks, such as text classification, event extraction, and taxonomy construction. Departing from prevailing supervised models for these tasks, our methods require little human supervision yet still achieve inspiring performance.

Representative publications:

Combining the above pieces, we have also developed the TextCube system, which facilitates on-demand text mining and learning with little human labeling effort. The following video provides a detailed introduction to the system.