Ge Lee
Ge — pronounced like the letter `g`, or `guh` as in 李格.
I'm a PhD candidate in Computer Science @ RMIT University, supervised by Prof. Zhifeng Bao and Dr. Shixun Huang. I also work with Dr. Yanchang Zhao @ CSIRO Data61, and I'm currently a visiting researcher in the Data Science discipline @ The University of Queensland.
My research sits at the intersection of data management and machine learning. I study how to make large, uncurated data collections easier to understand and work with. My recent work focuses on tabular data, discovering overlap between tables for scalable retrieval and deduplication. Increasingly, I am extending this toward cost-aware agentic workflows for selective and efficient data processing under practical budget and latency constraints. Throughout, the aim is the same: data management that is fast enough to use, affordable enough to run, and reliable enough to trust.
Publications
-
2027
-
Alignment-Guided Largest Table Overlap Size Estimation
Ge Lee, Shixun Huang, Zhifeng Bao, Shazia Sadiq, Yanchang Zhao.
Large table repositories need fast overlap estimates for blocking and query-by-table retrieval, but exact computation is too expensive and existing estimators struggle under structural variation and domain shift. This work introduces ALORE, an alignment-guided, hypergraph-based estimator that predicts table overlap size accurately and efficiently across heterogeneous repositories.
-
Alignment-Guided Largest Table Overlap Size Estimation
-
2026
-
Shape-Agnostic Table Overlap Discovery: A Maximum Common Subhypergraph Approach
Ge Lee, Shixun Huang, Zhifeng Bao, Felix Naumann, Shazia Sadiq, Yanchang Zhao.
Tables often share content despite reordered rows, columns, and missing metadata, but existing rectangular overlap definition miss many valid matches. This work introduces SALTO, a shape-agnostic notion of table overlap, and HyperSplit, a hypergraph-based algorithm that finds exact cell-level overlaps efficiently for copy detection, deduplication, and version comparison.
-
AgenticScholar: Agentic Data Management with Pipeline Orchestration for Scholarly Corpora
Hai Lan, Tingting Wang, Zhifeng Bao, Guoliang Li, Daomin Ji, Ge Lee, Feng Luo, Zi Huang, Hailang Qiu, Gang Hua.
Scholarly analysis increasingly requires reasoning across papers, where evidence is scattered across text, tables, figures, code snippets, citations, and bibliographic context, while questions evolve from retrieval into multi-step synthesis, comparison, trend tracing, and idea exploration. AgenticScholar compiles natural-language requests into evidence-grounded executable DAG workflows, supporting paper retrieval, structured extraction, cross-paper synthesis with ranking and inconsistency checking, trend analysis, milestone paper selection, and under-explored problem–method discovery. Its agentic core unifies a structure-aware scholarly knowledge base with hybrid planning and reusable operators, while exposing plans, intermediate results, and data lineage for traceability.
-
Shape-Agnostic Table Overlap Discovery: A Maximum Common Subhypergraph Approach
-
2025
-
Representative Time Series Discovery for Data Exploration
Ge Lee, Shixun Huang, Zhifeng Bao, Yanchang Zhao.
Large time series collections need compact summaries for exploration, but existing methods lack controllable similarity-bounded coverage. This work introduces RTSD, which finds the smallest set of representative time series that collectively covers a user-specified proportion of the data, and MLGreedyET, a self-supervised greedy framework that solves it with low time and memory costs.
-
Representative Time Series Discovery for Data Exploration
-
2024
-
Cost-effective Data Labelling for Graph Neural Networks
Shixun Huang, Ge Lee, Zhifeng Bao, Shirui Pan.
-
Cost-effective Data Labelling for Graph Neural Networks
Education
-
PhD, Computer Science
RMIT University, 2023–present
-
Bachelor of Computer Science (Hons)
RMIT University, 2022
-
Bachelor of Computer Science
RMIT University, 2019–2021
Awards
- CSIRO Data61 Top-Up Scholarship, 2023
- RMIT Vice-Chancellor's PhD Scholarship, 2023
- RMIT Vice-Chancellor's List for Academic Excellence, 2021 & 2022
Teaching
-
Introduction to Information Systems (INFS7900)
Casual Academic, The University of Queensland, Sem 1 2026
Service
- Reviewer, KDD 2024