Experience

Research Assistant and Ph.D in Computer Science Engineering at The Chinese University of Hong Kong.

2021–2026

GPU Optimization, Agentic System, LLM Post-training.

Software Engineer at Huawei 2012 Labs.

2018.10–2020.6

Huawei's GPU/NPU Project: operator, runtime, compiler optimization. Work based on Shanghai and Hangzhou, quit the job during COVID-19.

B.Eng of Computer Science and Automation at Chongqing University.

2014–2018

Member of MSRA Student Association; Member of DJI RoboMaster Team.

Research

Search-R3: Unifying reasoning and embedding in large language models — Y Gui et al.

under-review

Post-trains LLMs to produce latent embedding representations.

PilotANN: Memory-bounded gpu acceleration for vector search — Y Gui et al.

SIGKDD

GPU acceleration for vector search in RAG pipelines.

SPT: Fine-tuning transformer-based language models efficiently with sparsification — Y Gui et al.

preprint

A sparse attention approach developed concurrently with DeepSeek Sparse Attention (DSA), discontinued as DSA proved superior.

HGL: Accelerating heterogeneous GNN training with holistic representation and optimization — Y Gui et al.

SC

20× GPU speedup by compiler optimization on traditional neural networks.

Vertex-centric visual programming for graph neural networks — Y Wu, Y Gui, et al.

SIGMOD

An MLOps platform for recommendation systems.

Featured Projects
Search-R3

Search-R3, a novel post-training framework that adapts LLMs to generate search embeddings directly through their chain-of-thought reasoning process.

Qwen3.5-Sonnet-9B

Qwen3.5-Sonnet-9B, an offline-RL distilled LLM for agentic coding. FP8-quantized, fits on a single 24 GB GPU with 200K context, optimized for long tool-call chains in agents like OpenCode and Claude Code.