Xi Chen

Ph.D.-trained Data Scientist

Building scalable ML systems across intelligent manufacturing, autonomous driving, and RAG architectures

About Me

Ph.D.-trained Data Scientist with 5+ years of experience in developing scalable machine learning systems across intelligent manufacturing, autonomous driving, and RAG. Adept in bridging academic research and real-world deployment, with hands-on expertise in statistical machine learning, sequence modeling, uncertainty quantification, and agent-based architectures. Delivered high-impact solutions in industry, leading cross-functional ML product development with measurable gains in accuracy, efficiency, and user value.

Experience

Sep 2024 - Present

Senior Data Scientist

RAG Legal AI LLM NLP

LexisNexis

  • Architected a legal chatbot using a Retrieval-Augmented Generation (RAG) pipeline. Executed hybrid search and custom re-ranking strategies, increasing answer usefulness by >10%.
  • Led a squad to deliver a "Timeline" generation feature, reducing LLM API costs by 80% and improving quality ratings by 15%.
  • Prototyped "Doc Locator" feature outperforming Human Review Test Baseline by 13%, integrated as a Citation Agent in a custom ReAct-style workflow.
  • Built domain-specific evaluation dataset and programmatic evaluation tool, shortening development cycles by 70%.
Jun 2023 - Aug 2024

Research Associate

Deep Learning Autonomous Driving Graph Neural Networks Uncertainty Quantification

The University of Arizona

  • Proposed end-to-end sequence modeling framework for V2I cooperative perception, achieving 4% reduction in error rates (FDE/MR) on V2X-Seq benchmark.
  • Designed uncertainty module using Conformal Prediction for statistical coverage guarantees in safety-critical AI deployment.
  • Engineered custom architecture combining GNNs, Cross-Graph Attention, and multimodal decoder.
View Project
May 2021 - Dec 2022

Research Assistant

Trajectory Prediction Transformer Graph Attention Networks

U of Arizona x Intel Autonomous Driving Research Collaboration

  • Designed deep learning framework for multimodal trajectory forecasting using heterogeneous sensor data with cross-attention and Transformers.
  • Simulated driving scenarios in CARLA with synthetic sensor noise and communication dropouts.
  • Demonstrated 5% improvement in prediction accuracy under high connected-vehicle penetration scenarios.
Aug 2019 - Apr 2021

Bayesian Optimization & Tensor Regression

Statistical Machine Learning Gaussian Process Bayesian Optimization Kernel Learning

The University of Arizona

  • Proposed novel kernel function for Gaussian Processes that preserves structural information of tensorial spatial data.
  • Integrated Bayesian Optimization with Gaussian Process for efficient design space exploration and high-performing parameter identification.
  • Applied model to predict antenna performance based on 3D-printed geometric design tensors, demonstrating 8% improvement in prediction accuracy and sample efficiency.

Special Projects

Agentic RAG for Temporal Statute Validation

Engineered a "Temporal Verification" pipeline using LangGraph to autonomously audit historical caselaw snippets, validating quantitative entities against live statute APIs.

LangGraph AsyncIO RAG

Fine-tuning BLIP Model with LoRA

Fine-tuned multimodal BLIP model using LoRA for Visual QA tasks in autonomous driving scenarios, improving performance with efficient parameter tuning.

LoRA BLIP Hugging Face PyTorch

Education

Ph.D., Systems and Industrial Engineering

The University of Arizona

Full scholarship, 4.0/4.0 GPA

M.S., Data Science and Statistics

The University of Arizona

4.0/4.0 GPA

M.S., Control Science and Engineering

Beihang University

B.S., Quality and Reliability Engineering

Beihang University

B.S., Economics (Dual Degree)

Peking University

Technical Skills

Programming Languages

Python R SQL MATLAB Git CMake ROS SAS

LLM & RAG

GPT Hugging Face LLaMA FAISS LangChain Google ADK ReAct Agents LoRA Fine-tuning

Deep Learning & Machine Learning

PyTorch TensorFlow Keras PySpark Pandas NumPy Scikit-learn

Data Visualization

Matplotlib Plotly Seaborn Power BI Tableau ggplot

Contact