Hi it's Yifan. I am a first-year CS Ph.D. student in Stony Brook University, advised by
Chenyu You.
My research interest broadly lies in Computer Vision, Machine Learning, Cognitive Science and Medical Image Analysis.
My current research focuses on the decoding of brain signals and the potential applications of generative models.
I am intrigued by the intersection of cognitive science and machine learning and I am committed to the development of reliable machine learning systems.
Previously, I graduated from ShanghaiTech University with a major in computer science, advised by
Kan Ren.
I also spent a wonderful year at UC Berkeley as an exchanged junior, where I worked as a research intern in Whitney's Lab.
News
[02/2025] Our paper DiffLens has been accepted by CVPR 2025!
[01/2025] Our paper Neuron Path has been accepted by ICLR 2025!
[08/2024] I will join Stony Brook University in 24Fall as a new PhD student!
[03/2024] Our paper EEGFormer has been accepted by AAAI 2024 SSS on Clinical FMs!
[10/2023] Our paper VEATIC Dataset has been accepted by WACV 2024!
[06/2023] I've completed the one-year exchange in UC Berkeley!
Vision Transformer models exhibit immense power yet remain opaque to human understanding, posing challenges and risks for practical applications.
While prior research has attempted to demystify these models through input attribution and neuron role analysis,
there's been a notable gap in considering layer-level information and the holistic path of information flow across layers.
In this paper, we investigate the significance of influential neuron paths within vision Transformers, which is a path of neurons from the model input to output that impacts the model inference most significantly.
We first propose a joint influence measure to assess the contribution of a set of neurons to the model outcome.
And we further provide a
layer-progressive neuron locating
approach that efficiently selects the most influential neuron at each layer trying to discover the crucial neuron path from input to output within the target model.
Our experiments demonstrate the superiority of our method finding the most influential neuron path along which the information flows, over the existing baseline solutions.
Additionally, the neuron paths have illustrated that vision Transformers exhibit some specific inner working mechanism for processing the visual information within the same image category.
We further analyze the key effects of these neurons on the image classification task, showcasing that the found neuron paths have already preserved the model capability on downstream tasks, which may also shed some lights on real-world applications like model pruning.
The project website including implementation code is available at https://foundation-model-research.github.io/NeuronPath/.
Self-supervised learning has emerged as a highly effective approach in the fields of
natural language processing and computer vision. It is also applicable to brain signals such as
electroencephalography (EEG) data, given the abundance of available unlabeled data that exist
in a wide spectrum of real-world medical applications ranging from seizure detection to wave analysis.
The existing works leveraging self-supervised learning on EEG modeling mainly focus on pretraining upon
each individual dataset corresponding to a single downstream task, which cannot leverage the power of abundant
data, and they may derive sub-optimal solutions with a lack of generalization. Moreover, these methods rely on
end-to-end model learning which is not easy for humans to understand. In this paper,
we present a novel EEG foundation model, namely EEGFormer, pretrained on large-scale compound EEG data.
The pretrained model cannot only learn universal representations on EEG signals with adaptable performance on
various downstream tasks but also provide interpretable outcomes of the useful patterns within the data.
To validate the effectiveness of our model, we extensively evaluate it on various downstream tasks and assess
the performance under different transfer settings. Furthermore, we demonstrate how the learned model exhibits
transferable anomaly detection performance and provides valuable interpretability of the acquired patterns
via self-supervised learning.
Human affect recognition has been a significant topic in psychophysics and computer vision.
However, the currently published datasets have many limitations. For example, most datasets contain frames
that contain only information about facial expressions. Due to the limitations of previous datasets,
it is very hard to either understand the mechanisms for affect recognition of humans or generalize well on
common cases for computer vision models trained on those datasets. In this work, we introduce a brand new
large dataset, the Video-based Emotion and Affect Tracking in Context Dataset (VEATIC), that can conquer
the limitations of the previous datasets. VEATIC has 124 video clips from Hollywood movies, documentaries,
and home videos with continuous valence and arousal ratings of each frame via real-time annotation.
Along with the dataset, we propose a new computer vision task to infer the affect of the selected character
via both context and character information in each video frame. Additionally, we propose a simple model to
benchmark this new computer vision task. We also compare the performance of the pretrained model using our
dataset with other similar datasets. Experiments show the competing results of our pretrained model via VEATIC,
indicating the generalizability of VEATIC.
Professional Activity
Conference Reviewer: MICCAI 2025, CVPR 2025
Journal Reviewer: Pattern Recognition, TMI, TNNLS
Teaching Assistant: CSE 549, IAE 101
Awards
[06/2024] I received the honor of being the Outstanding Graduate in ShanghaiTech.
[12/2023] I received the honor of being 2022-2023 Merit Student in ShanghaiTech.
[07/2023] I received the Undergraduate International Exchange Special Scholarship in ShanghaiTech.
[12/2022] I received the honor of being 2021-2022 Merit Student in ShanghaiTech.