Daichi Yashima

Daichi Yashima

Ph.D. Student, Keio University
JSPS Research Fellow (DC1)

I am a Ph.D. student in Computer Science at Keio University, advised by Prof. Komei Sugiura. I am supported by the JSPS Research Fellowship for Young Scientists (DC1). I started my Ph.D. in April 2026 after completing the Master's program in one year.

My research focuses on foundation models and multimodal language understanding for embodied AI: building systems that can execute complex tasks in the physical world. I work on multimodal large language models, vision-language-action models, video understanding, and mobile manipulation.

News

Publications

2026

MLLM-as-a-Judge Exhibits Model Preference Bias
MLLM-as-a-Judge Exhibits Model Preference Bias
S. Koyama, Y. Wada, D. Yashima, and K. Sugiura
Preprint
ABMAMBA: Multimodal Large Language Model with Aligned Hierarchical Bidirectional Scan for Efficient Video Captioning
ABMAMBA: Multimodal Large Language Model with Aligned Hierarchical Bidirectional Scan for Efficient Video Captioning
D. Yashima, S. Kurita, Y. Oda, S. Suzuki, S. Otsuki, and K. Sugiura
ICPR 2026 (h5-index: 68)
HiFlow: Tokenization-Free Scale-Wise Autoregressive Policy Learning via Flow Matching
HiFlow: Tokenization-Free Scale-Wise Autoregressive Policy Learning via Flow Matching
D. Yashima, K. Seno, S. Kurita, Y. Oda, and K. Sugiura
Preprint
AnoleVLA: Lightweight Vision-Language-Action Model with Deep State Space Models for Mobile Manipulation
AnoleVLA: Lightweight Vision-Language-Action Model with Deep State Space Models for Mobile Manipulation
Y. Takagi, M. Kambara, D. Yashima, K. Seno, K. Tokura, and K. Sugiura
Preprint
ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding
ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding
D. Yashima, S. Kurita, Y. Oda, and K. Sugiura
CVPR 2026 (Acceptance Rate: 25.42%, h5-index: 450)
NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries
NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries
K. Amemiya, D. Yashima, K. Katsumata, T. Komatsu, R. Korekata, S. Otsuki, and K. Sugiura
CVPR 2026 Findings (Acceptance Rate (main + findings): 36%, h5-index: 450)

2025

AIRoA MoMa Dataset: A Large-Scale Hierarchical Dataset for Mobile Manipulation
AIRoA MoMa Dataset: A Large-Scale Hierarchical Dataset for Mobile Manipulation
R. Takanami, P. Khrapchenkov, S. Morikuni, J. Arima, Y. Takaba, S. Maeda, T. Okubo, G. Sano, S. Sekioka, A. Kadoya, M. Kambara, N. Nishiura, H. Suzuki, T. Yoshimoto, K. Sakamoto, S. Ono, H. Yang, D. Yashima, A. Horo, T. Motoda, K. Chiyoma, H. Ito, K. Fukuda, A. Goto, K. Morinaga, Y. Ikeda, R. Kawada, M. Yoshikawa, N. Kosuge, Y. Noguchi, K. Ota, T. Matsushima, Y. Iwasawa, Y. Matsuo, and T. Ogata
Preprint
Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning With Dense Labeling
Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning With Dense Labeling
D. Yashima, R. Korekata, and K. Sugiura
IEEE RA-L (IF: 5.2, h5-index: 132)
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
K. Katsumata, M. Kambara, D. Yashima, R. Korekata, and K. Sugiura
IEEE RA-L (IF: 5.2, h5-index: 132)

Experience

Research

Industry

Fellowships

Talks