Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Daichi Yashima

Daichi Yashima is a robotics researcher at Keio University focused on foundation models, multimodal language understanding, and embodied AI.

Jupyter notebook markdown generator

Posts

portfolio

publications

Paper Title Number 4

Published in GitHub Journal of Bugs, 2024

This paper is about fixing template issue #693.

Recommended citation: Your Name, You. (2024). "Paper Title Number 3." GitHub Journal of Bugs. 1(3).
Download Paper

Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement

Published in IEEE RA-L, 2025

In this study we propose a novel training method that leverages both learning-based and n-gram based automatic evaluation metrics as rewards to generate free-form mobile manipulation instructions.

Recommended citation: K. Katsumata, M. Kambara, D. Yashima, R. Korekata, and K. Sugiura, "Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement", IEEE RA-L, vol. 10, no. 3, pp. 3022–3029, 2025.
Download Paper

Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning With Dense Labeling

Published in IEEE RA-L, 2025

In this study we propose RelaX-Former, a method that leverages unlabeled positive labels and introduces a double relaxed contrastive learning approach to handle unlabeled positive and negative samples, improving the alignment between images and text.

Recommended citation: D. Yashima, R. Korekata, and K. Sugiura, "Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning With Dense Labeling", IEEE RA-L, vol. 10, no. 2, pp. 1728–1735, 2025.
Download Paper

AIRoA MoMa Dataset: A Large-Scale Hierarchical Dataset for Mobile Manipulation

Published in arXiv, 2025

We present the AIRoA MoMa Dataset, a large-scale hierarchical dataset designed to advance research in mobile manipulation within indoor environments.

Recommended citation: R. Takanami, P. Khrapchenkov, S. Morikuni, J. Arima, Y. Takaba, S. Maeda, T. Okubo, G. Sano, S. Sekioka, A. Kadoya, M. Kambara, N. Nishiura, H. Suzuki, T. Yoshimoto, K. Sakamoto, S. Ono, H. Yang, D. Yashima, A. Horo, T. Motoda, K. Chiyoma, H. Ito, K. Fukuda, A. Goto, K. Morinaga, Y. Ikeda, R. Kawada, M. Yoshikawa, N. Kosuge, Y. Noguchi, K. Ota, T. Matsushima, Y. Iwasawa, Y. Matsuo, and T. Ogata (2025). ``AIRoA MoMa Dataset: A Large-Scale Hierarchical Dataset for Mobile Manipulation', arXiv preprint arXiv:2509.25032.
Download Paper

NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries

Published in CVPR 2026 Findings, 2026

We propose NaiLIA, a multimodal retrieval method for nail design images that comprehensively aligns with dense intent descriptions and palette queries.

Recommended citation: K. Amemiya, D. Yashima, K. Katsumata, T. Komatsu, R. Korekata, S. Otsuki, and K. Sugiura, "NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries", CVPR Findings, 2026.

ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding

Published in CVPR 2026, 2026

We propose ReMoRa, a video MLLM that processes videos by operating directly on their compressed representations, using sparse RGB keyframes for appearance and a refined motion representation for temporal dynamics.

Recommended citation: D. Yashima, S. Kurita, Y. Oda, and K. Sugiura, "ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding", CVPR, 2026.
Download Paper

Daichi Yashima

Sitemap

Pages

Page Not Found

Daichi Yashima

Archive Layout with Content

Posts by Category

Posts by Collection

CV

CV

Markdown

Page not in menu

Page Archive

Portfolio

Publications

Sitemap

Posts by Tags

Talk map

Talks and presentations

Teaching

Terms and Privacy Policy

Blog posts

Jupyter notebook markdown generator

Posts

portfolio

publications

Paper Title Number 4

Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement

Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning With Dense Labeling

AIRoA MoMa Dataset: A Large-Scale Hierarchical Dataset for Mobile Manipulation

NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries

ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding

talks

teaching