Paper Title Number 4
Published in GitHub Journal of Bugs, 2024
This paper is about fixing template issue #693.
Recommended citation: Your Name, You. (2024). "Paper Title Number 3." GitHub Journal of Bugs. 1(3).
Download Paper
Published in GitHub Journal of Bugs, 2024
This paper is about fixing template issue #693.
Recommended citation: Your Name, You. (2024). "Paper Title Number 3." GitHub Journal of Bugs. 1(3).
Download Paper
Published in IEEE RA-L, 2025
In this study we propose a novel training method that leverages both learning-based and n-gram based automatic evaluation metrics as rewards to generate free-form mobile manipulation instructions.
Recommended citation: K. Katsumata, M. Kambara, D. Yashima, R. Korekata, and K. Sugiura, "Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement", IEEE RA-L, vol. 10, no. 3, pp. 3022–3029, 2025.
Download Paper
Published in IEEE RA-L, 2025
In this study we propose RelaX-Former, a method that leverages unlabeled positive labels and introduces a double relaxed contrastive learning approach to handle unlabeled positive and negative samples, improving the alignment between images and text.
Recommended citation: D. Yashima, R. Korekata, and K. Sugiura, "Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning With Dense Labeling", IEEE RA-L, vol. 10, no. 2, pp. 1728–1735, 2025.
Download Paper
Published in arXiv, 2025
We present the AIRoA MoMa Dataset, a large-scale hierarchical dataset designed to advance research in mobile manipulation within indoor environments.
Recommended citation: R. Takanami, P. Khrapchenkov, S. Morikuni, J. Arima, Y. Takaba, S. Maeda, T. Okubo, G. Sano, S. Sekioka, A. Kadoya, M. Kambara, N. Nishiura, H. Suzuki, T. Yoshimoto, K. Sakamoto, S. Ono, H. Yang, D. Yashima, A. Horo, T. Motoda, K. Chiyoma, H. Ito, K. Fukuda, A. Goto, K. Morinaga, Y. Ikeda, R. Kawada, M. Yoshikawa, N. Kosuge, Y. Noguchi, K. Ota, T. Matsushima, Y. Iwasawa, Y. Matsuo, and T. Ogata (2025). ``AIRoA MoMa Dataset: A Large-Scale Hierarchical Dataset for Mobile Manipulation', arXiv preprint arXiv:2509.25032.
Download Paper
Published in CVPR 2026 Findings, 2026
We propose NaiLIA, a multimodal retrieval method for nail design images that comprehensively aligns with dense intent descriptions and palette queries.
Recommended citation: K. Amemiya, D. Yashima, K. Katsumata, T. Komatsu, R. Korekata, S. Otsuki, and K. Sugiura, "NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries", CVPR Findings, 2026.
Download Paper
Published in CVPR 2026, 2026
We propose ReMoRa, a video MLLM that processes videos by operating directly on their compressed representations, using sparse RGB keyframes for appearance and a refined motion representation for temporal dynamics.
Recommended citation: D. Yashima, S. Kurita, Y. Oda, and K. Sugiura, "ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding", CVPR, 2026.
Download Paper
Published in arXiv, 2026
We propose AnoleVLA, a lightweight VLA that uses a deep state space model to process multimodal sequences efficiently, outperforming a representative large-scale VLA by 21 points in task success rate while achieving approximately three times faster inference.
Recommended citation: Y. Takagi, M. Kambara, D. Yashima, K. Seno, K. Tokura, and K. Sugiura, "AnoleVLA: Lightweight Vision-Language-Action Model with Deep State Space Models for Mobile Manipulation", arXiv preprint arXiv:2603.15046, 2026.
Download Paper
Published in arXiv, 2026
We propose HiFlow, a tokenization-free coarse-to-fine autoregressive policy that operates directly on raw continuous actions via flow matching, eliminating the need for discrete action tokenizers.
Recommended citation: D. Yashima, K. Seno, S. Kurita, Y. Oda, and K. Sugiura, "HiFlow: Tokenization-Free Scale-Wise Autoregressive Policy Learning via Flow Matching", arXiv, 2026.
Download Paper
Published in ICPR 2026, 2026
We propose ABMamba, a fully open MLLM based on Deep State Space Models with linear computational complexity that enables scalable video captioning, achieving competitive performance with approximately three times higher throughput.
Recommended citation: D. Yashima, S. Kurita, Y. Oda, S. Suzuki, S. Otsuki, and K. Sugiura, "ABMAMBA: Multimodal Large Language Model with Aligned Hierarchical Bidirectional Scan for Efficient Video Captioning", ICPR, 2026.
Download Paper
Published in arXiv, 2026
We investigate whether MLLMs used as automatic judges exhibit bias toward outputs from specific models, finding self-preference and family-level preference biases, and propose an ensemble method Pomms that reduces this bias while maintaining evaluation quality.
Recommended citation: S. Koyama, Y. Wada, D. Yashima, and K. Sugiura, "MLLM-as-a-Judge Exhibits Model Preference Bias", arXiv, 2026.
Download Paper