Ph.D. · Project Researcher, National Institute of Informatics (NII), Tokyo
Multimodal AI Safety · Multilingual LLM Evaluation · Vision-Language Models
"To build AI that is powerful, responsible, and accessible."
I am a Ph.D.-trained researcher at the National Institute of Informatics (NII), Tokyo, working on multimodal AI safety — specifically how frontier vision-language models (VLMs) behave when safety matters, across languages, modalities, and cultures.
My core project is extending DeepMind's Multimodal Safety Test Suite (MSTS) to Japanese as the 12th language. This is not a translation task — it is a full cross-linguistic safety evaluation involving human-translated and human-edited prompts, model evaluations across both multimodal (image + text) and text-only conditions, and a rigorous human annotation campaign (1,700+ safety judgments). I evaluate five frontier VLMs — including GPT-5, Gemini-2.5-Flash, Qwen2.5-VL, InternVL2.5, and LLM-JP-3-VILA — and find that Japanese inputs consistently produce higher violation rates than English (+10 to +46 percentage points depending on model and condition). This reveals a systematic safety gap that text-only evaluation misses entirely.
On evaluation methodology, I build and validate automated LLM-as-Judge pipelines, comparing them against human judgments through correlation analysis and ablation studies across modality and language conditions. A key finding: text-only reference answers actively hurt automated evaluation when applied to multimodal prompts — because models appropriately incorporate visual context that text-only references don't anticipate. This has direct implications for how we design reward signals for safer model alignment.
Beyond safety evaluation, I contribute to an end-to-end agent evaluation framework for Japanese, localizing benchmarks like OSWorld to assess whether LLM-powered agents can complete real-world tasks — not just call tools. I also study agentic architectures empirically: my ICAART 2026 paper shows that well-designed single-agent systems significantly outperform multi-agent decomposition for vision-based reasoning tasks.
I serve as Session Chair at AAAI and ICAART, Women in AI Myanmar Board Member, and GDG Tokyo volunteer. Trilingual: English · Japanese (JLPT N3) · Burmese.
Full list: 4 journal papers (Scopus-indexed) · 12 conference papers · Google Scholar ↗
I believe access to AI education should not depend on geography, gender, or privilege. Since 2022, I have served as an Ambassador — and now Board Member — of Women in AI Myanmar, helping lead national programs that open doors for students exploring AI for the first time, with a focus on women and underrepresented communities.
In 2025 alone, these programs reached over 900 participants across three initiatives:
As a trilingual communicator (English · Japanese · Burmese), I also publish AI articles on Medium and speak at events across global communities — translating cutting-edge research into accessible insights for audiences encountering these ideas for the first time.
Open to research collaborations, applied scientist roles, and conversations about multilingual AI safety in the APAC region.