Open Science Community

Cohere Labs Community Blog

Research notes, technical essays, and personal stories from the Cohere Labs Community

Research notes, stories, and ideas from people shaping AI together—transparent, collaborative, and community-led.

Read the latest Join the community

Latest posts

When AI Doctors "See" What Isn't There: Why Better Accuracy Doesn't Mean Better Vision

RLVR fine-tuning raises accuracy on medical VQA benchmarks while quietly degrading visual grounding: a new counterfactual evaluation framework identify the gap.

June 19, 2026 Anas Zafar 8 min read

community research medical-ai visual-grounding multimodal open-science
A Community That Met Me Halfway

The people, projects, and conversations that turned a moment of change into a community I could give back to.

June 15, 2026 Ankita Maity 6 min read

community research cultural-benchmark multilingual-evaluation open-science
Talking to a 4-Year-Old: A Multilingual Benchmark for Children's AI Companions

A 2,312-prompt, 23-language benchmark for child–AI conversations that evaluates four production models and validates the LLM-as-judge pipeline with five independent judges (Cohen's κ up to 0.71).

June 04, 2026 Batuhan Aktas 10 min read

community research multilingual-evaluation child-safety-benchmark open-science
Mix, Fine-Tune, Break: What Happens When You Stress-Test a Multilingual Model's Safety

What happens to a multilingual model's safety guardrails when you fine-tune it on harmful data and probe it with code-mixed inputs, and why current binary benchmarks can't tell you.

May 25, 2026 Tanav Singh Bajaj 9 min read

community research multilingual-evaluation cultural-benchmark open-science
From Showing Up to Leading: Three Years Inside Cohere Labs Community

A community lead reflects on three years of learning, research, and building programs inside the Cohere Labs Open Science Community.

May 19, 2026 Ahmad Anis 6 min read

community open-science research leadership

Cohere Labs Community Blog

Latest posts

When AI Doctors "See" What Isn't There: Why Better Accuracy Doesn't Mean Better Vision

A Community That Met Me Halfway

Talking to a 4-Year-Old: A Multilingual Benchmark for Children's AI Companions

Mix, Fine-Tune, Break: What Happens When You Stress-Test a Multilingual Model's Safety

From Showing Up to Leading: Three Years Inside Cohere Labs Community