Multimodal LLM Researcher (MLLM)

pika· 5 open roles

Location TBD On-siteFullTime1 months ago

Salaryest.

$150,000 - $250,000

Experience

Mid

Job Type

FullTime

Posted

1 months ago

Apply Now

Get more other jobs in your inbox

Verified daily — no ghost listings.

About This RoleAI processing…

Multimodal LLM Researcher (MLLM) About the Role At Pika, we are pioneering next-generation creative infrastructure built around real-time, multimodal generation and intelligent, agentic platforms. We are seeking accomplished Multimodal LLM Researchers (LLM, VLM, and Audio LM) to drive forward our mission to make agentic real-time generative technology accessible, dynamic, and transformative for millions of creators. As a core member of our research team, you will be integral to designing and building foundational technologies, developing novel approaches for large multimodal language models (L

Key Responsibilities

1
Lead and contribute to research efforts focused on real-time, multimodal generation—including text, image, video, and audio synthesis—as well as orchestration of agentic platform infrastructure
2
Design and prototype novel algorithms and architectures for high-fidelity, real-time multimodal synthesis and interactive experiences
3
Focus on real-time aspects of model inference and synthesis across modalities
4
Work on diffusion model distillation and/or develop diffusion-based world models for multimodal applications
5
Train and finetune autoregressive and diffusion models in LLM, VLM, or Audio LM contexts with a focus on real-time performance
6
Curate specific datasets, especially for video, audio, cross-modal, and sensory-rich data
7
Collaborate with cross-functional teams to bring research advancements into production-ready technologies
8
Publish work in top-tier conferences and journals; communicate research results internally and externally
9
Stay at the cutting edge of real-time multimodal generative AI and agentic orchestration

Requirements

5+ years of relevant experience, including research during graduate studies, in large language models, vision-language models, audio language models, deep learning, or related fields
Demonstrated impact as first author on major publications in top conferences or journals (e.g., NeurIPS, ICML, ICLR, frontier research background)
Deep expertise in at least one area: language modeling (LLM), vision-language modeling (VLM), or audio language modeling (Audio LM)
Strong experience with generative models, including autoregressive and diffusion models, and their real-time deployment
Hands-on experience curating, constructing, or augmenting large, high-quality multimodal datasets
Experience developing and deploying real-time systems and/or agentic orchestration infrastructure
Strong programming and prototyping skills (Python, PyTorch, TensorFlow, etc.)
Passion for building creative tools and platforms that empower users
Excellent communication and collaboration skills

Perks & Benefits

Competitive salary and substantial equity in a high-growth startup

Full health benefits + 401k matching and more

Collaborative, mission-driven team environment with major growth opportunities

Flexible on-site/remote hybrid (HQ in Palo Alto, CA)

Apply to This Job in Minutes

Generate ATS-optimized resume + cover letter + interview prep with Jobease.ca AI. Complete your application faster.

Get Started Free

Similar Jobs

Assistant Controller, Revenue

coinbaseRemote

View

Accountant, Capital Markets

coinbaseRemote

View

Concierge Specialist IV

coinbaseRemote

View

75% of AI Resumes Get Rejected

Beat the ATS with Jobease.ca's AI Resume Builder. Optimized for real hiring systems.

Build My Resume

Profile Match

Loading…

Checking your profile against this job…

Posted

1 months ago

Job Overview

Salary (est.)$150,000 - $250,000

Job TypeFullTime

Work ModelOn-site

ExperienceMid

LocationNot specified

Categoryother

Share This Job

Track All Your Applications

Never lose track again. Jobease.ca organizes every application, interview, and follow-up.

Organize My Search