Location: New York (In-person, Dumbo, Brooklyn)
Employment Type: Full-Time
Department: Engineering
Compensation: $180,000 – $280,000 + Equity
About Sunset
Sunset is building the data infrastructure layer for real-world AI training. The company partners with frontier AI labs to transform messy, multi-modal enterprise data into high-quality training datasets—sourced from hundreds of venture-backed startups that have gone through wind-down processes.
Backed by investors such as Floodgate, Afore Capital, and Hustle Fund, Sunset is a fast-growing, in-person team based in Dumbo, Brooklyn. The mission is ambitious: turn fragmented, real-world data into structured intelligence that powers the next generation of AI systems.
Job Overview
As a Data Engineer at Sunset, you will own the systems that convert raw, unstructured, and often chaotic enterprise data into structured, high-value training data. A core challenge of this role lies in entity resolution and de-identification across diverse data sources and formats.
You won’t just process data—you’ll reconstruct relationships, map complex entity linkages, and help model the structure of real-world business interactions hidden inside disconnected datasets.
What You’ll Work On
You’ll take full ownership of complex problems from day one. In your first 90 days, you may:
- Extend entity resolution systems to support new data types such as audio transcripts, design files, and embedded references in PDF documents
- Build coreference resolution across Slack messages, email threads, and project management tools (e.g., Linear) so references like “me,” “him,” and named entities resolve correctly
- Design de-identification systems that replace sensitive information (PII) with consistent pseudonyms while preserving relationships across datasets
- Develop scalable ingestion pipelines for unfamiliar and evolving data formats
- Tackle ambiguous data challenges where structure must be inferred, not provided
What We’re Looking For
- Strong product-minded engineer with experience building and shipping data pipelines at scale
- Advanced Python skills with familiarity in NER (Named Entity Recognition), record linkage, and coreference resolution
- Comfortable working in ambiguous environments without detailed specifications
- Someone who prefers end-to-end ownership over narrowly defined tasks
- Deep integration of AI tools into your workflow and problem-solving approach
This Role May Not Be a Fit If
- You prefer remote or hybrid work (this is fully in-person, 5 days/week in Brooklyn)
- You are primarily focused on theoretical or research-heavy ML work
- You prefer long planning cycles or narrowly scoped responsibilities
Tech Stack
Python, PostgreSQL, Redis, AWS
(Tools are selected based on problem fit rather than strict standardization.)
Compensation & Benefits
- $180K – $280K base salary + meaningful equity
- Fully covered medical, dental, and vision insurance
- Unlimited PTO
- $500 in-office setup stipend
Hiring Process
- Intro Chat (20 min): Mutual fit and expectations
- Technical Session (1 hour): Collaborative problem-solving exercise
- Onsite (2–3 hours): System design, product deep dive, and team interviews
- References → Offer
How to Apply?
If you are interested in this Job
CLICK HERE TO APPLY NOW
Join Our Job Update Communities
Get fast job alerts, remote opportunities & visa updates instantly.