Senior Data Engineer
Cobalt
Data Science
New York, NY, USA
USD 140k-180k / year + Equity
Location
New York City
Employment Type
Full time
Location Type
On-site
Department
Engineering
Compensation
- $140K – $180K • Offers Equity
Company Description
We have an internet for money - but still can't tell real companies apart from fake ones. Cobalt ID is building the business identity infrastructure for the financial internet. While others focus on consumers, we separate real companies from synthetic ones.
With AI accelerating fraud rings and shell companies at global scale, distinguishing a legitimate business from a sophisticated fraudster is now one of the hardest problems in fintech. We’re mapping 100M+ businesses and counting to expose hidden financial crime networks and ensure real businesses are never left out of the financial ecosystem.
Role Description
Our knowledge graph is built by fusing data from hundreds of messy, heterogeneous sources - some of which are exclusively ours to access.
As a Senior Data Engineer, you'll own the data layer that makes everything else possible. You'll build the ingestion pipelines, entity resolution systems, and data quality infrastructure that connects raw source data to a unified view of every entity in our graph in a manner that’s fast, accurate, and explainable for compliance.
The problems here are specific and mostly unsolved by the industry. Similar infrastructure powers leading social media platforms, search engines, and data fusion platforms, but hasn't yet been applied to this problem. If you're energized by turning chaos into structure at massive scale, this role is for you.
This is a full-time on-site role for a Senior Data Engineer located in New York, NY.
What you'll do:
Design and build production data pipelines that ingest, normalize, and link data from hundreds of heterogeneous sources
Build and maintain data quality infrastructure: monitoring, validation, deduplication, and freshness tracking across millions of data points
Develop the ingestion and processing layer for unstructured and semi-structured data, including document parsing and extraction from inconsistent sources
Instrument and monitor pipeline health, data coverage, and entity resolution accuracy as the system scales
Ship to production constantly - we're a small team and everything you build matters
Collaborate directly with founders and customers to shape what we build next
Base Qualifications
4+ years building production data pipelines and infrastructure (we care more about skill and impact than years alone)
Experience with large-scale data processing. You've built ETL/ELT systems that handled messy, real-world data at meaningful volume
Hands-on experience with entity resolution, record linkage, or data deduplication. You understand the algorithmic and practical challenges of matching records across noisy sources
Strong fundamentals in data modeling and pipeline orchestration
Comfort with ambiguity and fast iteration in an early-stage environment
You care about data quality as a first-class engineering problem, not an afterthought
You want to be close to the problem and the customer, not siloed from product decisions
Preferred Qualifications
Experience ingesting and normalizing data across unstructured / semi-structured sources
Background in knowledge graph construction, graph databases, or large-scale entity graph systems
Experience with NLP or LLM-based approaches to entity resolution or document extraction
Background in fraud detection, identity systems, ads ranking, recommendation systems, or other domains that require profiling and linking entities at scale
Familiarity with data infrastructure on cloud platforms at production scale
The Team:
We’re a small, tight-knit, and highly technical team. Our founders and early team members come from Waymo, Google, Meta, Brex, and Virtu Financial. We value technical depth and curiosity, low ego, and fast execution.
We’ve partnered with investors who understand the plumbing of the global financial system. We recently raised a round led by Nyca Partners, with participation from operators who built the modern fintech stack at Ramp, Plaid, and Brex.
Compensation Range: $140K - $180K