Staff/Senior Applied Data Scientist - Research

HG Insights

Full Timestaff Remote

Pune, Maharashtra, India; Remote (US)RemotePosted 5 weeks ago

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonGoSQLGitPandasNumPyAPISaaSB2B

Job Description

<h3><strong>About HG Insights</strong></h3> <p>HG Insights is the pioneer of Revenue Growth Intelligence. For more than a decade, we have delivered comprehensive, AI-driven datasets on B2B buyers, technology adoption, IT spend, and buyer intent, sourced from billions of data points. Today, we are a trusted partner to Fortune 500 technology companies, hyperscalers, and innovative B2B vendors seeking precise go-to-market analytics and decision-making.</p> <p>Through an evolving suite of AI agents that incorporate first-party data and buyer signals, HG Insights enables AI-powered GTM automation across sales, marketing, RevOps, and data analytics teams, modernizing GTM execution from strategy through activation.</p> <h3><strong>Role Overview</strong></h3> <p>The Staff/ Senior Applied Data Scientist - Research is a collaborative analytical partner to the Head of Data Science, contributing to the design and validation of GTM insights that power the Contextual Intelligence initiative.</p> <p>You will co-develop insight logic, selecting signals, designing scoring frameworks, prototyping models in Python, and validating outputs. You will also contribute to the production-ready briefs that are implemented in the data production pipeline by the engineering team.</p> <p>This role sits at the intersection of statistical modeling, structured data analysis, and applied AI. You are comfortable reasoning about how to measure something rigorously, how entities and relationships in a knowledge graph can be leveraged, and how to use LLMs as a practical tool in the insight development workflow, not as a subject of research, but as part of the toolkit.</p> <h3><strong>What You Will Do</strong></h3> <h3><strong>Insight & Model Development</strong></h3> <ul> <li>Co-develop scoring frameworks and metrics models, contributing to signal selection, weighting logic, and model structure across a range of GTM insight types (acquisition,expansion, retention, strategic)</li> <li>Prototype insight logic in Python notebooks: assembling features from HG's structured data assets, implementing model components, and stress-testing outputs.</li> <li>Design and run validation experiments to confirm that insight outputs are directionally correct, well-calibrated, and meaningful across the full vendor universe</li> <li>Contribute to ontology and entity design, thinking through how vendors, products, companies, and relationships should be structured to support a given insight, informed by a conceptual understanding of the knowledge graph schema</li> </ul> <h3><strong>Production Brief Development</strong></h3> <ul> <li>Translate insight designs into clear, implementation-ready production briefs </li> <li>Document model specifications precisely: component definitions, feature engineering, aggregation logic, edge case handling, and expected output distributions</li> <li>Participate in handoff reviews with the production function, answering implementation questions and refining specs based on feasibility feedback</li> </ul> <h3><strong>Insight Research & Discovery</strong></h3> <ul> <li>Contribute to the prioritized insights catalog, researching new insight ideas, assessing data availability, and framing feasibility</li> <li>Stay current on GTM data science approaches, competitive intelligence methodologies, and relevant analytical techniques that could expand the insight library</li> </ul> <h3><strong>What We're Looking For</strong></h3> <h3><strong>Core Skills</strong></h3> <ul> <li>Statistical modeling depth: Ability to design and implement a range of scoring and metrics models from first principles; comfortable with component weighting, normalization, signed rate-of-change metrics, composite aggregation, and distribution analysis; knows when a technique is appropriate and why </li> <li>Python for analytical prototyping: Strong notebook-based Python for data manipulation, feature construction, model prototyping, and output validation; pandas, NumPy, and Scikit are daily </li> <li>SQL: Proficient in querying structured data at scale; used for signal extraction, feature derivation, and validation checks across large vendor and company datasets</li> <li>Analytical rigor & validation thinking: Ability to critically evaluate whether a model is measuring what it claims to measure; designs validation experiments, checks edge cases, and flags when outputs don't pass a sanity check</li> <li>Clear technical communication: Able to translate analytical logic into precise written specifications; the production brief is a key deliverable </li> </ul> <h3><strong>Applied AI & Graph Literacy</strong></h3> <ul> <li>LLM API usage: Hands-on experience using Claude, GPT, or equivalent APIs as a practical tool; can design effective prompts, integrate LLM steps into an analytical workflow, and evaluate output quality critically</li> <li>Knowledge graph concepts: Conceptual understanding of how entities, relationships, and properties are structured in a graph; able to reason about how graph-derived features (e.g., vendor-product-company traversals) should inform insight design, without necessarily writing production Cypher </li> </ul> <h3><strong>Nice to Have</strong></h3> <ul> <li>GTM/Management Consulting, or IT Research experience, familiarity with concepts like install base, intent signals, competitive intelligence, and market analysis. Experience writing Cypher or querying graph-structured data directly</li> <li>Experience working collaboratively with engineering, product and GTM teams</li> <li>Experience in a B2B SaaS or data products environment</li> </ul> <h3><strong>Tools & Environment</strong></h3> <h3><strong>Primary</strong></h3> <ul> <li>Python (pandas, NumPy, scipy, Jupyter)</li> <li>SQL</li> <li>LLM APIs (Claude, GPT)</li> <li>Git and version control</li> </ul> <h3><strong>Working Knowledge</strong></h3> <ul> <li>Databricks</li> <li>Cloud storage </li> <li>Knowledge graph concepts </li> </ul>