The Stahlman Method: A Guide to Connecting Macro and Micro in Data Analysis

The Stahlman Method: A Guide to Connecting Macro and Micro in Data Analysis

2025-04-21
RMRichard Makara

The Stahlman Method: A Guide to Connecting Macro and Micro in Data Analysis

Meet Roger: The Human Data Catalog

Roger is a Senior Quantitative Analyst who describes himself as having "a natural inclination for sensing and understanding how different things are connected to each other." His professional journey has taken him from reporting to building models, managing teams of experts, and now to his current position where he's responsible for stress testing strategy and providing actionable insights to senior management.

He has been fascinated by figuring out how things work since childhood, when he "fiercely played with Legos." Over time, this evolved into a deep interest in numbers, math, and statistics.

We've chatted lots of times before, so I'm excited to showcase his thoughts and tips on the topic of data analysis.

You can skip to the step-by-step guide here or read the full interview for context.

The Importance of Context in Data Analysis

If I do not understand the context, the reason or the mechanism behind something, then the outcome is more likely to be unhappy for all parties involved.

For him, the most important thing in data analysis isn't technical skills or fancy tools—it's context. He's adamant that understanding what's around data is the crucial difference between successful and unsuccessful analytical projects.

Throughout his career, he has observed a consistent pattern: projects where he truly understands the underlying mechanisms succeed, while those lacking this understanding typically fail. This has shaped his entire approach to tackling new analytical challenges.

Starting with the Macro Picture

It's about bringing the macro picture and the micro picture together, having a good overview of what we are doing, having an understanding or atleast a desire to figure out the truth that the data is telling you.

When I asked how he approaches data problems, the answer emphasized beginning with the macro picture.

I.e the big "why" behind what you're trying to solve.

He shared an example from his work analyzing the financials of publicly-listed companies. Initially, he faced a huge table of seemingly random figures that made it nearly impossible to make useful investment decisions.

His breakthrough came when he started to understand what actually drives value in different types of companies. Modern tech companies often expense their investments rather than capitalizing them, which renders traditional profitability metrics less effective.

This realization allowed him to take a step back and break down the whole chain into more manageable links. Instead of applying a one-size-fits-all approach, he developed a framework that recognized different stages of company growth and identified the relevant indicators for each stage. For mature companies, he looked for certain metrics; for growth-phase companies, he searched for prerequisites to free cash flow; and for early-stage companies, he needed entirely different signals.

Finding the Starting Point

When we have puzzle pieces in front of us, we need to understand what we need to achieve. No, these puzzle pieces are not meant for eating and they are not meant for throwing across the room. Instead, we want to put the pieces together to get a picture. Now we know what we are after.

When I asked how to find a starting point when you don't have a clear end picture, Roger responded with a puzzle metaphor that perfectly captures his philosophy.

Sometimes in data analysis, you have the equivalent of the box with the picture on it—a clear vision of what you're trying to create. But more often, you don't have that box at all. You have to start making sense of the pieces themselves, generating a rough idea of what the complete picture might be.

He describes this as an exploratory phase, one that requires creativity and experimentation. The senior analyst believes strongly that we need to push back against the common misconception that data analysis is an exact science with predetermined steps and timelines. In his view, the first part of any data project is inherently creative—you're discovering what's possible with the materials you have, and the first approach rarely works out perfectly.

Managing the Details

You kind of have to somehow group those details together because some of those details are going to be relevant, some are not going to be relevant at all.

Once you understand the macro picture, you face a new challenge: too many details. The key is sorting through them effectively to find what truly matters.

Roger compares this process to principal component analysis, noting that attributes that initially seem important might turn out to have little variance or impact when you examine the actual data. By grouping related details and filtering out the noise, he works toward forming testable hypotheses.

When modeling credit risk factors like loss given default (LGD), he develops hypotheses about what might affect outcomes—such as whether customers with collateral have different loss patterns than those without. These hypotheses emerge from understanding the basic process flow: selling assets, looking at guarantees, considering other enforcement processes.

Despite this methodical approach, the analyst acknowledges the inherent uncertainty at this stage. Not all hypotheses will be supported by data, and sometimes there simply aren't enough observations to draw meaningful conclusions. Accepting this ambiguity is part of the process.

Improving Your Chances of Success

Speaking with people and making notes on the information they provide is super important.

When I asked what improves chances of success in the detailed phase of analysis, my interviewee didn't hesitate:

  1. conversations with relevant experts
  2. and good note-taking are essential.

His approach begins with friendly conversations where he can ask basic questions without judgment. These initial discussions help establish a baseline understanding of the problem domain. From there, he methodically gathers details, confirming what's clear and seeking explanations for what's fuzzy.

After several such interviews, a solid understanding emerges of what's being modeled, the processes behind it, and which factors are likely most important. Throughout this process, it's crucial to map out not just what's known, but also what remains uncertain—creating space for the data to eventually confirm or refute each potential factor.

Moving to Implementation

Well, once we have a pretty good idea what we are trying to model, at the same time we start to look for what kind of data do we have available.

The transition from exploration to implementation happens when you have both a solid understanding of the problem and a good evaluation of available data. These two aspects must align for the project to move forward effectively.

Roger recognizes that theoretical importance doesn't always translate to practical modeling—if the data doesn't exist to capture a supposedly critical factor, that factor simply can't be included in the model, regardless of expert opinions.

When I suggested that available data seems to be the bounding box while business problem exploration remains unbounded, he agreed with this characterization. While the context and details are boundless in theory, practical constraints always emerge:

  • data availability,
  • time and resource limitations,
  • technical system requirements, and
  • the specific application context all set boundaries on what's possible.

Some models fit neatly within these constraints, while others require compromises that might undermine their effectiveness.

Understanding these boundaries helps develop feasible solutions rather than theoretically perfect but impractical ones.

Working with Stakeholders

I listen to them, I'm conscious of them, but I take some of what they say with a grain of salt as quite often data can overrule their perception.

The quantitative analyst's approach to stakeholders is refreshingly candid. He acknowledges stakeholder input but doesn't always prioritize their opinions, especially when the data tells a different story.

He referenced the concept of "directors deciding the color of the bike shed"—a phenomenon where stakeholders fixate on minor details while missing the big picture. In his experience, stakeholders can get stuck on irrelevant aspects of a project while overlooking the more important elements.

But then again, stakeholders can sometimes help us see some crucial aspect we missed, like how to communicate the changes effectively etc.

Despite this skepticism, Roger believes strongly in getting early feedback.

His team follows the maxim of "fail quickly"—presenting initial findings to stakeholders as soon as possible to avoid wasting time on approaches that aren't working.

These early presentations often yield useful feedback on values or features that help refine the approach, even while maintaining a data-driven perspective on the core analysis.

The Refinement Process

I would say it's mostly the code itself.

When I asked what gets refined during implementation, the answer focused on the code or model itself, as the data understanding should be solid by this point in the process.

By the time a team presents anything to stakeholders, they've already had extensive conversations with data experts—both those who understand the business context and those who manage the technical ETL processes. This groundwork ensures that the data foundation is trustworthy before building models on top of it.

Business stakeholders are often less familiar with the data specifics, which is why they aren't typically consulted on data-related issues. Instead, stakeholder interactions focus on the business implications and applications of findings.

Knowing When You're Done

When I feel that this is good enough is when I can explain how it works and I can explain how it links to reality.

Several key indicators tell an analyst when an analysis is complete. First and foremost is the ability to explain the model clearly and connect it to real-world mechanisms—when the model genuinely mirrors the processes it's meant to represent.

Technical accuracy metrics also matter. Large error boundaries create uneasiness, while narrower boundaries that align with expectations from experience provide confidence. Additionally, tracking whether you've reasonably covered your initial hypotheses is important, acknowledging that some will work and others won't.

External factors inevitably influence the completion decision as well. Time and resource pressures often force a practical endpoint, and diminishing returns set in after a certain point—more time doesn't necessarily create a substantially better model.

Connecting to Reality

You have to explain it to the regulators and auditors and everybody. And it has to make sense to them.

A model must connect to reality in an explainable way. Roger shared an example from his work on a probability of default (PD) model that illustrates this principle.

Initially, his team struggled to find economic drivers that produced the expected patterns in their model. After extensive testing, they discovered a combination that worked effectively: one driver that spiked first during economic downturns, and another that kept probabilities elevated before gradually declining.

This approach created not just a mathematically sound model, but one with a sensible economic storyline that could be explained to stakeholders, management, regulators, and auditors. He emphasizes that complexity isn't a virtue if it prevents understanding—a model that's too complicated will generate confusion and "stupid questions" rather than confidence.

The final model showed good accuracy (better than their previous model) and mimicked observed behavior. Understanding both its strengths (when the model works exceptionally well) and weaknesses (when the model can underperform) helped determine that these trade-offs were acceptable given their goals and constraints.

Advice for Analysts

Be curious and be relentless about understanding what is the mechanism behind things, how stuff works.

As our conversation drew to a close, I asked for final advice to analysts. The first and most emphatic recommendation was to cultivate relentless curiosity about underlying mechanisms—understanding how things truly work provides powerful analytical tools.

My interviewee also urged analysts to be open-minded about methods, cautioning against the tendency to apply familiar techniques to every problem.

"If you learned regression and some basic methods at school," he noted, "not every problem is a nail because regression is a hammer."

Looking beyond one's immediate discipline for solutions can be valuable—he once found a perfect modeling approach for a credit risk problem in actuarial science.

Another key piece of advice was to embrace the Pareto principle—recognize when pursuing perfection leads to burnout rather than substantive improvements. Defining both minimum viable product standards and best-in-class visions for every project helps teams identify where on this spectrum to aim given their constraints.

He illustrated this concept with LGD models:

  • a basic version might apply a single fixed value to every loan,
  • while the best-in-class version would account for specific customer information to make tailored predictions.

Most projects land somewhere between these extremes, and understanding this spectrum helps analysts set realistic goals.

The Stahlman Method: A Step-by-Step Guide

After distilling the wisdom from my conversation with Mr. Stahlman, I've created this practical guide based on his methodology. Whether you're analyzing financial data, building predictive models, or tackling any complex analytical problem, these steps can help you navigate from confusion to clarity.

Step 1: Understand the Macro Context

  • Identify the mechanism behind the problem: Ask "How does this process actually work in reality?"
  • Interview domain experts: Schedule conversations with friendly faces where you can ask "dumb questions"
  • Map out the process: Create a flowchart or written description of how the system functions
  • Define what you're looking for: Clarify what success would look like in concrete terms

Step 2: Break Down the Problem

  • Create a logical chain: Work backward from your goal to identify prerequisites
  • Draw boundaries: Determine what's in scope and what's out of scope
  • Accept ambiguity: Recognize that you don't have the full picture yet, and that's okay
  • Think like a puzzle solver: Without the box picture, start organizing pieces that seem related

Step 3: Collect and Organize Details

  • Group related information: Cluster details that seem connected
  • Filter for relevance: Identify which variables or factors might matter most
  • Document everything: Take thorough notes during all conversations and discovery sessions
  • Map knowledge gaps: Explicitly list what you don't know yet but need to find out

Step 4: Form Testable Hypotheses

  • Create clear statements: "X might influence Y because of Z mechanism"
  • Prioritize hypotheses: Focus on those most likely to have significant impact
  • Consider alternatives: Develop multiple explanations for the patterns you observe
  • Keep your list visible: Refer back to your hypotheses throughout the project

Step 5: Evaluate Available Data

  • Inventory your data assets: What information do you have access to?
  • Identify gaps: What critical information is missing?
  • Assess data quality: How reliable and complete is your data?
  • Determine feasibility: Given data constraints, which hypotheses can you actually test?

Step 6: Get Early Feedback

  • Present initial findings quickly: Don't wait for perfection
  • Focus on the storyline: Ensure stakeholders understand the underlying logic
  • Listen selectively: Consider feedback on approach, but let data guide your conclusions
  • Identify refinement areas: What aspects need more work based on feedback?

Step 7: Refine Your Approach

  • Improve your code/model: Focus on technical implementation
  • Test against reality: Does the model behavior match what happens in the real world?
  • Check accuracy metrics: Are error boundaries reasonably narrow?
  • Revisit hypotheses: Have you addressed most of your initial questions?

Step 8: Define Completion Criteria

  • Set MVP standards: What's the minimum acceptable result?
  • Envision best-in-class: What would ideal look like with unlimited resources?
  • Apply the Pareto principle: Focus on the 20% of work that delivers 80% of value
  • Balance time constraints: Recognize when additional effort yields diminishing returns

Step 9: Create an Explainable Story

  • Connect to the real world: Show how your model mimics actual mechanisms
  • Simplify without distorting: Make it understandable without losing accuracy
  • Acknowledge limitations: Be transparent about what your analysis can and cannot do
  • Highlight key insights: What valuable truths has your analysis revealed?

Conclusion

Roger's approach to data analysis combines big-picture thinking with meticulous attention to detail. Throughout our conversation, he emphasized that success comes from truly understanding both the macro context and the micro details of your data.

His methodology—start with the big picture, define the puzzle, group details into testable hypotheses, validate with data, and ensure connections to reality—offers a roadmap for analysts facing complex problems with ambiguous solutions.

As he puts it, data analysis requires both understanding and a desire to "figure out the truth that the data is telling you," while remaining aware that "data has its own problems and it depends on the context."

Give Roger a follow on LinkedIn where he occasionally shared wisdom nuggets from his journey. đź«¶