Deeply understanding the problem is perhaps the most critical, yet underestimated, step in solving complex business challenges. Skipping this step is like trying to build a skyscraper on a shaky foundation—it might stand for a while, but cracks will inevitably appear. In the fast-paced world of tech, where new frameworks and shiny tools emerge daily, it's all too easy for teams to dive headfirst into the solution space, focusing on implementation before grasping the problem. But no matter how cutting-edge or "innovative" the solution may be, if it’s misaligned with the core issue, it’s doomed to fail.

This problem is especially prevalent in engineering teams. The allure of the latest tech stacks, microservices architectures, or DevOps tools can cloud judgment, making us eager to "solve" problems that haven't been fully defined. Early in my career, I often found myself rushing to use the hottest programming language or framework without ever stepping back to ask the crucial question: What problem are we actually trying to solve? Like many engineers, I was guilty of falling into the trap of letting the technology drive the solution, rather than allowing the problem to dictate the tools. Fortunately, I had mentors who continually reminded me to focus on the root cause, challenging me to look beyond the excitement of the tech and dig deeper into the real pain points.

So, what does it mean to truly understand the problem? It starts with asking simple but powerful questions: "What exactly are we solving for, and why is it a problem now?" It's not just about uncovering the symptoms; it's about drilling down into the underlying issues and understanding the business context. This isn’t a task reserved only for senior engineers or architects—everyone on the team should feel empowered to challenge assumptions and ensure the problem is fully decoded before jumping into code or solutions.

Let’s dive into a few real-world scenarios or problems to understand this better.

1: Improving the wrong system/sub-system

Imagine an online store where customers are experiencing timeouts when attempting to complete their purchases. The immediate assumption made by the team is that the database service, which has been slower than usual, is the root cause of the problem. Without properly diagnosing the issue, you instruct the database team to improve performance. After six weeks of work, the database team reports a 50% improvement in query response times. However, despite the optimization, the customers continue to experience timeouts. The problem persists because the database only contributed to 5% of the total latency, meaning that even after improvement, the real issue—network latency between the front-end and mid-tier services deployed in different geographic regions—remained unresolved.

This approach wasted valuable time because the team focused on a specific service without looking at the problem holistically.

The Correct Approach:
Instead of assuming the problem lies with a specific service, the correct approach would be to thoroughly analyze the entire call graph and measure latency across all services. Start with the front-end, move to mid-tier services, check the database layer, and if necessary, investigate storage and network layers. By examining the entire system, you would have quickly discovered that the majority of the latency was caused by high network delay between the front-end and the mid-tier aggregator service, which were deployed in different regions.

By identifying the root cause in the first iteration, the time and resources spent optimizing the database would have been avoided, and a solution addressing the actual issue could have been implemented.

Key Takeaway:
Jumping to conclusions without a complete understanding of the problem often leads to wasted time and resources. By investigating the entire system, you can identify the root cause and fix the issue faster.

2: Building the Wrong Product/Solution

Your team is excited about a new technology that promises to simplify application deployment to just a few clicks. Driven by enthusiasm for this breakthrough, your team quickly builds a product that automates deployments, reducing deployment times from minutes to seconds. However, when customers start using the product, they express dissatisfaction. The problem they’re facing isn’t in deployment times, but in their broken build pipelines. Developers are spending days fixing build issues before even thinking about deployment, making your new tool irrelevant to their actual needs.

This failure occurred because the team focused on building a solution to a problem that wasn’t critical to the customers. Reducing deployment times is meaningless when the real issue lies with the broken automation in the build process.

The Correct Approach:
To truly solve customer problems, the first step should always be to understand their real pain points. This can be done by working closely with a product manager or, better yet, conducting interviews with potential customers yourself. These interviews are crucial for gathering requirements and asking the right questions—though mastering this skill is a topic for another time.

In this case, for example, the real issue customers faced wasn’t slow deployment, but inefficient build pipelines and frequent automation failures. If the team had taken the time to dig deeper, they would have realized that improving and automating the build pipeline would have significantly reduced the time customers spend fixing errors. By focusing on the right problem, the team could have delivered a far more effective solution that genuinely improves the customer’s experience, making the product valuable and widely adopted.

Key Takeaway:
Focusing on the wrong problem can lead to building solutions that customers don’t need. Taking the time to deeply understand the customer’s real pain points ensures that the solution you deliver addresses the most critical issues.

3: Prioritizing features in wrong order

Imagine a company developing a cloud infrastructure platform for enterprises, where customers can deploy and manage applications across multiple cloud environments. The platform needs several key features: auto-scaling for handling traffic spikes, advanced security configurations, detailed performance monitoring, and a smooth user interface (UI) for developers.

The team is excited about the project and starts by focusing heavily on the UI and auto-scaling capabilities, reasoning that users would be drawn to the platform’s ease of use and performance during high traffic loads. After six months of development, they release the first version of the platform. However, customers quickly start reporting issues. They like the smooth UI and appreciate the auto-scaling, but they are hesitant to use the platform because it lacks the robust security and detailed monitoring features they need to comply with regulations and keep track of their applications’ performance.

Even though all the features (UI, auto-scaling, security, monitoring) are important, customers prioritized security and monitoring because they are critical to operating in a multi-cloud environment as well as get feedback on the use of the solution itself. Without these, the platform cannot support large-scale enterprise applications, regardless of how good the UI and auto-scaling are.

The Correct Approach:
Before jumping into development, the team should have prioritized gathering customer feedback to determine which features were truly essential for the platform’s success. There are a couple of effective ways to do this. First, conducting initial customer interviews, either by the team or with a product manager, often reveals key pain points and makes it relatively easy to identify customer priorities. Alternatively, a simpler approach could involve presenting a roadmap of feature delivery timelines to a subset of customers. While this method is helpful, it's less effective than direct interviews due to the longer feedback cycle and the risk of misaligned planning.

For example, in this case, either method would have made it clear that security and monitoring were top priorities, as enterprises rely heavily on these features for compliance and operational transparency. While a polished UI and auto-scaling are valuable, they would have been less critical in the initial version of the product. Had the team focused on delivering security and monitoring first, the platform would have been more attractive to enterprise customers from day one. They could have rolled out UI and auto-scaling enhancements in later versions, keeping customers engaged and improving the product over time.

Key Takeaway:
When building a solution that requires multiple features, it’s crucial to prioritize based on what’s most important to the customer at that moment. Focusing on the wrong features first can delay the product’s success, even if the overall solution is well-designed. Prioritizing critical features ensures early adoption and provides a strong foundation for future improvements.

Get the Problem Right, the Solution Will Follow

1: Improving the wrong system/sub-system

2: Building the Wrong Product/Solution

3: Prioritizing features in wrong order

Summary

Get the Problem Right, the Solution Will Follow

1: Improving the wrong system/sub-system

2: Building the Wrong Product/Solution

3: Prioritizing features in wrong order

Summary

Towards An Asynchronous Architecture