Ryan McCorvie On Why the Next Generation of AI Startups Will Be Built Around Data, Not Models


brain

Early artificial intelligence startups were often defined by their models. Progress depended on training larger systems, improving architecture, and increasing the computing power available for experimentation. Companies that could build stronger models gained clear advantages in capability and performance.

That environment has changed. High-performance AI models are now widely accessible through software platforms and open development communities. Startups no longer need to train complex systems from the ground up in order to build useful products.

Berkeley-area statistician and data scientist Ryan McCorvie has spent much of his career studying how organizations interpret large datasets and apply statistical models to real-world decision making. His work reflects a broader reality in artificial intelligence: the performance of a system often depends less on the algorithm itself and more on how the surrounding data is gathered and structured.

“Access to strong models has become much easier,” notes McCorvie. “As a result, the model itself is less likely to be the primary source of differentiation between competing companies.” Multiple startups can rely on similar underlying technology while building products that behave very differently.

What increasingly separates successful AI companies from the rest is the data that shapes how those systems behave. Information gathered through product usage, workflows, and user interactions determines how effectively an AI system can respond to real problems. Companies that organize and refine that information well will have an advantage that compounds over time.

The next generation of AI startups will therefore compete less on algorithm design and more on the quality of the data environments they create around their products.

Models Are Rapidly Becoming Infrastructure

Training a sophisticated AI model once required large research teams, expensive hardware, and extensive technical expertise. These requirements limited how many organizations could participate in the field. A startup that wanted to build an advanced system had to dedicate significant resources to model development.

Today, many companies access powerful models through cloud platforms or shared development ecosystems. Developers can integrate language processing, image analysis, and decision systems directly into applications without building the underlying models themselves. Adoption has expanded quickly. A survey of organizations reported in McKinsey research found that about 63 percent of organizations use open-source AI models somewhere in their technology stack.

“This has changed the role models play within the software stack,” says McCorvie. “In many products, the model functions as a component rather than the central product innovation. It provides intelligence capabilities in the same way that cloud storage provides data capacity or mapping services provide geographic information.”

When many companies rely on similar models, competition moves elsewhere. Two businesses might build products using the same model while producing very different results. The difference often comes from the information the system receives and how that information is organized.

The quality of that input matters greatly. After all, a capable model can only produce useful results when the surrounding data environment gives it clear signals. Without that context, even the most advanced model will struggle to deliver reliable outputs.

Data Is Emerging as the Real Competitive Advantage

Artificial intelligence systems depend on information to interpret requests and generate meaningful responses. The structure and quality of that information strongly influence how well a system performs.

Companies that gather proprietary data gain the ability to refine AI behavior in ways that competitors cannot easily duplicate. Information collected within a product environment often reflects the specific problems that product is designed to solve. When developers use that information to adjust their systems, the technology becomes better suited to its intended tasks.

Business leaders increasingly recognize how central this data advantage has become. Research from IBM research reports that 72 percent of CEOs believe proprietary data is essential for unlocking the value of generative AI. This view reflects a growing consensus that competitive advantage in AI depends heavily on access to unique datasets.

Building high-quality datasets requires careful work. Raw information rarely arrives in a format that machines can interpret effectively. Teams must clean the data, organize it into consistent structures, and identify which signals matter most. Over time, this effort produces a resource that strengthens the system. The AI becomes more accurate because the data reflects real usage patterns rather than generic information.

Products that interact frequently with users also create continuous opportunities for improvement. Every correction, adjustment, or refinement made during normal operation can contribute to a better understanding of how the system should respond.

Ryan McCorvie: Why Vertical AI Companies Are Positioned to Win

Many early AI products attempted to serve broad audiences. These tools aimed to handle a wide variety of tasks across multiple industries. While this approach allowed companies to reach large markets, it often struggled with specialized problems.

Some startups now take a different path by focusing on narrow professional environments. Instead of building general tools, they design systems that address a small set of specific tasks within a particular field.

Professional workflows generate large volumes of structured information. Documents, records, and operational data produced during everyday work can provide valuable signals for improving AI systems. When companies build products designed to operate within those environments, they gain access to information that can refine the technology.

“Understanding how professionals organize their work also matters,” explains McCorvie. “Knowledge of industry practices allows developers to structure data in ways that reflect how decisions are actually made. Without that context, an AI system may produce responses that sound plausible but fail to solve practical problems.”

Specialized products, therefore, benefit from both domain knowledge and access to relevant data. As the system processes more information from the environment it serves, its performance improves in ways that general tools may struggle to match. Companies that operate in these focused settings often build datasets closely tied to real work. Competitors entering the same space later may find it difficult to replicate that information.

Building an AI Company Means Building a Data Engine

Developing an AI product now requires more than selecting a capable model and connecting it to an interface. Companies must also design systems that capture and organize the information produced during everyday product use. These internal processes determine whether an AI system improves over time or remains static after launch.

Data pipelines gather information from user interactions, internal workflows, and system activity. Once collected, that information must be cleaned, structured, and prepared so that machine learning systems can interpret it effectively.

Many organizations are investing heavily in building these systems, but turning that investment into working AI products remains difficult. Consider that research from a Qlik study found that 94 percent of businesses are increasing spending on data readiness for AI initiatives, yet only 21 percent have successfully operationalized AI across their organizations.

Companies that treat data infrastructure as a long-term operational priority often build stronger products. As information accumulates through normal use, the system gradually learns from real-world patterns and improves its performance. So the product starts functioning less like static software and more like a system that becomes more capable as it processes new information.

What This Change Means for AI Founders

Founders entering the AI industry today face a different strategic environment than early builders of machine-learning startups. Access to powerful models is no longer restricted to organizations with large research budgets or specialized computing resources. Small teams can integrate advanced AI capabilities into products quickly.

Designing a product that produces meaningful information during normal use is one of the most important challenges. AI systems improve when they receive consistent signals about their performance in real-world situations.

Products that naturally generate structured records of user interactions provide valuable opportunities to refine the technology over time. Interface design also shapes the quality of that information. Systems that encourage users to correct outputs, adjust instructions, or provide feedback create clearer signals about how the AI should behave. Each interaction contributes additional context that developers can use to improve the system’s responses.

Founders who treat data collection as a central part of product design often create stronger foundations for long-term growth. When a product continuously gathers useful information from the environments it serves, that information becomes a dataset that strengthens the system over time.

The Takeaway

Artificial intelligence startups once competed primarily on their ability to train stronger models. Improvements in computing power, model architecture, and training techniques drove progress in the field for many years.

The environment surrounding AI development has changed as powerful models have become widely available through shared platforms and development ecosystems. Startups can now build sophisticated products without training large systems themselves.

The information surrounding those models has become increasingly important. Data gathered through real product usage shapes how AI systems interpret inputs and generate responses.

Companies that collect and organize that information effectively can refine their technology in ways that competitors cannot easily replicate. The startups that define the next phase of artificial intelligence will therefore be those that build strong data environments around the models they use.

Products that learn continuously from real-world interactions will become more capable over time, giving those companies a lasting advantage as the field continues to expand.