Stack Overflow users don’t trust AI. They’re using it anyway

AI Summary37 min read

TL;DR

Stack Overflow CEO Prashanth Chandrasekar discusses the company's pivot to AI, including launching AI Assist and data licensing deals, despite low user trust in AI. The platform now focuses on enterprise SaaS and community features for complex problems.

Key Takeaways

  • Stack Overflow responded to ChatGPT's launch by reallocating 10% of staff to develop AI solutions, launching features like AI Assist while banning AI-generated answers to maintain trust.
  • The company now generates revenue primarily through enterprise SaaS (Stack Internal), data licensing deals with AI labs, and advertising, with data licensing becoming a key business model.
  • Over 80% of Stack Overflow users use or intend to use AI for coding, but only 29% trust it, highlighting a tension between adoption and skepticism in the developer community.
  • Stack Overflow has expanded its mission to include community features like chatrooms and challenges to attract users beyond just Q&A, addressing the threat of AI tools like GitHub Copilot.
  • The company balances its original mission of accurate knowledge with modern AI integration, facing pushback from some users over data licensing deals while adapting to new internet business models.

Tags

Stack OverflowAIChatGPTdeveloper communitydata licensing

Today, I’m talking with Prashanth Chandrasekar, who is the CEO of Stack Overflow. I last had Prashanth on the show in 2022, one month before ChatGPT launched. While the generative AI boom had tons of impact on all sorts of companies, it immediately upended everything about Stack Overflow in an existential way.

Stack Overflow, if you’re not familiar with it, is the question and answer forum for developers writing code. Before the AI explosion, it was a thriving community where developers asked for and received help with complicated programming problems. But if there’s one thing AI is good at, it’s helping developers write code — and not just write code, but develop entire working apps. On top of that, Stack Overflow’s forums themselves became flooded with AI-generated answers, bringing down the quality of the community as a whole.

You’ll hear Prashanth explain that it was clear more or less from the jump how big a deal ChatGPT was going to be, and his response was pure Decoder bait. He called a company emergency, reallocated about 10 percent of the staff to figure out solutions to the ChatGPT problem, and made some pretty huge decisions about structure and organization to navigate that change.

Verge subscribers, don’t forget you get exclusive access to ad-free Decoder wherever you get your podcasts. Head here. Not a subscriber? You can sign up here.

Three years later, Prashanth says Stack Overflow is now very comfortable primarily as an enterprise SaaS business, which provides AI-based solutions that are tailored to different companies’ internal systems. Stack Overflow also operates a big data licensing business, selling data from its community back to all those AI companies, large and small.

That’s a pretty big pivot from being seen as the place where everyone can go to get help with their code. So I had to ask him: does Stack Overflow even attract new users anymore, in 2025, when ChatGPT can do it all for you? Prashanth said yes, of course. You’ll hear him explain that while AI can handle simple problems, for thorny, complex ones, you really want to talk to a real person. That’s where Stack Overflow still brings people together. 

You’ll hear us come back to a single stat in particular: Prashanth says more than 80 percent of Stack Overflow users want to use AI or are already using AI for code-related topics, but only 29 percent of that population actually trusts AI to do useful work. 

That’s a huge split, and it’s one I see all over in AI right now. AI is everywhere, in everything, and yet huge numbers of people say they hate it. We see it, in the Decoder inbox, in the comments on The Verge, and on our videos on YouTube. Everyone says they hate it — and yet numbers don’t lie about how many millions of people are using it and apparently deriving some benefit. 

It’s a big contradiction and hard to unpack. But Prashanth was pretty willing to get into it with me. I think you’ll find his answers and his insight very interesting.

Okay: Prashanth Chandrasekar, CEO of Stack Overflow. Here we go.

This interview has been lightly edited for length and clarity.

Prashanth Chandrasekar, you’re the CEO of Stack Overflow. Welcome to Decoder.

Wonderful to see you again. It’s been a hot minute. It has been three years since the last time we spoke, so it’s great to see you again.

I should have said welcome back to Decoder. You were last on the show in October 2022. One month later, ChatGPT launched. 

[Laughs] That was an interestingly timed interview, right before the world changed.

Right before the world changed. Software development is certainly the thing that has changed the most since AI models have hit. There are a lot of new products in your universe to talk about, and there’s what Stack Overflow itself is doing in the world of AI. So I want to talk about all of that.

But first, take me back to that moment. We had spent an entire conversation in 2022 talking about community and moderation, how you were going to build a funnel of people learning to code, and learning to use Stack Overflow. That was a big part of our conversation. The engineer pipeline, both them learning to write software and being a part of the software development community, was very much on your mind. And then, all of software development changed because of the AI tool. Describe that moment for me because I think it contextualizes everything that happened afterwards.

It was definitely a very, very surprising moment. It wasn’t an unexpected moment in many ways because here came this technology that obviously some people knew about but not in a way that captured everybody’s imagination using this beautiful interface. We were in the middle of wrapping up our calendar year, and at that point, we were thinking about our priorities for the next year. 

It became very clear what we needed to focus on because this was going to be this very, very huge change to how people consume technology. Welcome to technology. It’s constantly changing, and I think this wave especially is completely unprecedented. I don’t think there is any sort of analogy or prior wave that I could look to, including the cloud and maybe the internet. I don’t think we’re still fully consuming what this is at the moment.

So, we went into what is the equivalent of a code red situation inside the company. It was an existential moment, especially for our public platform because the primary job, if you will, is all about making sure people get answers to their questions. Now, you have this really, really slick natural language interface that allows you to do that at a moment’s notice. We had to sort of organize our thoughts, and what I ended up doing was carving out 10 percent of the company’s resources to specifically focus on a response to this. 

We set a specific date to respond in a meaningful fashion, so we said the summer of 2023. I was going to go speak at the WeAreDevelopers Conference in Berlin, and I effectively told the company, “We’ve got six months to go and produce our response.” At least, it would be our initial response because this is going to keep iterating.

That’s how we mobilized the company. We acknowledged it was a code red moment, we carved out a team of 10 percent, so about 40 people or so since we were a somewhat medium-sized company. Then, we got to work. That was the moment.

Take me inside that room. Very few people ever get to send the code red memo, right? This is not a thing most people ever get to do. Maybe you think about doing it, but no one’s going to read your memo. Everyone has to read your memo. You’re the CEO. 

Take me inside that room where you said, “Okay, I have identified an existential threat to our company. People have come to us for answers to software development questions.” Again, the last time you were on the show, you were talking about the idea that there were objectively right answers to software development questions and that the community could provide them and vote on them. Well, now you’ve got a robot that can do it and can do it as much as you want, as long as you want. There are tools like Cursor AI and Cloud Code that can run off and do it for you. 

So you’ve got all that, and you say, “I need to take 10 percent of the company.” I’m curious how big the company is. I know there’s been some changes, but 10 percent of the company is 40, 50 people. How did you identify and say, “This is the moment I need to pull these people in the room. I’m making this decision, and the right answer is 40, 50 people are going to set aside their time to deliver me a plan by this time next year?”.

The instinct has come from a couple different experiences. My experience right before this was at Rackspace in the cloud services space. The business I was running at Rackspace was actually around responding to Amazon Web Services as a cloud technology threat. I was on the team that ultimately built that business from the ground up, and it was effectively 10 percent of Rackspace’s population that went and created that. So, I had some practice seeing and responding to a disruptive threat that you encounter. It was my turn now to put that into motion at Stack by appointing somebody like myself when I was at Rackspace to do exactly the same thing. 

The other data point goes all the way back a couple decades or more when I was in business school. My professor was Clayton Christensen, and he wrote the book The Innovator’s Dilemma. I’ve always thought about that in the context of technology. In technology, it is a very consistent theme that every so often, you will have disruptive threats, and there’s a very specific way in which you need to respond to that. It’s very much about how history suggests you should carve out an autonomous team that has very different incentives and can pursue things in a very different way relative to the rest of your business. 

And remember, Stack Overflow is really two parts. We have our public platform, which is this big web disruptor and which we should talk about more broadly in regards to the internet). The other side is the enterprise business, where we are serving large companies with a private version of Stack Overflow. Thankfully, people continue to see value in having a knowledge base that’s very accurate. Increasingly over the past few years, it’s actually even become more valuable because you need really great context for AI agents and assistants to work. I’ve got plenty of examples, so we can talk about that. 

So, that’s where that response came from. I had been through it in a couple different dimensions prior to that. And just in terms of how I communicated to the team, the memo was actually like a series of memos. Every Friday, I send a company email. I just sent one right before I got on here. I am pretty transparent in those:  “Here’s what’s on my mind, here’s what we should be doing, here are some great things that happened, here are some people who demonstrated core values.”

I’ve done that religiously for… I’ve been at the company for six years, and I do that every Friday. The team basically knows what’s on my mind, and so it wasn’t one big memo to activate it. It was a series of emails leading up to this moment saying, “here’s what we’re going to be, we’ve got to respond to this, here’s what we’re thinking about now,” and so on and so forth. This went on until I could put the flag poll down and say, “Hey, by the WeAreDevelopers Conference, we have to produce a meaningful response on the public platform as well as on the enterprise front because it’s a great opportunity to integrate AI into our SaaS application, which is a different vector.” Hopefully that helps.

Did you actually type the words “code red?”

I think I definitely used “disruptive.” I used “existential moment.” I used all those things, but I don’t know if I used the exact words, “code red.”

[Laughs] I just think about that moment where you’re like, “All right, I’m going to hit the C and the O… I’m saying these words, it’s happening.”

It was very clear. We have a very specific communication cadence with the company, like many others. The tone and seriousness of what we were working on was very obvious to people, especially when you carve out resources and take people away from certain teams. People are going to ask, “Wow, what about my staff?” Here you go, there’s the reason. So, it becomes very obvious.

How did you make those decisions to pull people away? How did you decide which people, how did you decide which teams? Those are all trade-offs, right?

Yeah, no doubt. I think this is a hard problem to solve. You certainly want very talented people, but I think you want the types of people who are willing to break glass or go against the grain and not be encumbered by historical norms. I very specifically picked a combination of people. The people who are leading it were newer and came from the outside of the company because remember, we were going through a transformation. 

I joined a company that was engineering-led in 2019 and all about this public platform, and we were transforming into this product-led organization. We specifically appointed a newer person who had come from the outside, was interested in building highly innovative, fast iterating products, and had the DNA and the drive to do it.

I also personally stayed much closer to it. I, in fact, ran product for an interim period of time with that person reporting directly to me. That was another way to stay very, very close to what was happening on the ground until the actual launch. The rest of the team was a combination of very talented engineers, designers, and people who had context of how the site worked in the past and who could provide us with all the unlocks that we needed.

I think about Stack Overflow in what are probably two reductive terms in this context. You have inputs, you have outputs. The inputs are users answering questions. The outputs are the answers to those questions that people come and search for. There’s a whole community that makes that system run. The software platform manages that community with the event moderators, but it’s really inputs and outputs. There are people who are asking questions and people who are answering questions. 

Both sides of that are deeply affected by AI. I think we have come to the open web part of the conversation, where the input side is being flooded by AI generated slop. In 2022, you had to ban AI-generated answers in Stack Overflow. Then, on the output side, the ability for AI tools to supply the answers is overwhelming. 

So, let’s just break it into two parts. How did you think about the input side, where there’s a flood of people saying, “Oh, I can answer these questions faster than ever by just asking ChatGPT and pasting the answer in. Maybe that’s not good enough, but I can just do it.” Then, how did you think about the output side?

We noticed two things right out of the gate. One was the number of questions that were being asked and answered on Stack went through the roof because people started using, to your point, ChatGPT to answer these questions. That fueled this spike, which is kind of counterintuitive, but I think people just felt like, “Wow, I can game the system, so let me go do it.” Very quickly, we had to be extremely shrewd, and our community members are amazing at figuring out what’s real and what’s not. They were able to call out very quickly that these posts were actually ChatGPT generated. That’s kind of what initiated the ban, which we completely supported and still support, by the way. You still cannot answer any of the questions on Stack Overflow with AI-generated content.

The reason for that, Nilay, is because our proposition is to be the trusted vital source for technologies. That’s our vision for the company. So for us, it’s all about making sure that there are only a few places where you can go and not deal with AI slop, where a community of experts have actually voted and curated it so you can trust it for various purposes. On the input side, it made sense to do that, and we continue to do that. 

Fast-forward a little bit to now, and we have created all sorts of new entry points onto the site, even though we’ve had high standards to ask a question on Stack Overflow. We just launched our AI Assist feature into general availability earlier this week, and it’s been super exciting to watch how users are using that. It is effectively an AI conversational interface grounded on our 90 million questions and answers. 

Then, there’s the ability for people to ask subjective questions, going back to our last conversation three years ago. Now people are able to ask open-ended questions because there’s a place for Q&A, which is the canonical answer to a question. There’s also a place for discussion and conversation because there’s so much changing. It’s not like all the answers have been figured out, so let’s just make sure that people have an ability to do that. That’s aligned with our mission of “cultivating community,” which is one of the three parts of our mission. The other ones are “power learning” and “unlocking growth.” So, we have done all these things to make sure that we’re not restrictive on the entry point and the question-asking experience.

The other thing on the answer side is that we realized it’s very important to go wherever the user is spending time. Now that the world has changed and people are in fact using Cursor AI and GitHub Copilot to write their code, our goal is to be the vital source for technology. So let’s show up wherever our users are. We’veactually become a lot more headless. 

For example, we recently launched MCP servers for both our public platform and our enterprise product. What people are using our platform to do now is to not only invoke those MCP servers — let’s say if they’re writing code in Cursor and want to know the difference between version one and version two — but to also to be able to write back to our platform straight from Cursor if they want to engage and get a deeper answer, which is very unique in the industry.

So, that’s been our product principle: just go anywhere the user is. But ultimately, we just want to be that trustworthy vital source for technologists, whether it’s inside companies or outside companies.

How do you monetize in a world where you’re headless, where you’re just another database that someone’s querying from Cursor? How does that make you money?

We make money primarily in two ways. We have a third way, thankfully, but the third way is the smallest part, so I’ll start with the biggest. Our enterprise business, what we call Stack Internal, is now used by 25,000 companies around the world. Some of the world’s largest organizations, banks, tech companies, and retail companies use this product to share knowledge internally. Increasingly, they’re now able to use that trustworthy knowledge to power their AI assistants and AI agents to go do various things. 

A good example of this is Uber, which is a customer of Stack Overflow Internal and it has Uber Genie. It has thousands of questions and answers on our platform. Uber Genie plugs into that content through our APIs, and then it’s able to go into things like Slack channels to automatically answer questions and drive productivity so that you’re not bothering people. It’s rooted in the organization’s knowledge on our platform.

So, the enterprise business is our primary business. The second business is our data licensing business, which we actually built only over the past couple of years. One of the things we also noticed was that a lot of the AI labs were leveraging our data for LLM pre-training and post-training needs, along with retrieval-augmented generation (RAG) indexing, so we put up a whole bunch of anti-scrapers. We worked with third-party companies, and very quickly we got calls from a lot of them saying, “We need access to your data. Let’s work together to formally get access.” We had to do that, and now we’ve struck partnership agreements with every single AI lab that you can think of, every cloud hyperscaler that you can think of — companies like Google, OpenAI — and even partnerships with Databricks and Snowflake, even though they’re not doing LLM pre-training. That’s been our second business more recently. 

And the third one, which is the smallest part of our company, is advertising. I think most people assume that Stack Overflow is supported entirely by advertising, but it’s only about 20 percent of our company revenue. We have a very captive, very important audience of developers who spend time on the site, so we have large advertisers that want to get their attention on various products. In fact, there’s a lot of competition now, so they increasingly want to do that. That’s how we make money. 

So, in the context of becoming headless, for us it’s about our enterprise product. It works on a subscription and with hybrid pricing. That’s how we make money there. The data licensing is similar in that if people want access, they’ve got to pay for that. Then, yes, advertising is limited to some of the largest companies, and they pay us for that. But there’s always going to be… I would say it’s an “and” versus an “or.” We’re not going to be completely headless. I think we just want to give the user the option to be headless. Plenty of people still come to the site, and in that case, we’re able to balance that out with these mechanisms.

Do you think new users are going to come to Stack Overflow? Stack Overflow is a product of the mobile era. There’s an explosion of software development. There’s an explosion of community. There’s a culture in the value of building apps and services, and there’s new tools. Stack Overflow is one of the central gathering points for that community in that era. 

New developers today might just open Cloud Code, Cursor, GitHub, or whatever, and just talk to that. They might never actually venture out into a community in a similar way. Do you think you can get people to come to Stack Overflow directly and seek out answers from other people, or are they just going to talk to the AIs?

I think that for simple questions… By the way, when we saw the questions decline in early 2023, what we realized is that pretty much all those declines were with very simple questions. The complex questions still get asked on Stack because there’s no other place. If the LLMs are only as good as the data, which is typically human curated, we’re one of the best places for that, if not the best for technology. It’s still a very active site with a lot of engagement and a lot of monthly active usage. 

The questions being asked are quite advanced, I would say. What we’re also increasingly seeing through the new mechanisms that we’ve opened up… because to answer your question, we want to give people other reasons to come to the site besides just getting their answers. So, we have had to broaden our site’s purpose, hence the mission of “cultivate community, power learning, and unlock growth.” 

What we’ve done is open up new entry points, new ways for people to engage. We, for example, unlocked the ability for humans to chat with each other to get directional guidance. That’s been a very popular feature on the site where people are engaging with other experts. For example, we have people asking OpenAI API questions, and they can go into the OpenAI chatroom and engage with other people who have similar questions, or Python experts.

We also opened up the ability for people to demonstrate their knowledge with challenges, effectively like hackathons.  We’ve opened up a whole series of challenges, which are a very popular feature now. People spend time to go and solve these challenges that we post, and that way, they can showcase their understanding of the fundamentals, which I think is very important in terms of where the world is going. 

If people are just using vibe coding tools and code gen tools, companies bringing in young talent need to know that they’re relying on people who not only took the shortcut, but also understand the fundamentals. We’re one of the few places where you can actually prove that you’ve learned the fundamentals. So, that’s the other reason why we’ve opened up these new mechanisms.

Then, there’s the third part of the mission, which is unlocking growth. There’s going to be a lot of job disruption because of all this. If people’s jobs are going to change quite dramatically, junior developers are going to need a home, even though I think it’s a shortsighted move by many companies to stop hiring them considering you need a pipeline. They’re going to need to connect with other people, to be able to progress, learn, and get jobs. Jobs are a very important part. We struck a partnership with Indeed this past year to partner on tech jobs. It’s just to broaden the scope of our site so that there are many other reasons other than asking the questions. They still do, but we also want to give them more reasons to come to the site.

This comes to the big tension in all of this. I see it playing out in all kinds of different communities. I see it playing out in our own comments in a lot of ways. You want to build a community of people who are helping other people get better, and that is being disrupted on every side by AI. Communities that are built around people are pretty resistant to the incursion of AI.

This has definitely happened on Stack Overflow. Your moderators have essentially revolted over the ability to remove AI-generated answers as fast as they want to. When you partnered with OpenAI, a bunch of users started deleting content so it wouldn’t be fed into OpenAI for training, and you had to ban a bunch of them. How are you managing that balance? Because if you build communities around people, I would say the culture — right now anyway — is that those communities will push back against AI very hard.

I would say one of the most important things that we’ve focused on (and that I’ve spent time on over the past few years), is this whole push and pull of how we think about AI in the context of our site. Because it’s pretty clear to us that if we don’t modernize the site in the context of us leveraging AI as an entry point that it’s going to be less relevant over time. That’s not good. So, we’ve taken a very aggressive stance by incorporating AI into the public platform with AI Assist, which has been fantastic to see. I’ll walk you through the decision on why we did that. Then, we did the same thing on the enterprise side. 

If I think about the user base at Stack Overflow, it’s kind of like a big nation, right? We’ve got 100 million people, and there’s definitely people on both sides of the spectrum. We have something called the 1-9-90 rule. One percent are the hardcore users who have spent a lot of time with their blood, sweat and tears curating knowledge, spending their time on the site, and contributing. Nine percent are doing it in a medium way, and 90 percent are consuming and mostly lurking. 

We ask people on the site whether or not they’re using AI. Our own surveys basically say, if you took a look at the Stack Overflow 2025 Developer Survey, over 80 percent of our community members are using AI or intend to use AI. Eighty percent. But the trust level when they’re using AI is only about 29 percent. Only 29 percent of our user base actually trusts what’s coming out of AI, which is actually quite appropriate considering where we are because there should be skepticism of this new technology. 

So, there’s enthusiasm to try it but not to fully trust it. And with this 1-9-90 rule, I think what we have is a core group of users that are always going to be the protectors of the company’s original mission, which was to create this completely accurate knowledge base and do nothing more. Then, we have a very large number of people who are, let’s say, the next generation of developers, who are looking to leverage the latest and greatest tools. It’s very clear to us based on surveys and additional research that they want to use natural language as the interface to be able to do this.

It is the most meaningful change in terms of computer science development. If you look all the way back to object-oriented programming many decades ago, that wasn’t actually such a huge boom. It didn’t actually create this sort of change. But now, we’re in this moment where everything’s been unlocked. It’s a huge change effort, and we’ve had to decide to respect the original mission and keep accuracy at the heart of it. We’re not comfortable using AI for answers, for example, because it will generate slop. It hallucinates, hence why the trust score is low. But why don’t we incorporate natural language interfaces so that’s the preferred way to engage? So, we ended up doing that, both on the public side as well as on the enterprise side.

That’s been well received by the vast majority of users, but there will always be a vocal minority who will push back against incorporation. Beyond the site, there’s just a level of broader concern about what all this does to jobs, and what’s going to happen if we let the cat out of the bag. So,there’s that concern also, which is understandable.

Let me put a fine point on that. I think I understand that in a sharper way. If I am somebody in your 1 percent who spends a lot of time on Stack Overflow helping other people. The reason I answer questions for free on your platform, which you monetize in lots of ways, is because I can directly see that my effort helps other people grow and that I’m helping other people solve problems. That is one very self-contained dynamic. The last time you were on the show, our entire conversation was about that dynamic and how you got people to participate in that dynamic and the value of it.

Then suddenly, there’s a very clear economic benefit to the company that owns the database because it’s selling my effort to OpenAI, which is happening across the board. It’s going to do these data licensing deals with all these AI providers, they’re going to train on the answers that I have painstakingly entered into this database to help other people, and now the next generation of software engineers is going to get auto-complete that’s based on my work and I’ve gotten nothing. I’ve heard that from lots and lots of people. I’ve heard that in our own community, and I think I have felt that as various media companies have made these deals.

How do you respond to that? Because it feels like you were providing a database that you had to monetize in some ways, but the interaction people had was the value, and now there’s another kind of economic value that is maybe overshadowing, recasting, or re-characterizing the interaction that people have.

There are a couple of points there. One is about this company’s original DNA and why people came together to do this thing. When I joined the company, I asked a question like, “What’s people’s incentive to spend time doing this?” I asked the founders, specifically [co-founder] Joel Spolsky, about this. His point was that the software development community is very altruistic. People just want to help each other out because people understand how frustrating… I used to write code many years ago. I recently picked it back up with some of the code-generation tools, which is interesting to compare and contrast. I just remember how frustrating it was if you got stuck on something. Stack was a huge boon when it was created to unlock this. It was truly out of that. That was the reason. 

Even before ChatGPT, we also asked the question, “Should we incentivize users by paying them? Should we give them a monetary benefit?” That wasn’t a high ask by a user base. We went and researched people. People were not in for the money. Plus, it complicates things because how do you judge the payment for a particular JavaScript question relative to a particular Python question? It goes down a rabbit hole, which is untenable. So that’s one. What was the original reason people got together; it was about the mission.

Secondly, in terms of why we have to do this and if it’s unfair. The primary reason we have to go down the licensing route is because the model of the internet has literally been turned upside down. I know you talk about this, Nilay, with the “DoorDash problem.” People relied on the model of the internet where people go to search engines and websites and you monetize off of ads. I really empathize with content sites that are heavily dependent on advertising because I think most content sites’ traffic is down 30 or 40 percent, something like that. There’s this huge seed shift where companies that support these platforms have to… we’re a business ultimately. So, what do we have to do? We have to do what is necessary and adopt a new business model to survive, thrive, and do all the things.

Thankfully for us, we had an enterprise business, which is independent of all of this. Thankfully for us, we still had the advertising business, and large advertisers still cared about our community. So, data licensing only felt right in terms of making sure that we can effectively capitalize on the moment and be able to invest back into our community so that people who are there for the right reasons saw the benefits. We’ve invested with all these new features I just mentioned. Whether it’s these new content types, challenges, chat, AI Assist, any of these things, they all take resources to go and build. So, we had to go and leverage the funds we received to be able to go do that.

Now in the future, we may consider other ways. For example, should we pay our users, give them a piece of the data licensing revenue? Perhaps. We always ask that question. There are always ways for us to continue, but this is the current setup that we have right now. It’s balancing a lot of things.

You mentioned that to get to the data licensing deals, you had to put up a bunch of anti-scraper tools. You had to go into secondary and tertiary layers of the stack to get deals from Databricks and other kinds of providers. The AI companies were just scraping your site before. They probably still are. Whether or not they’re paying you, they’re probably still just going through the front door because all of them appear to be doing that. Did you have to say, “We’re stopping you,” and then go get the deal? Or did you say, “Hey, we know you’re doing this, but you have to pay us or we’re going to start litigating?”

It’s somewhere in between. We put up the anti-scrapers very quickly. We even changed the way in which people received our data dumps. Again, there’s a balance because we never wanted to prevent our community users from grabbing our data for their legitimate needs, like their school projects or PhD theses. So, we’ve continued to be open about our data for our community members, but they have to be community members, and there can be companies looking to commercialize off the data.

We were very specific about the policy terms. We put up technology that prevented people from grabbing it, so we knew exactly who is scraping and who’s not scraping. We reached out to some of those folks and said,, “Look, stand down because you’re putting a lot of pressure on our servers by doing what you’re doing, so take it easy in here.”

But I think my characterization of those companies is that they don’t care. Some of them care and want to be good citizens, and some of them absolutely do not care and they would prefer the smoke. You can just categorize them. There’s a reason Amazon is suing Perplexity. They told Perplexity to stop it and Perplexity won’t. The New York Times, as we’re speaking today, is suing Perplexity. Then, there are other players acting in different ways and striking different kinds of deals. 

Walk me through one of those deals. When you went and struck your deal with OpenAI, was it, “We’re going to stop you, and if you want the door to be open again, you have to pay us?” Or was it, “You know this is wrong. We can take all the technical and legal measures, but we should actually just get to the deal correctly?” Walk me through that conversation.

We were incorporating something like OpenAI into our product. Remember the code red situation where we were about to announce our AI response. So, we were actually using that technology to do what we had to do to incorporate AI into the public platform and our enterprise product. We had a relationship with them, and we also said, “Look, this is not going to work. It’s not tenable, and this is the new way of working. Maybe we need a new business arrangement for you to use the data. Let’s actually have a conversation.” And credit to them, they were very partner-centric around that. I was very impressed by OpenAI and companies like Google that are all very open to engaging on this topic and wanted to be responsible AI partners.

They got it immediately, even before we asked them. It wasn’t this big, “Let’s go have this conversation from the ground up and justify why it had to be done.” We just said, “Look, this is what needs to happen because this is a new business model.” We got into the conversation pretty quickly. “What exactly are you looking for? Which format of data do you want to scrape the content? Do you want bulk uploads? Do you want API calls? What do you want?” 

So, we got into that whole mix. And remind you, Nilay, these are recurring revenue-type deals. These are not one time payments. If you want access and you want continued access in the future, you’ve got to keep paying, even with the historical data. So that’s how these are set up.

So yes, they were very collaborative partners. But you’re right. There are contradictory players. They say something and their actions prove other things in terms of how they’ve engaged. There are holdouts for sure and people who are not exactly consistent with their word, and that’s unfortunate. I think every company like us has to decide what to do about that. We’re in various stages of these conversations with people on how to make sure  we sensibly get them to do the right thing.

Now you have to name one of those companies. Who do you think is holding out differently than their public posture?

I’d rather not be specific, but all the usual suspects that you’re covering are the usual suspects that we are encountering. That’s how I would put that. 

Let me ask you about the recurring revenue piece and then I want to get into the Decoder questions because I think they’ll be illuminating after this conversation. There’s a sense that we’ve done all the pre-training that we’re going to do, right? Scraping the internet is not the future of these models, and there needs to be some other leap. 

Stack Overflow’s existing corpus of information is the valuable thing. There’s a lot of information. There’s 20 years of stuff in that database. What’s the value of “you have to pay us again to train the next version of Gemini or GPT” and the value of “there’s incremental information being added to the existing database?” Because that seems like a clear split to me.

The way we’ve thought about this is that every model that’s being trained is trained on some corpus of information. You’re going from GPT X to Y. If you’re leveraging our original data or some derivative of that from a prior model in the new model that you’re training, then you have to pay us for it. That’s effectively the legal requirement for doing that. So, it’s a cumulative aspect. Let’s not forget that. People have to pay for the cumulative data. It’s not just that it was used back in the day. And yes, relative to 20 years, one year’s worth of information is going to be less, but that’s why you’re getting 20 plus one. That’s the idea. So, that’s the way the legal agreement has been set up.

Is it per year? Is it that every year’s worth of data is a chunk of money? How does that work?

No, it’s cumulative, like the whole corpus: historical data as well as anything going forward for the following year. All that is one accumulated data set, and that’s effectively charged as one.

So, this year’s data doesn’t get pulled into Gemini 3’s data set, which just came out, right? Every new question and answer in Stack Overflow since Gemini 3 came out is not incorporated in Gemini 3’s training.

Correct.

So you’re kind of betting that they’re just going to train ever-bigger models. Is that how it’s structured in your mind?

Yeah. And some companies have asked for wide use cases. There are pre-training use cases. Even beyond that, you can leverage the data in many different ways for AI and non-AI use cases, like search use cases. But correct. There may be scenarios where more larger models are built, and our data is going to be useful for those scenarios. But there’s going to be RAG indexing, post-training needs, all sorts of scenarios. It’s quite interesting to see some of the frontier labs ask for very specific slices of data that they find useful. 

Remember, there’s not only questions and answers. We’ve got the comment history, the metadata history, the voting history, the history of User A going down this path. So, it’s a lot of excellent context for things like reasoning and being able to mimic the human brain. It’s almost like one human brain that’s been documented. 

This, I think, brings me to the Decoder questions. You’ve restructured the company. There’s been some rounds of layoffs. You’ve refocused on the SaaS business in a real way. I think we should talk about that. But there’s the idea that we’re going to train ever-bigger models and that will be the growing part of the business, versus wanting some slices, versus RAG actually being the future for a lot of these other businesses. You would make different decisions based on which one of those is going to grow faster, and I don’t think anybody knows. Maybe you know. You can tell me if you know or you know someone who knows, 

Visit Website