Pinecone Now Valued at $750M, Arguably the Most Important Element in the Modern Data Stack
Funding announcement posts are often full of over-the-top venture capitalist claims about vision, foresight, and category mastery. I won’t do that here (or will I?). Instead, I’ll talk about our connection to Pinecone—stories that go back over a decade with the founding team, leading to the news today: Pinecone has raised a $100M Series B led by A16Z—with explosive growth justifying their new $750M valuation.
Concurrently, I’ll tie in some Avengers analogies. (I would reference Star Wars but I couldn’t figure out who’d be Darth Vader.)
They say partnership in venture capital is everything. Thankfully, my partnership with founder Edo Liberty and CTO Ram Sriharsha goes back more than 10 years.
Edo = Tony Stark
I first met Edo when he was at Yahoo’s research lab and I was leading engineering teams, some of which were using Hadoop to count unique users of Yahoo by counting cookies. Yahoo assigns unique cookies to each browser instance on a machine; the number of cookies in a given day is the union of cookies across multiple browsers, incognito mode, and robots, and cookie clearing can reach the high billions of uniques. “select count(distinct(cookies))” at that scale isn’t fun, especially when the underlying JVM is out of heap allocation.
We wanted something superior and, of course, reached for hyperloglog. Dissatisfied, we instead extended stochastic streaming algorithms to Data Sketches, which is now a popular OSS project. After scientifically solving big data problems at Yahoo, Edo eventually went on to run AI Research Labs at Amazon. I have always considered him a dynamic, multi-talented, and brilliant person, with an eye for what’s next but a pragmatic approach. He is also someone who lives life to the fullest (I’m excited to use this round to invest in bubble wrap to protect him from his extreme sports hobbies). He’s similar to Tony Stark, except Edo loves his family and other people.
Ram = Vision
Ram and I have another parallel and distinct story. Often, engineers are described as being “10x” developers. Ram isn’t a 10x developer; he’s a 1000x developer. His intellect reminds me of Vision from the Avengers, with a giant, caring heart inside like Vision, except Ram is human!
Working in the data team together, we were dissatisfied with Hadoop’s performance and wanted more. We took it so far as to rewrite the whole thing in C++ with a custom file format that looks precisely like Parquet (including metadata in the footer). Having sniffed around the literature for a better way, we discovered a project at the UC Berkeley AMPLab named Spark. We were intrigued by the graph processing model and immediately hopped on the next BART train to Berkeley to meet with Ion Stoica, Matei Zaharia, and Reynold Xin. In rapid succession, we sponsored the lab and hired some of their grad students as interns at Yahoo. From that, Databricks was born, formed by the AMPLabs team. Ram became an early employee at Databricks and one of their most important engineers.
Fast-forward to 2021—I was CTO at Splunk, and Ram was running our machine learning and security research teams. I left to work at Menlo Ventures—Ram stayed, but we chatted often. I wanted to found or incubate a company with Ram, and we quickly landed on vector embeddings—either applying them against cybersecurity problems or as a database. Ram was still in contact with Edo since they had worked closely together in the past. When he learned Edo had started a vector database company, Ram joined Pinecone right away.
AI/Data Architecture Changes: That Spark/Databricks Feeling Hits Different
At that point, I knew we had another inflection point in data and AI. I knew this feeling—I had it before—it felt exactly like the day we took BART to Berkeley and met the Spark team that formed Databricks.
Vector embedding databases were always going to be the future of data. Vectors are the new oil, like folks once said, “Data was the new oil.” It’s a richer, high-fidelity way to represent any data—structured or unstructured. Semantic search is clearly superior to lexical search and is going to change the search category for decades. The next great enterprise companies in security, observability, sales, marketing, and more—all of these categories will be built on embeddings.
The idea that a company could build a database for vectors in the cloud as Snowflake did for OLAP was a mind-blowing opportunity that was both impossibly technically challenging and lucrative. If anyone could build a Snowflake-like cloud database with separation of storage and compute, vertical/horizontal scaling, CRUD semantics and a custom vector storage layer, it was going to be the Edo, Ram, and the Pinecone team.
Closing the Deal
When I learned Ram joined Pinecone, I made it my mission to get in front of it. I quickly connected with Edo. After exchanging ideas about the art of the possible with vector databases, a few dinners (including with Edo’s wife), we eventually reached a deal. Menlo led their Series A in December of 2021.
$17 million at $170 million post in December 2021 for a vector database when nobody understood vector embeddings sounded bold.
We were okay to have Menlo look crazy at that point. It was clear: Pinecone would be an anchor piece in the architecture of AI. Though we couldn’t have predicted the timing generative AI hype (crypto was dominant at the time), we did know that Pinecone would be fantastic due to semantic search, applications in machine learning, and, eventually, language models, like the ones we are all in love with today.
Pinecone was already going to be a massive hit based on semantic search alone. However, with the rise of LLMs, developers quickly realized that hallucinations and lack of model freshness due to the untenable pair of size and cost was a problem. Pinecone filled that gap immediately, to the point that the pairing of OpenAI and Pinecone became “a thing” now known as the OP stack.
That combination sparked incredible and explosive growth in Pinecone. It is clear that vector databases will be one of the key anchor elements of the modern AI data stack, and that Pinecone is the emerging category leader with a proven team. I’m incredibly proud and excited to be on the journey with Edo and Ram. We’re also thrilled excited to welcome Peter Levine and A16Z to the team as we continue to design the future of AI with Pinecone.
PS: To celebrate this milestone, I cleaned up and promoted the Julia Pinecone API (Pinecone.jl) to 1.0. Thank you to the amazing Pinecone team for keeping me on my toes by using every HTTP 20x status code that I didn’t know existed! Silly me to hardcode HTTP 200 when HTTP 202 would be better!
Also, if you’re a Pinecone user, check out the Pinecone command line interface I wrote, which helps you manage indexes and CRUD against data.