Open source software security has gained the attention of governments in the U.S. and E.U., especially in the wake of the Log4Shell vulnerability. With this new focus comes a growing list of proposed rules and regulations to address current and potential security weaknesses. The overarching purpose of these regulations is to improve cybersecurity for government, business, and citizens. But they also bring with them a new set of issues that impact open source maintainers, such as whether liability for security issues falls on the maintainers and do they have the time and motivation to adhere to what are by many seen as unfunded mandates. Last week, Tidelift co-founder and General Counsel Luis Villa sat down with FLOSS Weekly host Doc Searls and guest Simon Phipps to discuss what comes next with government regulations, where machine learning (ML) and artificial intelligence (AI) fit in, Wikipedia, and more. Below we share highlights from their entertaining discussion—to hear it in full, listen to the episode on FLOSS Weekly.

Wikipedia, open source, and paying the maintainers

Doc: You were telling us earlier, before we started—well I had always thought that citations [in Wikipedia] were an original thing, having written and edited many Wikipedia pages, and worked hard to make sure citations are everywhere. I did not know that was not there from the start.

Luis: Citations were baked in in year three, or four. This is one of the things that I love to tell open source people about Wikipedia, because Wikipedia is very open source-y in a lot of ways. Open source people who bring their experience and preconceptions to Wikipedia are sometimes sort of surprised. Colloquially we were like the Linux community. But of course, if you talk to a Linux maintainer, it's actually like the networking drivers community and the storage layer community, and they sort of interact a little bit, and Wikipedia is the same way. English Wikipedia, German Wikipedia—within each of these Wikipedia's there are sub communities. There's actually still no one way to do citations in English Wikipedia; there's like six last time I counted, and it's slightly different in German and in Spanish. And that drives external people—they're like, what's the API for citations? I have bad news for you. It's an IRC channel and you put a bot in the IRC channel. That's actually been fixed now, but that was the case when I was at Wikipedia about ten years ago.

Doc: I don't do them [citations] so often that I remember how to do them every single time, and it's almost always copy it from somewhere else, or I look at what somebody else has done.

Luis: In one of these awesome examples of the power of open source: have you ever heard of Zotero? Zotero is an academic tool that helps researchers collect and store their citations. Wikipedia now runs a copy of Zotero in server mode and in the VisualEditor. You can give it a URL and it will attempt to pull out the author, publication date—all the stuff that you want in a citation—and put it into a properly formatted citation, all through the magic of collaboration between Zotero and a programming team. There is no standard for Wikitext, the standard was the PHP implementation. So this heroic team reverse engineered Wikitext from the PHP implementation into a specification reimplemented in JavaScript, so they could do the visual editor. It’s one of those things where you're like, how hard could it be? How hard could a WYSIWYG editor be? And the answer is many years of some of the smartest programmers you know, but you would never know that behind the scenes. This might be a great transition into those wondering what we do at Tidelift.

Doc: Go for it. Tell us about that.

Luis: One of the ways in which open source and Wikipedia are very parallel is that these are long lived things now. We have to think about, how does this get maintained over time? People's interest waxes and wanes. Not all articles are [well maintained], for example, the person who started my Wikipedia article, now has three kids and doesn't edit Wikipedia anymore.

Tidelift is a way of thinking about this maintenance problem over the long term for open source. We observed that our average customer has about 4000 packages at Tidelift. Doc, Simon—when we all started, you could count all your packages on one hand, right? You could establish a relationship with these communities by dropping them an email. Now, if you've got 4000 packages as part of your production setup, you can't build human relationships with them. So how do you know that they're going to still be there tomorrow, the day after tomorrow—ten years from now?

The answer is you have to think creatively, and that's where we at Tidelift come in. We build relationships between our enterprise customers, and for those of you who have seen the XKCD comic, the person in Nebraska who's maintaining the one little stick. We build relationships with them, to encourage them and pay them out of our customers money to keep doing what they do. Because they are deeply under-appreciated—that person in Nebraska is a small cog in a big machine, and the way open source works these days, they often get a lot of requests and a lot of love. We bring in paying them to help make those pieces more durable.

Doc: Tell me some more about Tidelift, because it fascinates me that you're building relationships with communities. I know, there are countless millions of communities—there's communities of communities that don't have any ontology. The relationships between them are all entirely random. And then over here, you're taking some money from one of Tidelift’s customers, and you're somehow going to send it to the right people over here. How does that all work? It blows my mind to think that you might even attempt it.

Luis: Some days it blows my mind that we attempted it too, so you're not alone there. The basic model, you can almost think of as Spotify. We measure what our customers are using in order to identify what packages we need to reach out to first. To some extent, of course, they're all using some things like Kubernetes. The amount of support we can offer to something like Kubernetes, that is already supported by a bunch of the biggest companies in the world very directly and by a flourishing commercial ecosystem, is relatively very small. That's not what our customers need, and not what we focus on.

Instead, it turns out, as soon as you start going down that long tail, you get to packages with one or two maintainers very, very quickly. Some of the biggest packages in the world like libcurl, or a bunch of the JavaScript ecosystem, basically relies on one person. We identify those packages where you, as a customer, get a lot of maintenance bang for the buck, by saying, hey, you and all of our other customers are using this JavaScript framework, we're gonna go find that person and give them money. We have several of our top maintainers making six figures from us now. Lots are making smaller amounts—I don't want to say that every listener who has any kind of package should run and sign-up. You can check; our webpage does allow you to check.

That's really where the sweet spot is, is those large packages that are widely-used, but usually one or two, maintainers. We think we can make a big difference there in the long term sustainability and viability of those projects.

Government regulation and AI

Just last week, the Biden-Harris administration announced an executive order set out to address AI regulation, including a blueprint of an AI Bill of Rights. Like the cybersecurity executive orders before it, the order proposes next steps and calls on the community for support and guidance. Doc, Luis, and Simon discuss what this regulation means, regulations of the past, and hopes and worries for the future of regulated AI.

Doc: Let's dig a little deeper into government and regulation. I tried reading through what came out of the White House, and all I could think of was, it's way too early for this. This is no fun. Is it going to take away a lot of fun? There are real threats, but do we need bureaucracy right away on this thing? And I'm coming from the libertarian spirit that actually animates Silicon Valley and most of the open source world— it's a lot of individuals scratching their own itch and having fun doing cool things. Do you want to throw sand in those tires?

Simon: We watched the web being a wild west for twenty years, and then finally it comes time for the European Union and for California to lead to some legislation about it. The European Union makes GDPR and GDPR is a steaming mess that really benefits nobody. The way the web works is now so much a part of people's economic lives that you can't fundamentally disrupt it in the way that you would have done if you had started legislating much sooner and let the legislation evolve in the places where it turned out to not be quite right.

Though, I'm really pleased to see both the US and Europe legislating around AI. I'm actually quite pleased with the Cyber Resilience Act (CRA) that is legislating for the protection of users of devices, because you know they're not going to get it right now—they are legislating early in the life cycle, but that will mean that there is a legislative framework that we can tell them is wrong. Rather than letting everybody go invent new ways of stealing money from the general public, and then in fifteen years, someone comes along and weakly says, oh, we really ought to be legislating some of this. So I'm quite pleased to see this, though having said that, I'm not very pleased about the CRA and the way that it actually looks.

Luis: Doc, to your point about libertarianism, one of the good questions that techno-libertarianism raises is, are these folks competent to be legislating this? And the cookie popups drive me bonkers, because as Simon says: privacy legislation, good! Getting permission for every cookie on the internet? That was obviously a terrible idea fifteen years ago, and the fact that it's still going on is a triumph of—

Doc: It’s tokenism. The legislation was being negotiated at a point when it was not possible to do the real thing that needed doing, which was to regulate surveillance capitalism. Instead,it all got reduced down to an argument over cookies. The token thing that everybody could agree that did the least harm to everybody's business models was cookie pop-ups. It's not there because it does anybody any good. It's there because it's the avatar for doing the least harm to everybody else who wants to carry on surveilling the general public and stealing their identity and information in order to make money.

Luis: Every time I see a cookie pop-up, I cringe at the thought of, ‘and now they're gonna figure out large language models.’ But the alternative of starting earlier is not ideal; I think this is just too central to the economy, right? One of my favorite books about the law is called The Accidental Republic and it's about how the U.S. came to regulate railroads. Railroads went from this thing that the law analogized to horse driven carriages, and so there was very little liability on them, because how much damage could a horse driven carriage really do? At its peak, all the major railroad systems had their own hospital systems throughout the U.S. because they maimed and killed so many people every year, like civil war levels of carnage, from railroad accidents. At some point, the US government sort of invented modern regulation in order to regulate the railways. It's because they were so central, and so huge and so powerful. And software tech is the same way. The question is not, are we going to be regulated? The question is, how are we going to be regulated? And do we engage with that and in a healthy way?

— — — — — — —

To hear more of their thoughts on the proposed government regulation, AI bias, and how Wikipedia communities and open source are birds of a feather, listen to the episode in full on FLOSS Weekly. For even more on government regulation and open source software, read Tidelift’s response to the Office of the National Cyber Director’s Request for Information, where we discuss how and why supporting open source maintainers is the best way to improve open source software security. And to keep up to date on government regulations, subscribe to the Tidelift newsletter and check out our government open source cybersecurity resource center.

Tidelift 2023 open source maintainer impact report

Big news: Sonar has acquired Tidelift!

Tidelift co-founder Luis Villa talks Wikipedia, government regulations, and AI with FLOSS Weekly

Don't miss the latest from Tidelift

Wikipedia, open source, and paying the maintainers

Government regulation and AI

Government, open source software supply chain, government guidance, Artificial intelligence

You might also like:

Address

Tidelift

Product

Resources

For Maintainers

Big news: Sonar has acquired Tidelift!

Tidelift co-founder Luis Villa talks Wikipedia, government regulations, and AI with FLOSS Weekly

Don't miss the latest from Tidelift

Wikipedia, open source, and paying the maintainers

Government regulation and AI

Government, open source software supply chain, government guidance, Artificial intelligence

You might also like:

Government, open source software supply chain, government guidance, Artificial intelligence

Luis Villa at TechCrunch Disrupt: Free but not cheap, the open source dilemma

Government, open source software supply chain, government guidance, Artificial intelligence

CISA, CRA, and PLD: some updates on government regulation of open source

Address

Tidelift