For those of you who may have been living under a rock for the last year, Artificial Intelligence (AI) and Machine Learning (ML) are now at the center of almost any conversation about technology, thanks to the rapid pace of innovation being led by organizations like OpenAI’s popular conversational chatbot ChatGPT, Stability AI’s Stable Diffusion, and one of Google’s newest AI offerings, Gemini. These tools have helped streamline productivity, increase customer engagement, and generate content for corporate businesses and even the most casual internet user, thanks to their accessibility and diverse use cases. But while we often hear about how these tools are being used (and misused) in the news, we hear a lot less about the open source software underpinning these tools and others like them.
Many of the open source projects powering AI are written in Python. Why? A recent TechRepublic article listed a few of the top reasons why Python is powering many ML and AI initiatives:
- “Being free and open-source makes Python community friendly and guarantees improvements in the long run
- Exhaustive libraries ensure there is a solution for every problem
- Smooth implementation and integration make it accessible for people with varying skill levels
- Increases productivity by reducing the time to code and debug
- Can be used for Soft Computing and Natural Language Processing as well
- Works seamlessly with C and C++ code modules”
According to the same article, the list of packages below are the top Python libraries and frameworks used in ML and AI today.
- Numpy [included in the Tidelift Subscription] which stands for Numerical Python is used in nearly every field of science and engineering to perform mathematical and logical operations. It is widely relied on in the education sector and across game, software, and web development. (source)
- Scipy [included in the Tidelift Subscription] which stands for Scientific Python uses Numpy underneath but is optimized for statistical and signal processing. (source)
- Pandas [included in the Tidelift Subscription] is a library that offers powerful, flexible, and easy-to-use data manipulation and data analysis capabilities. Pandas is especially useful in data-wrangling (also known as munging) which is a method used to take data from an unusable state to a more structured one needed for processing. (source)
- Matplotlib [included in the Tidelift Subscription] is used for creating static, animated, and interactive data visualizations in Python. It also offers an open source alternative to MATLAB, a programming language used for complex calculations. (source)
- Scikit-learn [included in the Tidelift Subscription] (also known as sklearn) is a machine learning library built on top of other Python libraries in this list like Numpy, Scipy, and Matplotlib. It provides a vast assortment of algorithms and tools for ML visualizations, preprocessing, model fitting, and evaluation. (source)
- Tensorflow is a library developed in 2015 by Google to aid in conducting data automation, model tracking, performance monitoring, and model retraining. (source)
- Keras is Tensorflow’s API that provides a simple and easy interface for ML solutions. (source)
- Pytorch is used for training deep learning models commonly used in image recognition, reinforcement learning, and language processing. (source)
- Plotly is similar to Matplotlib as it is also used for creating interactive graphs and data visualization in Python. One difference is that Plotly requires fewer lines of code to create plots. (source)
How does Tidelift intersect with the AI world?
Looking at the tools listed above, over half are included in the Tidelift Subscription–meaning Tidelift directly partners with the maintainers of those libraries—and pays them—to ensure they adhere to secure software development practices, like those defined in the U.S. government NIST Secure Software Development Framework and the OpenSSF scorecards project.
Because they are being paid by Tidelift and its customers for important security work that maintainers have historically been expected to do for free, Tidelift’s partnered maintainers are able to make time to do the sometimes extensive work required to ensure their packages are secure and well maintained.
For example, according to Tidelift’s 2023 open source maintainer impact report, 100% of open vulnerabilities have had a fixed release made and/or documented mitigations to address known risk, 94% of lifted packages have a discoverable security policy, and 82% of packages have a documented maintenance plan.
As organizations work to adopt AI and ML into their user-facing products, there needs to be an equal effort to ensure that these tools (and their dependencies) are safe to use–especially if they touch Personal Identifiable Information (PII) or other sensitive information.
We’re excited that Tidelift’s maintainer partners are working on these important tools that power the latest innovation in AI and ML, and, with the support of our customers who rely on these tools, are committed to ensuring they have the time and money to continue their important work.
Additional Resources: