Anacondas and Machine Learning
All about anacondas, what it's made up of and why it's so important
"It is essential to have good tools, but it is also essential that the tools should be used in the right way." -Wallace D. Wattles
Summary
- Anaconda is simply a software package that helps distribute all the popular libraries used by R and Python for Machine Learning libraries like TensorFlow, pandas, scikit-learn etc.
- Anaconda is made up of 3 core components, Anaconda, Miniconda and conda.
- Anaconda is a full distribution that has all the major Machine Learning Libraries pre-installed.
- Miniconda is an Anaconda software distribution but it's more like a lightweight version because unlike the Anaconda distribution the Miniconda distribution does not have machine Learning libraries pre-installed.
- Conda is responsible for installing software packages, updating software packages and deleting software packages from your system.
Anacondas is probably the first thing you need to install on your system if you are serious about Machine Learning it helps provide all the tools you'll need in Machine Learning in a single place.
If you have no idea what anaconda is but you want to get started with it, then this is article is for you.
Let us Begin
What is Anaconda
According to Wikipedia, Anaconda is a distribution of the Python and R programming languages for scientific computing (data science, machine learning applications, large-scale data processing, predictive analytics, etc.), that aims to simplify package management and deployment.
In simple words, Anaconda is simply a software package that helps distribute all the popular libraries used by R and Python for Machine Learning libraries like TensorFlow, pandas, scikit-learn etc. The Anaconda software helps make it very simple for you to download and manage these libraries, and also manage the environment where you use these libraries.
What makes up Anaconda
The Anaconda software is made up of 3 core components, Anaconda, Miniconda and Conda, below we'll get to understand them more clearly.
Anaconda
Remember earlier we spoke about how the Anaconda software helps you to download and manage these libraries, and also manage the environment where you use these libraries. Now Anaconda here is simply a full distribution of the anaconda software which has all the major Machine Learning libraries and also python installed.
In simple words, Anaconda is a full distribution that has all the major Machine Learning Libraries pre-installed. So when you download Anaconda you don't need to install tools like tensor flow, scikit learn, NumPy, pandas, matplot lib etc. Because it already has them installed.
Miniconda
Miniconda just like Anaconda is a distribution of the anaconda software, but it's more like a lightweight version because unlike the Anaconda distribution the Miniconda distribution does not have machine Learning libraries pre-installed.
In simple words, when you install the Miniconda distribution you don't get any Machine Learning Libraries follow come all you get is a python environment and the anaconda software package manager conda (which we will talk about next).
Conda
Unlike Anaconda and Miniconda, Conda is not a distribution but rather a package manager.
To truly understand the difference between Anaconda, Miniconda and Conda let's look at the difference between a software distribution and a package manager.
Software distribution is a collection of packages (packages like libraries), pre-built and pre-configured, that can be installed and used on a system. A package manager is a tool that automates the process of installing, updating, and removing packages.
In other words, Anaconda and Miniconda are responsible for providing the software packages like TensorFlow, scikit-learn, pandas, Pytorch etc while Conda is responsible for installing software packages, updating software packages and deleting software packages from your system.
I'll further illustrate using an analogy, Imagine you go to a toilet shop π , to get stuff for your toilet, stuff like pipes, toilet seats, water heater, toilet sink, etc that's your (Anaconda & Miniconda), now you need someone to install the pipes, toilet seat, water heater, toilet sink, etc you bought (software packages installed from a software distribution), you then call a plumber (packages manager) to install your stuff (software packages).
Let's say you just found out a newer cooler version of your current toilet seat and you want your current one updated or removed completely (updating or removing/deleting of software packages), you call the plumber once more. On a simple non-technical means that's how Package managers and software distributions work together.
Difference between the Anaconda and Miniconda Distributions
I will try to differentiate these two distributions using another analogy, Anaconda is like a tool shop, which has a lot of tools, including those you might not need at the moment, while Miniconda is like a workbench, it just contains those essential tools you need.
In other words, when you install the Anaconda distribution it contains a whole lot of Machine Learning libraries including those you don't need just like a fully stocked tool shop.
Meanwhile, when you install the Miniconda library you install only a python environment and the Conda package manager which you can use to install those tools you find useful, just like a workbench.
Which to Download between Miniconda and Anaconda
To answer objectively, the core difference between the two is Anaconda comes with a lot of Machine Learning Packages but Miniconda comes with just the Conda Package manager and a python environment.
I'll show you the difference between the two distributions so you can choose what suits your needs
I got these differences from Stack Overflow
Choose Anaconda if you:
- Are new to conda or Python
- Like the convenience of having Python and over 1500 scientific packages automatically installed at once
- Have the time and disk space (a few minutes and 3 GB).
- Donβt want to install each of the packages you want to use individually.
Choose Miniconda:
- Do not mind installing each of the packages you want to use individually.
- Do not have time or disk space to install over 1500 packages at once.
- Just want fast access to Python and the conda commands, and wish to sort out the other programs later.
In summary, don't take these as orders but rather as suggestions. I personally use Minicinda because I love to have control of what I have on my system. But my choice is purely personal and not based on any special technical insight or expertise.
Why should I make use of the Anaconda Software
Below I'll state the benefits of the Anaconda Software and its distributions
- It is free and open-source.
- It has more than 1500 Python/R data science packages.
- Anaconda simplifies package management and deployment.
- It has tools to easily collect data from sources using machine learning and AI.
- It creates an environment that is easily manageable for deploying any project
- Anaconda is the industry standard for developing, testing and training on a single machine.
- It has good community support- you can ask your questions there.
To learn more about the Anaconda Library and how to install it click here
Conclusion
The Anaconda Software is a really good platform to make use of If you're interested in Machine Learning.
To install a conda cheat sheet click here
Follow me on Twitter so you don't miss my announcements and tweets on Machine learning.
Subscribe to my blog for free so you can get alerts when I put out a new article.