Jul 16, 2019 By Team YoungWonks *
The top machine learning libraries today are - not surprisingly - in great demand across the world. These machine learning libraries have helped developers do away with the task of manually coding all the algorithms and mathematical and statistical formula. To begin with, let’s look at what is a machine learning library? And which are the leading names in this area? With Machine Learning (ML) and Artificial Intelligence (AI) driving the latest phase of the tech revolution, it is indeed increasingly important now to be aware of key concepts in both these areas.
What is a Machine Learning Library?
A machine learning library, also called a machine learning framework, refers to sets of routines and functions that are written in a given programming language. These are interfaces, libraries or tools that help developers to easily and quickly create machine learning models, without getting into the specific basic details of the underlying algorithms. So a robust collection of such libraries will make it easier for developers to carry out complex tasks without having to rewrite many lines of code. In other words, they offer a clear, concise way to define machine learning models using a set of pre-built, optimized components. Their goal is to simplify machine learning and make it easier for use by developers.
A good ML library / framework is typically optimized for performance; quite developer friendly with the framework using traditional ways of building models and is rather easy to understand and code on.
Another key concept in machine learning that we need to look at is that of neural networks. What is a neural network? Artificial neural networks (ANN) or connectionist systems are essentially computing systems that are inspired by, but not necessarily identical to, the biological neural networks that make up animal brains. Such systems “learn” to carry out tasks by going through examples, and usually without being programmed with any task-specific instructions. For example, for image recognition, a neural network can learn to identify images containing cats by analyzing example images that have been manually labeled as “cat” or “no cat” and then using the results to identify cats in other images. In other words, they automatically come up with identifying characteristics from the learning material that they process.
And while there are several machine learning libraries out there, in this blog, we shall be looking at some of the leading names in the field:
TensorFlow is a free and open-source software library that is used for research and production in the field of machine learning. Created by the Google Brain team, it has paved the way for a revolution of sorts thanks to the fact that it allows easy and effective implementation of machine learning algorithms. It is an efficient math library and is also used for machine learning applications such as neural networks. Moreover, TensorFlow offers distribution functions such as Bernoulli, Beta, Chi2, Uniform, Gamma, which are important while considering probabilistic approaches such as Bayesian models. With the advent of high level APIs (Application Programming Interfaces) like Keras and Theano, it has made massive strides in enhancing the capability of computers so as to predict solutions with a greater degree of accuracy.
The advantage of parallel processing makes TensorFlow the need of the hour. The speed at which TensorFlow processes data is almost unparalleled, especially when you consider its accuracy. It is important to note that TensorFlow offers stable APIs for Python and C. It is easily trainable on CPU as well as GPU (Graphics Processing Unit) for distributed computing. It is also fairly flexible in its operations. Using it, one can train multiple neural networks and multiple GPUs thus making the models very efficient on large-scale systems. Other advantages include a large community which is not surprising given its Google roots and the fact that a large team of software engineers continue to work on stability improvements. One can thus use this library/ framework to come up with an amazing variety of useful products; it even has feature columns that can be looked at as intermediaries between raw data and estimators, thus bridging the input data with one’s model.
Other key features include layered components and event logger and visualizer (with TensorBoard, a set of visualization tools). There are a host of advantages on offer here: think better computational graph visualizations; good library management offering seamless performance, quick updates and frequent new releases with new features; good debugging methods, scalability and compatibility with various backends software like GPU, ASIC, etc.
However, TensorFlow also has some disadvantages. For instance, it doesn’t have symbolic loops that are most needed for working with variable length sequences. Even on the computation speed and usage front, TensorFlow can do better. Plus, it offers no GPU support other than Nvidia and only language supported is Python. Overall, TensorFlow TensorFlow still has a lot of features it can provide and there is a strong and growing Internet community out there to help one with it.
Look at our blogs (https://www.youngwonks.com/blog/How-to-install-Tensorflow-on-a-Mac and https://www.youngwonks.com/blog/How-to-install-Tensorflow-on-Windows) to see how one can download TensorFlow on Mac and Windows respectively. Bear in mind that certain versions of TensorFlow might not work on certain operating systems and the user needs to check if the version of tensorflow being downloaded/ installed is compatible with the computer’s operating system.
Keras is an open-source neural-network library written in Python and it supports multiple back-end neural network computation engines. It is capable of running on top of frameworks such as TensorFlow, Microsoft Cognitive Toolkit, Theano. So it can be described as a minimalist Python library for deep learning that can run on top of TensorFlow or Theano. Built to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible. This library includes many implementations of commonly used neural-network building blocks like layers, objectives, activation functions, optimizers, and a whole set of tools aimed at making working with image and text data lot easier. Along with standard neural networks, Keras supports convolutional and recurrent neural networks. It supports other common utility layers like dropout, batch normalization, and pooling.
Developed and maintained by developed and maintained by Google engineer François Chollet, some of the key features of Keras include modularity, where a model can be understood as a sequence or a graph alone; minimalism so that the library offers just enough to get an outcome, without any frills and maximizing readability and extensibility wherein new components are easy to add and use within the framework, thus helping researchers to do more trials. It is also important to note that everything in Keras is native Python.
Keras is quite user friendly and in addition to ease of learning and ease of model building, it boasts the advantages of broad adoption, support for a wide range of production deployment options, integration with back-end engines / frameworks as mentioned earlier, and strong support for multiple GPUs and distributed training. Plus, Keras is backed by big names in the tech industry such as Google, Microsoft, Amazon, Apple, Nvidia, Uber and others, adding to its credibility.
Among its disadvantages, it is said that developers end up having to delve into Keras source code and tinker with it too for almost anything save for the simplest use-cases. Another point is that Keras data-processing tools have been found to be not as useful, with users needing to write their own sequence/non-sequence pre-processing routines or using a good external package like Scikit-learn. Going beyond surface-level customization is also said to be not all that easy.
Scikit-learn is a free machine learning library for Python built on SciPy. It is quite an effective tool for data mining and data analysis and can be used for both personal and commercial reasons. It incorporates several classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is created so as to interoperate with the Python numerical and scientific libraries NumPy and SciPy. With Scikit-learn, different tasks can be conducted - think model selection, clustering, preprocessing, and more. The library supports different types of traditional machine learning methods and operations in addition to giving developers the means to complete implementations. Today, it is being used by big companies from different industries such as music streaming, hotel bookings, etc.
Advantages of Scikit-learn are that it has a clean API, is fast, easy to use, comprehensive and robust in that it is well documented and supported, released under a permissive license and has a very active developer community. Thanks to its simplicity and accessibility, it is often the first choice in Machine Learning when one is working on a Python project, especially one that doesn’t require massively scalable algorithms.
It has some limitations too: for one, it has less focus on statistics. It is also not ideal for deep learning; leaving it as a good bet for only simple data tasks carried out by beginners.
Comparison between the machine learning libraries
1. TensorFlow vs Scitkit-learn:
When you compare the two, you’ll realise that TensorFlow is a symbolic math library used primarily for neural network-based models. Scikit-learn is a Python-based library supporting different types of traditional machine learning methods and operations.
TensorFlow is more low level and helps one implement machine learning algorithms by using rather basic steps and then working one’s way up. Scikit-learn, on the other hand, is more high level and offers ready-to-use algorithms. In other words, TensorFlow offers low level programming to work with mathematics as well as methods for defining neural network layers whereas Scikit-learn doesn’t have a deep learning framework.
TensorFlow runs on multiple processors including GPUs but Scikit-Learn runs on a single CPU processor.
Also, Scikit-learn can be built/ run on top of TensorFlow but the opposite isn’t possible. TensorFlow can perform automatic differentiation while Scikit-learn cannot.
2. Keras vs Tensorflow:
Given that several helpful features of Keras have been incorporated in TensorFlow for easier model building, it may not make much sense to compare the two.
As of mid-2017, Keras has been fully adopted and integrated into TensorFlow. This TensorFlow + Keras integration means that one can define one’s model using the easy-to-use interface of Keras and then drop down into TensorFlow if one needs a specific TensorFlow functionality or in order to implement a custom feature that Keras does not support but TensorFlow does.
In fact, using Keras inside of TensorFlow gives one the best of both worlds. The Keras API itself is much like that of Scikit-learn’s, in that it is among the best machine learning APIs out there. It is, after all, modular, Python oriented, and super easy to use. And when you need a custom layer implementation, a more complex loss function, etc., you can drop down into TensorFlow and have the code integrate with your Keras model automatically.
3. Keras vs Scitkit-learn:
While Keras is a deep learning library for Python. Convnets, recurrent neural networks, and more, Scikit-learn is a Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license. Developers pick Keras as it offers quality documentation and supports Tensorflow and Theano backends. Whereas Scikit-learn is opted for because it is rather easy and ideal for scientific computing. Keras is more flexible than Scikit-Learn as it lets us define our own machine learning models, instead of using pre-defined models.
*Contributors: Written by Vidya Prabhu; Research input by Prajwal Manurajan; Lead image by: Leonel Cruz