PyTorch -- a leading light in Deep Learning
Today I am writing about PyTorch, an open-source Machine Learning framework for Python that is very powerful for Deep Learning. I will be touching on what PyTorch is on a high level and what makes it exciting. I will also speak to the benefits of PyTorch. Finally, I will give an example of the distinctive style of PyTorch code.
What is PyTorch?
PyTorch(or often colloquially, and import as, torch,) is a great framework with a large ecosystem for Machine Learning tasks, that shines especially in Deep Learning areas such as Computer Vision. PyTorch is based around the tensor, a multi-dimensional matrix containing elements of a single data type, which provides a convenient structure for storage, manipulation, and operation over data in concert with a Dynamic Computational Graph(a graph data structure in contrast to other libraries using static graphs) for Deep Learning.
PyTorch is favored in academia and research, though popular interest has surged with the Practical Deep Learning for Coders course and PyTorch based fastai library giving many their introduction to, well, practical Deep Learning and PyTorch as well. I will say I have been through some of the content in the course on a casual level and it is very interesting stuff, including utilizing the more abstract fastai library with a top-down teaching style that dives head-in to implementing concepts. Often, a frustration among Deep Learning learners(who learn and learn about Machine Learners, there's a lot of learning here..) is going bottom-up when being taught, starting with the lowly but key tensor quite often, as seen in the still-terrific DataCamp courses that utilize PyTorch. Just throwing some resources at you!
PyTorch's syntax is heavily rooted in Object-Oriented Programming. This is in contrast to more procedural or declarative approaches to ML frameworks or Deep Learning. OOP makes it great for scalability and for those like me who come from a Software Engineering background utilizing OOP heavily. Finally, it has a large ecosystem including libraries for tasks from vision to graph neural networks and integration with scikit-learn, another popular ML framework.
Benefits of PyTorch
I will point to two main benefits of PyTorch, which I've touched on already and will focus more on. The first is its Object-Oriented Programming design. The second is its popularity in academia.
Object-Oriented Programming, while often difficult to grasp at first, is absolutely a powerful way of structuring code and modeling, well, objects. In PyTorch's case, this is often a neural network itself, a complex object that I think intuitively makes sense to utilize OOP to model. As above, so below, as they say, and while no model is perfect, some are very useful, with OOP and tensors providing a powerfully useful combination in PyTorch.
PyTorch's popularity with research means it is great to know to digest research papers and keep ahead in the ever-expanding ML field, especially the blazing-fast development of Deep Learning as part of the field. As well, it means that adaptations of papers and papers with code through arxiv and Papers With Code make for great learning resources and material to adapt for your own work.
A PyTorch example
I am going to wrap this blog post up with some sample PyTorch code, taken from the excellent documentation, presented without much comment. As mentioned, it utilizes tensors and an OOP paradigm. Read it over, presuming you're unfamiliar with torch, and look out for the next entry in this series where I will once again appreciate you coming here for information on Deep Learning:
# -*- coding: utf-8 -*-
import torch
import math
# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)
# For this example, the output y is a linear function of (x, x^2, x^3), so
# we can consider it as a linear layer neural network. Let's prepare the
# tensor (x, x^2, x^3).
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)
# In the above code, x.unsqueeze(-1) has shape (2000, 1), and p has shape
# (3,), for this case, broadcasting semantics will apply to obtain a tensor
# of shape (2000, 3)
# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. The Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
# The Flatten layer flatens the output of the linear layer to a 1D tensor,
# to match the shape of `y`.
model = torch.nn.Sequential(
torch.nn.Linear(3, 1),
torch.nn.Flatten(0, 1)
)
# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')
learning_rate = 1e-6
for t in range(2000):
# Forward pass: compute predicted y by passing x to the model. Module objects
# override the __call__ operator so you can call them like functions. When
# doing so you pass a Tensor of input data to the Module and it produces
# a Tensor of output data.
y_pred = model(xx)
# Compute and print loss. We pass Tensors containing the predicted and true
# values of y, and the loss function returns a Tensor containing the
# loss.
loss = loss_fn(y_pred, y)
if t % 100 == 99:
print(t, loss.item())
# Zero the gradients before running the backward pass.
model.zero_grad()
# Backward pass: compute gradient of the loss with respect to all the learnable
# parameters of the model. Internally, the parameters of each Module are stored
# in Tensors with requires_grad=True, so this call will compute gradients for
# all learnable parameters in the model.
loss.backward()
# Update the weights using gradient descent. Each parameter is a Tensor, so
# we can access its gradients like we did before.
with torch.no_grad():
for param in model.parameters():
param -= learning_rate * param.grad
# You can access the first layer of `model` like accessing the first item of a list
linear_layer = model[0]
# For linear layer, its parameters are stored as `weight` and `bias`.
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')
Pretty wild, huh? Join me next time on my -- our -- adventure into the world of PyTorch!
(Photo by Linus Sandvide on Unsplash)
Comments