PyTorch Design Philosophy¶
This document is designed to help contributors and module maintainers understand the high-level design principles that have developed over time in PyTorch. These are not meant to be hard-and-fast rules, but to serve as a guide to help trade off different concerns and to resolve disagreements that may come up while developing PyTorch. For more information on contributing, module maintainership, and how to escalate a disagreement to the Core Maintainers, please see PyTorch Governance.
Design Principles¶
Principle 1: Usability over Performance¶
This principle may be surprising! As one Hacker News poster wrote: PyTorch is amazing! […] Although I’m confused. How can a ML framework be not obsessed with speed/performance? See Hacker News discussion on PyTorch.
Soumith’s blog post on Growing the PyTorch Community goes into this in some depth, but at a high-level:
PyTorch’s primary goal is usability
A secondary goal is to have reasonable performance
We believe the ability to maintain our flexibility to support researchers who are building on top of our abstractions remains critical. We can’t see what the future of what workloads will be, but we know we want them to be built first on PyTorch and that requires flexibility.
In more concrete terms, we operate in a usability-first manner and try to avoid jumping to restriction-first regimes (for example, static shapes, graph-mode only) without a clear-eyed view of the tradeoffs. Often there is a temptation to impose strict user restrictions upfront because it can simplify implementation, but this comes with risks:
The performance may not be worth the user friction, either because the performance benefit is not compelling enough or it only applies to a relatively narrow set of subproblems.
Even if the performance benefit is compelling, the restrictions can fragment the ecosystem into different sets of limitations that can quickly become incomprehensible to users.
We want users to be able to seamlessly move their PyTorch code to different hardware and software platforms, to interoperate with different libraries and frameworks, and to experience the full richness of the PyTorch user experience, not a least common denominator subset.
Principle 2: Simple Over Easy¶
Here, we borrow from The Zen of Python:
Explicit is better than implicit
Simple is better than complex
A more concise way of describing these two goals is Simple Over Easy. Let’s start with an example because simple and easy are often used interchangeably in everyday English. Consider how one may model devices in PyTorch:
Simple / Explicit (to understand, debug): every tensor is associated with a device. The user explicitly specifies tensor device movement. Operations that require cross-device movement result in an error.
Easy / Implicit (to use): the user does not have to worry about devices; the system figures out the globally optimal device placement.
In this specific case, and as a general design philosophy, PyTorch favors exposing simple and explicit building blocks rather than APIs that are easy-to-use by practitioners. The simple version is immediately understandable and debuggable by a new PyTorch user: you get a clear error if you call an operator requiring cross-device movement at the point in the program where the operator is actually invoked. The easy solution may let a new user move faster initially, but debugging such a system can be complex: How did the system make its determination? What is the API for plugging into such a system and how are objects represented in its IR?
Some classic arguments in favor of this sort of design come from A Note on Distributed Computation (TLDR: Do not model resources with very different performance characteristics uniformly, the details will leak) and the End-to-End Principle (TLDR: building smarts into the lower-layers of the stack can prevent building performant features at higher layers in the stack, and often doesn’t work anyway). For example, we could build operator-level or global device movement rules, but the precise choices aren’t obvious and building an extensible mechanism has unavoidable complexity and latency costs.
A caveat here is that this does not mean that higher-level “easy” APIs are not valuable; certainly there is a value in, for example, higher-levels in the stack to support efficient tensor computations across heterogeneous compute in a large cluster. Instead, what we mean is that focusing on simple lower-level building blocks helps inform the easy API while still maintaining a good experience when users need to leave the beaten path. It also allows space for innovation and the growth of more opinionated tools at a rate we cannot support in the PyTorch core library, but ultimately benefit from, as evidenced by our rich ecosystem. In other words, not automating at the start allows us to potentially reach levels of good automation faster.
Principle 3: Python First with Best In Class Language Interoperability¶
This principle began as Python First:
PyTorch is not a Python binding into a monolithic C++ framework. It is built to be deeply integrated into Python. You can use it naturally like you would use NumPy, SciPy, scikit-learn, or other Python libraries. You can write your new neural network layers in Python itself, using your favorite libraries and use packages such as Cython and Numba. Our goal is to not reinvent the wheel where appropriate.
One thing PyTorch has needed to deal with over the years is Python overhead: we first rewrote the autograd engine in C++, then the majority of operator definitions, then developed TorchScript and the C++ frontend.
Still, working in Python provides easily the best experience for our users: it is flexible, familiar, and perhaps most importantly, has a huge ecosystem of scientific computing libraries and extensions available for use. This fact motivates a few of our most recent contributions, which attempt to hit a Pareto optimal point close to the Python usability end of the curve:
TorchDynamo, a Python frame evaluation tool capable of speeding up existing eager-mode PyTorch programs with minimal user intervention.
torch_function and torch_dispatch extension points, which have enabled Python-first functionality to be built on-top of C++ internals, such as the torch.fx tracer and functorch respectively.
These design principles are not hard-and-fast rules, but hard won choices and anchor how we built PyTorch to be the debuggable, hackable and flexible framework it is today. As we have more contributors and maintainers, we look forward to applying these core principles with you across our libraries and ecosystem. We are also open to evolving them as we learn new things and the AI space evolves, as we know it will.