[번역]TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems[2]

2018. 11. 15. 20:19

2. Programming Model and Basic Concepts

A TensorFlow computation is described by a directed graph, which is composed of a set of nodes. The graph represents a dataﬂow computation, with extensions for allowing some kinds of nodes to maintain and update persistent state and for branching and looping control structures within the graph in a manner similar to Naiad [36]. Clients typically construct a computational graph using one of the supported frontend languages (C++ or Python). An example fragment to construct and then execute a TensorFlow graph using the Python front end is shown in Figure 1, and the resulting computation graph in Figure 2.

In a TensorFlow graph, each node has zero or more in-puts and zero or more outputs, and represents the instantiation of an operation. Values that ﬂow along normal edges in the graph (from outputs to inputs) are tensors, arbitrary dimensionality arrays where the underlying element type is speciﬁed or inferred at graph-construction time. Special edges, called control dependencies, can also exist in the graph: no data ﬂows along such edges, but they indicate that the source node for the control dependence must ﬁnish executing before the destination node for the control dependence starts executing. Since our model includes mutable state, control dependencies can be used directly by clients to enforce happens before relationships. Our implementation also sometimes inserts control dependencies to enforce orderings between otherwise independent operations as a way of, for example, controlling the peak memory usage.

Operations and Kernels

An operation has a name and represents an abstract computation (e.g., “matrix multiply”, or “add”). An operation can have attributes, and all attributes must be provided or inferred at graph-construction time in order to instantiate a node to perform the operation. One common use of attributes is to make operations polymorphic over different tensor element types (e.g., add of two tensors of type ﬂoat versus add of two tensors of type int32). A kernel is a particular implementation of an operation that can be run on a particular type of device (e.g., CPU or GPU). A TensorFlow binary deﬁnes the sets of operations and kernels available via a registration mechanism, and this set can be extended by linking in additional operation and/or kernel deﬁnitions/registrations. Table 1 shows some of the kinds of operations built into the core TensorFlow library.

Sessions

Clients programs interact with the TensorFlow system by creating a Session. To create a computation graph, the Session interface supports an Extend method to augment the current graph managed by the session with additional nodes and edges (the initial graph when a session is created is empty). The other primary operation supported by the session interface is Run, which takes a set of out-put names that need to be computed, as well as an optional set of tensors to be fed into the graph in place of certain outputs of nodes. Using the arguments to Run, the TensorFlow implementation can compute the transitive closure of all nodes that must be executed in order to compute the outputs that were requested, and can then arrange to execute the appropriate nodes in an order that respects their dependencies (as described in more detail in 3.1). Most of our uses of TensorFlow set up a Session with a graph once, and then execute the full graph or a few distinct subgraphs thousands or millions of times via Run calls.

Variables

In most computations a graph is executed multiple times. Most tensors do not survive past a single execution of the graph. However, a Variable is a special kind of operation that returns a handle to a persistent mutable tensor that survives across executions of a graph. Handles to these persistent mutable tensors can be passed to a handful of special operations, such as Assign and AssignAdd (equivalent to +=) that mutate the referenced tensor. For machine learning applications of TensorFlow, the parameters of the model are typically stored in tensors held in variables, and are updated as part of the Run of the training graph for the model.

저작자표시 (새창열림)

'Paper > Tensorflow' 카테고리의 다른 글

[번역]TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems[3] (0)	2018.11.17
[번역]TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems[1] (1)	2018.11.15

책읽는공대생

[번역]TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems[2]

'Paper > Tensorflow' 카테고리의 다른 글

+ Recent posts

티스토리툴바