This document discusses various techniques for optimizing deep neural network models and hardware for efficiency. It covers approaches such as exploiting activation and weight statistics, sparsity, compression, pruning neurons and synapses, decomposing trained filters, and knowledge distillation. The goal is to reduce operations, memory usage, and energy consumption to enable efficient inference on hardware like mobile phones and accelerators. Evaluation methodologies are also presented to guide energy-aware design space exploration.