0% found this document useful (0 votes)
44 views20 pages

Deep Learning for Fluid Velocity Field Estimation a Review

Uploaded by

Pavan Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views20 pages

Deep Learning for Fluid Velocity Field Estimation a Review

Uploaded by

Pavan Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Ocean Engineering 271 (2023) 113693

Contents lists available at ScienceDirect

Ocean Engineering
journal homepage: www.elsevier.com/locate/oceaneng

Review

Deep learning for fluid velocity field estimation: A review


Changdong Yu a , Xiaojun Bi b ,∗, Yiwei Fan c
a
College of Information and Communication Engineering, Harbin Engineering University, Harbin, 150001, China
b
College of Information and Engineering, Minzu University of China, Beijing, 100081, China
c
College of Shipbuilding Engineering, Harbin Engineering University, Harbin, 150001, China

ARTICLE INFO ABSTRACT

Keywords: Deep learning technique, has made tremendous progress in fluid mechanics in recent years, because of its
Deep learning mighty feature extraction capacity from complicated and massive fluid data. Motion estimation and analysis of
PIV fluid data is one of the significant research topics in fluid mechanics. In this paper, we provide a comprehensive
Fluid motion estimation
review of fluid motion (i.e., velocity field) estimation methods based on deep learning. Essentially, the fluid
Velocity field reconstruction
super-resolution (SR) reconstruction task can also be regarded as an velocity field estimation from low
Optical flow
resolution to high resolution. To this end, we mainly give a review on two topics: fluid motion estimation
and later velocity field super-resolution reconstruction. Specifically, we first introduce the basic principle and
component of deep learning methods. We then review and analyze deep learning based methods on fluid
motion estimation. Note we mainly investigate the commonly used fluid motion estimation approach here,
particle image velocimetry (PIV) algorithm, which extract velocity field from successive particle images pair
in a non-contact manner. In addition, SR reconstruction methods for velocity fields based on deep learning
technique are also reviewed. Eventually, we give a discussion and possible routes for the future research works.
To our knowledge, this paper are the first to give a review of deep learning-based approaches for fluid velocity
field estimation.

1. Introduction the displacement correlation peak value of the corresponding interro-


gation window between the image pair. Namely, the estimated dis-
The acquisition of the global velocity field (i.e., motion fields) is placement vector within each interrogation window is regarded as the
of great significance to research the structure of complicated fluid average velocity in that window. As a result, this correlation-based ap-
flows in the fluid mechanics. As a widely used velocity field estima- proach provides a sparse (i.e., low-resolution) velocity field. Therefore,
tion technique, particle image velocimetry (PIV) (Nguyen et al., 2012; many modifications have been made to enhance the computational ac-
Khalid et al., 2019; Fleit and Baranya, 2019), which can extract the curacy and efficiency of the cross-correlation algorithm (Wereley et al.,
velocity vectors of the whole field from successive particle images. 2002; Scarano, 2001). Furthermore, many advanced post-stage velocity
Fig. 1 shows the working principle of PIV technique and is described as field processing methods, such as outlier detection (Westerweel and
follows (Adrian and Westerweel, 2011). First, the small tracer particles Scarano, 2005; Wang et al., 2018a) and spline interpolation (Astarita
are cast into the fluid medium to be measured, then the tracer particles and Cardone, 2005; Cholemari, 2007), are put forward to further re-
in the area to be measured are illuminated with a uniform sheet of duce error. The cross-correlation algorithm has been relatively mature
light. The illuminated tracer particles are continuously photographed after forty years of development, and it has achieved good perfor-
by a camera, so as to obtain successive particle image pair. Finally, mance in the International PIV Challenges (Stanislas et al., 2005,
the corresponding algorithms are adopted to calculate the particle 2008). Essentially, the cross-correlation algorithm still cannot provide
image pair to obtain the velocity field. Hence, how to obtain high a dense velocity field at the pixel level. Optical flow algorithm (Horn
resolution velocity field is always a core problem in the PIV estimation and Schunck, 1981) has been a popular research direction in the
community. computer vision community that estimates the motion field between
There are two main traditional PIV estimation algorithms: cross- image pairs by solving the optimal value of an objective function. In
correlation algorithm and optical flow algorithm. Cross-correlation al- contrast to the cross-correlation approach, optical flow method can
gorithm (Adrian, 2007) obtains the displacement vector by querying provide a dense velocity field for the whole image. Additionally, optical

∗ Corresponding author.
E-mail address: [email protected] (X. Bi).

https://doi.org/10.1016/j.oceaneng.2023.113693
Received 5 October 2022; Received in revised form 2 December 2022; Accepted 10 January 2023
Available online 24 January 2023
0029-8018/© 2023 Elsevier Ltd. All rights reserved.
C. Yu et al. Ocean Engineering 271 (2023) 113693

Chen et al., 2022a), speech recognition (Zhang et al., 2018a; Lee et al.,
Terminology
2021), natural language processing (NLP) (Strubell et al., 2019; Otter
PIV Particle image velocimetry et al., 2020), etc. Deep learning has a strong ability to learn data fea-
CFD Computational fluid dynamics tures, and can handle fluid mechanics topics with complex nonlinear,
CV Computer vision high-dimensional, and big data characteristics. Therefore, deep learning
NLP Natural language processing techniques have made tremendous progress in the fluid mechanics
community in recent years. Since the use of shallow convolutional
AI Artificial intelligence
neural networks for PIV estimation was proposed by Rabault et al.
ML Machine learning
(2017) in 2017, deep learning-based fluid motion estimation methods
DL Deep learning
have continued to emerge (Lee et al., 2017; Cai et al., 2019b). These
DNN Deep neural networks models achieve better outcomes in terms of precision and inference
FNN Fully-connected neural network speed.
DBN Deep belief network In fact, the quest for high-resolution velocity fields has always been
CNN Convolutional neural network the pursuit in fluid mechanics. In addition to PIV estimation techniques
RNN Recurrent neural network to get high-resolution velocity fields, another significant computational
LSTM Long short-term memory fluid dynamics (CFD) technique can also provide high-resolution flow
GRU Gated recurrent unit fields. CFD technique can simulate complicated turbulent fields with
MLP Multi-layer perceptron, each layer of high fidelity by mesh refinement. However, the experimental simu-
neurons is fully connected to the next lations are expensive and time-consuming, especially as the number
layer of spatial meshes increases. Meanwhile, for complex high-dimensional
Receptive field The mapping area size of the pixel of the flow fields, the process of numerical modeling and solution is also
output feature map on the original map very complicated. Deep learning has a powerful nonlinear function
Feedforward network One-way propagation, no connection and fitting ability, which is able to mine useful feature information from
feedback between layers a large amount of fluid data. Therefore, more researchers tend to use
SR Super-resolution deep learning methods to estimate high-resolution flow fields around
HR High-resolution physical models (Guo et al., 2016; Ling et al., 2016). This task is
similar to super-resolution (SR) reconstruction tasks in the computer
LR Low-resolution
vision. The reconstruction of high-resolution (HR) velocity field from
PINN Physics-informed neural network
low-resolution(LR) counterpart can also be regarded as a process of
GAN Generative adversarial net
velocity field estimation. It aims at estimating a HR fluid data from
a low-resolution counterpart. Therefore, this task is in line with the
theme of our article review.
In this article, we give a comprehensive overview of deep learning-
based fluid velocity field estimation, which covers two topics: fluid
motion estimation (for PIV) and velocity field SR reconstruction. We
first introduce the basic theory and knowledge of deep learning, paving
the way for the subsequent introduction of research topics. Deep learn-
ing based velocity field estimation approaches including optical flow
learning and cross-correlation are then investigated and compared. Af-
ter that, we also review the velocity field SR reconstruction approaches
based on deep learning. Note that we not only describe but also analyze
the advantages and disadvantages of various algorithms in the review
process. Finally, we present our own conclusions and summarize the
trends and challenges for future work. To our knowledge, this is the first
Fig. 1. Schematic diagram of the basic principle of PIV technique. review paper that comprehensively covers deep learning-based fluid
motion estimation and velocity field reconstruction.

flow methods are easily embedded with prior physical constraints to 2. Principles of deep learning
make the algorithm more suitable for different fluid scenarios (Heitz
et al., 2010). Therefore, optical flow algorithm has attracted many Deep learning technology (LeCun et al., 2015; Du et al., 2016) has
researchers in fluid mechanics to improve it and apply it to fluid increasingly become a research hotspot and mainstream direction in
motion estimation (Corpetti et al., 2006; Kapulla et al., 2011; Hua the artificial intelligence (AI) domain. Deep learning-based architec-
et al., 2014; Zhong et al., 2017). Although optical flow algorithms ture is a deep machine learning model that usually contains multiple
have been widely used in various fluid scenarios, there are still two layers of neural networks. In the learning process, deep learning maps
obvious problems. First, the optical flow algorithm is time-consuming the input data from low-level to high-level to a new feature space,
in the process of variational optimization. In addition, the optical which makes it have the characteristics of hierarchical and distributed
flow method is sensitive to noise, especially changing illumination, abstraction. In this way, complicated nonlinear functions can be well
which will affect the accuracy of velocity field estimation. Review fitted and high-dimensional nonlinear input data can be processed.
articles (Heitz et al., 2010; Liu et al., 2015) give a comprehensive During the development of deep learning technology, there are many
description of optical flow methods for fluid motion estimation and different typical models such as Deep Belief Network (DBN) (Mohamed
comparisons with cross-correlation algorithms. PIV-equipment et al., 2009), Convolutional Neural Network (CNN) (Li et al., 2016),
Deep learning (DL), as a significant branch of machine learning Recurrent Neural Network (RNN) (Pascanu et al., 2013), etc. For the
(ML), has achieved distinguished performance in different fields such commonly used models in the field of fluid mechanics, we mainly
as computer vision (CV) (Krizhevsky et al., 2012; Wang and Bi, 2021; describe the CNN and RNN architectures in this section.

2
C. Yu et al. Ocean Engineering 271 (2023) 113693

Fig. 2. Schematic diagram of a general CNN structure.

2.1. Convolutional neural network size is 𝑊out × 𝐻out × 𝐶out , and the depth dimension 𝐶out of the
output feature map equals the quantity of convolution kernels (𝑁). For
Different from the fully-connected neural network(FNN), the CNN simplicity, only one convolution kernel is used here as an illustration, so
is a feedforward neural network with convolutional computations that the depth of the obtained feature map is 1. In the model structure, the
is inspired by the receptive field mechanism of the cerebral cortex initial convolution mainly extracts low-level detail feature information,
in biology. CNN introduces a sparse connection mechanism in the such as corners, edges, and colors. With the forward pass of features,
convolutional layer, which effectively avoids the loss of spatial features subsequent convolutional layers are adopted to extract higher-level
in the image through the local receptive field. In addition, the weight and richer feature information. Generally, the size and stride of the
sharing mechanism and pooling operation are used to overcome the convolution kernel is able to set manually, and its parameters are
training overfitting problem caused by excessive model parameters. In obtained through learning during training process.
summary, the three interesting operations of CNN are local receptive
fields, weight sharing and pooling layers. Through the above opera- 2.1.2. Pooling layer
tions, it is possible to build a deeper CNN network and effectively learn Pooling layer usually follows a convolutional layer, which is used to
more abstract features in images, thereby enhancing the recognition downsample the feature maps output by the convolutional layer. This
and classification capabilities of the model. The general structure of operation is able to effectively reduce network parameters, computa-
CNN model (see Fig. 2) mainly consists of input layer, convolutional tional cost and training difficulty of the network, and avoid overfitting.
layer, pooling layers, fully connected layers, output layer and activation In addition, the pooling operation has feature invariance, that is, it
function, respectively. In addition, deconvolution layers have important no longer cares about the specific location of the feature but cares
applications in pixel-level upsampling processes such as image SR about the existence of the feature. Pooling operations mainly contain
reconstruction and optical flow estimation tasks. We next describe the max pooling and average pooling operation. The average pooling is
basic components of a CNN model in detail. to compute the average value of all elements, and use the obtained
average value as the output of the pooling layer. In contrast, the output
of the max pooling layer is the maximum value of all elements as the
2.1.1. Convolutional layer
output. It can be seen from Fig. 4 that after a pooling operation with a
It is well known that the convolutional layer is a key part of the CNN
step size of 2, the image becomes 1/4 of the original image. Through
architecture, which plays an important role in the feature extraction
the max pooling operation, the largest representative feature of the
process. As depicted in Fig. 3(a) , different from the fully connected
feature map is able to be extracted, while reducing the amount of data
mode (red arrow in the figure), convolutional layers are connected in
by 75%. Therefore, the max pooling operation is widely used in the
a sparse manner, i.e., only a fraction of input neurons 𝑥𝑘 are connected
CNN model.
(solid connecting lines) to output neurons 𝑦𝑙 . Here we assume that
( )𝑇
the input vector is denoted by 𝑋 = 𝑥1 , 𝑥2 , … , 𝑥𝑘 , and the output
( )𝑇 2.1.3. Activation function
is denoted by 𝑌 = 𝑦1 , 𝑦2 , … , 𝑦𝑙 , then the relationship between the The activation function is mainly adopted in the convolutional
output and input is described as follows: layer or the fully connected layer, and its main purpose is to per-
𝑌 =𝑊𝑋+𝑏 (1) form nonlinear fitting on the output features, thereby increasing the
nonlinear feature representation ability of the network. Different kinds
where the weight matrix 𝑊 represents the sparse matrix, and 𝑏 rep- of activation functions have emerged to enhance the ability of the
resents the bias parameter. The working principle of the convolution model in a targeted manner. For instance, the Sigmoid function, Tanh
layer is to use the convolution filter to convert the input image into a function, ReLU function (Glorot et al., 2011) and Softmax function
feature map and then feed it to the next layer. Specifically, as illustrated are common activation functions in convolutional neural networks (see
in Fig. 3(b), the convolution kernel (gray cube in the figure) of size Fig. 5).
3 × 3 × 4 shifts horizontally or vertically on the feature map (light First, the definition formula of Sigmoid is as follows:
green cube in the figure) of size 5 × 5 × 4 with a step size (here 𝑠 = 1).
Then, the weight coefficient of the convolution kernel (3 × 3 × 4 = 36) 𝑠(𝑥) = 1∕ (1 + 𝑒−𝑥 ) (3)
is multiplied by the element of the corresponding position of the feature The function image of Sigmoid function is shown in Fig. 5(a). The
map, and the output result of the corresponding position is obtained by Sigmoid function is a continuous monotonous curve that maps the
summative. The output feature map can be calculated by the following input value to the interval from 0 to 1, so it is often used in the
formula: probability prediction problem of binary classification. In addition,
( )
𝑊out = 𝑊in − 𝐾size ∕𝑆size + 1 the Sigmoid function has the problem of easy gradient disappearance
( ) during training. For the Tanh activation function, it is depicted in
𝐻out = 𝐻in − 𝐾size ∕𝑆size + 1 (2)
Fig. 5(b), and its description formula is defined as follows:
𝐶out = 𝑁 ( )( )
𝑡(𝑥) = 1 − 𝑒−2𝑥 1 + 𝑒−2𝑥 (4)
where 𝑊in and 𝐻in denote the width and height dimensions of the
input feature map, 𝐾size and 𝑆size denote the convolution kernel size The Tanh function is a hyperbolic tangent function, which converges
and convolution step size, respectively. The final output feature map faster than the Sigmoid function. However, both functions have the

3
C. Yu et al. Ocean Engineering 271 (2023) 113693

Fig. 3. Schematic diagram of connections between convolutional layers (a) and convolution operation in CNN (b).

popular, and its description formula is described as follows:

𝑟(𝑥) = max(𝑥, 0) (5)

Fig. 5(c) represents that the value of function is 0 when 𝑥 is less than
0, and it is a one-dimensional linear function when 𝑥 is greater than
0, keeping the gradient at a constant value. This method effectively
solves the gradient disappearance of Sigmoid and Tanh functions. At
the same time, it also has the advantages of strong sparsity and easier
convergence. The function image of Softmax function is shown in
Fig. 5(d), which is described as follows:
Fig. 4. Schematic diagram of max pooling (a) and average pooling (b).
( ) ( ) ∑ ( )
𝑚 𝑥𝑖 = exp 𝑥𝑖 ∕ exp 𝑥𝑗 (6)
𝑗
( )
where 𝑖 denotes the label of the classification, and 𝑚 𝑥𝑖 denotes the
probability of one of the categories. The Softmax function is actually a
normalization function, which is often used in the output layer of the
feedforward neural network, mapping the output value to a probability
between 0 and 1, and is mainly used as an activation function in
multi-classification problems.

2.1.4. Fully connected layer


In the CNN structure, after highly abstract modeling of the two-
dimensional information of the input image through the convolution
and pooling layers, it is necessary to further reduce the dimensional-
ity of the feature map using the fully connected layer. In this way,
deep two-dimensional features are mapped to a one-dimensional sam-
ple space. Essentially, the fully connected layer is also a convolution
calculation, which performs convolution operations on the input multi-
channel feature maps and accumulates the convolution results (see
Fig. 6). During the convolution operation, it is necessary to expand
the calculation feature matrix in front of the fully connected layer
into a one-dimensional vector, which is adopted as the input data for
the fully connected layer calculation. The essence of a fully connected
layer is a linear function, which cannot effectively model nonlinear
features, so multiple fully connected layers are usually used in neural
Fig. 5. Schematic diagram of different activation functions: Sigmoid function (a), Tanh networks. Fig. 6 shows the schematic diagram of the fully connected
function (b), ReLU function (c) and Softmax function (d). operation. Here, two fully connected layers are designed to flatten
the input feature map from the pooling layer into a one-dimensional
vector, and then complete a feature weighting through the first fully
problem that the gradient is easy to disappear. Compared with the first connected layer ((FC1) layer and ReLU activation. Then the result
two activation functions, the application of the ReLU function is more of the first feature weighting is passed through the FC2 layer and

4
C. Yu et al. Ocean Engineering 271 (2023) 113693

Fig. 6. Schematic diagram of the fully connected operation.

Fig. 7. Schematic diagram of deconvolution computation process.


Softmax activation function to complete the second feature weighting, Source: Reprinted from Dumoulin and Visin (2016).
and finally the prediction result is output.
Taking the input to the FC1 layer as an example, the operation
process is shown in the following formula:
where 𝐹 denotes the network function, 𝐿 denotes the loss function.
𝐶1 = 𝑋1 × 𝐻11 + 𝑋2 × 𝐻12 + ⋯ + 𝑋𝑘 × 𝐻1𝑘 + 𝑏1 Once the training process is completed, 𝑤 is fixed to the optimal value.
𝐶2 = 𝑋1 × 𝐻21 + 𝑋2 × 𝐻22 + ⋯ + 𝑋𝑘 × 𝐻2𝑘 + 𝑏2 Since 2012, CNNs have attracted much attention from researchers,
(7)
⋯ and many classic model structures have emerged, such as LeNet (Le-
𝐶𝑘 = 𝑋1 × 𝐻𝑘1 + 𝑋2 × 𝐻𝑘2 + ⋯ + 𝑋𝑘 × 𝐻𝑘𝑘 + 𝑏𝑘 Cun et al., 1998), Alex-Net (Krizhevsky et al., 2012), ZFNet (Zeiler
We convert Eq. (7) into a matrix, which is defined as follows: and Fergus, 2014), VGGNet (Simonyan and Zisserman, 2014) and
GoogleNet (Szegedy et al., 2015). Nowadays, CNNs are still an active
⎛ 𝐶1 ⎞ ⎛ 𝐻11 𝐻12 ⋯ 𝐻1𝑘 ⎞ ⎛ 𝑋1 ⎞ ⎛ 𝑏1 ⎞
⎜ ⋯ ⎟=⎜ ⋯ ⎟∗⎜ ⋯ ⎟+⎜ ⋯ ⎟ topic applied to many different tasks.
⋯ ⋯ ⋯ (8)
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ 𝐶𝑘 ⎠ ⎝ 𝐻𝑘1 𝐻1𝑘 ⋯ 𝐻𝑘𝑘 ⎠ ⎝ 𝑋𝑘 ⎠ ⎝ 𝑏𝑘 ⎠
where 𝑋1 ∼ 𝑋𝑘 and 𝐶1 ∼ 𝐶𝑘 denote the input and output, 𝐻11 ∼ 2.2. Recurrent neural network
𝐻𝑘𝑘 represents the weight of the fully connected layer, and 𝑏1 ∼
𝑏𝑘 represents the bias of the fully connected layer. Eq. (8) further When feedforward networks such as convolutional neural networks
proves that the calculation method of the fully connected layer is
process input data, adjacent input samples are independent of each
a convolution calculation. In addition, although the fully connected
other in the process of feature learning. Yet, when the input data
layer generally only accounts for 10% 20% of the model, its parameter
is sequential information with dependencies, the CNN model cannot
volume accounts for more than 80% of the total.
effectively learn the temporal correlation features between adjacent
samples. Instead, RNN introduce self-recurrent operations in their own
2.1.5. Deconvolution layer
The deconvolution (transposed-convolutional) (Zeiler et al., 2010) neurons, so that the output of each time is not only related to the input
is a variant of the convolution operation that essentially inverts the at present, but related to the input at previous timestamps. Fig. 8(a)
forward and backward computations of convolution. Fig. 7 presents shows schematic diagram of the internal structure of an RNN neuron.
the calculation process of a 3 × 3 convolution kernel moving on a It can be seen that it learns an implicit representation of the input
4 × 4 feature map with a stride of 1. In the forward pass of the model, sequence through an internal recurrent structure. When given an input
( )
the convolution operation reduces the size of the feature map layer sequence 𝑥𝑡 = 𝑥1 , 𝑥2 , … , 𝑥𝑛 , the recurrent update process of RNN
by layer. Conversely, the deconvolution operation increases the feature hidden neurons is as follows:
map size layer by layer until it returns to the original image size. There- ( )
fore, the deconvolution operation is similar to an upsampling operation. ℎ𝑡 = 𝑓 𝑊𝑥ℎ 𝑥𝑡 + 𝑊ℎℎ ℎ𝑡−1 + 𝑏ℎ (10)
However, unlike the traditional upsampling method (e.g., bilinear up-
where 𝑊𝑥ℎ is the weight input to the variable 𝑥𝑡 at this moment, 𝑊ℎℎ is
sampling), the convolution kernel parameters of deconvolution can be
the neuron state ℎ𝑡−1 at the previous moment as the weight input at this
learned from training, which allows the model to learn better feature
moment. 𝑏ℎ is the bias value, and 𝑓 is the activation function, which
representation capabilities in the data.
usually adopts the Tanh function. Further, the network output 𝑦𝑡 at the
For the fluid velocity field estimation task, it is essentially a process
of pursuing a high-resolution velocity field. For PIV estimation, the current time 𝑡 can be obtained by the following iterative formula:
model extracts feature from low-dimensional particle image pairs and 𝑦𝑡 = 𝑊ℎ𝑦 ℎ𝑡 + 𝑏𝑦 (11)
finally upsamples to a high-resolution velocity field. Similarly, the
SR reconstruction of the flow field is also a process of estimating where 𝑊ℎ𝑦 is the network weight when ℎ𝑡 is used as input, and 𝑏𝑦 is
the HR velocity field from the LR velocity field data. Therefore, the the corresponding bias value.
deconvolution operation is of great significance in the fluid velocity Although RNN can handle time series data, there is still a serious
field estimation task. problem about gradient vanishing in the process of training. As a
result, RNN can only have short-term memory and cannot effectively
2.1.6. Parameter optimization
process long-term time series data. In order to address these issues,
Once the neural network model is designed and determined, its
some excellent variants appear immediately, e.g., representative Long
parameters 𝑤 need to be trained and optimized on the corresponding
Short-Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) and
dataset. Assuming that the sample of the training set is 𝑝 and the
Gated Recurrent Unit (GRU) (Cho et al., 2014). These networks add
corresponding label is 𝑞, the training process of model learning is equal
information storage units to the structure of the recurrent neural net-
to addressing an optimization problem,
) work, which makes the network have stronger memory capabilities
𝑤 = arg min 𝐿(𝐹 (𝑝, 𝑤)), 𝑞 (9) than RNNs.
𝑤

5
C. Yu et al. Ocean Engineering 271 (2023) 113693

Fig. 8. Schematic diagram of the RNN structure (a), the LSTM structure (b) and GRU structure (c). Where 𝑡 represents the input and 𝐶𝑡 denotes the long-term state of the unit.

( )
2.2.1. LSTM ℎ𝑡 = 𝑜𝑡 × tanh 𝐶𝑡 (17)
Different from the basic structure of RNN, LSTM uses three ‘‘gates’’
The input 𝑥𝑡 at the current time and the previous hidden state ℎ𝑡−1
to constrain the state and output at different times, i.e., input gate,
are adopted as the input of the current Sigmoid activation function to
output gate and forget gate. LSTM combines short-term memory with
calculate the output 𝑜𝑡 . The output 𝑜𝑡 is then multiplied by the long-term
long-term memory through a gate structure, which can effectively
state 𝐶𝑡 to get the current hidden state ℎ𝑡 .
alleviate the problem of gradient disappearance. The gate structure is a
In summary, the forget gate constrains the amount of sample in-
fully connected layer that uses a bitwise multiplication operation, and
formation that can be passed from a previous time to the current cell
its activation function is a sigmoid function. The sigmoid function will
state. The input gate determines the amount of sample information that
output a value between 0 and 1 to represent the amount of sample
the input can save to the current unit. The output gate determines how
information that can transmit the gate at the current time. 0 represents
much information a cell state can output to the current state output
that no feature information can be transmitted, and 1 represents that
value. Afterwards, LSTM was improved and generalized for various
all feature information can be transmitted. The gate structure can be
applications (Mirza and Cosan, 2018; Koutnik et al., 2014).
shown in Fig. 8(b), and its specific calculation process is described by
the following formula, which can be divided into three parts:
2.2.2. GRU
(1) Forget gate is used to forget information, discarding some un-
GRU is improved on the basis of LSTM, and it has better effect
wanted information from the long-term state:
than LSTM while simplifying the structure of LSTM. Compared with the
( [ ] )
𝑓𝑡 = 𝜎 𝑊𝑓 ⋅ ℎ𝑡−1 , 𝑥𝑡 + 𝑏𝑓 (12) three-gate structure of the LSTM, GRU simplifies it to two gates: update
gate and reset gate. The former is designed to control the amount of
where 𝑊𝑓 denotes the weight matrix of the forget gate, and 𝑏𝑓 denotes information transmitted from the previous moment to the current state,
the bias value. The input of the forget gate 𝑓𝑡 is ℎ𝑡−1 and 𝑥𝑡 , and and the latter is adopted to control the amount of information forgotten
the output value is between 0 and 1, which is multiplied by each at the previous moment. As illustrated in Fig. 8(c), 𝑥𝑡 is the input data,
corresponding position element in the previous long-term state 𝐶𝑡−1 . ℎ𝑡 is the GRU output state. 𝑟𝑡 and 𝑧𝑡 are the reset gate and the update
The operation of multiplying with 0 here means forgetting the infor- gate, respectively. These two gates together handle the computation
mation, and 1 The operation of multiplying represents receiving this from state ℎ𝑡−1 to ℎ𝑡 , and the specific formula is described as follows:
information. ( [ ])
(2) Input gate is used to determine new memory information and 𝑧𝑡 = 𝜎 𝑊𝑧 ⋅ ℎ𝑡−1 , 𝑥𝑡 (18)
store new information in the long-term state: ( [ ])
𝑟𝑡 = 𝜎 𝑊𝑟 ⋅ ℎ𝑡−1 , 𝑥𝑡 (19)
( [ ] ) ( [ ])
𝑖𝑡 = 𝜎 𝑊𝑖 ⋅ ℎ𝑡−1 , 𝑥𝑡 + 𝑏𝑖 (13) ̃ℎ𝑡 = tanh 𝑊 ⋅ 𝑟𝑡 × ℎ𝑡−1 , 𝑥𝑡 (20)
( [ ] ) ( )
𝐶𝑡′ = tanh 𝑊𝑐 ⋅ ℎ𝑡−1 , 𝑥𝑡 + 𝑏𝑐 (14) ℎ𝑡 = 1 − 𝑧𝑡 × ℎ𝑡−1 + 𝑧𝑡 × ℎ𝑡̃ (21)
𝐶𝑡 = 𝑓𝑡 × 𝑐𝑡−1 + 𝑖𝑡 × 𝐶𝑡′ (15) where 𝑊𝑧 , 𝑊𝑟 and 𝑊 denote the weight matrices of the update gate, re-
where 𝑊𝑖 and 𝑊𝑐 still represent the corresponding weight matrix, 𝑏𝑖 set gate and hidden state respectively. Based on GRU, some researchers
and 𝑏𝑐 represent the bias value. This part can be divided into three have improved and further improved its performance (Zhao et al.,
operations. First, the input 𝑥𝑡 at the current moment and the hidden 2017; Che et al., 2018).
state ℎ𝑡−1 at the previous moment are adopted as the input of the input
gate, and the output value 𝑖𝑡 is controlled within the range of 0 to 1 3. Review of fluid motion estimation
through the Sigmoid activation function. The second operation uses
the Tanh activation function to create a new candidate state 𝐶𝑡′ , as The fluid motion estimation methods based on deep learning are de-
described in Eq. (14). Finally, Eq. (15) is used to add the candidate veloped along with deep learning technology. The conceptual study of
deep learning for fluid motion estimation could date back to 1990s (Teo
state 𝐶𝑡′ to the current long-term state 𝐶𝑡 .
et al., 1991; Cenedese et al., 1992; Hassan and Philip, 1997; Grant
(3) Finally, the state information ℎ𝑡 is output, and its calculation
and Pan, 1997). However, at the time neural networks with only a
process is described as follows:
few layers were adopted to process only part of the PIV tasks such
( [ ] )
𝑜𝑡 = 𝜎 𝑊𝑜 ⋅ ℎ𝑡−1 , 𝑥𝑡 + 𝑏𝑜 (16) as feature extraction and process optimization. In fact, early work just

6
C. Yu et al. Ocean Engineering 271 (2023) 113693

Fig. 9. A temporal overview of a deep learning-based fluid motion estimation models (for PIV).

used an artificial neural network for pattern recognition of particle the existing optical flow models into three types: U-Net, spatial pyramid
trajectories (Hassan and Philip, 1997; Grant and Pan, 1997). On the network and recurrent iteration network.
other hand, limited by the computing power of hardware systems,
U-Net
deep learning technology also fell into a low period of development
Dosovitskiy et al. (2015) first propose to use CNNs to estimate opti-
at that time. But this has laid a certain theoretical foundation for the
cal flow fields in 2015, which presents two networks namely FlowNetS
subsequent study of fluid motion estimation based on deep learning.
and FlowNetC. Both architectures are based on the encoder–decoder
Krizhevsky et al. (2012) put forward deep neural networks for image
structure (i.e., U-Net), which consists of encoder and decoder parts.
classification tasks in 2012 and achieved the best performance. Deep
Fig. 11 exhibits the architecture of FlowNetS and its encoder part
learning technology has once again attracted the interest and research
consists of consecutive convolution layers to extract deep feature maps.
of researchers in various fields. In 2017, Rabault et al. (2017) put
Because the stride of some layers is 2, the size of the feature map will
forward a convolutional neural network (CNN) that truly applies deep
decrease as the network develops deeper. The decoder part of FlowNetS
learning methods to estimate velocity field. Likewise in 2017, Kutz
is composed of a series of deconvolution layers that recover the size
(2017) gave a short review about deep learning methods in fluid dy-
of feature maps. Finally, the model outputs fine optical flow field at
namics and point out that it is just a matter of time before deep learning
the 1/4 resolution of the original input. Different from the structure
achieves outstanding performance in the complex turbulence modeling.
of FlowNetS, the input of FlowNetC is divided into two branches, and
Meanwhile, they also suggest building open challenge datasets in the
each branch consists of 3 convolutional layers to extract features of
fluid community similar to ImageNet (Deng et al., 2009) to fairly
image pairs. The outputs of these two branches are then fed to the
compare approaches.
correlation layers to compute the feature matching cost. The decoder
Since then, deep learning methods for fluid velocity field estimation
part of FlowNetS and FlowNetC is the same. After that, based on this U-
have gradually emerged. Fig. 9 presents the development process of
Net architecture, many variants appeared successively (Ilg et al., 2017;
fluid velocity field estimation methods based on deep learning in recent
Yu et al., 2016; Lai et al., 2017). For example, FlowNet2 (Ilg et al.,
years. The design of these network models is inseparable from the inspi-
2017) model is putforward to further enhance the accuracy for optical
ration of traditional algorithms. The ideas of optical flow learning and
flow estimation. However, the improvement of the accuracy is obtained
cross-correlation learning have important guiding significance for fluid
by stacking sub-networks, which results in a large and time-consuming
motion estimation. Next, we first review deep optical flow learning
model. Meanwhile, the structure of U-Net is regarded as a general
and cross-correlation learning methods for velocity field estimation.
approximation method and lacks the exploration of optical flow theory.
Subsequently, we give consideration and analysis on the applicability
of deep learning methods. Spatial pyramid
The optical flow model SPyNet based on spatial pyramid was firstly
3.1. Optical learning proposed by Ranjan and Black (2017). The SPyNet is a coarse-to-fine
spatial pyramid architecture to predict optical flow at different resolu-
3.1.1. A short overview of deep optical flow learning tions. Although the SPyNet model is smaller, the estimation accuracy
Optical flow estimation has always been a research hotspot in is not as good as FlowNet2. The SPyNet estimates large motion on
the field of computer vision (CV), which is mainly used for motion coarse layers and warps the second image towards the first using the
estimation and analysis of objects. With the maturity of deep learning, upsampled flow from the previous level. Therefore, only the remaining
the deep optical flow learning method has completely surpassed the flow for each level needs to be calculated. In order to further deal
performance of traditional optical flow algorithms (Dosovitskiy et al., with the large displacement and reduce the model parameters, based
2015; Ilg et al., 2017). Just as the traditional optical flow algorithm on (Ranjan and Black, 2017), two representative pyramid structures
has aroused the interest of experimental mechanics researchers, the PWCNet (Sun et al., 2018) and LiteFlowNet (Hui et al., 2018a) are pro-
deep learning optical flow approach has also attracted the attention of posed simultaneously in 2018. This architecture adopts feature warping
researchers. The research on velocity field estimation based on deep operation instead of image warping under different scale. Meanwhile,
optical flow learning has progressed along with the development of it also adopts cost volume layer to compute the matching cost on
optical flow learning models. Fig. 10 presents the evolution of represen- each pyramidal layer. The two networks achieve excellent results on
tative deep learning optical flow networks. According to the literature different optical flow datasets. In the following period, many optical
review (Zhai et al., 2021) for optical flow approaches, we can divide flow networks (Wang et al., 2018b; Liu et al., 2019; Yang and Ramanan,

7
C. Yu et al. Ocean Engineering 271 (2023) 113693

Fig. 10. A temporal overview of deep learning optical flow models, where the pink stars represent milestone approaches.

with high efficiency in running time, training speed and model size. It
has become a new milestone architecture, which has attracted many
researchers to improve and enhance it (Jiang et al., 2021; Zhang et al.,
2021).

3.1.2. Optical flow learning for velocity field estimation


Supervised learning
Since traditional optical flow algorithms play an important role
in fluid motion estimation, it is conceivable that deep optical flow
learning also has attractive prospect. Estimating fluid motion fields
from particle image pairs can also be considered as an image processing
issue in the CV area. Cai et al. (2019b) first modified the optical flow
model FlowNetS (Dosovitskiy et al., 2015) for velocity field estimation.
The original FlowNetS use the interpolation approaches to improve
the flow field into full resolution. However, it finds that the inter-
polation method ignores the small-scale vortex structure information
of the flow field, which is critical for complex flow fields. Hence,
two more deconvolutional layers are integrated into the final stage of
the model. In this way, a dense high-resolution velocity field (pixel-
Fig. 11. The architecture of FlowNetS (Dosovitskiy et al., 2015).
level) can be obtained. In addition, considering the supervised learning
method, the corresponding dataset needs to be established to train the
models. The generated PIV dataset contains various flow fields and has
2019; Hur and Roth, 2019) are still improved based on this spatial strong generalization. Up to our knowledge, this dataset is a public
pyramid structure. PIV dataset which can be a benchmark for comparing different deep
Recurrent iteration learning algorithms. The main process of generating the PIV dataset
The coarse-to-fine refine strategy of spatial pyramid structure effec- is as follows. First, the particle image generator (PIG) is adopted to
tively improves the estimation accuracy. But the refinement of optical generate particle image (Raffel et al., 2007) and a particle can be
flow is limited by the number of pyramid levels. Some related works defined by a two-dimensional Gaussian function:
adopted iterative refinement methods to enhance results on optical [ ( )2 ( )2 ]
− 𝑥 − 𝑥0 − 𝑦 − 𝑦0
flow. In 2020, a novel optical flow architecture named recurrent all- 𝐼𝑝 (𝑥, 𝑦) = 𝐼0 exp , (22)
(1∕8)𝑑𝑝2
pairs field transforms (RAFT) (Teed and Deng, 2020) is proposed and
( )
becomes a new milestone approach. As shown in Fig. 12, RAFT first where 𝐼0 , 𝑑𝑝 and 𝑥0 , 𝑦0 represent the intensity, diameter and center
extracts the features of the image pair using the feature encoder. The position of the particle, respectively. Additionally, the particle seed-
context encoder has the same structure as the feature encoder to extract ing density determines the particle quantity in the observed domain.
the context features of the first image, which provides more semantic Therefore, different control parameters can determine a unique particle
features for the subsequent flow interference to improve accuracy. image. The generated image then moves symmetrically following the
Then, a 4D(W × H × W × H) correlation layer that builds a correlation flow motion to get an image pair. For different flow patterns, it can be
volume to compute the matching of the corresponding feature vectors. extracted in different ways such as computational fluid dynamics (CFD)
Finally, a recurrent update operator based Conv-GRU block is utilized and open source addresses (e.g., 2D DNS-turbulence flow Carlier, 2005,
to update and refine optical flow. It is worth mentioning that the surface quasi-geostrophic (SQG) model of sea flow Resseguier et al.,
number of iterations of the update operator of RAFT can be selected in 2017 and the Johns Hopkins Turbulence Databases (JHTBD) Li et al.,
training and testing until the model evaluates to a satisfactory effect. 2008). The components of the dataset are illustrated in Table 1.
The design idea also imitates the optimization process of the traditional With the advent of more favorable optical flow models, Cai et al.
variational algorithm. RAFT reaches optimal estimation performance (2019a) then put forward the PIV-LiteFlowNet-en model that is based

8
C. Yu et al. Ocean Engineering 271 (2023) 113693

Fig. 12. The architecture of RAFT (Teed and Deng, 2020). It consists of 3 main components: a feature encoder along with context encoder, a 4D correlation layer and GRU-based
update operator.

Table 1
Main descriptive details of the public PIV dataset from Cai et al. (2019b).
Case name Main property Parameters Quantity
[ ]
Uniform Uniform flow field by CFD |𝑑𝑥| ∈ 0, 5 1000

Re = 800 600
Re = 1000 600
Back-step Backward stepping flow by CFD
Re = 1200 1000
Re = 1500 1000
Re = 40 50
Re = 150 500
Cylinder Flow over a circular cylinder by CFD Re = 200 500
Re = 300 500
Re = 400 500
DNS-turbulence (Carlier, 2005) A homogeneous and isotropic turbulence flow – 2000

SQG (Resseguier et al., 2017) Sea surface flow driven by SQG model – 1500

JHTDB-channel (Li et al., Channel flow provided by JHTDB – 1600


2008)

JHTDB-mhd1024 Forced MHD turbulence provided by JHTDB – 800

JHTDB-isotropic1024 Forced isotropic turbulence provided by JHTDB – 2000

on the LiteFlowNet (Hui et al., 2018a). Similarly, special deconvolution incorporated into the loss function of the network. Furthermore, Guo
layers are added to the model to obtain high-precision small-scale fluid et al. (2022) presented a multi-frame velocity field estimation network,
information (see Fig. 13). The results show that the PIV-LiteFlowNet- which effectively transfers and fuses flow field features between dif-
en model is superior to the previous PIV-NetS model in both accuracy ferent frames, thereby further improving accuracy and computational
and efficiency. However, the increase in accuracy comes in exchange efficiency.
for adding layers to the model. This increases the redundancy of With the emergence of a new milestone optical flow model RAFT,
model parameters, which has a large room for improvement. As for researchers began to further explore the potential of RAFT. Lagemann
LiteFlowNet, it enjoys some merits for fluid motion estimation. (1) It et al. (2021b) and Yu et al. (2021a) successively modified the RAFT
imitates the coarse-to-fine estimation strategy of the variational optical to represent an excellent PIV estimator. Specifically, Lagemann et al.
flow approach, which can effectively handle motions with different (2021b) crop particle images to small resolution (32px × 32px) and
displacement sizes. (2) Feature warping (Brox et al., 2004) operation performs feature extraction without spatial downsampling operation.
is adopted in each pyramid level to reduce the distance between the By this means, the feature of the original image can be extracted and
second image and first image. (3) Special flow regularization module utilized effectively. Different from Lagemann et al. (2021b), Yu et al.
can utilize image features to smooth the velocity field and decrease (2021a) improve the resolution of the feature encoder from 1/8 to
error vector. This process acts like a regularization term in the vari- 1/4 by removing the residual blocks of the encoder. This operation
ational optical flow formulation. (4) Prior assumptions can be easily also further compresses the model size. In essence, the improved ideas
embedded into the multilayer loss function of the LiteFlowNet model, in Lagemann et al. (2021b), Yu et al. (2021a) are both to improve the
so researchers can couple knowledge of fluid physics to the network. ability to extract image feature information. The obvious advantage
Inspired by the above, based on LiteFlowNet, Yu et al. (2021b) put of RAFT is that it iteratively refines the velocity field using a Conv-
forward a supervised learning network to specifically solve the problem GRU-based update operator. Different from performing an iterative
of PIV estimation in conditions of illumination variation. Brightness estimation from coarse to fine in the LiteFlowNet model, RAFT can
gradient constancy and first-order divergence-curl smoothing terms are update and iterate the velocity field multiple times during training

9
C. Yu et al. Ocean Engineering 271 (2023) 113693

Fig. 13. The architecture of PIV-LiteFlowNet-en (Cai et al., 2019a). The flow field in the figure represents DNS-turbulence flow.

and testing until a satisfactory result is obtained. In addition, the This is due to the optical flow architecture RAFT is superior to Lite-
special upsampling method (the convex combination upsampler) also FlowNet. Subsequently, Zhang et al. (2022) further developed the unsu-
increases the resolution of the flow fields to full resolution with high pervised fluid motion estimation strategy to embed more prior physical
precision. Benefiting from the huge advantages of the RAFT architec- knowledge into the framework (here termed as Un-LiteFlowNet-PIV-
ture, these two approaches achieve the state-of-the-art PIV estimation 2). This framework consist of a PDE (partial differential equation)-
performance. It has high accuracy not only on the PIV dataset, but also constrained motion predictor and a physical based corrector. This
has excellent performance in the real flow field (Yu et al., 2021c). prediction–correction scheme is applied to LiteFlowNet or PWCNet
architecture to further improve the estimation accuracy. Moreover, it
Unsupervised learning
also has great generalization ability to complex real-world fluid scenes.
Supervised learning networks usually require a large amount of
Overall, the aforementioned deep learning-based optical flow methods
training data with ground truth, so the corresponding dataset usually
have achieved encouraging results in recent years. It can be predicted
need to be artificially generated. To avoid this problem, Zhang and
that the deep optical flow model has important application prospects
Piggott (2020) proposed an unsupervised version of LiteFlowNet (called
and research potential in fluid motion estimation.
UnLiteFlowNet-PIV) inspired by Yu et al. (2016), Meister et al. (2018).
The total unsupervised loss of the UnLiteFlowNet-PIV is a combina-
3.2. Cross-correlation learning
tion of photometric loss, flow smoothness loss and consistency loss.
Concretely, the photometric loss is defined in terms of the difference
Inspired by cross-correlation algorithms, some scholars have also
between the two frames and corresponding forward–backward warped
elaborately designed deep cross-correlation learning networks to per-
images, which is defined as follows:
( ) ∑ ( ( )) form end-to-end PIV estimation (Lee et al., 2017; Gao et al., 2021).
𝐿𝑃 𝐼1 , 𝐼2 , 𝐅𝑓 , 𝐅𝑏 = 𝜌 𝐼1 (𝐱) − 𝐼2 𝐱 + 𝐅𝑓 (𝐱) Typically, Lee et al. (2017) adopt a four-level cascaded deep convo-
𝐱∈𝑃 (23) lutional network called PIV-DCNN to gradually generate coarse-to-fine
( ( ))
+ 𝜌 𝐼2 (𝐱) − 𝐼1 𝐱 + 𝐅𝑏 (𝐱) velocity vectors (see Fig. 14). Same as the traditional cross-correlation
where 𝑃 represents real number space, 𝐱 + 𝐅𝑓 and 𝐱 + 𝐅𝑏 (𝐱) denotes the method, the input of the PIV-DCNN is corresponding two patches (color
corresponding coordinates position in the other image. Here 𝜌 denotes squares) from the successive images pair. Every sub-net can estimate a
the Charbonnier penalty function (Teng et al., 2005). In addition, a velocity vector from two patches. F1 at level 1 can be regarded as an
second-order smooth term is used to reduce the error vector and extractor of the large displacement vector, and networks at level 2, 3
enhance the regularization effect, which is described as follows: and 4 then refine the vectors by calculating the residual vector (VecRes)
( ) ∑ ∑ ( ) after central difference window offset. The performance of PIV-DCNN
𝐿𝑆 𝐅𝑓 , 𝐅𝑏 = 𝜌 𝐅𝑓 (𝐬) − 2𝐅𝑓 (𝐱) + 𝐅𝑓 (𝐫) is competitive with the classical PIV algorithms, e.g., the window
(𝐬,𝐫)∈𝐍(𝐱) 𝐱∈𝑃 (24) deformation iterative multi-grid (i.e., WIDIM) algorithm. However, this
( )
+𝜌 𝐅𝑏 (𝐬) − 2𝐅𝑏 (𝐱) + 𝐅𝑏 (𝐫) requires a lot of execution time due to the large number of patches
where 𝑁 denotes a four channel filter including 𝑥, 𝑦 and two diagonals, that need to be recalculated. Furthermore, the velocity field output
and 𝑠 and 𝑟 respectively denote the two pixels before and after 𝑥, more by the PIV-DCNN is a sparse and low-resolution velocity field. The
details can refer to Zhang et al. (2014). The consistency loss indicates proposal of method PIV-DCNN provides an important reference for
that the forward and backward flow estimates should be consistent, velocity estimation based on deep learning.
which is described as follows: Although deep learning-based PIV methods have showed potential,
( ) ∑ ( 𝑓 ( )) e.g., high accuracy and spatial resolution, the generalization ability and
𝐿𝐶 𝐅 𝑓 , 𝐅 𝑏 = 𝜌 𝐅 + 𝐅𝑏 𝐱 + 𝐅𝑓
robustness of the related approaches still can be further enhanced for
𝐱∈𝑃 (25)
( ( )) real applications. Gao et al. (2021) put forward a deep learning model
+ 𝜌 𝐅𝑏 + 𝐅𝑓 𝐱 + 𝐅𝑏
called CC-FCN integrated with cross correlation strategy, which can
The forward flow 𝐅𝑓 should be the inverse of the backward flow fight against noise and achieve satisfactory results in practical applica-
( )
𝐅𝑏 𝐱 + 𝐅𝑓 at the corresponding pixel in the second image. Finally, tions. The CC-FCN network synergistically combines cross-correlation
experimental results show that UnLiteFlowNet-PIV can achieve com- learning and fully convolutional network. Two types of calculations
petitive results compared with supervised learning methods. are used as input to the model, including the particle image pair and
Lagemann et al. (2021a) replaced the LiteFlowNet model in this initial velocity field obtained by cross correlation. The embedded cross-
framework with the RAFT model, which achieved better performance. correlation approach estimates a coarse velocity field with a large

10
C. Yu et al. Ocean Engineering 271 (2023) 113693

Fig. 14. Schematic diagram of the PIV-DCNN structure.


Source: Reprinted from Lee et al. (2017), with permission
from Springer Nature.

interrogation window. It is well known that the execution process of superior performance for different flow fields compared to traditional
cross-correlation algorithm is based on the calculation of interrogation algorithms. Estimation accuracy are continuously improved as better
window matching, which has strong robustness and against the effects variants emerge. The supervised learning method has achieved high-
of noise in particle images. Therefore, using this prior velocity field as precision velocity field estimation results, and the variant based on
a reference velocity field can contribute to enhancing the robustness the optical flow model RAFT can achieve more outstanding predic-
of the model. The integration module then integrates these features tion results than the LiteFlowNet. Similarly, in the unsupervised PIV
appropriately and fed to a series of deconvolution layers to predict estimation framework, the effect of embedding the RAFT model It
dense (single-pixel) velocity fields. The trained model can reasonably is also better than embedding the LiteFlowNet. This proves that the
estimate the high-precision velocity field in the real flow fields. new milestone optical flow model RAFT has stronger feature rep-
Fig. 15 presents the architecture of CC-FCN and examples of ve- resentation ability in the fluid estimation task and can become the
locity fields the jet flow estimated by different methods. As shown new basic architecture for PIV estimation. Furthermore, the RAFT has
in Fig. 15(b), all methods achieve good results for velocity field es- fewer parameters and inference time compared to the LiteFlowNet. For
timation. To further compare the velocity field details extracted by
example, the LightPIVNet model proposed by Yu et al. (2021a) has
these three approaches, a section of a typical near-wall region (labeled
a parameter size of only 3.725M, which further compreses the RAFT
with a black dashed box) with rich flow structures was enlarged for
parameter structure. Although the evaluation results of unsupervised
observation. Note that the colored maps in the figure show the magni-
learning methods on the training data set are slightly lower than that of
tude of the velocity, and two yellow dots and one white dot indicate
supervised learning methods, they integrate the loss of prior knowledge
the center of the near-wall vortex and the saddle, respectively. Cross-
of fluid physics in the training. This enables the model to have strong
correlation algorithm, as a mature fluid motion estimation method,
robustness and generalization ability in the face of unknown flow fields.
can extract flow structure with relatively high reliability. In addition,
Similarly, the CC-FCN model embedded with cross-correlation learning
the spatial information of typical structures (i.e., two vortex centers
and a saddle) captured by CC-FCN is closer to the cross-correlation also has good generalization ability.
benchmark method compared to the PIV-LiteFlowNet-en. Both CC-FCN In overall, the current deep learning methods can achieve high
and the cross-correlation method extract relatively high velocity around estimation accuracy in the PIV dataset, and its calculation error is
the vortices, while the former estimates more details of the high-speed already very low. This is due to the powerful feature learning and
regions. In contrast, the flow field extracted by PIV-LiteFlowNet-en has representation capabilities of deep learning technique. Optical flow
a smaller high-speed regions around the two vortices. More comparison methods occupy half of the PIV estimation methods, which also proves
and descriptions can refer to Gao et al. (2021). Furthermore, due to the advantage of optical flow learning for fluid motion estimation. At
the calculation of cross-correlation algorithm and multi-layer deconvo- present, the optical flow model RAFT has great advantages in inference
lution embedded in the CC-FCN model, the process of estimating the time, parameters and accuracy. But it is worth noting that the 4D cost
velocity field is relatively time-consuming. volume of the RAFT consume more memory during the calculation
process. With the advent of better optical flow variants, more applicable
3.3. Applicability analysis fluid motion estimator will emerge.
In addition, compared with the RAFT, the LiteFlowNet and PWCNet
We conduct a comparison of deep learning-based fluid motion models based on the multi-level pyramid structure can be more easily
estimation methods on the public PIV dataset in this subsection. In embed the prior physical constraints of the fluid, which is important
addition, we also give a modest analysis and discussion on the ap- for the application in specific flow scenarios. This is still an interesting
plicability and computing costs of these methods. We first summarize direction in the future to embed more specific prior fluid knowledge
test results of deep learning methods on the public PIV dataset. For into optical flow learning networks.
fair comparison, we uniformly use the Averaged Endpoint Error (AEE)
metric to evaluate the performance of the methods. The AEE can be
defined as the 𝐿2 distance between the estimated flow 𝐅𝑒 and the 4. Review of velocity field reconstruction
ground truth flow 𝐅𝑔 , which is described as follows:
‖ ‖ In this part, we first present a short review of related super-
AEE = ‖𝐅𝑒 − 𝐅𝑔 ‖ (26)
‖ ‖2 resolution (SR) deep learning methods. Then, the description of fluid
Table 2 presents the estimation results of different methods on super resolution reconstruction is carried out around the three aspects
the public PIV dataset. For easy distinction, UnLite-FlowNet-PIV-2 rep- of CNNs and generative adversarial nets(GANs) and physics-informed
resents the new unsupervised framework proposed by Zhang et al. neural networks (PINNs). Finally, we also give a applicability analysis
(2022). It can be seen that the deep learning-based methods achieve of the fluid SR reconstruction methods.

11
C. Yu et al. Ocean Engineering 271 (2023) 113693

Fig. 15. Schematic diagram of the structure of the CC-FCN model (a) and comparison of jet velocity field estimated by different methods (left column: cross correlation method,
middle: CC-FCN method and right: PIV-LiteFlowNet-en method).
Source: Reprinted from Gao et al. (2021), with the permission of AIP Publishing.

4.1. An overview of SR methods the nonlinear mapping through the three-layer convolutional network,
and eventually outputs the HR image. Three-layer convolution can be
As the name implies, SR technique represents the reconstruction interpreted as three stages: image feature extraction, feature nonlinear
of corresponding high-resolution (HR) data from low-resolution (LR) mapping and final reconstruction.
data. SR technique has extensive applications in the areas of satellite Since then, super-resolution methods based on deep learning have
images, medical imaging and monitoring equipment, etc. The super- been studied and developed intensively. Anwar et al. (2020) have
resolution convolutional neural network (SRCNN) is the pioneering evaluated the single image SR methods on the benchmark dataset, and
work of deep learning in SR reconstruction (Dong et al., 2014). As summarized and classified the SR methods, which can be divided into
shown in Fig. 16, the network structure of SRCNN is relatively simple 9 categories: containing linear, residual, recursive, progressive, densely
and just contains three convolutional layers. It first adopts bicubic connected, multi-branch, attention-based, multiple degradation and
interpolation to enlarge the LR image to the target size, then fits adversarial design (see Fig. 17). For a linear network, it has a simple

12
C. Yu et al. Ocean Engineering 271 (2023) 113693

Table 2
The Averaged Endpoint Error (AEE) of various methods for the public PIV-Dataset (Cai et al., 2019b). Note that the error
unit is set to pixels per 100 pixels for easier comparison.
Methods Back-step Cylinder JHTDB-channel DNS-turbulence SQG
WIDIM (Scarano, 2001) 3.4 8.3 8.4 30.4 45.7
HS (Horn and Schunck, 1981) 4.5 6.9 6.9 13.5 15.6
PIV-DCNN (Lee et al., 2017) 4.9 7.8 11.7 33.4 47.9
PIV-NetS-noRef (Jiang et al., 2021) 13.9 23.7 23.7 52.5 52.5
PIV-NetS (Jiang et al., 2021) 7.2 15.5 15.5 28.2 29.4
PIV-LiteFlowNet (Li et al., 2008) 5.6 10.4 10.4 19.6 20.2
PIV-LiteFlowNet-en (Li et al., 2008) 3.3 7.5 7.5 12.2 12.6
PIV-RAFT (Yu et al., 2021a) 2.1 3.1 12.6 10.7 17.9
LightPIVNet (Yu et al., 2021a) 1.3 2.4 5.8 7.7 9.9
RAFT256-PIV (Lagemann et al., 2021b) 1.6 1.4 13.7 9.3 11.7
RAFT32-PIV (Lagemann et al., 2021b) 0.4 1.8 1.1 2.8 2.1
CC-FCN (Gao et al., 2021) 3.4 3.3 13.5 10.5 22.5
UnLiteFlowNet-PIV (Zhang and Piggott, 10.1 7.8 9.6 13.5 19.7
2020)
UnLiteFlowNet-PIV-2 (Zhang et al., 9.4 6.9 8.4 15.0 17.3
2022)
UnPwcNet-PIV (Zhang et al., 2022) 8.2 7.1 13.4 21.5 25.2
URAFT-PIV (Lagemann et al., 2021a) 6.5 6.6 8.1 12.5 13.2

Fig. 16. Schematic diagram of the SRCNN structure (Dong et al., 2014).

structure consisting of a single path or multiple paths without any skip introduced to SR networks such as SRDenseNet (Tong et al., 2017),
connections. As for such a network design, several convolutional layers RDN (Zhang et al., 2018c) and IDN (Hui et al., 2018b).
are stacked, and input features from the initial layer are fed to sub- The above network model considers that the spatial location and
sequent layers in order. The linear networks operate in different ways feature channel are unified in the process of feature extraction. How-
of upsampling, i.e. early upsampling or late upsampling. For example, ever, not all features are necessary for the SR tasks. The attention
DnCNN (Zhang et al., 2017) directly learns to predict high-frequency mechanism technique can effectively help the model focus on more im-
residuals rather than potentially super-resolution images. Similar to portant local features. Anwar and Barnes (2020) introduce the Densely
SRCNN, the architecture of DnCNN is very simple because it only Residual Laplacian Network (DRLN) for the SR image reconstruction
stacks several convolutional layers. However, its performance heavily task (see Fig. 18). The novel design of the DRLN model is to propose the
depends on the precision of noise estimation. In addition, DnCNN is densely connected residual units and Laplacian attention mechanism.
relatively computationally expensive due to the batch normalization Specifically, the residual architecture consists of the dense residual
Laplacian module (DRLM) hierarchically in a cascaded manner. Then,
(BN) operation after each convolutional layer. Compared to linear
the learned features are weighted and fused using Laplacian atten-
networks, residual learning adopts skip connections to avoid vanishing
tion in each module to adaptively learn features of different scales.
gradients and makes it feasible to design very deep networks (He et al.,
Empirical results also indicate that the DRLN method has excellent
2016). Lim et al. (2017) put forward an enhanced deep super-resolution
performance in terms of vision and accuracy. For multiple degradation
network (EDSR), introduced the residual block to expand the depth
processing networks, only bicubic degradation is considered, which
of the model to further improve performance, and removed unneces-
may not be a feasible assumption in practical applications because
sary BN layers of the previous network to stabilize training process. multiple degradations may occur simultaneously. Many existing CNN-
Subsequently, Yu et al. (2018) improved the EDSR by removing the based SR approaches assume bicubic downsampling of LR images from
redundant convolution layer and splitting the residual body into two HR images. However, when the actual degradation does not satisfy
parts for feature fusion, thus further improving the SR performance. this assumption, it will inevitably lead to poor performance. To ad-
Furthermore, Tai et al. (2017) proposed a deep CNN architecture in dress these issues, Zhang et al. (2018b) present a general framework
which 52 convolutional layers are employed, and called deep recurrent with dimensional stretching strategies. This framework utilizes a single
residual network (DRRN). Specifically, the model adopts global and convolutional SR models to take as input two important factors of
local residual learning to solve the problem of training very deep the SR degradation process, namely the blur kernel and the noise
networks, and uses recursive learning to increase the depth without level. Therefore, this method can solve multiple and even spatially
increasing the model parameters. To extract richer feature information varying degradations, which greatly enhances the practicability. Simi-
of images, the design ideas of dense connection and multi-branch are larly, the super resolution multiple degenerate network (SRMD) (Zhang

13
C. Yu et al. Ocean Engineering 271 (2023) 113693

Fig. 17. Schematic diagram of the classification of existing single image SR approaches.

Fig. 18. Schematic diagram of the DRLN structure (a) Anwar and Barnes (2020) and Laplacian attention module (b).

et al., 2018d) employed a scheme of connecting LR images and their 4.2. CNN-based methods
degradation maps.
In addition, the use of generative adversarial networks (GANs) to Inspired by machine learning, Fukami et al. (2019) develop two
deep learning models, namely convolutional neural network and the
perform SR tasks is also an interesting and promising research direc-
hybrid downsampled skip-connection/ multi-scale (called DSC-MS)
tion. In addition, the physics-informed neural network (PINN) (Raissi
models. Both of them are applied to test two-dimensional cylinder
et al., 2019) is a novel neural network proposed recently, which has
wake flows. The results show that it has superior ability to reconstruct
shown great advantages in the field of flow fields reconstruction. Next, turbulent and laminar flow from LR flow fields data. Meanwhile, it
we give a review of the velocity fields reconstruction from 4 aspects: pioneered the task of flow field reconstruction using deep learning
CNN-based methods, GAN-based methods, RNN-based methods and methods. Subsequently, Liu et al. (2020) also propose two deep learning
PINN-based methods. networks to address the SR reconstruction of turbulent flows from LR

14
C. Yu et al. Ocean Engineering 271 (2023) 113693

coarse flow field data. One is the static convolutional neural network 4.3. GAN-based methods
(SCNN) based on the SRCNN (Dong et al., 2014). The other one
is the novel multiple temporal paths convolutional neural network Generative adversarial net (GAN) (Goodfellow et al., 2014) is com-
(termed MTPC) that takes a time series of velocity fields as an input posed of two networks: one is a generator network (𝐺) and the other is a
and then output a HR flow field. The MTPC simultaneously considers discriminator (𝐷) (see Fig. 21). The final destination of GAN is to learn
and integrates the spatial and temporal information of the flow field, a high-quality generator and GAN can obtain high quality generator
which can enhance the reconstruction of the fine-scale structure of by adding discriminator. Concretely, the generator aims to generate
turbulence. Fig. 19 shows the SR results of different methods for images during the training process that are as realistic as possible so
isotropic turbulence fields at low resolution 𝑟 = 4. It can be seen that the discriminator cannot judge the authenticity of this picture.
that the velocity field estimated by the bicubic interpolation is smooth The purpose of discriminator in the training process is to identify the
because only low-frequency information can be captured. In contrast, true or false of this picture as much as possible. Therefore, generator
the SRCNN and MTPC achieve good reconstruction results that are wants to maximize the error rate of discriminator, and discriminator
closer to the direct numerical simulation (DNS) result, but the latter can wants to minimize the error rate. The two confront with each other
estimate more details and reconstruct finer structures than SRCNN. In and improve in competition together. In theory, this relationship can
other words, the MTPC estimates more high-frequency content from the reach an equilibrium point, which is the Nash equilibrium. In other
low-resolution velocity field. The author analyzes that this is because words, when the probability of discriminating the picture generated by
the MTPC effectively integrates the extra spatio-temporal information generator as real data is 0.5, that is, the current discriminator can no
of the flow fields. longer distinguish the true or false pictures generated by the generator.
Since the high-resolution data is a broad pursuit in physics and Then the purpose of the generator also achieves the purpose of mixing
engineering, researchers have further applied deep learning SR tech- the fake with the real. The optimization objective function of GAN is
niques to various flow fields reconstruction tasks. For instance, Kong as follows:
et al. (2020) put forward a multipath SR convolutional neural network
min max 𝑉 (𝐷, 𝐺) =E𝑥∼𝑃 𝑑𝑎𝑡𝑎(𝑥) [log(𝐷(𝑥))]+
(MPSRC) to achieve super-resolution temperature field reconstruction. 𝐺 𝐷
(27)
The MPSRC achieves better reconstruction results with lower error and E𝑧∼𝑃𝑧(𝑧) [log(1 − 𝐷(𝐺(𝑧)))]
higher peak signal-to-noise ratio (PSNR). Although this method has
where 𝑧 is a noisy sample of the prior probability distribution 𝑃𝑧 ,
made great progress, there are still some challenges. For example, when
the magnification factor is large, the super-resolution reconstruction and 𝑥 is a real sample following a specific distribution 𝑃 𝑑𝑎𝑡𝑎. The
performance degrades to a certain extent. Ferdian et al. (2020) pro- generator works to minimize the target loss function during training,
posed the 4DFlowNet to produce noise-free SR 4D flow MRI data. Based while the discriminator maximizes the target loss function. For the
on Fukami et al. (2019), Fukami et al. (2021) successively construct a training schedule, one side is fixed during the training process, the
deep learning-based spatio-temporal network to reconstruct turbulence parameters of the other network are updated, and the training is
fields. Kong et al. (2021) used simple convolutional layers to construct carried out alternately and iteratively. Finally, the samples generated
a multi-path model architecture to reconstruct the low-resolution super- by the generator are closer to the real data. Super-resolution techniques
sonic flow fields. When the sampling factor is low (𝑟 = 4), the model based on generative adversarial networks, such as SRGAN (Ledig et al.,
can reconstruct a clear background wave. But when the sampling factor 2017) and ESRGAN (Wang et al., 2018c), also achieved wonderful
is large (𝑟 = 8), the reconstructed shock structure is completely lost, performance in image super-resolution tasks.
and the reconstruction result is seriously distorted. Recently, Chen et al. Because of the novelty of GAN, some researchers improved GAN
(2022b) proposed a multi-branch fusion convolutional neural network and applied it to flow fields super-resolution reconstruction. In the
(called MBFCNN) to reconstruct the flow field in a supersonic com- TempoGAN network proposed by Xie et al. (2018), two aspects are
bustor and achieved great reconstruction results. The MBFCNN model considered in the discriminator to discriminate space and time re-
can predict a rich information source for the evolution of the wave spectively. This method can generate more detailed real and time-
system structure under the self-ignition conditions of the hydrogen- consistent physical quantities of flow field. Xu et al. (2020) proposed
fueled scramjet and greatly improves the detection precision. Deng an algorithm for data-driven 3D super-resolution that increases spa-
et al. (2022) subsequently developed a dual-branch network based on a tial resolution by twofold along each spatial direction. The approach,
multi-head attention mechanism to reconstruct the flow field schlieren known as 3D-Superresolution Generative Adversarial Network (3D-SR-
image in a supersonic combustor, and results demonstrate that the GAN), constructs a generator and a discriminator network to study
model can effectively reconstruct the basic wave system structure of topographic information and at a given LR counterpart to infer high-
a complicated flow field. resolution 3D turbulent flame structure. Lee et al. (2018) proposed a
It can be seen that the general CNN-based SR techniques improve deep learning GAN network method for simulating small-scale features
the accuracy by building a temporal multi-branch network structure. of turbulent flow. This method is novel in processing 3D convolution
This is because flow field data has multi-scale spatial and temporal to achieve 3D structure prediction and predicting accurate solutions
characteristics (Pope and Pope, 2000). However, the network designed with less computational cost. Deng et al. (2019) develop SR recon-
by this idea also increases the complexity of the model and ignores the struction methods from LR flow fields using GAN-based deep learning
computational efficiency. Inspired by single-image SR reconstruction frameworks SRGAN (Ledig et al., 2017) and ESRGAN (Wang et al.,
methods in the field of computer vision, Bi et al. (2022) put forward 2018c). The analysis of reconstructed instantaneous flow fields and
a multi-scale integration network (named FlowSRNet) to reconstruct spatial correlation shows that both models can accurately reconstruct
the HR flow fields. Considering the multi-scale spatial characteristics high spatial resolution flow field in complicated flow structures.
of the fluid flows, a lightweight multi-scale aggregation block (LMAB) However, it is worth noting that training GAN is difficult and
is carefully designed inspired by Gao et al. (2019), which includes a unstable, and it is prone to discriminator convergence and generator
parallel cascading architecture and feature aggregation module. Fur- divergence, leading to model collapse. To deal with this dilemma, Wu
thermore, a corresponding SR dataset is build to train and verify et al. (2020) proposed a generative adversarial network embedded with
the proposed approach, which contains a variety of fluid flows. The statistical constraints. By enforcing covariance constraints on the train-
results demonstrate that the FlowSRNet model achieves outstanding SR ing data, the network can fit the statistics of training data produced by
performance for various flow fields. Meanwhile, the FlowSRNet has the solving the fully decomposed partial differential equations. The results
advantage of being lightweight, and its parameters are only 0.432M show that this statistical regularization results in better performance in
when stacking two LMABs in the backbone network architecture (see comparison to the standard GAN. Lin et al. (2019) proposed a novel
Fig. 20). GAN network to solve the problem that traditional GAN networks are

15
C. Yu et al. Ocean Engineering 271 (2023) 113693

Fig. 19. Schematic diagram of the MTPC structure (Liu et al., 2020) (a) and comparison of different SR methods for the forced isotropic turbulence at 𝑟=4 (b).
Source: Reprinted from Liu et al. (2020), with the permission from AIP Publishing.

difficult to converge, in which the generation network no longer tries to the task of predicting the features of flow fields data (Li et al., 2021,
deceive the discriminative network but seeks the matching problem of 2022).
corresponding relations. Compared with traditional GAN, this method
is superior in terms of iterative convergence and visual detection. 4.4. PINN-based methods

It is well known that unsteady flow has the characteristics of


Both data-driven CNN-based and GAN-based methods improve SR
changes in time and space characteristics. Hence, some works have performance by modifying the model architecture, and also rely on
developed recurrent neural networks (e.g., LSTM) to predict the spatio- high-quality training data. However, the training data imply prior
temporal characteristics of specific flow data (Pawar et al., 2019; Huang physical knowledge in various fluid scenarios, which is not explic-
et al., 2019). The recurrent neural network itself has the advantage of itly represented by deep learning models. In recent years, a novel
extracting temporal features, which has broad application prospects in physics-informed neural network (PINN) (Raissi et al., 2019, 2020) is

16
C. Yu et al. Ocean Engineering 271 (2023) 113693

Fig. 20. Schematic diagram of the FlowSRNet structure (Bi et al., 2022).

where 𝑅𝑒 denotes the Reynolds number, and 𝑒1 , 𝑒2 , 𝑒3 and 𝑒4 are


the errors of the N–S equations. In addition, the partial derivatives
𝜕∕𝜕𝐗 and 𝜕∕𝜕𝑡 can be computed by automatic differentiation in the
open source framework (e.g., TensorFlow or PyTorch). Finally, the loss
function of the PINN is defined as follows:

𝐿 = 𝐿data + 𝛼𝐿eqns (29)

where 𝛼 denotes a weighting coefficient, 𝐿data represents the loss


between the measured data and predicted data, and 𝐿eqns represents
the total error of the N–S equations. Although there is still a certain gap
between the reconstruction effect of PINNs and the traditional classical
methods, it has become a promising data assimilation technology,
which has aroused the interest of researchers.

4.5. Applicability analysis

We also give an applicability analysis of deep learning-based meth-


Fig. 21. Schematic diagram of the GAN structure. ods for fluid flow fields reconstruction. Deep learning techniques, such
as CNNs and GANs, have many meaningful applications in flow field
reconstruction tasks, showing great disadvantages. For CNN, it is rel-
proposed by combining the data-driven neural networks and physical atively easy to build and improve the network structure, and has a
laws. Specifically, a neural network could be thought of a general strong feature representation ability for the flow fields. Considering
function approximator (Hornik et al., 1989), which can be composed the temporal feature of the flow field, building a multi-branch network
of a fully connected neural network or a residual neural network, and is a mainstream strategy, but it also leads to a relative increase in
the latter is implemented by embedding partial differential equations computing time. For GAN, it can also effectively reconstruct the texture
(PDEs) into the loss of the neural network using automatic differentia- details and feature information for the flow field. However, an obvious
tion. Therefore, PINNs can be easily embedded into physical constraints shortcoming of GAN is that it is difficult to train and converge. Simi-
to accomplish the task of flow field reconstruction. larly, PINN also has the problem of large training cost, and the process
At present, PINNs have been successfully applied to reconstruct the of solving partial derivatives multiple times also increases the calcula-
velocity and pressure distributions from flow visualizations including tion time. In overall, researchers need to consider the characteristics of
the cylindrical wake flow field and the biological flow field of the in- specific flow field data to select and design the deep learning model
tracranial aneurysm image (Raissi et al., 2020). Furthermore, the PINN architecture. In addition, the introduction of prior physical constraints
can also be used to address the incompressible N–S equation, which is is an important trend in the integration and development of deep
based on the velocity–pressure and velocity–vorticity forms (Rao et al., learning technology and fluid mechanics.
2020; Jin et al., 2021). Cai et al. (2021) put forward a new technique
based on PINNs to predict the full continuous 3D velocity and pressure 5. Conclusion
fields from snapshots of 3D temperature fields that is obtained by
the tomographic background oriented schlieren imaging. Experiments A comprehensive survey of deep learning-based methods for fluid
show that the PINN method can efficiently infer velocities and pressures velocity field estimation is given in this paper. Specifically, we divide
from regression data without any information about initial or boundary methods for fluid velocity field estimation into two categories, fluid
conditions.
motion estimation and velocity field super-resolution reconstruction
Wang et al. (2022) adopted a PINN to reconstruct dense velocity
methods. First, the principle of deep learning is introduced and de-
fields from sparse tomographic PIV data. The PINN not only increases
scribed in detail. Second, we mainly investigate deep learning methods
the velocity resolution, but also can be used to predict the pressure
for velocity field estimation from successive pair of particle images.
field of the flow field. Fig. 22 shows the proposed PINN structure and
Finally, we describe the deep learning approaches for fluid velocity
the reconstruction results. The inputs of the PINN are space coordinates
field reconstruction. It is worth noting that we also report experimental
𝐗 = (𝑥, 𝑦, 𝑧) and time 𝑡, and the outputs are velocity 𝐔 = (𝑢, 𝑣, 𝑤),
results of the representative methods for a better understanding the
and pressure 𝑝. The physical laws are described by incompressible N–S
equations, which are described as follows: motivation of the proposed methods. For the development of fluid
velocity field estimation methods, we give the following summary and
1 ( )
𝑒1 = 𝑢𝑡 + 𝑢𝑢𝑥 + 𝑣𝑢𝑦 + 𝑤𝑢𝑧 + 𝑝𝑥 − 𝑅𝑒 𝑢𝑥𝑥 + 𝑢𝑦𝑦 + 𝑤𝑧𝑧 , prospect.
1 ( )
𝑒2 = 𝑣𝑡 + 𝑢𝑣𝑥 + 𝑣𝑣𝑦 + 𝑤𝑣𝑧 + 𝑝𝑦 − 𝑅𝑒 𝑣𝑥𝑥 + 𝑣𝑦𝑦 + 𝑣𝑧𝑧 , Fluid motion estimation. Deep learning methods have achieved
1 ( ) (28)
remarkable success in the field of fluid motion estimation because
𝑒3 = 𝑤𝑡 + 𝑢𝑤𝑥 + 𝑣𝑤𝑦 + 𝑤𝑤𝑧 + 𝑝𝑧 − 𝑅𝑒 𝑤𝑥𝑥 + 𝑤𝑦𝑦 + 𝑤𝑧𝑧 ,
𝑒4 = 𝑢𝑥 + 𝑣𝑦 + 𝑤𝑧 . they are good at dealing with fluid data with complex nonlinearity,

17
C. Yu et al. Ocean Engineering 271 (2023) 113693

Fig. 22. Schematic diagram of the PINN structure.


Source: Reprinted from Wang et al. (2022), with the
permission of AIP Publishing.

high dimensionality, and large quantities. Compared with traditional of different flow fields are different. It is of significance to choose a
algorithms, deep learning can well solve problems such as low ac- suitable neural network model for a specifical flow field reconstruction
curacy, low spatial resolution and computational efficiency. Yet, the problem. (3) The establishment of standard flow field datasets is of
following challenges still remain in the development of deep learning great significance to the development of flow field reconstruction,
methods for fluid motion estimation in the future: (1) The robustness which will contribute to the test and improvement of models. (4) Super-
and generalization ability of the model still need to be further improved resolution tasks are pixel-level tasks that are time-consuming to process
for practical applications. Some challenges inevitably exist in real im- large volumes of fluid data. Building a lightweight model architecture
ages, such as severe noise and large displacement, etc. Therefore, how has practical significance in engineering applications.
to effectively solve these problems is also a promising direction. (2)
Learning more characteristics of the fluid velocity field, i.e., temporal Declaration of competing interest
and high-dimensional properties. Most of the current works are devoted
to estimating the velocity field from particle image pairs while ignoring The authors declare that they have no known competing finan-
the time-resolved properties of the velocity field. It is well known that cial interests or personal relationships that could have appeared to
fluids have dynamic temporal properties, which are non-local in both influence the work reported in this paper.
time and space. Therefore, it is an interesting direction to obtain the
time-resolved characteristics of the fluid velocity field. In addition, it Data availability
is an inevitable trend to develop from estimating two-dimensional (2D)
fluid motion to three-dimensional (3D) fluid motion. In the field of com- Data will be made available on request.
puter vision, 3D motion estimation models have begun to appear and
develop, such as scene flow estimation using point cloud technology. Acknowledgment
Similar to the 2D optical flow model, the 3D scene flow model can be
well applied to fluid estimation. But a difficulty here is how to construct This work is supported by the National Natural Science Foundation
a corresponding 3D dataset. We believe that in the future this problem of China under Grant 62236011.
will definitely be solved well. (3) Improving the interpretability of neu-
ral network structures. Diverse fluids have complex physical properties, References
and designing a more physically interpretable network architecture and
embedding more prior physical knowledge can effectively enhance the Adrian, R.J., 2007. Hairpin vortex organization in wall turbulence. Phys. Fluids 19 (4),
041301.
performance of the model. This is a trend in the future integration of
Adrian, R.J., Westerweel, J., 2011. Particle Image Velocimetry, No. 30. Cambridge
deep learning technology and fluid mechanics. (4) Developing small- University Press.
sample neural network architectures. Deep learning method training Anwar, S., Barnes, N., 2020. Densely residual laplacian super-resolution. IEEE Trans.
requires a large amount of data. Computational fluid dynamics is Pattern Anal. Mach. Intell.
limited by the high cost of data acquisition. Therefore, how to develop a Anwar, S., Khan, S., Barnes, N., 2020. A deep journey into super-resolution: A survey.
ACM Comput. Surv. 53 (3), 1–34.
small-sample neural network motion modeling method is an important Astarita, T., Cardone, G., 2005. Analysis of interpolation schemes for image deformation
research direction that can be generalized in engineering. methods in PIV. Exp. Fluids 38 (2), 233–243.
Velocity field SR reconstruction. Deep learning-based image SR Bi, X., Liu, A., Fan, Y., Yu, C., Zhang, Z., 2022. FlowSRNet: A multi-scale integration
techniques have achieved good performance. Compared with tradi- network for super-resolution reconstruction of fluid flows. Phys. Fluids 34 (12),
127104.
tional interpolation methods, these networks show better performance
Brox, T., Bruhn, A., Papenberg, N., Weickert, J., 2004. High accuracy optical flow
for flow field reconstruction. The deep learning SR methods have a estimation based on a theory for warping. In: European Conference on Computer
wide range of application tasks and scenarios for fluid reconstruction. Vision. Springer, pp. 25–36.
Similar to the above, the future development of deep learning in this Cai, S., Liang, J., Gao, Q., Xu, C., Wei, R., 2019a. Particle image velocimetry based on
direction also has many directions. (1) It is also a trend to recon- a deep learning motion estimator. IEEE Trans. Instrum. Meas. 69 (6), 3538–3554.
Cai, S., Wang, Z., Fuest, F., Jeon, Y.J., Gray, C., Karniadakis, G.E., 2021. Flow over
struct high-dimensional (i.e., 3D or more) velocity field components. an espresso cup: inferring 3-D velocity and pressure fields from tomographic
Compared to building the PIV dataset, building a high-dimensional background oriented Schlieren via physics-informed neural networks. J. Fluid Mech.
SR dataset seems to be moderately difficult. (2) The characteristics 915.

18
C. Yu et al. Ocean Engineering 271 (2023) 113693

Cai, S., Zhou, S., Xu, C., Gao, Q., 2019b. Dense motion estimation of particle images Horn, B.K., Schunck, B.G., 1981. Determining optical flow. Artificial Intelligence 17
via a convolutional neural network. Exp. Fluids 60 (4), 1–16. (1–3), 185–203.
Carlier, J., 2005. Second set of fluid mechanics image sequences. European Project Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are
Fluid Image Analysis and Description (FLUID)-http://www.fluid.irisa.fr. universal approximators. Neural Netw. 2 (5), 359–366.
Cenedese, A., Romano, G., Paglialunga, A., Terlizzi, M., 1992. Neural Net for Tra- Hua, M., Bie, X., Zhang, M., Wang, W., 2014. Edge-aware gradient domain optimization
jectories Recognition in a Flow. Technical Report, UNIVERSITA DEGLI STUDI LA framework for image filtering by local propagation. In: Proceedings of the IEEE
SAPIENZA ROME, ITALY. Conference on Computer Vision and Pattern Recognition. pp. 2838–2845.
Che, Z., Purushotham, S., Cho, K., Sontag, D., Liu, Y., 2018. Recurrent neural networks Huang, J., Liu, H., Cai, W., 2019. Online in situ prediction of 3-D flame evolution from
for multivariate time series with missing values. Sci. Rep. 8 (1), 1–12. its history 2-D projections via deep learning. J. Fluid Mech. 875.
Chen, Z., Bi, X., Zhang, Y., Yue, J., Wang, H., 2022a. LightweightDeRain: learning a Hui, T.W., Tang, X., Loy, C.C., 2018a. Liteflownet: A lightweight convolutional neural
lightweight multi-scale high-order feedback network for single image de-raining. network for optical flow estimation. In: Proceedings of the IEEE Conference on
Neural Comput. Appl. 34 (7), 5431–5448. Computer Vision and Pattern Recognition. pp. 8981–8989.
Chen, H., Guo, M., Tian, Y., Le, J., Zhang, H., Zhong, F., 2022b. Intelligent reconstruc- Hui, Z., Wang, X., Gao, X., 2018b. Fast and accurate single image super-resolution
tion of the flow field in a supersonic combustor based on deep learning. Phys. via information distillation network. In: Proceedings of the IEEE Conference on
Fluids 34 (3), 035128. Computer Vision and Pattern Recognition. pp. 723–731.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Hur, J., Roth, S., 2019. Iterative residual refinement for joint optical flow and occlusion
Bengio, Y., 2014. Learning phrase representations using RNN encoder-decoder for estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and
statistical machine translation. arXiv preprint arXiv:1406.1078. Pattern Recognition. pp. 5754–5763.
Cholemari, M.R., 2007. Modeling and correction of peak-locking in digital PIV. Exp. Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T., 2017. Flownet 2.0:
Fluids 42 (6), 913–922. Evolution of optical flow estimation with deep networks. In: Proceedings of the
Corpetti, T., Heitz, D., Arroyo, G., Mémin, E., Santa-Cruz, A., 2006. Fluid experimental IEEE Conference on Computer Vision and Pattern Recognition. pp. 2462–2470.
flow estimation based on an optical-flow scheme. Exp. Fluids 40 (1), 80–97. Jiang, S., Campbell, D., Lu, Y., Li, H., Hartley, R., 2021. Learning to estimate
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L., 2009. Imagenet: A large- hidden motions with global motion aggregation. In: Proceedings of the IEEE/CVF
scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision International Conference on Computer Vision. pp. 9772–9781.
and Pattern Recognition. IEEE, pp. 248–255. Jin, X., Cai, S., Li, H., Karniadakis, G.E., 2021. NSFnets (Navier-Stokes flow nets):
Deng, X., Guo, M., Chen, H., Tian, Y., Le, J., Zhang, H., 2022. Dual-path flow field Physics-informed neural networks for the incompressible Navier-Stokes equations.
reconstruction for a scramjet combustor based on deep learning. Phys. Fluids 34 J. Comput. Phys. 426, 109951.
(9), 095118. Kapulla, R., Hoang, P., Szijarto, R., Fokken, J., 2011. Parameter sensitivity of optical
Deng, Z., He, C., Liu, Y., Kim, K.C., 2019. Super-resolution reconstruction of turbulent flow applied to PIV images. In: Proceedings of the Fachtagung ‘‘Lasermethoden in
velocity fields using a generative adversarial network-based artificial intelligence der Strömungsmesstechnik’’. Ilmenau, Germany, pp. 6–8.
framework. Phys. Fluids 31 (12), 125111. Khalid, M., Pénard, L., Mémin, E., 2019. Optical flow for image-based river velocity
Dong, C., Loy, C.C., He, K., Tang, X., 2014. Learning a deep convolutional network estimation. Flow Meas. Instrum. 65, 110–121.
for image super-resolution. In: European Conference on Computer Vision. Springer, Kong, C., Chang, J.T., Li, Y.F., Chen, R.Y., 2020. Deep learning methods for super-
pp. 184–199. resolution reconstruction of temperature fields in a supersonic combustor. AIP Adv.
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der 10 (11), 115021.
Smagt, P., Cremers, D., Brox, T., 2015. Flownet: Learning optical flow with Kong, C., Chang, J., Wang, Z., Li, Y., Bao, W., 2021. Data-driven super-resolution
convolutional networks. In: Proceedings of the IEEE International Conference on reconstruction of supersonic flow field by convolutional neural networks. AIP Adv.
Computer Vision. pp. 2758–2766. 11 (6), 065321.
Du, X., Cai, Y., Wang, S., Zhang, L., 2016. Overview of deep learning. In: 2016 31st Koutnik, J., Greff, K., Gomez, F., Schmidhuber, J., 2014. A clockwork rnn. In:
Youth Academic Annual Conference of Chinese Association of Automation. YAC, International Conference on Machine Learning. PMLR, pp. 1863–1871.
IEEE, pp. 159–164. Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep
Dumoulin, V., Visin, F., 2016. A guide to convolution arithmetic for deep learning. convolutional neural networks. Adv. Neural Inf. Process. Syst. 25.
arXiv preprint arXiv:1603.07285. Kutz, J.N., 2017. Deep learning in fluid dynamics. J. Fluid Mech. 814, 1–4.
Ferdian, E., Suinesiaputra, A., Dubowitz, D.J., Zhao, D., Wang, A., Cowan, B., Lagemann, C., Klaas, M., Schröder, W., 2021a. Unsupervised recurrent all-pairs field
Young, A.A., 2020. 4DFlowNet: super-resolution 4D flow MRI using deep learning transforms for particle image velocimetry. In: 14th International Symposium on
and computational fluid dynamics. Front. Phys. 138. Particle Image Velocimetry, Vol. 1, No. 1.
Fleit, G., Baranya, S., 2019. An improved particle image velocimetry method for Lagemann, C., Lagemann, K., Mukherjee, S., Schröder, W., 2021b. Deep recurrent
efficient flow analyses. Flow Meas. Instrum. 69, 101619. optical flow learning for particle image velocimetry data. Nat. Mach. Intell. 3 (7),
Fukami, K., Fukagata, K., Taira, K., 2019. Super-resolution reconstruction of turbulent 641–651.
flows with machine learning. J. Fluid Mech. 870, 106–120. Lai, W.S., Huang, J.B., Yang, M.H., 2017. Semi-supervised learning for optical flow
Fukami, K., Fukagata, K., Taira, K., 2021. Machine-learning-based spatio-temporal super with generative adversarial networks. Adv. Neural Inf. Process. Syst. 30.
resolution reconstruction of turbulent flows. J. Fluid Mech. 909. LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521 (7553), 436–444.
Gao, S.H., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P., 2019. Res2net: A LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient-based learning applied to
new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43 document recognition. Proc. IEEE 86 (11), 2278–2324.
(2), 652–662. Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A.,
Gao, Q., Lin, H., Tu, H., Zhu, H., Wei, R., Zhang, G., Shao, X., 2021. A robust Tejani, A., Totz, J., Wang, Z., et al., 2017. Photo-realistic single image super-
single-pixel particle image velocimetry based on fully convolutional networks with resolution using a generative adversarial network. In: Proceedings of the IEEE
cross-correlation embedded. Phys. Fluids 33 (12), 127125. Conference on Computer Vision and Pattern Recognition. pp. 4681–4690.
Glorot, X., Bordes, A., Bengio, Y., 2011. Deep sparse rectifier neural networks. In: Lee, J., Lee, S., You, D., 2018. Deep learning approach in multi-scale prediction of
Proceedings of the Fourteenth International Conference on Artificial Intelligence turbulent mixing-layer. arXiv preprint arXiv:1809.07021.
and Statistics. In: JMLR Workshop and Conference Proceedings, pp. 315–323. Lee, W., Seong, J.J., Ozlu, B., Shim, B.S., Marakhimov, A., Lee, S., 2021. Biosignal
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., sensors and deep learning-based speech recognition: A review. Sensors 21 (4), 1399.
Courville, A., Bengio, Y., 2014. Generative adversarial nets. Adv. Neural Inf. Lee, Y., Yang, H., Yin, Z., 2017. PIV-DCNN: cascaded deep convolutional neural
Process. Syst. 27. networks for particle image velocimetry. Exp. Fluids 58 (12), 1–10.
Grant, I., Pan, X., 1997. The use of neural techniques in PIV and PTV. Meas. Sci. Li, Y., Chang, J., Wang, Z., Kong, C., 2021. An efficient deep learning framework to
Technol. 8 (12), 1399. reconstruct the flow field sequences of the supersonic cascade channel. Phys. Fluids
Guo, C., Fan, Y., Yu, C., Han, Y., Bi, X., 2022. Time-resolved particle image velocimetry 33 (5), 056106.
algorithm based on deep learning. IEEE Trans. Instrum. Meas. 71, 1–13. Li, Y., Hao, Z., Lei, H., 2016. Survey of convolutional neural network. J. Comput. Appl.
Guo, X., Li, W., Iorio, F., 2016. Convolutional neural networks for steady flow 36 (9), 2508.
approximation. In: Proceedings of the 22nd ACM SIGKDD International Conference Li, Y., Perlman, E., Wan, M., Yang, Y., Meneveau, C., Burns, R., Chen, S., Szalay, A.,
on Knowledge Discovery and Data Mining. pp. 481–490. Eyink, G., 2008. A public turbulence database cluster and applications to study
Hassan, Y., Philip, O., 1997. A new artificial neural network tracking technique for Lagrangian evolution of velocity increments in turbulence. J. Turbul. (9), N31.
particle image velocimetry. Exp. Fluids 23 (2), 145–154. Li, Y., Wang, Z., Jiang, W., Xie, Z., Kong, C., Chang, J., 2022. Research on time
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recog- sequence prediction of the flow field structure of supersonic cascade channels in
nition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern wide range based on artificial neural network. Phys. Fluids 34 (1), 016106.
Recognition. pp. 770–778. Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K., 2017. Enhanced deep residual networks
Heitz, D., Mémin, E., Schnörr, C., 2010. Variational fluid flow measurements from for single image super-resolution. In: Proceedings of the IEEE Conference on
image sequences: synopsis and perspectives. Exp. Fluids 48 (3), 369–393. Computer Vision and Pattern Recognition Workshops. pp. 136–144.
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9 (8), Lin, J., Lensink, K., Haber, E., 2019. Fluid flow mass transport for generative networks.
1735–1780. arXiv preprint arXiv:1910.01694.

19
C. Yu et al. Ocean Engineering 271 (2023) 113693

Ling, J., Kurzawski, A., Templeton, J., 2016. Reynolds averaged turbulence modelling Wang, H., Bi, X., 2021. Person re-identification based on graph relation learning. Neural
using deep neural networks with embedded invariance. J. Fluid Mech. 807, Process. Lett. 53 (2), 1401–1415.
155–166. Wang, H.P., Gao, Q., Wang, S.Z., Li, Y.H., Wang, Z.Y., Wang, J.J., 2018a. Error
Liu, P., Lyu, M., King, I., Xu, J., 2019. Selflow: Self-supervised learning of optical reduction for time-resolved PIV data based on Navier–Stokes equations. Exp. Fluids
flow. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern 59 (10), 1–21.
Recognition. pp. 4571–4580. Wang, H., Liu, Y., Wang, S., 2022. Dense velocity reconstruction from particle image
Liu, T., Merat, A., Makhmalbaf, M., Fajardo, C., Merati, P., 2015. Comparison between velocimetry/particle tracking velocimetry using a physics-informed neural network.
optical flow and cross-correlation methods for extraction of velocity fields from Phys. Fluids 34 (1), 017116.
particle images. Exp. Fluids 56 (8), 1–23. Wang, Y., Yang, Y., Yang, Z., Zhao, L., Wang, P., Xu, W., 2018b. Occlusion aware
Liu, B., Tang, J., Huang, H., Lu, X.Y., 2020. Deep learning methods for super-resolution unsupervised learning of optical flow. In: Proceedings of the IEEE Conference on
reconstruction of turbulent flows. Phys. Fluids 32 (2), 025105. Computer Vision and Pattern Recognition. pp. 4884–4893.
Meister, S., Hur, J., Roth, S., 2018. Unflow: Unsupervised learning of optical flow with Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., Change Loy, C., 2018c.
a bidirectional census loss. In: Proceedings of the AAAI Conference on Artificial Esrgan: Enhanced super-resolution generative adversarial networks. In: Proceedings
Intelligence, Vol. 32, No. 1. of the European Conference on Computer Vision (ECCV) Workshops.
Mirza, A.H., Cosan, S., 2018. Computer network intrusion detection using sequen- Wereley, S.T., Gui, L., Meinhart, C., 2002. Advanced algorithms for microscale particle
tial LSTM neural networks autoencoders. In: 2018 26th Signal Processing and image velocimetry. AIAA J. 40 (6), 1047–1055.
Communications Applications Conference. SIU, IEEE, pp. 1–4. Westerweel, J., Scarano, F., 2005. Universal outlier detection for PIV data. Exp. Fluids
Mohamed, A.r., Dahl, G., Hinton, G., et al., 2009. Deep belief networks for phone 39 (6), 1096–1100.
recognition. In: Nips Workshop on Deep Learning for Speech Recognition and Wu, J.L., Kashinath, K., Albert, A., Chirila, D., Xiao, H., et al., 2020. Enforcing statistical
Related Applications, Vol. 1, No. 9. p. 39. constraints in generative adversarial networks for modeling chaotic dynamical
Nguyen, T.D., Wells, J.C., Nguyen, C.V., 2012. Velocity measurement of near-wall systems. J. Comput. Phys. 406, 109209.
flow over inclined and curved boundaries by extended interfacial particle image Xie, Y., Franz, E., Chu, M., Thuerey, N., 2018. tempogan: A temporally coherent,
velocimetry. Flow Meas. Instrum. 23 (1), 33–39. volumetric gan for super-resolution fluid flow. ACM Trans. Graph. 37 (4), 1–15.
Otter, D.W., Medina, J.R., Kalita, J.K., 2020. A survey of the usages of deep learning Xu, W., Luo, W., Wang, Y., You, Y., 2020. Data-driven three-dimensional super-
for natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 32 (2), resolution imaging of a turbulent jet flame using a generative adversarial network.
604–624. Appl. Opt. 59 (19), 5729–5736.
Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y., 2013. How to construct deep recurrent Yang, G., Ramanan, D., 2019. Volumetric correspondence networks for optical flow.
neural networks. arXiv preprint arXiv:1312.6026. Adv. Neural Inf. Process. Syst. 32.
Pawar, S., Rahman, S., Vaddireddy, H., San, O., Rasheed, A., Vedula, P., 2019. A Yu, C., Bi, X., Fan, Y., Han, Y., Kuai, Y., 2021a. LightPIVNet: An effective convolutional
deep learning enabler for nonintrusive reduced order modeling of fluid flows. Phys. neural network for particle image velocimetry. IEEE Trans. Instrum. Meas. 70, 1–15.
Fluids 31 (8), 085101. Yu, C.D., Fan, Y.-W., Bi, X.J., Han, Y., Kuai, Y.F., 2021b. Deep particle image
Pope, S.B., Pope, S.B., 2000. Turbulent Flows. Cambridge University Press. velocimetry supervised learning under light conditions. Flow Meas. Instrum. 80,
Rabault, J., Kolaas, J., Jensen, A., 2017. Performing particle image velocimetry using 102000.
artificial neural networks: a proof-of-concept. Meas. Sci. Technol. 28 (12), 125301. Yu, J., Fan, Y., Yang, J., Xu, N., Wang, Z., Wang, X., Huang, T., 2018. Wide activation
Raffel, M., Willert, C.E., Wereley, S.T., Kompenhans, J., 2007. Mathematical Background for efficient and accurate image super-resolution. arXiv preprint arXiv:1808.08718.
of Statistical PIV Evaluation. Yu, J.J., Harley, A.W., Derpanis, K.G., 2016. Back to basics: Unsupervised learning
Raissi, M., Perdikaris, P., Karniadakis, G.E., 2019. Physics-informed neural networks: of optical flow via brightness constancy and motion smoothness. In: European
A deep learning framework for solving forward and inverse problems involving Conference on Computer Vision. Springer, pp. 3–10.
nonlinear partial differential equations. J. Comput. Phys. 378, 686–707. Yu, C., Luo, H., Fan, Y., Bi, X., He, M., 2021c. A cascaded convolutional neural network
Raissi, M., Yazdani, A., Karniadakis, G.E., 2020. Hidden fluid mechanics: Learn- for two-phase flow PIV of an object entering water. IEEE Trans. Instrum. Meas. 71,
ing velocity and pressure fields from flow visualizations. Science 367 (6481), 1–10.
1026–1030. Zeiler, M.D., Fergus, R., 2014. Visualizing and understanding convolutional networks.
Ranjan, A., Black, M.J., 2017. Optical flow estimation using a spatial pyramid In: European Conference on Computer Vision. Springer, pp. 818–833.
network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R., 2010. Deconvolutional networks.
Recognition. pp. 4161–4170. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern
Rao, C., Sun, H., Liu, Y., 2020. Physics-informed deep learning for incompressible Recognition. IEEE, pp. 2528–2535.
laminar flows. Theor. Appl. Mech. Lett. 10 (3), 207–212. Zhai, M., Xiang, X., Lv, N., Kong, X., 2021. Optical flow and scene flow estimation: A
Resseguier, V., Mémin, E., Chapron, B., 2017. Geophysical flows under location survey. Pattern Recognit. 114, 107861.
uncertainty, part II quasi-geostrophy and efficient ensemble spreading. Geophys. Zhang, Z., Geiger, J., Pohjalainen, J., Mousa, A.E.D., Jin, W., Schuller, B., 2018a.
Astrophys. Fluid Dyn. 111 (3), 177–208. Deep learning for environmentally robust speech recognition: An overview of recent
Scarano, F., 2001. Iterative image deformation methods in PIV. Meas. Sci. Technol. 13 developments. ACM Trans. Intell. Syst. Technol. 9 (5), 1–28.
(1), R1. Zhang, C., Li, Z., Cai, R., Chao, H., Rui, Y., 2014. As-rigid-as-possible stereo under
Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale second order smoothness priors. In: European Conference on Computer Vision.
image recognition. arXiv preprint arXiv:1409.1556. Springer, pp. 112–126.
Stanislas, M., Okamoto, K., Kähler, C.J., Westerweel, J., 2005. Main results of the Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y., 2018b. Image super-resolution using
second international PIV challenge. Exp. Fluids 39 (2), 170–191. very deep residual channel attention networks. In: Proceedings of the European
Stanislas, M., Okamoto, K., Kähler, C.J., Westerweel, J., Scarano, F., 2008. Main results Conference on Computer Vision. ECCV, pp. 286–301.
of the third international PIV challenge. Exp. Fluids 45 (1), 27–71. Zhang, M., Piggott, M.D., 2020. Unsupervised learning of particle image velocimetry. In:
Strubell, E., Ganesh, A., McCallum, A., 2019. Energy and policy considerations for deep International Conference on High Performance Computing. Springer, pp. 102–115.
learning in NLP. arXiv preprint arXiv:1906.02243. Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y., 2018c. Residual dense network for
Sun, D., Yang, X., Liu, M.Y., Kautz, J., 2018. Pwc-net: Cnns for optical flow using image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision
pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on and Pattern Recognition. pp. 2472–2481.
Computer Vision and Pattern Recognition. pp. 8934–8943. Zhang, M., Wang, J., Tlhomole, J., Piggott, M.D., 2022. Learning to estimate and refine
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Van- fluid motion with physical dynamics. arXiv preprint arXiv:2206.10480.
houcke, V., Rabinovich, A., 2015. Going deeper with convolutions. In: Proceedings Zhang, F., Woodford, O.J., Prisacariu, V.A., Torr, P.H., 2021. Separable flow: Learning
of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1–9. motion cost volumes for optical flow estimation. In: Proceedings of the IEEE/CVF
Tai, Y., Yang, J., Liu, X., 2017. Image super-resolution via deep recursive residual International Conference on Computer Vision. pp. 10807–10817.
network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L., 2017. Beyond a gaussian denoiser:
Recognition. pp. 3147–3155. Residual learning of deep cnn for image denoising. IEEE Trans. Image Process. 26
Teed, Z., Deng, J., 2020. Raft: Recurrent all-pairs field transforms for optical flow. In: (7), 3142–3155.
European Conference on Computer Vision. Springer, pp. 402–419. Zhang, K., Zuo, W., Zhang, L., 2018d. Learning a single convolutional super-resolution
Teng, C.H., Lai, S.H., Chen, Y.S., Hsu, W.H., 2005. Accurate optical flow computation network for multiple degradations. In: Proceedings of the IEEE Conference on
under non-uniform brightness variations. Comput. Vis. Image Underst. 97 (3), Computer Vision and Pattern Recognition. pp. 3262–3271.
315–346. Zhao, R., Wang, D., Yan, R., Mao, K., Shen, F., Wang, J., 2017. Machine health
Teo, C., Lim, K., Hong, G., Yeo, M., 1991. A neural net approach in analyzing monitoring using local feature-based gated recurrent unit networks. IEEE Trans.
photograph in PIV. In: Conference Proceedings 1991 IEEE International Conference Ind. Electron. 65 (2), 1539–1548.
on Systems, Man, and Cybernetics. IEEE, pp. 1535–1538. Zhong, Q., Yang, H., Yin, Z., 2017. An optical flow algorithm based on gradient
Tong, T., Li, G., Liu, X., Gao, Q., 2017. Image super-resolution using dense skip constancy assumption for PIV image processing. Meas. Sci. Technol. 28 (5), 055208.
connections. In: Proceedings of the IEEE International Conference on Computer
Vision. pp. 4799–4807.

20

You might also like