0% found this document useful (0 votes)
5 views2 pages

Mock Paper 8NN

The document consists of a series of questions and answers related to neural networks, specifically focusing on convolutional neural networks (CNNs) and multi-layer perceptrons (MLPs). It covers various topics including advantages of CNNs, functions of convolutional filters, pooling techniques, activation functions, and training methods. The content is structured in a quiz format, providing insights into key concepts and terminology in deep learning.

Uploaded by

nzangi824
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views2 pages

Mock Paper 8NN

The document consists of a series of questions and answers related to neural networks, specifically focusing on convolutional neural networks (CNNs) and multi-layer perceptrons (MLPs). It covers various topics including advantages of CNNs, functions of convolutional filters, pooling techniques, activation functions, and training methods. The content is structured in a quiz format, providing insights into key concepts and terminology in deep learning.

Uploaded by

nzangi824
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

1.What is the primary advantage of CNNs 2.What does a convolutional filter do in 3.

What does max pooling in CNNs


over MLPs CNNs? achieve?
a) Larger datasets support a) Scales pixel values a) Reduces computational complexity
b) Fewer parameters for image proc b) Slides across the image b) Upscales image resolution
c) Faster training time for all tasks c) Enhances image resolution c) Multiplies features
d) Avoiding the use of activation fun d) Removes noise from the image d) Removes irrelevant layer
4.What is the typical stride used for max 5.Which of the following is NOT a feature 6.What is the purpose of activation
pooling in CNNs? of convolutional operations? functions in a CNN?
a) 1 a) Edge detection a) To enhance pooling layers
b) 2 b) Capturing spatial relationships b) To introduce non-linearity
c) 4 c) Generating a flattened 1D vector c) To remove redundant weights
d) Variable d) Producing feature maps d) To increase data dimensionality
7.In MNIST classification, what does a 8.What happens when the learning rate in 9.How does pooling affect the feature map
CNN operate on? gradient descent is too high? size?
a) 1D vectors of pixel values a) The model trains very slowly a) Reduces it
b) 2D images b) The G.D may overshoot and diverge b) Doubles it
c) Augmented datasets only c) The loss stagnates c) Leaves it unchanged
d) Images converted to grayscale d) The model performs optimally d) Depends on filter size
10.What is a common filter size used in 11.What does the term "parameters" in a 12.What is the main purpose of data
CNNs for image convolution? NN refer to? preprocessing in training NN
a) 2x2 a) Data features a) To reduce data size
b) 3x3 b) Model weights and biases b) To speed up testing
c) 5x5 c) Hidden layers c) To help convergence
d) Variable depending on the task d) Activation functions d) To eliminate noise completely
13.Which loss function is commonly used 14.What is the gradient in the context of the 15.Why is the learning rate important in
for classification tasks in NN. GD Algorithm? Gradient Descent
a) Mean Absolute Error (MAE) a) A measure of model accuracy a) It prevents overfitting
b) Mean Squared Error (MSE) b) A direction of steepest inc in loss b)controls the size of param updates
c) Cross-entropy loss c) A scaling factor for activation func c) It increases comput efficiency
d) Hinge loss d) A way to select the best training d) It determines the activation func
16.What does "mini-batch gradient 17.Which optimizer combines momentum 18.What does the "softmax" activation
descent" involve? with adaptive learning rates? function do?
a) find loss for 1 training at a time a) Stochastic Gradient Descent (SGD) a) Maps logits to a P.D
b) find loss for a subset of the tr data b) Adam b) Acts as a loss function
c) Use all tr. data for a param update c) Adagrad c) Normalize values to [0,1] range
d) Shuffling data without param d) Mini-batch GDdistributed data d) Removes noise from training data
19.What can happen if the learning rate is 20.Which method is preferred for large 21.What does (MLP) primarily consist of?
too small? datasets like ImageNet? a) CNN layer and pooling layers
a) G.D converges too quickly a) Stochastic Gradient Descent (SGD) b) Weights, biases, and activation fun
b) G.D fails to converge b) Full Gradient Descent c) Only output neurons
c) Training takes a long time c) Mini-batch Gradient Descent d) Training data and test data
d) Loss increases exponentially d) Batch Normalization
22.How is the error in training a NN 23.What is the first step in training NN 24.What is the role of activation functions
typically reduced? a) Updating the weights in a NN?
a) By scaling the output layer b) Feeding training data into network a) To optimize weights
b) By adjusting weights and biases c) Initializing weights randomly b) To add non-linearity to the model
c) By reducing the number of neurons d) Calculating the final output c) To preprocess input data
d) By reshuffling the training data d) To minimize the loss function
25.Which of the following represents a 26.What does the decision boundary of a 27.What kind of datasets benefit the most
common non-linear activation func? NN signify? from MLPs?
a) Softmax a) Network's overall loss a) Sequential data
b) Sigmoid b) Point of convergence b) Structured tabular data
c) Mean Squared Error c) Separation of classes in feature space c) Large image datasets
d) Batch Normalization d) Final set of weights d) Unstructured text data
28.How is a labeled dataset used in 29.What is the output of a NN when using a 30.What is the purpose of weight
training NNs? sigmoid function? adjustments during training?
a) As test data for prediction a) A binary classification (0 or 1) a) To max the loss function
b) To compare predictions b) A value in the range [0,1] b) To min the error
c) To identify missing features c) The sum of all input features c) To increase neuron count
d) To initialize weights and biases d) A probability distribution d) To normalize input features
31.What is the primary reason for the 32.Major Contributions to the success of 33.Why are DNN often more effective
resurgence of NN around 2010? NN than shallow ones?
a) Introduction of random forests a) Turing, Neumann, and Minsky a)Reduce the need for activation func
b) Advancement in computing b) LeCun, Hinton, and Yoshua Bengio b) Perform better at learning
c) Development of rule-based AI sys c) Andrew Ng, Goodfellow, and Fei Li c) Require fewer parameters to train
d) Decline in the effectiveness SVM d) Elon Musk, Hassabis, and Hinton d) They eliminate the vanishing G.D
34.What is a perceptron in the context of 35.What type of activation function is the 36.What does the term 'deep' in deep
NNs? sigmoid function? learning refer to?
a) The most complex type of NN a) Linear a) num of neurons in the input layer
b) maps inputs to outputs b) Non-linear b) num of hidden layers in the netw
c) A layer that performs pooling op c) Exponential c) number of training examples used
d) A non-linear activation function d) Polynomial d) The use of complex activation functions
37.What is the purpose of parallel 38.How does the sigmoid function output 39.Which activation function is most
computing in NNs? values? prone to the vanishing gradient problem?
a) To increase the depth of the network a) In the range [-1,1] a) ReLU
b) To speed up matrix operations b) In the range [0,1] b) Sigmoid
c) To simplify activation functions c) As whole numbers only c) Leaky ReLU
d) To reduce the size of datasets d) As probabilities greater than 1 d) Softmax function
d) Any polynomial function
40.Why is deep learning particularly 41.What is the main purpose of transfer 42.What is a common starting point for
useful for speech and computer vision learning in CNNs? the learning rate.
tasks? a) train a CNN from scratch a) The original learning rate
a) requires no labeled data b) use a pre-trained network b) Twice the original learning rate
b) provides a flexible learnable frame c) reduce the size of the dataset c) One-tenth of the original learning rate
c) eliminates the need for preprocessing d) increase the number of layers in a CNN d) Half the original learning rate
d) relies on rule-based methods
43.Which architecture introduced the 44.What is a key feature of GoogLeNet's 45.How does ResNet address the
concept of "dropout"? architecture vanishing gradient problem?
a) AlexNet a) Fully connected layers at every stage a) By using smaller learning rates
b) VGGNet b) Use of "Inception" modules b) By employing "residual connections"
c) GoogLeNet c) 11x11 convolutional filters c) By increasing the number of epochs
d) ResNet d) 152 layers d) By reducing the number of layers
46.What is the total number of layers in 47.What advantage does GoogLeNet have 48.Which architecture first utilized ReLU
ResNet's winning model for ImageNet? over AlexNet and VGGNet? activations?
a) 22 a) Fewer parameters and higher efficiency a) GoogLeNet
b) 152 b) Larger training datasets b) VGGNet
c) 16 c) Deeper networks with more parameters c) AlexNet
d) 8 d) Use of only max pooling layers d) ResNet
49.What is the key innovation in 50.What is the main takeaway from 51.Which rule is used to calculate
VGGNet? ResNet's gradients in backpropagation?
a) Global average pooling a) Deeper networks require larger L.R a) Sum rule
b) Deeper networks with small convo net b) Residual connections trains deep NN b) Chain rule
c) Fully connected layers after every layer c) Shallow networks outperform deepNN c) Product rule
d) Use of skip connections d) Fully connected layers essential for Dnn d) Mean value theorem
52.What is the primary advantage of the 53.Which activation function addresses the 54.What is the derivative of the identity
ReLU activation function? "dying ReLU" problem? function f(x)=x^2
a) Saturates for large positive inputs a) Sigmoid a) x
b) Squashes numbers between 0 and 1 b) Tanh b) 1
c) Computational efficiency and faster c) Leaky ReLU c) 0
d) Zero-centered output d) Softmax d) Undefined
55.Which approach is used to mitigate the 56.What does automatic differentiation in 57.What is a key feature of modern DL
vanishing gradient problem? deep learning libraries simplify? frameworks regarding backpropagation?
a) Increase the batch size a) Calculating forward passes a) Manual computation of gradients
b) Use activation functions like ReLU b) Deriving partial derivatives by hand b) Use of pre-computed gradients
c) Reduce the number of layers in the net c) Initializing weights c) Automatic gradient computation
d) Train with a larger learning rate d) Evaluating the model accuracy d) No backward pass required
58.What is the main objective of learning 59.What technique penalizes large weights 60.What is the purpose of dropout during
rate scheduling? in the network to prevent overfitting? training?
a) To increase the learning rate over time a) Dropout a) permanently remove redundant units
b) adjust the learning rate during training b) Batch Normalization b) improve validation accuracy
c) To fix the learning rate for faster c) ℓ2 Weight Decay c) To randomly drop units and conn
d) To decrthe number of training epochs d) Early stopping d) To initialize weights more efficiently
61.What is the key parameter for early 62.What is the typical dropout rate used in 63.What does the MNIST dataset
stopping? neural networks? primarily contain?
a) Dropout rate a) 5-10% a) Images of animals
b) Number of layers b) 20-50% b) Handwritten digits
c) Patience (number of epochs) c) 60-80% c) Scenes from nature
d) Regularization coefficient d) 90-100% d) Medical imaging data
64.Why is the softmax layer preferred for 65.What is the input size for images in the 66.Which technique reduces spatial
multi-class classification? MNIST dataset? dimensions in CNNs?
a) It is faster than sigmoid activations a) 16x16 pixels a) Fully connected layers
b) outputs interpretable probability value b) 28x28 pixels b) ReLU activation
c) It reduces the complexity of the model c) 32x32 pixels c) Pooling layers
d) It eliminates the need for dropout d) 64x64 pixels d) Softmax layers
67.What feature makes CNNs efficient for 68.What is a primary application of neural 69.What activation function is commonly
image data? networks discussed in the lecture? used in the output layer for classification.
a) Large weight matrices a) Handwriting digit recognition a) ReLU
b) Use of sparsely connected layers b) Real-time weather prediction b) Sigmoid
c) Hierarchical feature extraction c) Stock market analysis c) Softmax
d) Overfitting prevention mechanisms d) Chatbot development d) Tanh
70.Which preprocessing step helps in 71.What is the main characteristic of a 72.What role do weights and biases play
obtaining zero-centered data? Multi-Layer Perceptron (MLP)? in an MLP?
a) Normalization a) Use of convolutional layers a) They determine the activation function
b) Scaling within [0,1] b) Fully connected layers b) adjust outputs based on input features
c) Mean subtraction c) Hierarchical feature extraction c) They standardize the input data
d) All of the above d) Handling sequential data d) They define the number of neurons net
73.How is the error minimized during 74.What is the function of the sigmoid 75.What is a decision boundary in the
training in MLPs? activation? context of neural networks?
a) By normalizing data a) Maps outputs between [-1,1] a) The threshold for neuron activation
b) By adjusting weights and biases b) Maps outputs between [0,1] b) The boundary separating data classes
c) By increasing the number of layers c) Maps inputs into larger values c) The point where weights are initialized
d) By using larger datasets d) Scales inputs within [-1,1] d) A measure of loss during training
76.What distinguishes neural networks 77.What is the output of the sigmoid 78Which algorithm re-popularized neural
from other machine learning models? activation function? networks after 2010?
a) Use of manual feature extraction a) Always 0 a) Support vector machines
b) Capability to learn hierarchical rep b) Binary values (0 or 1) b) Random forests
c) Dependence on small datasets c) A value between 0 and 1 c) Deep learning
d) Inability to generalize d) Positive integers d) Rule-based systems

You might also like