3rd_data(1) (1)
3rd_data(1) (1)
a) variance boosting
b) gradient boosting(Y)
c) mean boosting
2. Select Which of the following is the most important language for data science?
a) Java
b) Ruby
c) R(Y)
3. select Which of the following data mining technique is used to uncover patterns in data?
a) data bagging
b) data booting
c) data merging
d) data dredging(Y)
4. The goal of ___________ is to focus on summarizing and explaining a specific set of data
a) inferential statistics
b) descriptive statistics(Y)
c) none of these
d) all of these
a) inclusive
b) exhaustive
c) mutually exclusive(Y)
6. Select Which of the following approach should be used if you can’t fix the variable?
a) randomize it(Y)
b) non stratify it
c) generalize it
7. Focusing on describing or explaining data versus going beyond immediate data and making
inferences is the difference between _______
8. Select Which of the following mentioned standard probability density functions is applicable
to discrete random variables?
a) gaussian distribution
b) poisson distribution(Y)
c) rayleigh distribution
d) exponential distribution
a) P(x)
b) ∑ P(x)
c) ∑ x P(x)(Y)
d) 1
c) the range
d) the mean
10. If a test was generally very easy, except for a few students who had very low scores, then
the distribution of scores would be defined as _____
a) positively skewed
b) negatively skewed(Y)
d) normal
11. Select Which of the following approach should be used to ask data analysis question?
a) 7.5
b) 400005.2(Y)
c) 7
d) 4
13. Select When do the conditional density functions get converted into the marginally density
functions?
b) to judge how the trained model performs outside the sample on test data
c) both to assess the predictive performance of the models and to judge how the trained model
performs outside the sample on test data(Y)
d) none of these
16.Explain how can you prevent a clustering algorithm from getting stuck in bad local optima?
c) both set the same seed value for each run and use multiple random initialization
d) none of these
17.You run gradient descent for 15 iterations with a=0.3 and compute J (theta) after each iteration.
You find that the value of J (Theta) decreases quickly and then levels off. Based on this, select which
of the following conclusions seems most plausible?
a) rather than using the current value of a, use a larger value of a (say a=1.0)
b) rather than using the current value of a, use a smaller value of a (say a=0.1)
d) none of these
a) 0.5
b) 4
c) 2(Y)
19.A fair six-sided die is rolled twice. Calculate What is the probability of getting 2 on the first roll
and not getting 4 on the second roll?
a) 1/36
b) 1/18
c) 5/36(Y)
d) 1/6
20.Suppose you have trained a logistic regression classifier and it outputs a new example x with a
prediction ho(x) = 0.2. This determine
c) All of these(N)
d) None(N)
21. For t distribution, increasing the sample size, the effect will be apply on
a) degrees of freedom
b) the t-ratio
d) all of these(Y)
a) events(Y)
b) composition
c) trials
d) functions
23.Select the following When do the conditional density functions get converted into the marginally
density functions?
a) kernel SVM
b) neural networks
d) all of these(Y)
25.select Which of the following is defined as the rule or formula to test a null hypothesis?
a) test statistic(Y)
b) population statistic
c) variance statistic
d) null statistic
26.The point where the null hypothesis gets rejected is describe as?
a) significant value
b) rejection value
c) acceptance value
d) critical value(Y)
a) kernel SVM
b) neural networks
d) all of these(Y)
a) describe those groups that have reliable differences between group means(Y)
d) none of these
29.The process of constructing a mathematical model or function that can be used to predict or
determine one variable by another variable is describe as
a) regression(Y)
b) correlation
c) residual
d) outlier plot
30.In the regression equation Y = 75.65 + 0.50X, the intercept is defined as
a) 0.5
b) 75.65000000000001
c) 1
d) indeterminable(Y)
a) 1-α
b) β
c) Α(Y)
d) 1-β
32. Statement 1: It is possible to train a network well by initializing all the weights as 0
Statement 2: It is possible to train a network well by initializing biases as 0 choose which of the
statements given above is true?
c) both statement 1 is true while statement 2 is false & statement 2 is true while statement 1 is false
are true
d) both statement 1 is true while statement 2 is false & statement 2 is true while statement 1 is false
are false
c) that as the predictor variable increases, the likelihood of the outcome occurring decreases
34. Predict which of the following would have a constant input in each epoch of training a deep
learning model?
35. Statement 1: It is possible to train a network well by initializing all the weights as 0
Statement 2: It is possible to train a network well by initializing biases as 0 choose which of the
statements given above is true?
c) both statement 1 is true while statement 2 is false & statement 2 is true while statement 1 is false
are true
d) both statement 1 is true while statement 2 is false & statement 2 is true while statement 1 is false
are false
a) time consuming
b) highly restricted(Y)
c) more generalized
b) actual output is determined by computing the outputs of units for each hidden layer(Y)
c) hidden layer's output is not all important, they are only meant for supporting input and output
layers
39. Write what is meant by generalized in statement back propagation is a generalized delta
rule?
a) because delta rule can be extended to hidden layer units(Y)
b) because delta is applied to only input and output layers, thus making it more simple and
generalized
c) it has no significance
d) it has no significance
a) ∆wij= µ(si) aj
d) ∆wij= µ bi aj(Y)
41. Write how are input layer units connected to second layer in competitive learning networks?
b) feedback manner
b) mcculloch-pitts model(Y)
b) mcculloch-pitts model(Y)
44. Judge which of the following is/are true about random forest and gradient boosting
ensemble methods? 1.Both methods can be used for classification task 2.Random forest is use for
classification whereas gradient boosting is use for regression task 3.Random forest is use for
regression whereas gradient boosting is use for classification task 4.Both methods can be used for
regression task
a) 1
b) 2
c) 1 and 4(Y)
d) 3
45. In random forest you can generate hundreds of trees (say T1, T2 ....Tn) and then aggregate
the results of these tree. Determine Which of the following is true about individual (Tk) tree in
random forest? 1.Individual tree is built on a subset of the features 2.Individual tree is built on all the
features 3.Individual tree is built on a subset of observations Individual tree is built on full set of
observations
a) 1 and 3(Y)
b) 1 and 4
c) 2 and 3
d) 2 and 4
46. Select which of the following algorithm would you take into the consideration in your final
model building on the basis of performance? Suppose you have given the following graph which
shows the ROC curve for two different classification algorithms such as random forest (Red) and
logistic regression (blue)
a) random forest(Y)
b) logistic regression
d) none of these.
47. In random forest or gradient boosting algorithms, features can be of any type. For example,
it can be a continuous feature or a categorical feature. Select which of the following option is true
when you consider these types of features?
a) only random forest algorithm handles real valued attributes by discretizing them
b) only gradient boosting algorithm handles real valued attributes by discretizing them
d) none of these
48. Analyze , The cell body of neuron can be analogous to what mathematical operation?
a) summing(Y)
b) differentiator
c) integrator
49. Explain what is the advantage of basis function over multi-layer feed forward neural
networks?
50. Suppose you are using a bagging based algorithm say a random forest in model building.
Select which of the following can be true? 1.Number of tree should be as large as possible 2.You will
have interpretability after using random forest
a) 2
b) 1 and 2
c) none of these
d) 1,(y)
a) activation
b) synchronization
c) learning(Y)
d) all of these(Y)
54. Test ,In which of the following cases will k-means clustering fail to give good results? 1) Data
points with outliers 2) Data points with different densities 3) Data points with non-convex shapes
a) 1 and 2
b) 2 and 3
c) 1, 2 and 3(Y)
d) 1 and 3
55. Evaluate Which of the following scenario prefers failover cluster instance over standalone
instance in SQL server?
a) high confidentiality
b) high availability(Y)
c) high integrity
a) destination address
57. Measure,A windows failover cluster can support up to how many nodes
a) 12
b) 14
c) 16(Y)
d) 18
58. Select an exciting new feature in SQL Server 2014 is the support for the deployment of a
failover cluster instance (FCI) with ___________
b) in memory database
59. Select which of the following is a windows failover cluster quorum mode?
a) node majority(Y)
60. Recommend ,benefits that SQL server failover cluster instances provide ____________
61. Select which of the following argument is used to set importance values?
a) set(Y)
b) value
c) all of the mentioned
a) all z nodes are ephemeral, which means they are describing a “temporary― state(Y)
c) offline snapshots are coordinated by the Master using zoo keeper to communicate with the
Region servers using a two-phase-commit-like transaction
a) stat(Y)
b) put
c) receive
d) gets
64. Propose which of the following has a design policy of using zoo keeper only for transient
data
a) hive
b) imphala
c) hbase(Y)
d) oozie
65. Write which of the following specifies the required minimum number of observations for
each column pair in order to have a valid result?
a) min_periods(Y)
b) max_periods
c) minimum_periods
66. Validate according to analysts, for what can traditional IT systems provide a foundation
when they’re integrated with big data technologies like Hadoop?
a) open source
b) real-time(Y)
c) java based
d) distributed computing approach
68. write what are the five V’s of big data?(Probable duplicate)
a) volume
b) velocity
c) variety
d) all of these(Y)
69. Write what are the different features of big data analytics?
a) open source
b) scalability
c) data recovery
d) all of these(Y)
70. Write, Facebook tackles big data with _______ based on Hadoop
a) projectprism(Y)
b) prism
c) projectdata
d) projectbid
71. Write what is a unit of data that flows through a flume agent?
a) Record
b) event(Y)
c) row
d) log
72. Validate, As companies move past the experimental phase with Hadoop, many cite the need
for additional capabilities, including _______________
75. After SVM learning, each lagrange multiplier αi takes either zero or non-zero value. What
does it express in each situation?
a) a non-zero αi indicates the data point i is a support vector, meaning it touches the margin
boundary(N)
b) a non-zero αi indicates that the learning has not yet converged to a global mini-mum(N)
c) a zero αi indicates that the data point i has become a support vector data point, on the margin(Y)
d) a zero αi indicates that the learning process has identified support for vector i(N)
76. Validate, Active learning, creating data for analytics through reinforcement learning
a) performance element
b) changing element
c) learning element(Y)
d) none of these
78. Propose which of the following has a design policy of using zoo keeper only for transient
data
hive
imphala
hbase(Y)
oozie
79. Write which of the following specifies the required minimum number of observations for
each column pair in order to have a valid result?
a) min_periods(Y)
b) max_periods
c) minimum_periods
80. Validate according to analysts, for what can traditional IT systems provide a foundation
when they€™re integrated with big data technologies like Hadoop?
a) open source
b) real-time(Y)
c) java based
82. write what are the five V’s of big data?(Probable duplicate)
a) volume
b) velocity
c) variety
d) all of these(Y)
83. Write what are the different features of big data analytics?
a) open source
b) scalability
c) data recovery
d) all of these(Y)
84. Write, Facebook tackles big data with _______ based on Hadoop
a) projectprism(Y)
b) prism
c) projectdata
d) projectbid
85. Write what is a unit of data that flows through a flume agent?
a) Record
b) event(Y)
c) row
d) log
86. As companies move past the experimental phase with Hadoop, many cite the need for
additional capabilities, including