0% found this document useful (0 votes)
7 views

3rd_data(1) (1)

Uploaded by

sandippatra567
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

3rd_data(1) (1)

Uploaded by

sandippatra567
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

1. Select which of the following is one of the largest boost sub-class in boosting?

a) variance boosting

b) gradient boosting(Y)

c) mean boosting

d) all of the mentioned

2. Select Which of the following is the most important language for data science?

a) Java

b) Ruby

c) R(Y)

d) none of the mentioned

3. select Which of the following data mining technique is used to uncover patterns in data?

a) data bagging

b) data booting

c) data merging

d) data dredging(Y)

4. The goal of ___________ is to focus on summarizing and explaining a specific set of data

a) inferential statistics

b) descriptive statistics(Y)

c) none of these

d) all of these

5. Non-overlapping categories or intervals are define as ______

a) inclusive

b) exhaustive

c) mutually exclusive(Y)

d) mutually exclusive and exhaustive

6. Select Which of the following approach should be used if you can’t fix the variable?
a) randomize it(Y)

b) non stratify it

c) generalize it

d) none of the mentioned

7. Focusing on describing or explaining data versus going beyond immediate data and making
inferences is the difference between _______

a) central tendency and common tendency

b) mutually exclusive and mutually exhaustive properties

c) descriptive and inferential(Y)

d) positive skew and negative skew

8. Select Which of the following mentioned standard probability density functions is applicable
to discrete random variables?

a) gaussian distribution

b) poisson distribution(Y)

c) rayleigh distribution

d) exponential distribution

9. The expected value of a discrete random variable ‘x’ is defined by _______

a) P(x)

b) ∑ P(x)

c) ∑ x P(x)(Y)

d) 1

10. The denominator (bottom) of the z-score formula is defined as

a) the standard deviation(Y)

b) the difference between a score and the mean

c) the range

d) the mean

10. If a test was generally very easy, except for a few students who had very low scores, then
the distribution of scores would be defined as _____
a) positively skewed

b) negatively skewed(Y)

c) not skewed at all

d) normal

11. Select Which of the following approach should be used to ask data analysis question?

a) find only one solution for particular problem

b) find out the question which is to be answered(Y)

c) find out answer from dataset without asking question

d) none of the mentioned

12. Select What is the mean of this set of numbers: 4, 6, 7, 9, 2000000?

a) 7.5

b) 400005.2(Y)

c) 7

d) 4

13. Select When do the conditional density functions get converted into the marginally density
functions?

a) only if random variables exhibit statistical dependency

b) only if random variables exhibit statistical independency(Y)

c) only if random variables exhibit deviation from its mean value

d) if random variables do not exhibit deviation from its mean value

14. select which of the following is the advantage/s of decision trees?

a) possible scenarios can be added

b) use a white box model, if given result is provided by a model

c) use a white box model, if given result is provided by a model

d) all of the mentioned(Y)

15.Identify what is the purpose of performing cross-validation?


a) to assess the predictive performance of the models

b) to judge how the trained model performs outside the sample on test data

c) both to assess the predictive performance of the models and to judge how the trained model
performs outside the sample on test data(Y)

d) none of these

16.Explain how can you prevent a clustering algorithm from getting stuck in bad local optima?

a) set the same seed value for each run

b) use multiple random initialization(Y)

c) both set the same seed value for each run and use multiple random initialization

d) none of these

17.You run gradient descent for 15 iterations with a=0.3 and compute J (theta) after each iteration.
You find that the value of J (Theta) decreases quickly and then levels off. Based on this, select which
of the following conclusions seems most plausible?

a) rather than using the current value of a, use a larger value of a (say a=1.0)

b) rather than using the current value of a, use a smaller value of a (say a=0.1)

c) a=0.3 is an effective choice of learning rate(Y)

d) none of these

18.If P(x) = 0.5 and x = 4, then calculate E(x) =?

a) 0.5
b) 4
c) 2(Y)

19.A fair six-sided die is rolled twice. Calculate What is the probability of getting 2 on the first roll
and not getting 4 on the second roll?

a) 1/36

b) 1/18

c) 5/36(Y)

d) 1/6
20.Suppose you have trained a logistic regression classifier and it outputs a new example x with a
prediction ho(x) = 0.2. This determine

a) our estimate for P(y=1 | x)(N)

b) our estimate for P(y=0 | x)(Y)

c) All of these(N)

d) None(N)

21. For t distribution, increasing the sample size, the effect will be apply on

a) degrees of freedom

b) the t-ratio

c) standard error of the means

d) all of these(Y)

22.In random experiment, observations of random variable are classified as ___________

a) events(Y)

b) composition

c) trials

d) functions

23.Select the following When do the conditional density functions get converted into the marginally
density functions?

a) only if random variables exhibit statistical dependency

b) only if random variables exhibit statistical independency(Y)

c) monly if random variables exhibit deviation from its mean value

d) if random variables do not exhibit deviation from its mean value

24.Predict which of the following are universal approximators?

a) kernel SVM

b) neural networks

c) boosted decision trees

d) all of these(Y)
25.select Which of the following is defined as the rule or formula to test a null hypothesis?

a) test statistic(Y)

b) population statistic

c) variance statistic

d) null statistic

26.The point where the null hypothesis gets rejected is describe as?

a) significant value

b) rejection value

c) acceptance value

d) critical value(Y)

27.Predict which of the following are universal approximators?

a) kernel SVM

b) neural networks

c) boosted decision trees

d) all of these(Y)

28.Select , What is the function of a post-test in ANOVA?

a) describe those groups that have reliable differences between group means(Y)

b) set the critical value for the F test (or chi-square)

c) determine if any statistically significant group differences have occurred

d) none of these

29.The process of constructing a mathematical model or function that can be used to predict or
determine one variable by another variable is describe as

a) regression(Y)

b) correlation

c) residual

d) outlier plot
30.In the regression equation Y = 75.65 + 0.50X, the intercept is defined as

a) 0.5

b) 75.65000000000001

c) 1

d) indeterminable(Y)

31. The probability of type 1 error is describe as?

a) 1-α

b) β

c) Α(Y)

d) 1-β

32. Statement 1: It is possible to train a network well by initializing all the weights as 0
Statement 2: It is possible to train a network well by initializing biases as 0 choose which of the
statements given above is true?

a) statement 1 is true while statement 2 is false

b) statement 2 is true while statement 1 is false(Y)

c) both statement 1 is true while statement 2 is false & statement 2 is true while statement 1 is false
are true

d) both statement 1 is true while statement 2 is false & statement 2 is true while statement 1 is false
are false

33. Large values of the log-likelihood statistic indicate

a) that there are a greater number of explained vs. unexplained observations

b) that the statistical model fits the data well

c) that as the predictor variable increases, the likelihood of the outcome occurring decreases

d) that the statistical model is a poor fit of the data(Y)

34. Predict which of the following would have a constant input in each epoch of training a deep
learning model?

a) weight between input and hidden layer(Y)

b) weight between hidden and output layer

c) biases of all hidden layer neurons


d) activation function of output layer

35. Statement 1: It is possible to train a network well by initializing all the weights as 0
Statement 2: It is possible to train a network well by initializing biases as 0 choose which of the
statements given above is true?

a) statement 1 is true while statement 2 is false

b) statement 2 is true while statement 1 is false(Y)

c) both statement 1 is true while statement 2 is false & statement 2 is true while statement 1 is false
are true

d) both statement 1 is true while statement 2 is false & statement 2 is true while statement 1 is false
are false

36. Write what is stability plasticity dilemma?

a) system can neither be stable nor plastic

b) static inputs and categorization can’t be handled

c) dynamic inputs and categorization can’t be handled(Y)

d) none of the mentioned

37. Determine drawbacks of template matching ?

a) time consuming

b) highly restricted(Y)

c) more generalized

d) none of the mentioned

38. write what is true regarding back propagation rule?

a) it is a feedback neural network

b) actual output is determined by computing the outputs of units for each hidden layer(Y)

c) hidden layer's output is not all important, they are only meant for supporting input and output
layers

d) none of the mentioned

39. Write what is meant by generalized in statement back propagation is a generalized delta
rule?
a) because delta rule can be extended to hidden layer units(Y)

b) because delta is applied to only input and output layers, thus making it more simple and
generalized

c) it has no significance

d) it has no significance

40. Predict Correlation learning law can be represented by equation?

a) ∆wij= µ(si) aj

b) ∆wij= µ(bi – si) aj

c) ∆wij= µ(bi – si) aj Õ(xi), where Õ(xi) is derivative of xi

d) ∆wij= µ bi aj(Y)

41. Write how are input layer units connected to second layer in competitive learning networks?

a) feed forward manner(Y)

b) feedback manner

c) feed forward and feedback

d) feed forward or feedback

42. Predict what is the name of the model in figure below?

a) rosenblatt perceptron model

b) mcculloch-pitts model(Y)

c) widrow’s adaline model

d) none of the mentioned

43. Choose what is the name of the model in figure below?

a) rosenblatt perceptron model

b) mcculloch-pitts model(Y)

c) widrow’s adaline model

d) none of the mentioned

44. Judge which of the following is/are true about random forest and gradient boosting
ensemble methods? 1.Both methods can be used for classification task 2.Random forest is use for
classification whereas gradient boosting is use for regression task 3.Random forest is use for
regression whereas gradient boosting is use for classification task 4.Both methods can be used for
regression task

a) 1

b) 2

c) 1 and 4(Y)

d) 3

45. In random forest you can generate hundreds of trees (say T1, T2 ....Tn) and then aggregate
the results of these tree. Determine Which of the following is true about individual (Tk) tree in
random forest? 1.Individual tree is built on a subset of the features 2.Individual tree is built on all the
features 3.Individual tree is built on a subset of observations Individual tree is built on full set of
observations

a) 1 and 3(Y)

b) 1 and 4

c) 2 and 3

d) 2 and 4

46. Select which of the following algorithm would you take into the consideration in your final
model building on the basis of performance? Suppose you have given the following graph which
shows the ROC curve for two different classification algorithms such as random forest (Red) and
logistic regression (blue)

a) random forest(Y)

b) logistic regression

c) both random forest & logistic regression

d) none of these.

47. In random forest or gradient boosting algorithms, features can be of any type. For example,
it can be a continuous feature or a categorical feature. Select which of the following option is true
when you consider these types of features?

a) only random forest algorithm handles real valued attributes by discretizing them

b) only gradient boosting algorithm handles real valued attributes by discretizing them

c) both algorithms can handle real valued attributes by discretizing them(Y)

d) none of these

48. Analyze , The cell body of neuron can be analogous to what mathematical operation?
a) summing(Y)

b) differentiator

c) integrator

d) none of the mentioned

49. Explain what is the advantage of basis function over multi-layer feed forward neural
networks?

a) training of basis function is faster than MLFFNN(Y)

b) training of basis function is slower than MLFFNN

c) storing in basis function is faster than MLFFNN

d) none of the mentioned

50. Suppose you are using a bagging based algorithm say a random forest in model building.
Select which of the following can be true? 1.Number of tree should be as large as possible 2.You will
have interpretability after using random forest

a) 2
b) 1 and 2
c) none of these

d) 1,(y)

51. Select what consist of a basic counter propagation network?

a) a feed forward network only

b) a feed forward network with hidden layer

c) two feed forward network with hidden layer(Y)

d) none of the mentioned

52. Categorize, The process of adjusting the weight is known as

a) activation

b) synchronization

c) learning(Y)

d) none of the mentioned

53. Explain how do you handle missing or corrupted data in a dataset?

a) drop missing rows or columns


b) replace missing values with mean/median/mode

c) assign a unique category to missing values

d) all of these(Y)

54. Test ,In which of the following cases will k-means clustering fail to give good results? 1) Data
points with outliers 2) Data points with different densities 3) Data points with non-convex shapes

a) 1 and 2

b) 2 and 3

c) 1, 2 and 3(Y)

d) 1 and 3

55. Evaluate Which of the following scenario prefers failover cluster instance over standalone
instance in SQL server?

a) high confidentiality

b) high availability(Y)

c) high integrity

d) none of the mentioned

56. Predict,The resources owned by WSFC node include ___________

a) destination address

b) SQL server browser

c) one file share resource, if the FILESTREAM feature is installed(Y)

d) none of the mentioned

57. Measure,A windows failover cluster can support up to how many nodes

a) 12

b) 14

c) 16(Y)

d) 18
58. Select an exciting new feature in SQL Server 2014 is the support for the deployment of a
failover cluster instance (FCI) with ___________

a) cluster shared volumes (CSV)(Y)

b) in memory database

c) column oriented database

d) all of the mentioned

59. Select which of the following is a windows failover cluster quorum mode?

a) node majority(Y)

b) no majority: read only

c) file read majority

d) none of the mentioned

60. Recommend ,benefits that SQL server failover cluster instances provide ____________

a) protection at the instance level through redundancy

b) disaster recovery solution using a multi-subnet FCI0

c) zero reconfiguration of applications and clients during failovers

d) all of the mentioned(Y)

61. Select which of the following argument is used to set importance values?

a) set(Y)
b) value
c) all of the mentioned

62. Predict the correct statement

a) all z nodes are ephemeral, which means they are describing a “temporary― state(Y)

b) hbase/replication/state contains the list of Region Servers in the main cluster

c) offline snapshots are coordinated by the Master using zoo keeper to communicate with the
Region servers using a two-phase-commit-like transaction

d) none of the mentioned


63. To register a watch on a z node data, write, what commands you need to use to access the
current content or metadata.

a) stat(Y)

b) put

c) receive

d) gets

64. Propose which of the following has a design policy of using zoo keeper only for transient
data

a) hive

b) imphala

c) hbase(Y)

d) oozie

65. Write which of the following specifies the required minimum number of observations for
each column pair in order to have a valid result?

a) min_periods(Y)

b) max_periods

c) minimum_periods

d) all of the mentioned

66. Validate according to analysts, for what can traditional IT systems provide a foundation
when they’re integrated with big data technologies like Hadoop?

a) big data management and data mining(Y)

b) data warehousing and business intelligence

c) management of Hadoop clusters

d) collecting and storing unstructured data

67. Test,All of the following accurately describe Hadoop, except

a) open source

b) real-time(Y)

c) java based
d) distributed computing approach

68. write what are the five V’s of big data?(Probable duplicate)

a) volume

b) velocity

c) variety

d) all of these(Y)

69. Write what are the different features of big data analytics?

a) open source

b) scalability
c) data recovery

d) all of these(Y)

70. Write, Facebook tackles big data with _______ based on Hadoop

a) projectprism(Y)

b) prism

c) projectdata

d) projectbid

71. Write what is a unit of data that flows through a flume agent?

a) Record

b) event(Y)

c) row

d) log

72. Validate, As companies move past the experimental phase with Hadoop, many cite the need
for additional capabilities, including _______________

a) improved data storage and information retrieval

b) improved extract, transform and load features for data integration

c) improved data warehousing functionality

d) improved security, workload management and SQL support(Y)


73. Predict, what was Hadoop named after?

a) creator doug cutting’s favorite circus act

b) cutting’s high school rock band

c) the toy elephant of cutting’s son(Y)

d) a sound cutting’s laptop made during Hadoop development

74. Write,When a job tracker schedules a task is first looks for

a) a node with empty slot in the same rack as data node(Y)

b) any node on the same rack as the data node

c) any node on the adjacent to rack of the data node

d) just any in the cluster

75. After SVM learning, each lagrange multiplier αi takes either zero or non-zero value. What
does it express in each situation?

a) a non-zero αi indicates the data point i is a support vector, meaning it touches the margin
boundary(N)

b) a non-zero αi indicates that the learning has not yet converged to a global mini-mum(N)

c) a zero αi indicates that the data point i has become a support vector data point, on the margin(Y)

d) a zero αi indicates that the learning process has identified support for vector i(N)

76. Validate, Active learning, creating data for analytics through reinforcement learning

a) performance element

b) changing element

c) learning element(Y)

d) none of these

78. Propose which of the following has a design policy of using zoo keeper only for transient
data

hive

imphala

hbase(Y)

oozie
79. Write which of the following specifies the required minimum number of observations for
each column pair in order to have a valid result?

a) min_periods(Y)

b) max_periods

c) minimum_periods

d) all of the mentioned

80. Validate according to analysts, for what can traditional IT systems provide a foundation
when they€™re integrated with big data technologies like Hadoop?

a) big data management and data mining(Y)

b) data warehousing and business intelligence

c) management of Hadoop clusters

d) collecting and storing unstructured data

81. Test,All of the following accurately describe Hadoop, except

a) open source

b) real-time(Y)

c) java based

d) distributed computing approach

82. write what are the five V’s of big data?(Probable duplicate)

a) volume

b) velocity

c) variety

d) all of these(Y)

83. Write what are the different features of big data analytics?

a) open source

b) scalability

c) data recovery

d) all of these(Y)
84. Write, Facebook tackles big data with _______ based on Hadoop

a) projectprism(Y)

b) prism

c) projectdata

d) projectbid

85. Write what is a unit of data that flows through a flume agent?

a) Record

b) event(Y)

c) row

d) log

86. As companies move past the experimental phase with Hadoop, many cite the need for
additional capabilities, including

a) improved data storage and information retrieval

b) improved extract, transform and load features for data integration

c) improved data warehousing functionality

d) improved security, workload management and SQL support(Y)

You might also like