Data science and big data anlytics choose the coorect answer

 
QUESTION 1
The TFIDF (or TF-IDF) is a measure that considers both  ______________ and ________________.
A.commonness of a term,  the scarcity of the term 
B.uncommoness of a term,  the scarcity of the term 
C.length of a term,  the scarcity of the term  
D.uncommoness of a term,  the weakness of the term  
5 points  
QUESTION 2
The term __________ refers to a specific implementation of association rules mining that many companies use for a variety of purposes.
A.market research analysis
B.market prediction analysis
C.market competitive analysis
D.market basket analysis
5 points  
QUESTION 3
A distribution over a fixed vocabulary of words is formally defined as..
A.subject
B.topic
C.story
D.text line
5 points  
QUESTION 4
Your customer provided you with 3,000 unlabeled records and asked you to separate them into three groups. What is the correct analytical method to use?
A.K-means clustering
B.Naive Bayesian classification
C.Linear regression
D.Logistic regression 
5 points  
QUESTION 5
Time series analysis attempts to model the underlying structure of ________________taken over time.
A.observation
B.patterns
C.solution
D.facts
5 points  
QUESTION 6
Which of the following algorithm are not an example of ensemble learning algorithm?
A.Random Forest
B.Adaboost
C.Gradient Boosting
D.Decision Trees  
5 points  
QUESTION 7
What is the difference between supervised learning and unsupervised learning?
A.Supervised learning algorithms work on data which are labelled. On the other hand, unsupervised learning algorithms work on unlabeled data.
B.Supervised learning algorithms work on data which are unlabelled. On the other hand, unsupervised learning algorithms work on labeled data.
C.Supervised learning algorithms work on raw data.  On the other had, unsupervised learning algorithms work on process data.
D.None of these
5 points  
QUESTION 8

The ____________ is the most iterative one and the one that teams tend to underestimate the amount of effort involved.
Discovery Phase
Model Building Phase
Operationalization Phase
Data Preparation Phase

5 points  
QUESTION 9
How many levels does fdata contain in the following R code. data = c(1,2,2,3,1,2,3,3,1,2,3,3,1), fdata = factor (data)
A.2,3,2
B.1,2,3
C.5,3,1
D.1,2,6
5 points  
QUESTION 10
A _______ is a table-like data structure available in languages like R and Python
A.data frame
B.data file
C.data table
D.database
5 points  
QUESTION 11
Which of the following are “Measures of Central Tendency”?
A.Mean,Range, Mode
B.Mean, Standard Deviation, Range
C.Mode, Mean, Median
D.Range, Standard Deviation, Variance  
5 points  
QUESTION 12

________________________ is a probabilistic classification method based on Bayes’ theorem.

A.Naive function
B.Naive process
C.Naive Bayes
D.None of these
5 points  
QUESTION 13

What is a type I error? What is a type II error? Is one always more serious than the other? Why?

5 points  
QUESTION 14

The ______________ function builds a model of recursive  partitioning  and regression tree and have four parameters.

A.lpart()
B.mpart()
C.rpart ()
D.None of these
5 points  
QUESTION 15
In least squares regression, which of the following is not a required assumption about the error term ε?
A.The expected value of the error term is one.
B.The variance of the error term is the same for all values of x.
C.The values of the error term are independent.
D.The error term is normally distributed. 
5 points  
QUESTION 16
During the Model Building phase, the team builds and executes _____________________________________________.
A.The models base on the work done in the Planning phase
B.The business requirement provided from business analyse
C.the models base on the work done in the Model Planning phase
D.None of the above
5 points  
QUESTION 17
Which of the following is the most important language in Data Science
A.C#
B.Java
C.Ruby
D.R
5 points  
QUESTION 18
Your organization has a website where visitors randomly receive one of two coupons. It is also possible that visitors to the website will not receive a coupon. You have been asked to determine if offering a coupon to visitors to your website has any impact on their purchase decision. Which analysis method should you use?
A.One-way ANOVA
B.K-means clustering 
C.Association rules

D.Student T-test

5 points  
QUESTION 19
How many steps does a text analysis problem consist of 
A.2
B.1
C.3
D.4
5 points  
QUESTION 20
A time series can consist of all of the following components except:
A.Time lapse
B.Trend
C.Cyclic
D.Seasonality
5 points  
QUESTION 21
Additional time series methods include all of the following except which one.
A.Autoregressive Moving Average with Exogenous inputs (ARMAX)
B.Spectral analysis
C.Kalman filtering
D.Single variable time series filtering
5 points  
QUESTION 22
The goal of POS tagging is to ______ whose input is a sentence.
A.build a text file 
B.build a model 
C.build a database
D.build a text graph  
5 points  
QUESTION 23
What happens in the final Operationalize phase? 
A.Requirements are gathered
B.The team delivers final reports, briefings, code, and technical documents. They may also run a pilot project to implement the models in a production environment.
C.The team delivers draft reports, draft briefings, code, and some technical documents. They may also run a pilot project to implement the models in a production environment.
D.None of the above
5 points  
QUESTION 24
What can be done if during the Discovery Phase the team decides that the available data is insufficient?
A.Cancel the project
B.Collect Additional Data
C.Work with what you already have
D.Do nothing
5 points  
QUESTION 25
In regression, the equation that describes how the response variable (y) is related to the explanatory variable (x) is
A.the correlation model
B.the regression model
C.used to compute the correlation coefficient
D.None of the above
5 points  
QUESTION 26
In chapter 8, a time series consists of an __________________ sequence of equally spaced values over time.
A.Unordered
B.Bilateral
C.ordered
D.lateral
5 points  
QUESTION 27
One advantage of ARIMA modeling is that the analysis can be based on _________________________for the variable of interest.
A.future time series data
B.historical time series data
C.historical time lapse data
D.None of the above
5 points  
QUESTION 28
What are the ‘resources’ being assessed in the Discovery Phase?
Cloud Resources
The business environment and business partners resources
Technology, Tools, Systems, Data, and People
None of the above
5 points  
QUESTION 29
Suppose you are using a bagging based algorithm say a RandomForest in model building. Which of the following can be true?

Number of tree should be as large as possible
You will have interpretability after using RandomForest

A.1
B.2
C.1 and 2
D.None of these   
5 points  
QUESTION 30
 HDFS block size is larger as compared to the size of the disk blocks so that _____________________
A.Only HDFS files can be stored in the disk used.
B.The seek time is maximum
C.Transfer of a large files made of multiple disk blocks is not possible.
D.A single file larger than the disk size can be stored across many disks in the cluster.
5 points  
QUESTION 31
The IDF inversely corresponds to the ______________________ , which is defined to be the number of documents in the corpus that contain a term.
A.document frequency (DF)
B.directory frequency (DF)
C.docker frequency (DF)
D.None of the above
5 points  
QUESTION 32
The arima () function in R uses ___________________________________ to estimate the model coefficients.
A.Maximum Likelihood Estimation (MLE)
B.Mini Likelihood Estimation (MLE)
C.Minimum Likelihood Estimation (MLE)
D.Mining Likelihood Estimation (MLE)
5 points  
QUESTION 33
R functionality is divided into a number of ________
A.Stored Procedures
B.Functions
C.Domains
D.Packages
5 points  
QUESTION 34
A __________________________is a simple and widely used visualization for finding the relationship among multiple variables and can represent data with up to five variables.
A.scatterplot
B.Dotchart and Barplot
C.Straight Plot
D.Box-and-Whisker Plot
5 points  
QUESTION 35
Which of the following R function can best provide descriptive statistics, such as the mean and median, about a variable as the sales data frame.
A.ggplot2 ()
B.dplyr ()
C.stringr ()
D.summary ()
5 points  
QUESTION 36
Your colleague, who is new to Hadoop, approaches you with a question. They want to know how best to access their data. This colleague has a strong background in data flow languages and programming. Which query interface would you recommend?
A.Howl
B.Pig
C.Hive
D.HBase 
5 points  
QUESTION 37
Many quantitative analysts use R as their____tool?
A.Leading tool
B.Programming tool
C.Primary Tool
D.All of the above
5 points  
QUESTION 38
In R, the ___________________ function creates a time series object from a vector or a matrix. 
A.ts ()
B.tk ()
C.ttime ()
D.plot()
5 points  
QUESTION 39
According to your text book, Chapter 4. clustering analysis groups _______________objects based on the objects’ __________.
A.similarity , cost
B.position, similarity
C.similarity, attributes
D.rank, attributes
5 points  
QUESTION 40
You have run the association rules algorithm on your data set, and the two rules {banana, apple} => {grape} and {apple, orange}=> {grape} have been found to be relevant. What else must be true?
A.{banana, apple, grape, orange} must be a frequent itemset.
B.{banana, apple} => {orange} must be a relevant rule.
C.{grape} => {banana, apple} must be a relevant rule.
D.{grape, apple, orange} must be a frequent itemset. 
5 points  
QUESTION 41
In regression analysis, the variable that is being predicted is the
A.Response, or dependent variable 
B.Independent variable 
C.Intervening variable
D.Usually X
5 points  
QUESTION 42
Which function is used to create the vector with more than one element?
A.Library()
B.plot()
C.c()
D.par()
5 points  
QUESTION 43
During the Model Building phase, the team develops _____________________________
A.Data application for testing, training, and production purposes
B.Datasets for testing, training, and production purposes
C.Datasets for prediction, training, and production purposes
D.Datasets for testing, training, and development purposes
5 points  
QUESTION 44
A data analysis must know when to pick the most suitable method for a given classification problem. When there is nonlinear data or discontinuities in the input variables that would affect the out, the best method choice to choose would be
A.Simple Regression
B.Naive Bayes
C.Logistic programming
D.Decision Tree
5 points  
QUESTION 45
Vectors come in two parts_____ and _____.
A.Atomic vectors and list
B.Atomic vectors and matrix
C.Atomic vectors and array
D.None of the above
5 points  
QUESTION 46
Which of the following is performed by Data Scientist?
A.Define the question
B.Create reproducible code
C.Challenge results
D.All of the above mentioned  
5 points  
QUESTION 47
Regression modeling is a statistical framework for developing a mathematical equation that describes how
A.one explanatory and one or more response variables are related
B.several explanatory and several response variables response are related 
C.one response and one or more explanatory variables are related 
D.All of these are correct.
5 points  
QUESTION 48
Text analysis, sometimes called text analytics, refers to the ___________,________ , and __________ of textual data to derive useful insights
A.representation,  processing , and  modeling
B.representation,  processing , and  designing
C.presentation,  processing , and  modeling
D.presentation,  storing , and  modeling
5 points  
QUESTION 49
What is a major difference between BI and Analytics?
A.Analytics has predictive capabilities whereas BI helps in informed decision-making based on analysis of past data
B.Analytics has no predictive capabilities whereas BI helps in informed decision-making based on analysis of past data 
C.Analytics has is not always reliable whereas BI helps in informed decision-making based on analysis of past data 
D.None of the above
5 points  
QUESTION 50
A ___________________________is a specific table layout that allows visualization of the performance of a  classier.
A.positive matrix
B.true matrix
C.confusion matrix
D.false positive rate matrix
5 points  
QUESTION 51
Which of the following are correct.
A.Raw data is original source of data
B.Preprocessed data is original source of data
C.Raw data is the data obtained after processing steps
D.None of the mentioned 
5 points  
QUESTION 52
Unsupervised learning is where you only have input data (X) and ______________________output variables.
A.two corresponding
B.three corresponding
C.no corresponding
D.two or more
5 points  
QUESTION 53
Which of the following is not  a critical characteristic of Big Data?
A.Velocity
B.Volume
C. Variety
D. Value
5 points  
QUESTION 54
What of the following are identified in the Communicate Results phase?
A.Key findings
B.A quantification of the business value
C.A narrative to summarize and convey findings to stakeholders
D.All of the above
5 points  
QUESTION 55
Which SQL function is used to count the number of rows in a SQL query?
A.COUNT()
B.NUMBER()
C.SUM()
D.COUNT(*) 
5 points  
QUESTION 56
Which of the following sort dataframe by the order of the elements in B
A.a.x[rev(order(x$B)),]
B.b.x[ordersort(x$B),]
C.c.x[order(x$B),]
D.None
5 points  
QUESTION 57
Which of the following is not a good practical use of Big Data Analytics?
A.Location Tracking
B.Precision Medicine
C.Customer Discrimination
D.Fraud Detection & Handling
5 points  
QUESTION 58
How many types of R objects are present in R data type?
A.1
B.17
C.4
D.6
5 points  
QUESTION 59An ______ is a workspace that is typically isolated from production applications and warehouse environments.
A.Private Environment
B.Sand Environment
C.Sandbox
D.Production box
5 points  
QUESTION 60
Data smoothing in predictive analytics is, essentially, trying to find the “signal” in the “noise” by discarding data points that are considered “noisy”. There are various smoothing techniques. Which of the following is NOT a smoothing  technique.
A.Laplace smoothing
B.Kneser-Ney smoothing
C.Katz smoothing
D.Statistical smoohting
5 points  
QUESTION 61
All of the following are the steps followed during a text analysis problem:
A.searching, parsing, and retrieval, and text mining
B.saving, parsing, and retrieval, and text mining
C.searching, parsing, and retrieval, and text saving 
D.searching, parting, and retrieval, and text mining
5 points  
QUESTION 62
Unlike Pig and Hive, which are intended for _____________ , Apache HBase is capable of providing _________________ and write access to datasets with billions of rows and millions of columns.
A.batch application, historic data read
B.batch application, real-time data read
C.batch application, speed data read
D.batch application, deep data read
5 points  
QUESTION 63
When writing SQL query statements, a RIGHT OUTER JOIN is used t specify that _____________________________from the table on the right-handside of the join, should be returned, regardless of whether there is a matching record found.
A.some rows
B.partial rows
C.all rows
D.all columns
5 points  
QUESTION 64
The conditional probability of event  C occurring, given then event A has already occurred, is denoted as 
A.P(A| C)
B.P(A| A) C
C.P(C| C) A
D.None of these

Place your order
(550 words)

Approximate price: $22

Homework help cost calculator

600 words
We'll send you the complete homework by September 11, 2018 at 10:52 AM
Total price:
$26
The price is based on these factors:
Academic level
Number of pages
Urgency
Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 customer support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • 4 hour deadline
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 300 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read more

Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read more

Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read more

Privacy policy

Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read more

Fair-cooperation guarantee

By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more
× How can I help you?