Cyber Policy: Problem Set
Calculate the mean, standard deviation, and standard error for the ransom variable in d and store them in ransom.mean, ransom.sd, and ransom.se respectively.
What is a substantive difference between the standard deviation of ransom requests and their standard error? Explain in plain English:
What does the confidence interval tell us here? What does the 95% stand for?
What will happen to the confidence interval if we only use a subset of observations?
Compare, in plain English, these two confidence intervals. So, CI describes our uncertainty about something. About what?
How does this uncertainty change if instead of d we use only its subset (d.sample)?
What is the probability that an actual mean ransom in the population of companies is 9 or more? (Reminder: ransom is coded in millions of dollars, so 9 stands for 9 mln dollars) To answer this question calculate z.score and p.value. You might want to round the p.value to three digits at the end using round(p.value, digits = 3).
What is the probability that an actual mean ransom request in the population of companies is 8.3 or less?
What do these two p.values tell us about our confidence regarding the proximity of the sample mean ransom to the population mean ransom?
What does it stand for in the chunk above? You might want to go through the last example in Handout-1 to help you answer this question.
Finally, conduct the statistical test and figure out what is the chance that on average the hackers request the same amount of ransom across the small and medium-sized companies. Use t.test() function to calculate a p.value.
We conducted t-tests for each pair of groups. What conclusion about the differences in average ransom requests can we make based on these three p.values?
What does the constant stand for in this example?
Next, use internet_sales_share (X) to explain (or predict) ransom (Y). Using function lm(), regress ransom on internet_sales_share (no need to specify constant as 1 this time). Store the resulting model in linear.model
What does the coefficient for internet_sales_share stand for? Explain in plain English
What does the number inside the parenthesis stand for?
What do the three stars mean?
What do a constant stand for in this case?
Next, we conduct regression analysis with adjustment. Regress both internet_sales_share and critical_industry on ransom.
Store the results in model.with.critical.industry. Next, regress both internet_sales_share and size on ransom. Store the results in model.with.size.
Display the results of linear.model, model.with.critical.industry, and model.with.size in the same table by using stargazer(
Why do you think the coefficient for internet_sales_share in model.with.critical.industry is the same as in linear.model?
Why do you think the coefficient for internet_sales_share in model.with.size is different from the one in linear.model?
What does the coefficient for internet_sales_share stand for in model.with.size? Explain in plain English