The Truth About Statistics

Recently it seems everyone has been bad talking government statistics on everything. Referencing a history of abuses, skeptics generalise the whole thing as nothing but a pack of lies and distortions. We therefore need to know if there is anything at all good about statistics – and there is plenty.

The field of statistics is a maze of esoteric terminologies and the uninitiated layman could easily get lost.  So here we stick to basics avoiding much technicalities to get the message across.

Statistics is the collection, analysis, and interpretation of data for making decisions while facing uncertainties and future probabilities.  Statisticians are “number crunchers doing the math” to explain trends and clarify issues.  In science, business, and education they use quantitative measures with compound names like econometrics and biometrics.

In statistical analysis we use data of the whole “population” or just a small representative “sample” of it. Statistics population is not just people. It may be the population of St George’s Town but also the population of nutmeg trees on Belvidere estate or all monkeys in the Grand Etang forest.

There are two kinds of statistics, namely, “descriptive” and “inferential” (statistical inference).  Descriptive statistics, example a population census, simply describe features of a population like minimum, maximum, mean (average), mode, median values with results often illustrated graphically in charts and diagrams.

For inferential statistical analysis it is not feasible to study the whole population so the statistician takes a small data sample to represent it. After analysing the data he makes inferences about the whole population making allowance for a “margin of error”.

The random sample must be typical and truly representative of the whole population.

An inferential example is a questionnaire polling 1’000 (2%) of a 50, 000 voting population to infer/predict how the whole population would vote.    Another is the Bureau of Standards conducting a survey of ten supermarkets to determine the average percentage of defective products sold in national supermarkets. Sampling avoids the prohibitive cost of contacting every single member of the population.

Statistics uses two kinds of data, namely, “cross sectional” and “time series”.  Earlier discussions were about cross-sectional data analysis to determine behaviour of variables at fixed points in time, example, the choice a voter makes on Election Day.

But time series analysis is “forecasting” future behaviour from past behaviour.  It analyses a huge amount of historical data to quantify how a certain variable would change overtime as related variables change.  The change variable is the “dependent” variable and the “independent” variables are the change agents.

In time series analysis the huge volume of data cannot be calculated manually and requires sophisticated computer programming.  “Econometrics” estimates the numerical values of changes and “diagnostics” check the statistical soundness of the model.

The following is a simplified example of a popular forecasting technique known as “multiple regression analysis”.

Assume government wishes to know which of these variables (farming technique (T), export price (XP), and (DC) disease control) would have the greatest growth impact on national soursop production.

Annual production data for 15 years (2000-2014) is collected from the Central Statistical Office – hopefully.  Your computer processes the data and the output is a “multiple regression equation” with “parameter” values as follows:

P = 80.2 + 0.60T + 0.30XP – 0.22DC
By interpretation, 80.2 % is the production capacity when farming technique, export price, and input material are zero constants.  Production increases 60%, double the increase by export price (30%), whenever farming technique improves by 1 percent.  And production decreases 22% whenever disease control measures decrease by 1 percent.

Policy makers observe that the farming technique variable had the greatest impact on production in the past fifteen years. Therefore, decisions are made to concentrate resource allocation on improving farming techniques to increase future soursop production.

Every profession observes rules of engagement and internationally accepted “best practices”.  Lawyers take the Oath of Attorney, doctors swear to the Hippocratic Oath of Ethics, and statisticians follow ISO international standardisation.

The United Nations Statistical Commission (UNSC) is the world policeman of statistics and statistical practitioners. The UNSC has mandated ten (10) fundamentals to promote high quality statistics.

The key principles are professionalism in processing protocols, scientific methods for interpretation, impartiality of public access, independence from political interference, data confidentiality, prevention of fraudulent use, timeliness, and relevance.

We live in the Information Age of mathematically generated statistics indispensible to most human activity.  Statistics is needed for evidence-based policy planning, government accountability to citizens, informed business decisions, and social research analysis.

There are numerous practical applications of statistics in modern life.

In economics statistical methods analyse data findings and measure inflation rates, budget estimates, trade policy effects, and the tax incidence.

Medical statistics report percentages of preventative diseases caused by bad human habits.

Insurance policies are based on the mathematical statistics of pooled risk, probability, and the Law of Large Numbers.

Statistical computer models provide weather forecasts to warn us for emergency preparations.

And without market research and statistical data analysis a new business is doomed to failure.

Jay Bruno

Bookmark the permalink.

Comments are closed.