• Home
  • /
  • Blog
  • /
  • 4 Basic Statistics For Data Science(Methods, Uses & Examples)

4 Basic Statistics For Data Science(Methods, Uses & Examples)

basic statistics for data science

This post is also available in: हिन्दी (Hindi) العربية (Arabic)

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of domains. Data science is related to data mining, machine learning, and big data.

Data scientists use a blend of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns in the raw data. Statistical methods and techniques play important roles in carrying out these tasks.

Let’s understand the basic statistics for data science and the different concepts used in data science.

Basic Statistics for Data Science

Statistics is the discipline of analyzing data. As such it intersects heavily with data science and machine learning. Following are the 4 basic concepts of statistics used in data science.

1. Descriptive Statistics

Descriptive Statistics is summarizing the data at hand through certain numbers like mean, median, mode, variance, standard deviation, etc. so as to make understanding the data easier. It does not involve any generalization or inference beyond what is available. This means that descriptive statistics are just the representation of the data (sample) available and are not based on any theory of probability.

In business, it provides the analyst with a view of key metrics and measures (mentioned above) within the business. Descriptive statistics include exploratory data analysis, unsupervised learning, clustering, and basic data summaries. Descriptive statistics usually are the starting point for any analysis. Often, descriptive statistics help us arrive at hypotheses to be tested later with more formal inference.

basic statistics for data science

Descriptive statistics are very important because if we simply present our raw data it would be hard to visualize what the data is showing, especially if there is a lot of it. Descriptive statistics, therefore, enables us to present the data in a more meaningful way.

To understand the role of descriptive statistics, let’s consider the following example. You have the marks obtained by 100,000 students in a particular examination and you may be interested in the overall performance of these students. Descriptive statistics allow us to do this. 

The mean of data gives the average score of students. Median and quartiles help in finding the percentile score of students (i.e., where a particular student stands), standard deviation and variance show the spread of data, and so on.

Is your child struggling with Maths?
frustrated-kid
We can help!
Country
  • Afghanistan 93
  • Albania 355
  • Algeria 213
  • American Samoa 1-684
  • Andorra 376
  • Angola 244
  • Anguilla 1-264
  • Antarctica 672
  • Antigua & Barbuda 1-268
  • Argentina 54
  • Armenia 374
  • Aruba 297
  • Australia 61
  • Austria 43
  • Azerbaijan 994
  • Bahamas 1-242
  • Bahrain 973
  • Bangladesh 880
  • Barbados 1-246
  • Belarus 375
  • Belgium 32
  • Belize 501
  • Benin 229
  • Bermuda 1-441
  • Bhutan 975
  • Bolivia 591
  • Bosnia 387
  • Botswana 267
  • Bouvet Island 47
  • Brazil 55
  • British Indian Ocean Territory 246
  • British Virgin Islands 1-284
  • Brunei 673
  • Bulgaria 359
  • Burkina Faso 226
  • Burundi 257
  • Cambodia 855
  • Cameroon 237
  • Canada 1
  • Cape Verde 238
  • Caribbean Netherlands 599
  • Cayman Islands 1-345
  • Central African Republic 236
  • Chad 235
  • Chile 56
  • China 86
  • Christmas Island 61
  • Cocos (Keeling) Islands 61
  • Colombia 57
  • Comoros 269
  • Congo - Brazzaville 242
  • Congo - Kinshasa 243
  • Cook Islands 682
  • Costa Rica 506
  • Croatia 385
  • Cuba 53
  • Cyprus 357
  • Czech Republic 420
  • Denmark 45
  • Djibouti 253
  • Dominica 1-767
  • Ecuador 593
  • Egypt 20
  • El Salvador 503
  • Equatorial Guinea 240
  • Eritrea 291
  • Estonia 372
  • Ethiopia 251
  • Falkland Islands 500
  • Faroe Islands 298
  • Fiji 679
  • Finland 358
  • France 33
  • French Guiana 594
  • French Polynesia 689
  • French Southern Territories 262
  • Gabon 241
  • Gambia 220
  • Georgia 995
  • Germany 49
  • Ghana 233
  • Gibraltar 350
  • Greece 30
  • Greenland 299
  • Grenada 1-473
  • Guadeloupe 590
  • Guam 1-671
  • Guatemala 502
  • Guernsey 44
  • Guinea 224
  • Guinea-Bissau 245
  • Guyana 592
  • Haiti 509
  • Heard & McDonald Islands 672
  • Honduras 504
  • Hong Kong 852
  • Hungary 36
  • Iceland 354
  • India 91
  • Indonesia 62
  • Iran 98
  • Iraq 964
  • Ireland 353
  • Isle of Man 44
  • Israel 972
  • Italy 39
  • Jamaica 1-876
  • Japan 81
  • Jersey 44
  • Jordan 962
  • Kazakhstan 7
  • Kenya 254
  • Kiribati 686
  • Kuwait 965
  • Kyrgyzstan 996
  • Laos 856
  • Latvia 371
  • Lebanon 961
  • Lesotho 266
  • Liberia 231
  • Libya 218
  • Liechtenstein 423
  • Lithuania 370
  • Luxembourg 352
  • Macau 853
  • Macedonia 389
  • Madagascar 261
  • Malawi 265
  • Malaysia 60
  • Maldives 960
  • Mali 223
  • Malta 356
  • Marshall Islands 692
  • Martinique 596
  • Mauritania 222
  • Mauritius 230
  • Mayotte 262
  • Mexico 52
  • Micronesia 691
  • Moldova 373
  • Monaco 377
  • Mongolia 976
  • Montenegro 382
  • Montserrat 1-664
  • Morocco 212
  • Mozambique 258
  • Myanmar 95
  • Namibia 264
  • Nauru 674
  • Nepal 977
  • Netherlands 31
  • New Caledonia 687
  • New Zealand 64
  • Nicaragua 505
  • Niger 227
  • Nigeria 234
  • Niue 683
  • Norfolk Island 672
  • North Korea 850
  • Northern Mariana Islands 1-670
  • Norway 47
  • Oman 968
  • Pakistan 92
  • Palau 680
  • Palestine 970
  • Panama 507
  • Papua New Guinea 675
  • Paraguay 595
  • Peru 51
  • Philippines 63
  • Pitcairn Islands 870
  • Poland 48
  • Portugal 351
  • Puerto Rico 1
  • Qatar 974
  • Romania 40
  • Russia 7
  • Rwanda 250
  • Réunion 262
  • Samoa 685
  • San Marino 378
  • Saudi Arabia 966
  • Senegal 221
  • Serbia 381 p
  • Seychelles 248
  • Sierra Leone 232
  • Singapore 65
  • Slovakia 421
  • Slovenia 386
  • Solomon Islands 677
  • Somalia 252
  • South Africa 27
  • South Georgia & South Sandwich Islands 500
  • South Korea 82
  • South Sudan 211
  • Spain 34
  • Sri Lanka 94
  • Sudan 249
  • Suriname 597
  • Svalbard & Jan Mayen 47
  • Swaziland 268
  • Sweden 46
  • Switzerland 41
  • Syria 963
  • Sao Tome and Principe 239
  • Taiwan 886
  • Tajikistan 992
  • Tanzania 255
  • Thailand 66
  • Timor-Leste 670
  • Togo 228
  • Tokelau 690
  • Tonga 676
  • Trinidad & Tobago 1-868
  • Tunisia 216
  • Turkey 90
  • Turkmenistan 993
  • Turks & Caicos Islands 1-649
  • Tuvalu 688
  • U.S. Outlying Islands
  • U.S. Virgin Islands 1-340
  • UK 44
  • US 1
  • Uganda 256
  • Ukraine 380
  • United Arab Emirates 971
  • Uruguay 598
  • Uzbekistan 998
  • Vanuatu 678
  • Vatican City 39-06
  • Venezuela 58
  • Vietnam 84
  • Wallis & Futuna 681
  • Western Sahara 212
  • Yemen 967
  • Zambia 260
  • Zimbabwe 263
Age Of Your Child
  • Less Than 6 Years
  • 6 To 10 Years
  • 11 To 16 Years
  • Greater Than 16 Years

2. Inferential Statistics

In inferential statistics, we make an inference from a sample about the population. The main aim of inferential statistics is to draw some conclusions from the sample and generalize them for the population data. E.g., you want to find the average salary of a data analyst across a country. There are two options available to you:

  • The first option is to consider the salary of data analysts across the country and take an average of it.
  • The second option is to take a sample of the salary of data analysts from major IT cities of a country and take their average and consider that for the whole country.

The first option is not possible as it is very difficult to collect all the data of data analysts across the country. It is time-consuming as well as costly. So, to overcome these issues, we will look into the second option to collect a small sample of the salaries of data analysts and take their average as the country’s average. This is inferential statistics where we make an inference from a sample about the population.

basic statistics for data science

The most common methodologies in inferential statistics are hypothesis tests, confidence intervals, and regression analysis.

3. Prediction

Prediction overlaps quite a bit with inference, but modern prediction tends to have a different mindset. Prediction is the process of trying to guess an outcome given a set of realizations of the outcome and some predictors. Machine learning, regression, deep learning, boosting, random forests, and logistic regression are all prediction algorithms.

basic statistics for data science

Predictive analytics uses historical data to predict future events. Typically, historical data is used to build a mathematical model that captures important trends. Then that predictive model is used on current data to predict what will happen next or to suggest actions to take for optimal outcomes.

Predictive analytics has received a lot of attention in recent years due to advances in supporting technology, particularly in the areas of big data and machine learning.

Types of Coordinate Systems

4. Experimental Design

At the heart of every data science project exists the planning, design, and execution of experiments. Such experiments aim at understanding the data, potentially cleaning it, and performing the necessary data analysis for knowledge discovery and decision-making. Without knowing the experimental design processes that are used in practice, researchers may not be able to discover what is really hidden in their data.

Experimental design is the act of controlling your experimental process to optimize the chance of arriving at sound conclusions. The most notable example of experimental design is randomization. In randomization, a treatment is randomized across experimental units to make treatment groups as comparable as possible. Clinical trials are the best example that employs randomization. 

basic statistics for data science

In random sampling, one tries to randomly sample from a population of interest to get better generalizability of the results to the population.

Practice Problems

  1. What are the four basic concepts of statistics used in Data Science?
  2. What is Descriptive Statistics?
  3. What is Inferential Statistics?
  4. What is the difference between Descriptive & Inferential Statistics?
  5. Mean, Median, Mode, Quartiles and Standard Deviation comes under
    • Descriptive Statistics
    • Inferential Statistics
  6. Hypothesis Tests, Confidence Intervals, and Regression Analysis comes under
    • Descriptive Statistics
    • Inferential Statistics
  7. What is meant by Probability?
  8. What is meant by Prediction in Statistics?

FAQs

Why statistics is useful for data science?

Advanced machine learning algorithms in data science utilize statistics to identify and convert data patterns into usable evidence. Data scientists use statistics to collect, evaluate, analyze, and draw conclusions from data, as well as to implement quantitative mathematical models for pertinent variables.

What type of statistics is used in data science?

Two types of statistics are used in data science.
1. Descriptive Statistics is summarizing the data at hand through certain numbers like mean, median, mode, variance, standard deviation, etc. so as to make understanding the data easier. It does not involve any generalization or inference beyond what is available.
2. Inferential statistics, where we make an inference from a sample about the population. The main aim of inferential statistics is to draw some conclusions from the sample and generalize them for the population data.

What topics in statistics are needed for data science?

Data analysis requires descriptive statistics and probability theory, at a minimum. These concepts will help you make better business decisions from data. Key concepts include probability distributions, statistical significance, hypothesis testing, and regression.

Conclusion

Data science is basically a study of analyzing data for actionable insights and Statistics is the discipline of analyzing data. There is a direct relationship between data science and statistics. Data science uses the four basic concepts of statistics that are descriptive statistics, inferential statistics, prediction, and experimental design.

Recommended Reading

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}
>