Data Collection & Organization(Methods, Tools, Types & Techniques)

This post is also available in: हिन्दी (Hindi)

In today’s world knowledge is power, information is knowledge, and data is information in raw form. But before you can use that data for any purpose, you need to gather and organize it. This initial stage in data handling and statistics is called data collection and data organization.

Let’s understand these two terms and their significance.

What is Data?

Statistics is a branch of mathematics. It involves gathering information, summarizing it, and deciding what it means. They can help to predict such things as the weather and how sports teams will perform. They also can describe specific things about large groups of people—for example, the reading level of students, the opinions of voters, or the average weight of a city’s residents.

Data is the word used to describe information. This could be facts, observations, numbers, graphs, or measurements – any kind of information that has been collected and can be analyzed. 

Data can be classified into two types.

  • Primary Data 
  • Secondary Data

What is Primary Data?

Primary data is the data that is collected for the first time through personal experiences or evidence, particularly for research. It is also described as raw data or first-hand information. The mode of assembling the information is costly, as the analysis is done by an agency or an external organization, and needs human resources and investment. The investigator supervises and controls the data collection process directly.

The data is mostly collected through observations, physical testing, mailed questionnaires, surveys, personal interviews, telephonic interviews, case studies, focus groups, etc.

What is Secondary Data?

Secondary data is second-hand data that is already collected and recorded by some researchers for their purpose, and not for the current research problem. It is accessible in the form of data collected from different sources such as government publications, censuses, internal records of the organization, books, journal articles, websites and reports, etc.

This method of gathering data is affordable, readily available, and saves cost and time. However, the one disadvantage is that the information assembled is for some other purpose and may not meet the present research purpose or may not be accurate.

Difference Between Primary and Secondary Data

These are the differences between primary and secondary data.

data collection

What is Data Handling?

Data handling is the method of performing statistical analysis on the given data. It is the process that comprises data collection, data organization, data analysis, and finally its depiction with the help of graphs or charts. 

The numbers representing the speed of the wind, its direction, temperature, and humidity are the data collected by the meteorological department. But how does this data help you? It helps in predicting the weather of a place. The data that the temperature is $40^{\circ}$ becomes information when it leads to a realization that the weather is very hot. 

Information is the interpretation and understanding of data. What you handle in your day-to-day life is called raw data, this kind of data by itself does not have any meaning. It’s only after it’s organized and structured properly that it is of any use or meaning to us.

The two initial stages in data handling are

  • Data Collection
  • Data Organization

Data Collection

In Data Handling or Statistics, data collection is a process of gathering information from all the relevant sources to find a solution to a problem. It helps to evaluate the outcome of the problem. The data collection methods allow a person to conclude an answer to the relevant question. The next step after the data is collected, is data organization.

Depending on the type of data, the data collection method is divided into two categories namely,

  • Primary Data Collection methods
  • Secondary Data Collection methods

Primary Data Collection Methods

Primary data or raw data is a type of information that is obtained directly from a first-hand source through experiments, surveys, or observations. There are several methods to collect this type of data. 

Observation Method: The observation method is used when the study relates to behavioural science. This method is planned systematically. It is subject to many controls and checks. The different types of observations are:

  • Structured and unstructured observation
  • Controlled and uncontrolled observation
  • Participant, non-participant, and disguised observation

Interview Method: The method of collecting data in terms of verbal responses. It is achieved in two ways, such as

  • Personal Interview: In this method, a person known as an interviewer is required to ask questions face-to-face to the other person. The personal interview can be structured or unstructured, direct investigation, focused conversation, etc.
  • Telephonic Interview: In this method, an interviewer obtains information by contacting people on the telephone to ask questions or views, verbally.

Questionnaire Method: In this method, the set of questions is mailed to the respondent. They should read, reply and subsequently return the questionnaire. The questions are printed in the definite order on the form. A good survey should have the following features:

  • Short and simple
  • Should follow a logical sequence
  • Provide adequate space for answers
  • Avoid technical terms
  • Should have a good physical appearance, and quality of the paper to attract the attention of the respondent

Schedules: This method is similar to the questionnaire method with a slight difference. The enumerations are specially appointed for the purpose of filling the schedules. It explains the aims and objectives of the investigation and may remove misunderstandings if any have come up. Enumerators should be trained to perform their job with hard work and patience.

Secondary Data Collection Methods

Secondary data is data collected by someone other than the actual user. It means that the information is already available, and someone analyses it. The secondary data includes magazines, newspapers, books, journals, etc. It may be either published data or unpublished data.

Published data are available in various resources including

  • Government publications
  • Public records
  • Historical and statistical documents
  • Business documents
  • Technical and trade journals

Unpublished data are available in various resources including

  • Diaries
  • Letters
  • Unpublished biographies, etc.
Maths in Real Life

Data Organization

Data organization is the way to arrange the raw data in an understandable order. Organizing data include classification, frequency distribution table, picture representation, graphical representation, etc.

Data organization helps us to arrange the data in order that we can easily read and work. It is difficult to work or do any analyses on raw data. Hence, we need to organize the data to represent them in a proper way.

For example, if we want to find the median of a data set, the first step is to arrange the data in ascending or descending order.

Why Data Organization is Important?

There are a lot of benefits to organizing data. Some of these include

  • Reduces Time for Processing: Disorganized data has many bottlenecks in terms of data structuring. Suppose you have data on the results of 1000 students in a school, and you need to find out how many students scored a percentage greater than 90. If your data is unorganized, it will take a lot of time and resources to gather the required information, but suppose you have organized the data in descending order of percentages, and then it will be very quick and easy to sort out the required information. 
  • Reduces Errors in the Decision-Making Process: Organizing data also helps in reducing data loss and reduces errors. Suppose you have confusion in different sets of data, then the only solution to such problems is to organize the data properly.

Types of Data Organization

Data organization can be of various types, depending on the requirement of the user. Sometimes, the repeated values in the data are collected together to know the mode of the data or sometimes the data is organized in increasing or decreasing order, to find the median of the given set of data.

The different types of data, based on which they are organized are given below:

  • Chronological Data: Chronological data are grouped or classified according to the time, such as days, weeks, months, and years. For example, the growth of population with time in years.
  • Spatial Data: Spatial data are classified based on geographical locations or areas such as cities, states, countries, etc.
  • Qualitative Data: Qualitative data are categorized under different attributes like nationality, gender, religion, marital status, etc. Such data cannot be measured but can be classified based on their presence and absence of qualitative characteristics. For example, categorizing the population of males and females in a city.
  • Quantitative Data: Quantitive data is the type of data when the above attributes (in the case of qualitative classification) are further categorized into number-based data such as height, age, marks of students, salary, etc.

Ways of Organisation of Data in Statistics

The tools and the ways help us to organize the data efficiently. There are two ways to organize data 

  • Frequency Distribution Table: A frequency distribution table is a comprehensive way of representing the organization of raw data of a quantitative variable. This table shows how various values of a variable are distributed and their corresponding frequencies. There are two types of frequency tables.
    • Discrete Frequency Distribution: In a discrete frequency distribution, the values of the variable are determined individually. The number of times each value occurs denotes the frequencies of the particular value or observation. Discrete frequency distribution is also known as ungrouped frequency distribution.
    • Continuous Frequency Distribution: A continuous frequency distribution is a series in which the data are classified into different class intervals without gaps and their respective frequencies are assigned as per the class intervals and class width.
  • Graphical Method: Graphical Representation is a way of analyzing numerical data. It exhibits the relation between data, ideas, information, and concepts in a diagram. It is easy to understand and it is one of the most important learning strategies. It always depends on the type of information in a particular domain. There are different types of graphical representation. Some of them are as follows:
    • Line Graph: A line graph or linear graph is used to display continuous data and it is useful for predicting future events over time.
    • Bar Graph: Bar Graph is used to display the category of data and it compares the data using solid bars to represent the quantities.
    • Histogram: The graph that uses bars to represent the frequency of numerical data that are organized into intervals. Since all the intervals are equal and continuous, all the bars have the same width.
    • Line Plot: It shows the frequency of data on a given number line. $sx$ is placed above a number line each time when that data occurs again.
    • Circle Graph: Also known as the pie chart that shows the relationships of the parts of the whole. The circle is considered 100% and the categories occupied are represented with that specific percentage like 15%, 56%, etc.

Practice Problems

  • What is meant by data?
  • What are the two types of data?
  • What is the frequency distribution table?
  • What are the most common types of graphical representation of data?
  • What is data handling?


What is data collection with example?

Collection of data from information services providers and other external data sources; tracking social media, discussion forums, reviews sites, blogs, and other online channels; surveys, questionnaires, and forms, done online, in person, or by phone, email, or regular mail; focus groups and one-on-one interviews, etc.

What is data organization in statistics?

Data organization refers to the systematic arrangement of collected figures (raw data) so that the data becomes easy to understand and more convenient for further statistical treatment.

What are the two methods of data organization?

The two methods of data organization are 
Frequency Distribution Table: A frequency distribution table is a comprehensive way of representing the organization of raw data of a quantitative variable. This table shows how various values of a variable are distributed and their corresponding frequencies.
Graphical Method: Graphical Representation is a way of analyzing numerical data. It exhibits the relation between data, ideas, information, and concepts in a diagram. It is easy to understand and it is one of the most important learning strategies. It always depends on the type of information in a particular domain.


Data handling is the method of performing statistical analysis on the given data. It is a process that comprises two main activities – data collection and data organization. There are two methods of data collection – primary data collection and secondary data collection. The data organization helps us in two important ways – reducing the time of accessing information and reducing the error in the decision-making process.

Recommended Reading

Leave a Comment