How To Filter Rows In Pandas
Filtering Rows and Columns in Pandas Python — techniques y'all must know
In this tutorial, I will focus on the most ordinarily used techniques to filter and subset data for data frames, which will exist widely practical in your future information analysis projects.
The tutorial is based on the real information which can be downloaded from the website worldpopulationreview.
Import libraries and information
import pandas as pd
dataset = pd.read_csv('data.csv')
dataset
You lot go a dataset with 232 rows and 17 columns like this
If yous would like to know how to become the data without using importing, you can read my other post — Make Beautiful Nightingale Rose Chart in Python.
Filtering rows based by conditions
- Filtering rows on meeting one condition
The syntax of filtering row past i condition is very uncomplicated — dataframe[condition].
#Showing information of Brazil only
dataset[dataset['name']=='Brazil'] #Method one
In Python, the equal operator is ==, double equal sign.
Some other fashion of achieving the aforementioned result is using Pandas chaining operation. It is very convenient to employ Pandas chaining to combine one Pandas command with another Pandas command or user defined functions. Here we can apply Pandas eq() function and chain information technology with the name series for checking element-wise equality to filter the data.
dataset[dataset.name.eq('Brazil')] #Method 2
2. Filtering rows by more than than i weather condition
The syntax of filtering row by more 1 weather is dataframe[(condition1) & (condition2)]
#Showing only the rows in which the pop2020 is more 50,000 AND Density is higher than 100:
dataset[(dataset['pop2020'] > 50000) & (dataset['Density'] > 100)]
3. Filtering rows on NOT meeting one status
In Python, the equal operator is !=, exclamation sign followed by an equal sign.
#Excluding China from the data
dataset[dataset['name']!='China']. #Method 1 dataset[dataset.name!='China'] #Method two
4. Filtering rows on Not meeting more than one conditions
dataset[(dataset['name']!='China') & (dataset['proper name']!='India')] Method 1 dataset[(dataset.name!='Prc') & (dataset.name!='India')] # Method 2
5. Filtering rows based on a list
What if we would similar to go data of twenty countries we are interested in for further analysis? Write lawmaking using above-mentioned method like dataframe[(condition1) & (condition2) & ….&[condition20)] is very inefficient. We need to employ isin() function to solve this problem. It allows us to select rows using a list or any iterable.
We need to create a list consisting names of 20 countries and so pass the list to isin() function.
Listname = ['France', 'Italy', 'South Africa', 'Tanzania', 'Myanmar', 'Kenya','South korea', 'Colombia', 'Spain', 'Argentina', 'Uganda','Ukraine', 'Algeria', 'Sudan', 'Iraq', 'Afghanistan', 'Poland','Canada', 'Morocco', 'Saudi Arabia'] dataset[dataset.name.isin(Listname)]
Trick: you don't need to write country names one by i. You can utilise dataset.name.unique() to become unique land names, and then copy and paste names y'all would similar to select.
6. Filtering rows based on values Non in a list
Listname_excl = ['Cathay', 'India', 'U.s.', 'Republic of indonesia', 'Islamic republic of pakistan'] dataset[~dataset.name.isin(Listname_excl)]
7. Filtering rows containing certain characters
# select rows containing 'Korea'
dataset[dataset.name.str.contains("Korea")]
8. Filtering rows based on row number
dataset.filter(regex='0$', axis=0) #select row numbers ended with 0, like 0, 10, 20,thirty
Filtering columns based by atmospheric condition
- Filtering columns containing a string or a substring
If nosotros would like to get all columns with population data, nosotros can write
dataset.filter(similar = 'pop', axis = 1). #Method 1
In the bracket, similar volition search for all columns names containing 'pop'. The 'popular' doesn't need to be the starting of the column names. If we put like='n', we will become information from columns — 'proper noun', 'Density', 'WorldPercentage', and 'rank' because they all have 'n' in their names.
We can also utilise regex to get the same result.
dataset.filter(regex = 'pop', centrality = 1). #Method two
If y'all would like to select column names starting with popular, just put a chapeau ^pop.
Some other style of filtering the columns is using loc and str.contains() function.
dataset.loc[:, dataset.columns.str.contains('pop')] #Method3
Those three methods will requite you the same result but personally, I recommend the 2d method. It requite you more flexibility while less coding required.
2. Filtering columns containing more than than i different strings
If we would like to get the land and all population data, we need to use |, which is the logical OR operator in regex.
dataset.filter(regex = '^popular|name', centrality = ane)
You will get a prissy dataset in good order like this:
Removing rows with missing data
dropna() function will drib the rows where at least one chemical element is missing.
dataset.dropna(axis=0)
If you want to drop the rows where all elements are missing.
df.dropna(how='all')
At present, you are able to filter and subset dataset according to your own requirements and needs. Congratulations!
If you lot would similar to learn how to slice data in Pandas, please don't miss my other post Slicing information in Pandas Python — techniques you must know.
A note from the Obviously English team
Did yous know that we have 4 publications? Prove some love by giving them a follow: JavaScript in Plain English, AI in Apparently English language, UX in Plain English, Python in Plain English — thank you and proceed learning!
Besides, we're always interested in helping to promote good content. If you have an article that you would like to submit to any of our publications, transport an e-mail to submissions@plainenglish.io with your Medium username and what you are interested in writing about and we volition get back to you!
How To Filter Rows In Pandas,
Source: https://python.plainenglish.io/filtering-rows-and-columns-in-pandas-python-techniques-you-must-know-6cdfc32c614c
Posted by: culbertsoncrin1958.blogspot.com
0 Response to "How To Filter Rows In Pandas"
Post a Comment