pandas数据预处理实例

pandas数据预处理实例

news/2024/7/7 9:58:17

排序，默认从小到大排

#By default, pandas will sort the data by the column we specify in ascending order and return a new DataFrame
# Sorts the DataFrame in-place, rather than returning a new DataFrame.
#print food_info["Sodium_(mg)"]
food_info.sort_values("Sodium_(mg)", inplace=True)
print (food_info["Sodium_(mg)"])
#Sorts by descending order, rather than ascending.
food_info.sort_values("Sodium_(mg)", inplace=True, ascending=False)
print (food_info["Sodium_(mg)"])

运行结果：
在这里插入图片描述

打开一个csv文件

import pandas as pd
import numpy as np
titanic_survival = pd.read_csv("titanic_train.csv")
titanic_survival.head()

运行结果：
在这里插入图片描述

计算空值个数

#The Pandas library uses NaN, which stands for "not a number", to indicate a missing value.
#we can use the pandas.isnull() function which takes a pandas series and returns a series of True and False values
age = titanic_survival["Age"]
# print(age.loc[0:10])
age_is_null = pd.isnull(age)
# print (age_is_null)
age_null_true = age[age_is_null]
# print (age_null_true)
age_null_count = len(age_null_true)
print(age_null_count)

运行结果：
在这里插入图片描述

http://www.niftyadmin.cn/n/4714813.html

相关文章

pandas常用预处理方法

pandas常用预处理方法

求均值，表格中含有空值： #The result of this is that mean_age would be nan. This is because any calculations we do with a null value also result in a null value mean_age sum(titanic_survival["Age"]) / len(titanic_survival[&qu…

阅读更多...

VS 2010之多显示器支持 / Multi-Monitor Support (VS 2010 and .NET 4 Series)

VS 2010之多显示器支持 / Multi-Monitor Support (VS 2010 and .NET 4 Series)

【原文地址】Multi-Monitor Support (VS 2010 and .NET 4 Series) 【原文发表日期】 Monday, August 31, 2009 10:37 PM 这是我针对即将发布的VS 2010 和 .NET 4所撰写的贴子系列的第四篇。今天的贴子讨论其中一个IDE改进，我知道很多人都在迫切期望VS 2010的--…

阅读更多...

pandas自定义函数

pandas自定义函数

sort_values和reset_index new_titanic_survival titanic_survival.sort_values("Age",ascendingFalse) print (new_titanic_survival[0:10]) titanic_reindexed new_titanic_survival.reset_index(dropTrue) print(titanic_reindexed.iloc[0:10])运行结果&#xf…

阅读更多...

Series结构

Series结构

读取csv文件： import pandas as pd fandango pd.read_csv(fandango_score_comparison.csv) series_film fandango[FILM] print(series_film[0:5]) series_rt fandango[RottenTomatoes] print (series_rt[0:5])运行结果： 制作Series # Import the Se…

阅读更多...

折线图的绘制

折线图的绘制

to_datetime import pandas as pd unrate pd.read_csv(unrate.csv) unrate[DATE] pd.to_datetime(unrate[DATE]) print(unrate.head(12))运行结果： 绘图 from pandas.plotting import register_matplotlib_converters #%matplotlib inline #Using the different…

阅读更多...

技术人员不应该固步自封

技术人员不应该固步自封

能力的提高不是通过量，而是通过质来提高的。经常听到人们说，这点东西犯不到花这么大力气。如果是学术问题，我觉得OK，确实是这样，因为有思路就行了。但是技术问题则不同，光有想法是不够的。工程上是要…

阅读更多...

子图的操作

子图的操作

读数据绘图： import pandas as pd from pandas.plotting import register_matplotlib_convertersunrate pd.read_csv(unrate.csv) unrate[DATE] pd.to_datetime(unrate[DATE]) first_twelve unrate[0:12] plt.plot(first_twelve[DATE], first_twelve[VALUE]) plt…

阅读更多...

字符串相似度算法 / The Arithmetic of String Similarity Degree

字符串相似度算法 / The Arithmetic of String Similarity Degree

dongle2001的《字符串相似度算法介绍(整理)》中提到，算法分为三类： 1、编辑距离（Levenshtein Distance） 编辑距离就是用来计算从原串（s）转换到目标串(t)所需要的最少的插入，删除和替换的数目…

阅读更多...

最新文章