pandas数据预处理实例

news/2024/7/7 9:58:17
  1. 排序,默认从小到大排

    #By default, pandas will sort the data by the column we specify in ascending order and return a new DataFrame
    # Sorts the DataFrame in-place, rather than returning a new DataFrame.
    #print food_info["Sodium_(mg)"]
    food_info.sort_values("Sodium_(mg)", inplace=True)
    print (food_info["Sodium_(mg)"])
    #Sorts by descending order, rather than ascending.
    food_info.sort_values("Sodium_(mg)", inplace=True, ascending=False)
    print (food_info["Sodium_(mg)"])
    

    运行结果:
    在这里插入图片描述

  2. 打开一个csv文件

    import pandas as pd
    import numpy as np
    titanic_survival = pd.read_csv("titanic_train.csv")
    titanic_survival.head()
    

    运行结果:
    在这里插入图片描述

  3. 计算空值个数

    #The Pandas library uses NaN, which stands for "not a number", to indicate a missing value.
    #we can use the pandas.isnull() function which takes a pandas series and returns a series of True and False values
    age = titanic_survival["Age"]
    # print(age.loc[0:10])
    age_is_null = pd.isnull(age)
    # print (age_is_null)
    age_null_true = age[age_is_null]
    # print (age_null_true)
    age_null_count = len(age_null_true)
    print(age_null_count)
    

    运行结果:
    在这里插入图片描述


http://www.niftyadmin.cn/n/4714813.html

相关文章

pandas常用预处理方法

求均值,表格中含有空值: #The result of this is that mean_age would be nan. This is because any calculations we do with a null value also result in a null value mean_age sum(titanic_survival["Age"]) / len(titanic_survival[&qu…

VS 2010之多显示器支持 / Multi-Monitor Support (VS 2010 and .NET 4 Series)

【原文地址】Multi-Monitor Support (VS 2010 and .NET 4 Series) 【原文发表日期】 Monday, August 31, 2009 10:37 PM 这是我针对即将发布的VS 2010 和 .NET 4所撰写的 贴子系列的第四篇。 今天的贴子讨论其中一个IDE改进,我知道很多人都在迫切期望VS 2010的--…

pandas自定义函数

sort_values和reset_index new_titanic_survival titanic_survival.sort_values("Age",ascendingFalse) print (new_titanic_survival[0:10]) titanic_reindexed new_titanic_survival.reset_index(dropTrue) print(titanic_reindexed.iloc[0:10])运行结果&#xf…

Series结构

读取csv文件: import pandas as pd fandango pd.read_csv(fandango_score_comparison.csv) series_film fandango[FILM] print(series_film[0:5]) series_rt fandango[RottenTomatoes] print (series_rt[0:5])运行结果: 制作Series # Import the Se…

折线图的绘制

to_datetime import pandas as pd unrate pd.read_csv(unrate.csv) unrate[DATE] pd.to_datetime(unrate[DATE]) print(unrate.head(12))运行结果: 绘图 from pandas.plotting import register_matplotlib_converters #%matplotlib inline #Using the different…

技术人员不应该固步自封

能力的提高不是通过量,而是通过质来提高的。 经常听到人们说,这点东西犯不到花这么大力气。 如果是学术问题,我觉得OK,确实是这样,因为有思路就行了。 但是技术问题则不同,光有想法是不够的。工程上是要…

子图的操作

读数据绘图: import pandas as pd from pandas.plotting import register_matplotlib_convertersunrate pd.read_csv(unrate.csv) unrate[DATE] pd.to_datetime(unrate[DATE]) first_twelve unrate[0:12] plt.plot(first_twelve[DATE], first_twelve[VALUE]) plt…

字符串相似度算法 / The Arithmetic of String Similarity Degree

dongle2001的《字符串相似度算法介绍(整理)》中提到,算法分为三类: 1、编辑距离(Levenshtein Distance) 编辑距离就是用来计算从原串(s)转换到目标串(t)所需要的最少的插入,删除和替换 的数目…