pandas自定义函数

pandas自定义函数

news/2024/7/7 9:57:51

sort_values和reset_index

new_titanic_survival = titanic_survival.sort_values("Age",ascending=False)
print (new_titanic_survival[0:10])
titanic_reindexed = new_titanic_survival.reset_index(drop=True)
print(titanic_reindexed.iloc[0:10])

运行结果：
在这里插入图片描述

自定义函数

# This function returns the hundredth item from a series
def hundredth_row(column):
    # Extract the hundredth item
    hundredth_item = column.iloc[99]
    return hundredth_item

# Return the hundredth item from each column
hundredth_row = titanic_survival.apply(hundredth_row)
print (hundredth_row)

运行结果：
在这里插入图片描述

非0行个数

def not_null_count(column):
    column_null = pd.isnull(column)
    null = column[column_null]
    return len(null)

column_null_count = titanic_survival.apply(not_null_count)
print (column_null_count)

运行结果：
在这里插入图片描述

练习

#By passing in the axis=1 argument, we can use the DataFrame.apply() method to iterate over rows instead of columns.
def which_class(row):
    pclass = row['Pclass']
    if pd.isnull(pclass):
        return "Unknown"
    elif pclass == 1:
        return "First Class"
    elif pclass == 2:
        return "Second Class"
    elif pclass == 3:
        return "Third Class"

classes = titanic_survival.apply(which_class, axis=1)
print (classes)

运行结果：
在这里插入图片描述

连续值离散化

def is_minor(row):
    if row["Age"] < 18:
        return True
    else:
        return False

minors = titanic_survival.apply(is_minor, axis=1)
#print minors

def generate_age_label(row):
    age = row["Age"]
    if pd.isnull(age):
        return "unknown"
    elif age < 18:
        return "minor"
    else:
        return "adult"

age_labels = titanic_survival.apply(generate_age_label, axis=1)
print (age_labels)

运行结果：
在这里插入图片描述

添加列

titanic_survival['age_labels'] = age_labels
age_group_survival = titanic_survival.pivot_table(index="age_labels", values="Survived")
print (age_group_survival)

运行结果：
在这里插入图片描述

http://www.niftyadmin.cn/n/4714810.html

相关文章

Series结构

Series结构

读取csv文件： import pandas as pd fandango pd.read_csv(fandango_score_comparison.csv) series_film fandango[FILM] print(series_film[0:5]) series_rt fandango[RottenTomatoes] print (series_rt[0:5])运行结果： 制作Series # Import the Se…

阅读更多...

折线图的绘制

折线图的绘制

to_datetime import pandas as pd unrate pd.read_csv(unrate.csv) unrate[DATE] pd.to_datetime(unrate[DATE]) print(unrate.head(12))运行结果： 绘图 from pandas.plotting import register_matplotlib_converters #%matplotlib inline #Using the different…

阅读更多...

技术人员不应该固步自封

技术人员不应该固步自封

能力的提高不是通过量，而是通过质来提高的。经常听到人们说，这点东西犯不到花这么大力气。如果是学术问题，我觉得OK，确实是这样，因为有思路就行了。但是技术问题则不同，光有想法是不够的。工程上是要…

阅读更多...

子图的操作

子图的操作

读数据绘图： import pandas as pd from pandas.plotting import register_matplotlib_convertersunrate pd.read_csv(unrate.csv) unrate[DATE] pd.to_datetime(unrate[DATE]) first_twelve unrate[0:12] plt.plot(first_twelve[DATE], first_twelve[VALUE]) plt…

阅读更多...

字符串相似度算法 / The Arithmetic of String Similarity Degree

字符串相似度算法 / The Arithmetic of String Similarity Degree

dongle2001的《字符串相似度算法介绍(整理)》中提到，算法分为三类： 1、编辑距离（Levenshtein Distance） 编辑距离就是用来计算从原串（s）转换到目标串(t)所需要的最少的插入，删除和替换的数目…

阅读更多...

条形图与散点图

条形图与散点图

取出一行数据 import pandas as pd reviews pd.read_csv(fandango_scores.csv) cols [FILM, RT_user_norm, Metacritic_user_nom, IMDB_norm, Fandango_Ratingvalue, Fandango_Stars] norm_reviews reviews[cols] print(norm_reviews[:1])运行结果： 显示柱形图…

阅读更多...

概要设计与详细设计 / Conceptual Design and Detail Design

概要设计与详细设计 / Conceptual Design and Detail Design

概要设计与详细设计的区别概要设计就是设计软件的结构，包括组成模块，模块的层次结构，模块的调用关系，每个模块的功能等等。同时，还要设计该项目的应用系统的总体数据结构和数据库结构，即应用系统要存储什…

阅读更多...

柱形图和盒图

柱形图和盒图

读取数据 import pandas as pd import matplotlib.pyplot as plt reviews pd.read_csv(fandango_scores.csv) cols [FILM, RT_user_norm, Metacritic_user_nom, IMDB_norm, Fandango_Ratingvalue] norm_reviews reviews[cols] print(norm_reviews[:5])运行结果： …

阅读更多...

最新文章