Pandas テキスト処理

Pandas テキスト処理操作の例

この章では、基本的なSeriesを使用します。 / Indexは文字列操作について議論しています。後の章では、DataFrameにこれらの文字列関数を適用する方法を学びます。

Pandasは、文字列データを操作するために簡単に使用できる一連の文字列関数を提供しています。最も重要なのは、これらの関数は欠損値（または除外）を無視（または排除）します。/ NaN値。

ほとんどすべてのこれらの方法は、Pythonの文字列関数に使用できます（参照： https://docs.python.org/3/library/stdtypes.html#string-methods)。因此，将Series对象转换为String对象，然后执行该操作。

我们看看每个操作如何执行。

方法	说明
lower()	将系列/索引中的字符串转换为小写。
upper()	将系列/索引中的字符串转换为大写。
len()	计算字符串length()。
strip()	帮助从两侧从系列/索引中的每个字符串中去除空格（包括换行符）。
split(' ')	用给定的模式分割每个字符串。
cat(sep=' ')/td>	用给定的分隔符连接系列/索引元素。
get_dummies()	返回具有一键编码值的DataFrame。
contains(pattern)	如果子字符串包含在元素中，则为每个元素返回一个布尔值True，否则返回False。
replace(a,b)	a值替换成b。
repeat(value)	以指定的次数重复每个元素。
count(pattern)	返回每个元素中模式出现的次数。
startswith(pattern)	如果系列/索引中的元素以模式开头，则返回true。
endswith(pattern)	如果系列/索引中的元素以模式结尾，则返回true。
find(pattern)	返回模式首次出现的第一个位置。
findall(pattern)	返回所有出现的模式的列表。
swapcase	大小写互换
islower()<	检查“系列/索引”中每个字符串中的所有字符是否都小写。返回布尔值
isupper()	检查“系列/索引”中每个字符串中的所有字符是否都大写。返回布尔值。
isnumeric()	检查“系列/索引”中每个字符串中的所有字符是否都是数字。返回布尔值。

我们来创建一个Series，看看以上所有功能如何工作。

例

　import　pandas　as　pd
　import numpy as np
　s = pd.Series(['Tom', 'William Rick', 'John', 'Alber@t', np.nan, '1234','SteveSmith'])
　print s

実行結果：

　0　Tom
　1　William Rick
　2　John
　3　Alber@t
　4　NaN
　5　1234
　6　Steve　Smith
　dtype:　object

lower()

例

　import　pandas　as　pd
　import numpy as np
　s = pd.Series(['Tom', 'William Rick', 'John', 'Alber@t', np.nan, '1234','SteveSmith'])
　print　s.str.lower()

実行結果：

　0　tom
　1　william　rick
　2　john
　3　alber@t
　4　NaN
　5　1234
　6　steve　smith
　dtype:　object

upper()

例

　import　pandas　as　pd
　import numpy as np
　s = pd.Series(['Tom', 'William Rick', 'John', 'Alber@t', np.nan, '1234','SteveSmith'])
　print　s.str.upper()

実行結果：

　0　TOM
　1　WILLIAM　RICK
　2　JOHN
　3　ALBER@T
　4　NaN
　5　1234
　6　STEVE　SMITH
　dtype:　object

len()

例

　import　pandas　as　pd
　import numpy as np
　s = pd.Series(['Tom', 'William Rick', 'John', 'Alber@t', np.nan, '1234','SteveSmith'])
　print s.str.len()

実行結果：

　0　3.0
　1　12.0
　2　4.0
　3　7.0
　4　NaN
　5　4.0
　6　10.0
　dtype: float64

strip()

例

　import　pandas　as　pd
　import numpy as np
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print s
　print ("After Stripping:")
　print s.str.strip()

実行結果：

　0　Tom
　1　William Rick
　2　John
　3　Alber@t
　dtype:　object
　After Stripping:
　0　Tom
　1　William Rick
　2　John
　3　Alber@t
　dtype:　object

split(pattern)

例

　import　pandas　as　pd
　import numpy as np
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print s
　print ("分割パターン:")
　print s.str.split('　')

実行結果：

　0　Tom
　1　William Rick
　2　John
　3　Alber@t
　dtype:　object
　分割パターン:
　0　[Tom, , , , , , , , , , ]
　1　[, , , , , William, Rick]
　2　[John]
　3　[Alber@t]
　dtype:　object

cat(sep=pattern)

例

　import　pandas　as　pd
　import numpy as np
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print s.str.cat(sep='_')

実行結果：

　　　Tom _ William Rick_John_Alber@t

get_dummies()

例

　import　pandas　as　pd
　import numpy as np
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print s.str.get_dummies()

実行結果：

　　　William Rick　Alber@t　John　Tom
0　　　　0　　　　0　　　0　　　　　1
1　　　　　　　　　　　　　1　　　　　　　　　0　　　　0　　　0
2　　　　　　　　　　　　　0　　　　0　　　　　　1　　　　　0
3　　　　　　　　　　　　　0　　　　　　　　　1　　　　　　0　　　0

contains ()

例

　import　pandas　as　pd
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print s.str.contains('　')

実行結果：

　0　　True
　1　　True
　2　　False
　3　　False
　dtype:　bool

replace(a,b)

例

　import　pandas　as　pd
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print s
　print ("@を$に置き換えた後:")
　print s.str.replace('@',')
　)

実行結果：

　0　Tom
　1　William Rick
　2　John
　3　Alber@t
　dtype:　object
　@を$に置き換えた後:
　0　Tom
　1　William Rick
　2　John
　3　Alber$t
　dtype:　object

repeat(value)

例

　import　pandas　as　pd
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print s.str.repeat(2)

実行結果：

0　Tom　Tom
1　　　William Rick　William Rick
2　　　　　　　　　　　　　　　　　　JohnJohn
3　　　　　　　　　　　　　　　　　　Alber@tAlber@t
dtype:　object

count(pattern)

例

　import　pandas　as　pd
　　
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print ("各文字列中の「m」の数:")
　print s.str.count('m')

実行結果：

　各文字列中の「m」の数:
　0　1
　1　1
　2　0
　3　0

startswith(pattern)

例

　import　pandas　as　pd
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print ("'T'で始まる文字列:")
　print s.str.startwith('T')

実行結果：

　0　　True
　1　　False
　2　　False
　3　　False
　dtype:　bool

endswith(pattern)

例

　import　pandas　as　pd
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print ("文字列が't'で終わるもの:")
　print s.str.endswith('t')

実行結果：

　タイトルが't'で終わる文字列
　0　　False
　1　　False
　2　　False
　3　　True
　dtype:　bool

find(pattern)

例

　import　pandas　as　pd
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print　s.str.find('e')

実行結果：

　0　-1
　1　-1
　2　-1
　3　3
　dtype:　int64

” -1”は要素に一致するものがないことを示します。

findall(pattern)

例

　import　pandas　as　pd
　s　=　pd.Series(['Tom　',　'　William　Rick',　'John',　'Alber@t'])
　print　s.str.findall('e')

実行結果：

　0　[]
　1　[]
　2　[]
　3　[e]
　dtype:　object

空リスト（[]）は要素に一致するものがないことを示します

swapcase()

例

　import　pandas　as　pd
　s　=　pd.Series(['Tom',　'William　Rick',　'John',　'Alber@t'])
　print　s.str.swapcase()

実行結果：

　0　tOM
　1　wILLIAM　rICK
　2　jOHN
　3　aLBER@T
　dtype:　object

islower()

例

　import　pandas　as　pd
　s　=　pd.Series(['Tom',　'William　Rick',　'John',　'Alber@t'])
　print　s.str.islower()

実行結果：

　0　　False
　1　　False
　2　　False
　3　　False
　dtype:　bool

isupper()

例

　import　pandas　as　pd
　s　=　pd.Series(['Tom',　'William　Rick',　'John',　'Alber@t'])
　print　s.str.isupper()

実行結果：

　0　　False
　1　　False
　2　　False
　3　　False
　dtype:　bool

isnumeric()

例

　import　pandas　as　pd
　s　=　pd.Series(['Tom',　'William　Rick',　'John',　'Alber@t'])
　print　s.str.isnumeric()

実行結果：

　0　　False
　1　　False
　2　　False
　3　　False
　dtype:　bool

Pandas SQL操作 Pandas ソート

Pandas チュートリアル

Pandas テキスト処理

lower()

upper()

len()

strip()

split(pattern)

cat(sep=pattern)

get_dummies()

contains ()

replace(a,b)

repeat(value)

count(pattern)

startswith(pattern)

endswith(pattern)

find(pattern)

findall(pattern)

swapcase()

islower()

isupper()

isnumeric()