Pythonにおける文字列の処理技術を共有する

一、複数の区切り記号を含む文字列をどのように分割しますか？

実際のケース

ある文字列を区切り記号に基づいて異なる文字列のセグメントに分割する必要があります。その文字列には、複数の区切り記号が含まれています。例えば：

s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd'

その中で<,>,<;>,<|>,<\t>は区切り記号です。どのように処理しますか？

解決策

連続してsplit()メソッドを使用し、それぞれの区切り記号を処理します

# 使用Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)

C:\Users\Administrator>C:\Python\Python27\python.exe E:\python-intensive-training\s2.py ['asd', 'aad', 'dasd', 'dasd', 'sdasd', 'asd', 'Adas', 'sdasd', 'Asdasd', 'd', 'asd']

>>> import re >>> re.split('[,;\t|]+','asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' ['asd', 'aad', 'dasd', 'dasd', 'sdasd', 'asd', 'Adas', 'sdasd', 'Asdasd', 'd', 'asd']

二、文字列aが文字列bで始まるか、または終わるかをどのように判断しますか？

実際のケース

あるディレクトリには以下のファイルがあります：

quicksort.c graph.py heap.java install.sh stack.cpp ...

現在は.shと.py接尾辞を持つフォルダに実行権限を与える必要があります

解決策

使用文字列のstartswith()とendswith()メソッド

>>> import os, stat >>> os.listdir('."/) ['heap.java', 'quicksort.c', 'stack.cpp', 'install.sh', 'graph.py'] >>> [name for name in os.listdir('."/) if name.endswith(('.sh','.py'))] ['install.sh', 'graph.py'] >>> os.chmod('install.sh', os.stat('install.sh').st_mode | stat.S_IXUSR)

[root@iZ28i253je0Z t]# ls -l install.sh -rwxr--r-- 1 root root 0 Sep 15 18:13 install.sh

三、文字列のフォーマットをどのように調整しますか？

実際のケース

某软件のログファイル、その日付形式はyyy-mm-dd:

2016-09-15 18:27:26 statu unpacked python3-pip:all 2016-09-15 19:27:26 statu half-configured python3-pip:all 2016-09-15 20:27:26 statu installd python3-pip:all 2016-09-15 21:27:26 configure asdasdasdas:all python3-pip:all

需要把其中日期改为美国日期的格式mm/dd/yyy, 2016-09-15 --> 09/15/2016,要如何处理？

解決策

使用正则表达式re.sub()方法做字符串替换

利用正则表达式的捕获组，捕获每个部分内容，在替换字符串中各个捕获组的顺序。

>>> log = '2016-09-15 18:27:26 statu unpacked python3-pip:all' >>> import re # 按顺序 >>> re.sub('(\d{4})-(\d{2})-(\d{2})', r'\2/\3/\1' , log) '0'9/15/2016 18:27:26 statu unpacked python3-pip:all' # 正規表現のグループを使用して63;P<year>\d{4})-(ɸ})-(ɸ})', r'\g<month>/\g<day>/\g<year>' , log) '09/15/2016 18:27:26 statu unpacked python3-pip:all'

4、複数の小文字列を大きな文字列に結合する方法はどうですか？

実際のケース

あるネットワークプログラムの設計中に、UDPに基づくカスタムネットワークプロトコルを定義しました。サーバーに一連のパラメータを固定順序で送信します：

hwDetect: "<0112>" gxDepthBits: "<32>" gxResolution: "<1024x768>" gxRefresh: "<60>" fullAlpha: "<1>" lodDist: "<100.0>" DistCull: "<500.0>"

プログラム内で、各パラメータを順序にリストに収集します：

["<0112>","<32>","<1024x768>","<60>","<1>","<100.0>","<500.0>"]

最終的には、各パラメータをデータパックに結合して送信する必要があります：

"<0112><32><1024x768><60><1><100.0><500.0>"

解決策

リストを反復し、'操作を連続して使用する+文字列を順次結合するために'操作を使用して

>>> for n in ["<0112>","<32>","<1024x768>","<60>","<1>","<100.0>","<500.0>"]: ... result += n ... >>> result '<0112><32><1024x768><60><1><100.0><500.0>'

str.join()メソッドを使用すると、リスト内のすべての文字列をより速く結合できます

>>> result = ''.join(["<0112>","<32>","<1024x768>","<60>","<1>","<100.0>","<500.0>']) >>> result '<0112><32><1024x768><60><1><100.0><500.0>'

リストに数字がある場合、ジェネレータを使用して変換することができます：

>>> hello = [222,'sd',232,2e',0.2]] >>> ''.join(str(x) for x in hello)'222sd2322e0.2'

5、文字列を左、右、中央に整列させる方法はどうですか？

実際のケース

ある辞書に一連の属性値が保存されています：

{ 'ip':'127.0.0.1','blog': 'www.anshengme.com', 'title': 'Hello world', 'port': ''80'}

プログラム内で、以下の形式で内容を出力したいと思っています。どう処理しますか？

IP: 127.0.0.1 ブログ: www.anshengme.com タイトル: Hello world ポート: 80

解決策

文字列のstr.ljust()、str.rjust、str.center()を使用して左右中央に整列

>>> info = {'ip':'127.0.0.1','blog': 'www.anshengme.com','title': 'Hello world','port': '80'} # 获取字典中的keys最大长度 >>> max(map(len, info.keys())) 5 >>> w = max(map(len, info.keys())) >>> for k in info: ... print(k.ljust(w), ':',info[k]) ... # 获取到的结果 port : 80 ブログ : www.anshengme.com ip : 127.0.0.1 title : Hello world

format()メソッドを使用して、'<20','>20','^20'パラメータで同じタスクを完了

>>> for k in info: ... print(format(k,'^'+str(w)), ':',info[k]) ... port : 80 ブログ : www.anshengme.com ip : 127.0.0.1 title : Hello world

6、文字列から不要な文字をどうやって削除するか？

実際のケース

ユーザー入力の後の余分な空白文字をフィルタリング: [email protected]

Windowsで編集されたテキストから'\r'をフィルタリング: hello word\r\n

テキストからUnicodeの組み合わせ記号（アクセント）を取り除く: ‘ní hǎo, chī fàn'

解決策

文字列strip()、lstrip()、rstrip()メソッドで文字列の両端の文字を削除

>>> email = ' [email protected] ' >>> email.strip() '[email protected]' >>> email.lstrip() '[email protected] ' >>> email.rstrip() ' [email protected]' >>>

特定の場所の文字を削除するには、スライスを使用+結合の方法

>>> s[:3] + s[4:] 'abc'123'

文字列のreplace()メソッドまたは正規表現re.sub()で任意の場所の文字を削除

>>> s = '\tabc\t'123\txyz' >>> s.replace('\t', '') 'abc123xyz'

re.sub()を使用して複数の文字を削除

>>> import re >>> re.sub('[\t\r]','', string) 'abc123xyzopq'

文字列translate()メソッドは、同時に複数の異なる文字を削除できます

>>> import string >>> s = 'abc123xyz' >>> s.translate(string.maketrans('abcxyz','xyzabc')) 'xyz123abc'

>>> s = '\rasd\t23\bAds' >>> s.translate(None, '\r\t\b') 'asd23Ads'

# python2.7 >>> i = u'ní hǎo, chī fàn' >>> i u'ni\u0301 ha\u030co, chi\u0304 fa\u0300n' >>> i.translate(dict.fromkeys([0x0301, 0x030c, 0x0304, 0x0300])) u'ni hao, chi fan'

まとめ

これで、Pythonで文字列の処理技術を整理しました。文では、例、解決策、および実例を通じて、どのように解決するかを示しています。Pythonの学習や使用に対して、参考になるでしょう。必要な場合は参照してください。

Pythonに関連する内容に興味を持つ読者は、本サイトの特集を参照してください：《Python文字列操作技術集約》、《Pythonコーディング技術集約》、《Python画像操作技術集約》、《Pythonデータ構造とアルゴリズム教程》、《Python Socketプログラミング技術集約》、《Python関数使用技術集約》、《Python入門と上級教程》および《Pythonファイルとディレクトリ操作技術集約》

基礎教程