Python에서 문자열 처리 기술 공유

I. How to split a string containing multiple separators?

실제 사례

We need to split a string based on the separator to get different character segments, the string contains multiple different separators, for example:

s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd'

where <,>,<;>,<|>,<\t> are delimiters, how to deal with them?

해결책

use split() method continuously, process each delimiter each time

# use Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)

C:\Users\Administrator>C:\Python\Python27\python.exe E:\python-intensive-training\s2.py ['asd', 'aad', 'dasd', 'dasd', 'sdasd', 'asd', 'Adas', 'sdasd', 'Asdasd', 'd', 'asd']

>>> import re >>> re.split('[,;\t|]+','asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' ['asd', 'aad', 'dasd', 'dasd', 'sdasd', 'asd', 'Adas', 'sdasd', 'Asdasd', 'd', 'asd']

2. 문자열 a가 문자열 b로 시작하거나 끝나는지 어떻게�断할 수 있습니까?

실제 사례

어떤 디렉토리에 다음과 같은 파일이 있습니다:

quicksort.c graph.py heap.java install.sh stack.cpp ......

현재 .sh와 .py 끝이 붙은 폴더에 실행 권한을 부여해야 합니다.

해결책

문자열의 startswith()와 endswith() 메서드를 사용하여

>>> import os, stat >>> os.listdir('."/) ['heap.java', 'quicksort.c', 'stack.cpp', 'install.sh', 'graph.py'] >>> [name for name in os.listdir('."/) if name.endswith(('.sh','.py'))] ['install.sh', 'graph.py'] >>> os.chmod('install.sh', os.stat('install.sh').st_mode | stat.S_IXUSR)

[root@iZ28i253je0Z t]# ls -l install.sh -rwxr--r-- 1 root root 0 Sep 15 18:13 install.sh

3. 문자열의 형식을 어떻게 조정할 수 있습니까?

실제 사례

어떤 소프트웨어의 로그 파일이며, 날짜 형식이 yyy입니다.-mm-dd:

2016-09-15 18:27:26 statu unpacked python3-pip:all 2016-09-15 19:27:26 statu half-configured python3-pip:all 2016-09-15 20:27:26 statu installd python3-pip:all 2016-09-15 21:27:26 configure asdasdasdas:all python3-pip:all

데이터를 미국 날짜 형식 mm로 변경해야 합니다./dd/yyy, 2016-09-15 --> 09/15/2016, 어떻게 처리해야 합니까?

해결책

정규 표현식 re.sub() 메서드를 사용하여 문자열 대체

정규 표현식의 캡처 그룹을 사용하여 각 부분 내용을 잡아내고, 대체 문자열에서 각 캡처 그룹의 순서를 유지합니다.

>>> 로그 = '2016-09-15 18:27:26 statu unpacked python3-pip:all' >>> import re # 순서대로 >>> re.sub('(\d{4})-(\d{2})-(\d{2})', r'\2/\3/\1' , 로그) '09/15/2016 18:27:26 statu unpacked python3-pip:all' # 정규 표현식의 그룹 사용 >>> re.sub('(?P<year>\d{4})-(?P<month>\d{2})-(ɸ})', r'\g<month>/\g<day>/\g<year>' , log) '09/15/2016 18:27:26 statu unpacked python3-pip:all'

4. 여러 작은 문자열을 하나의 큰 문자열로 결합하는 방법은 무엇인가요？

실제 사례

네트워크 프로그램을 설계할 때, 우리는 UDP 기반의 네트워크 프로토콜을 정의하고, 서버로 일련의 파라미터를 고정된 순서로 전달합니다：

hwDetect: "<0112>" gxDepthBits: "<32>" gxResolution: "<1024x768>" gxRefresh: "<60>" fullAlpha: "<1>" lodDist: "<100.0>" DistCull: "<500.0>"

프로그램에서는 각 파라미터를 순서대로 리스트에 수집합니다：

["<0112>","<32>","<1024x768>","<60>","<1>","<100.0>","<500.0>"]

결국 우리는 각 파라미터를 데이터 패키지로 결합하여 전송해야 합니다：

"<0112><32><1024x768><60><1><100.0><500.0>"

해결책

리스트를 반복하면서 '+'작업을 순차적으로 연결하여 각 문자열을

>>> for n in ["<0112>","<32>","<1024x768>","<60>","<1>","<100.0>","<500.0>"]: ... result += n ... >>> result '<0112><32><1024x768><60><1><100.0><500.0>'

str.join() 메서드를 사용하여列表에 있는 모든 문자열을 빠르게 연결할 수 있습니다

>>> result = ''.join(["<0112>","<32>","<1024x768>","<60>","<1>","<100.0>","<500.0>"]) >>> result '<0112><32><1024x768><60><1><100.0><500.0>'

리스트에 숫자가 있으면, 생성자를 사용하여 변환할 수 있습니다:

>>> hello = [222,'sd',232,'2e',0.2] >>> ''.join(str(x) for x in hello) '222sd2322e0.2'

5. 문자열을 왼쪽, 오른쪽, 가운데 정렬하는 방법은 무엇인가요？

실제 사례

특정 딕셔너리에 일련의 속성 값이 저장되어 있습니다：

{ 'ip':'127.0.0.1', '블로그': 'www.anshengme.com', '제목': 'Hello world', '포트': '80' }

프로그램에서는 다음 형식으로 내용을 출력하고 싶습니다. 어떻게 처리해야 합니까?

ip : 127.0.0.1 블로그 : www.anshengme.com 제목 : Hello world 포트 : 80

해결책

문자열의 str.ljust() , str.rjust, str.center()를 사용하여 왼쪽, 오른쪽, 중앙 정렬을 수행합니다

>>> info = {'ip':'127.0.0.1','blog': 'www.anshengme.com','title': 'Hello world','port': '80'} # info의 keys의 최대 길이를 가져옵니다 >>> max(map(len, info.keys())) 5 >>> w = max(map(len, info.keys())) >>> for k in info: ... print(k.ljust(w), ':',info[k]) ... # 가져온 결과 port : 80 블로그 : www.anshengme.com ip : 127.0.0.1 title : Hello world

format() 메서드를 사용하여 '<20','>20','^20' 매개변수를 사용하여 동일한 작업을 수행합니다

>>> for k in info: ... print(format(k,'^'+str(w)), ':',info[k]) ... port : 80 블로그 : www.anshengme.com ip : 127.0.0.1 title : Hello world

6. 문자열에서 필요하지 않은 문자를 어떻게 제거할 수 있습니까?

실제 사례

사용자 입력에서 불필요한 공백 문자를 필터링합니다: [email protected]

윈도우스에서 편집한 텍스트에서 ' '를 필터링합니다: hello word

문자열에서 유니코드 조합 기호(음조)를 제거합니다: ‘ní ha&780;o, chi&772; fa&768;n'

해결책

문자열 strip(), lstrip(), rstrip() 메서드를 사용하여 문자열의 양 끝 문자를 제거합니다

>>> email = ' [email protected] ' >>> email.strip() '[email protected]' >>> email.lstrip() '[email protected] ' >>> email.rstrip() ' [email protected]' >>>

특정 위치의 문자를 제거하려면 슬라이싱을 사용할 수 있습니다+결합 방법

>>> s[:3] + s[4:] 'abc123'

문자열의 replace() 메서드나 정규 표현식 re.sub()를 사용하여 어느 위치의 문자를 제거할 수 있습니다

>>> s = '\tabc\t'123\txyz' >>> s.replace('\t', '') 'abc123xyz'

re.sub() 사용하여 여러 개 제거

>>> import re >>> re.sub('[\t\r]','', string) 'abc123xyzopq'

字符串translate() 메서드는 여러 가지 다른 문자를 동시에 제거할 수 있습니다.

>>> import string >>> s = 'abc123xyz' >>> s.translate(string.maketrans('abcxyz','xyzabc')) 'xyz123abc'

>>> s = '\rasd\t23\bAds' >>> s.translate(None, '\r\t\b') 'asd23Ads'

# python2.7 >>> i = u'ni&769; ha&780;o, chi&772; fa&768;n' >>> i u'ni\u0301 ha\u030co, chi\u0304 fa\u0300n' >>> i.translate(dict.fromkeys([0x0301, 0x030c, 0x0304, 0x0300])) u'ni hao, chi fan'

요약

이 강의에서는 Python에서 문자열 처리 기술을 정리했습니다. 예제, 해결책 및 예시를 통해 문제를 해결하는 방법을 보여주었습니다. 이 강의는 Python을 배우거나 사용하는 데 참고할 가치가 있습니다. 필요한 경우 참고할 수 있습니다.

Python 관련 내용에 대해 더 알고 싶은 독자들은 이 사이트의 특집을 확인할 수 있습니다.：《Python 문자열 처리 기술 요약》、《Python 코드 처리 기술 요약》、《Python 이미지 처리 기술 요약》、《Python 데이터 구조와 알고리즘 강의》、《Python 소켓 프로그래밍 기술 요약》、《Python 함수 사용 기술 요약》、《Python 입문과 전문 강의》 및 《Python 파일과 디렉토리 처리 기술 요약》

기본 강의