Python基础篇 strings 03

2019-12-26 11:02:08 浏览数 (1)

Python基础,strings 03

找出子字符串出现频次和出现的索引位置核查是否存在字符串并找出其索引位置查找所有字符的出现次数和索引

找出子字符串出现频次和出现的索引位置
  • 使用 string.count() 计算子字符串出现频次

string.count(s, sub[, start[, end]])

代码语言:javascript复制
In [35]: mainStr = 'This is a sample string and a sample code. It is very short.'
    ...:
    ...: # Get the occurrence count of sub-string in main string.
    ...: count = mainStr.count('sample')
    ...:
    ...: print("'sample' sub string frequency / occurrence count : " , count)
'sample' sub string frequency / occurrence count :  2
  • 使用 python 正则表达式计算出现频次
代码语言:javascript复制
In [36]: import re
    ...:
    ...: # Create a Regex pattern to match the substring
    ...: regexPattern = re.compile("sample")
    ...:
    ...: # Get a list of strings that matches the given pattern i.e. substring
    ...: listOfMatches = regexPattern.findall(mainStr)
    ...:
    ...: print("'sample' sub string frequency / occurrence count : ", len(listOfMatches))
'sample' sub string frequency / occurrence count :  2
  • 统计重叠字符串

string.count() 不能正确统计重叠字符串中的出现次数

代码语言:javascript复制
In [37]: mainStr = 'thathatthat'

In [38]: # string.count() will not be able to count occurrences of overlapping sub-strings
    ...: count = mainStr.count('that')

In [39]: count
Out[39]: 2
'------that出现次数应为3------'
代码语言:javascript复制
In [40]: # 自定义函数,用于查找重叠字符串出现次数
    ...: def frequencyCount(mainStr, subStr):
    ...:    counter = pos = 0
    ...:    while(True):
    ...:        pos = mainStr.find(subStr , pos)
    ...:        # pos索引作为find起始位置,找不到时返回-1
    ...:        if pos > -1:
    ...:            counter = counter   1
    ...:            pos = pos   1
    ...:        else:
    ...:            break
    ...:    return counter
    ...:

In [41]: # count occurrences of overlapping substrings
    ...: count = frequencyCount(mainStr, 'that')
    ...:
    ...: print("'that' sub string frequency count : ", count)
'that' sub string frequency count :  3
  • 找出出现次数和所有的起始索引位置

using Python regex finditer()

代码语言:javascript复制
In [50]: print('**** Find Occurrence count and all index position of a sub-string in a String **** ')
    ...:
    ...: import re
    ...:
    ...: mainStr = 'This is a sample string and a sample code. It is very Short.'
    ...:
    ...: # Create a Regex pattern to match the substring
    ...: regexPattern = re.compile('sample')
    ...:
    ...: # Iterate over all the matches of substring using iterator of matchObjects returnes by finditer()
    ...: iteratorOfMatchObs = regexPattern.finditer(mainStr)
    ...: indexPositions = []
    ...: count = 0
    ...: for matchObj in iteratorOfMatchObs:
    ...:    indexPositions.append(matchObj.start())
    ...:    count = count   1
    ...:
    ...: print("Occurrence Count of substring 'sample' : ", count)
    ...: print("Index Positions of 'sample' are : ", indexPositions)
**** Find Occurrence count and all index position of a sub-string in a String ****
Occurrence Count of substring 'sample' :  2
Index Positions of 'sample' are :  [10, 30]
  • 使用自定义函数查找重叠字符串索引位置
代码语言:javascript复制
In [51]: def frequencyCountAndPositions(mainStr, subStr):
    ...:    counter = pos = 0
    ...:    indexpos = []
    ...:    while(True):
    ...:        pos = mainStr.find(subStr , pos)
    ...:        # pos索引作为find起始位置,找不到时返回-1
    ...:        if pos > -1:
    ...:            indexpos.append(pos)
    ...:            counter = counter   1
    ...:            pos = pos   1
    ...:        else:
    ...:            break
    ...:    return (counter, indexpos)
    ...:

In [52]: mainStr = 'thathatthat'
    ...:
    ...: result = frequencyCountAndPositions(mainStr, 'that')
    ...:
    ...: print("Occurrence Count of overlapping sub-strings 'that' : ", result[0])
    ...: print("Index Positions of 'that' are : ", result[1])
Occurrence Count of overlapping sub-strings 'that' :  3
Index Positions of 'that' are :  [0, 3, 7]
  • 查找第n次出现的位置索引
代码语言:javascript复制
In [54]: mainStr = 'This is a sample string and a sample code. It is very Short.'
    ...:
    ...: result = frequencyCountAndPositions(mainStr, 'is')
    ...: if result[0] >= 2:
    ...:    print("Index Positions of 2nd Occurrence of sub-string 'is'  : ", result[1][1])
    ...:
Index Positions of 2nd Occurrence of sub-string 'is'  :  5
核查是否存在字符串并找出其索引位置
  • use in / not in 操作符
代码语言:javascript复制
In [55]: mainStr = "This is a sample String with sample message."
    ...:
    ...: # Use in operator to check if sub string exists in another string
    ...: if "sample" in mainStr:
    ...:    print ('Sub-string Found')
    ...: else:
    ...:    print('Sub-string not found')
    ...:
Sub-string Found
代码语言:javascript复制
In [56]: mainStr = "This is a sample String with sample message."
    ...:
    ...: if "Hello" not in mainStr:
    ...:    print ("Sub-string Doesn't exists in main String")
    ...:
Sub-string Doesn't exists in main String

  • 忽略大小写
代码语言:javascript复制
In [57]: mainStr = "This is a sample String with sample message."
    ...:
    ...: # use in operator to check if sub string exists by ignoring case of strings
    ...: # Convert both the strings to lower case then check for membership using in operator
    ...: if "SAMple".lower() in mainStr.lower():
    ...:    print('Sub-string Found')
    ...: else:
    ...:    print('Sub-string not found')
    ...:
Sub-string Found
  • 核查字符串是否包含列表中的元素
代码语言:javascript复制
In [58]: mainStr = "This is a sample String with sample message."
    ...:
    ...: listOfstrs = ['Hello', 'here', 'with', 'here', 'who']
    ...:
    ...: def checkIfAny(mainStr, listOfStr):
    ...:    for subStr in listOfStr:
    ...:        if subStr in mainStr:
    ...:            return (True, subStr)
    ...:    return (False, "")
    ...:
    ...: # Check if mainStr string contains any string from the list
    ...: result = checkIfAny(mainStr, listOfstrs)
    ...: if result[0]:
    ...:    print('Sub-string Found in main String : ', result[1])
    ...:
Sub-string Found in main String :  with

使用 any()和列表推导式

代码语言:javascript复制
In [59]: # Check if any string from the list exists in given string
    ...: result = any(([True if subStr in mainStr else False for subStr in listOfstrs]))
    ...:
    ...: if result:
    ...:    print('A string from list Found in main String ')
    ...:
A string from list Found in main String
  • 核查字符串是否包含列表中的所有元素
代码语言:javascript复制
In [60]: mainStr = "This is a sample String with sample message."
    ...: listOfstrs = ['sample', 'String', 'with']
    ...:
    ...: # Check if all strings from the list exists in given string
    ...: result = all(([True if subStr in mainStr else False for subStr in listOfstrs]))
    ...:
    ...: if result:
    ...:    print('All strings from list Found in main String ')
    ...:
All strings from list Found in main String
  • 使用 python regex 正则

考虑大小写

代码语言:javascript复制
In [61]: # Create a pattern to match string 'sample'
    ...: patternObj = re.compile("sample")

In [62]: mainStr = "This is a sample String with sample message."
    ...:
    ...: # search for the pattern in the string and return the match object
    ...: matchObj = patternObj.search(mainStr)
    ...:
    ...: # check if match object is not Null
    ...: if matchObj:
    ...:    print('Sub-string Found')
    ...: else:
    ...:    print('Sub-string Not Found')
    ...:
Sub-string Found

忽略大小写

代码语言:javascript复制
In [63]: # search for the sub-string in string by ignoring case
    ...: matchObj =  re.search('SAMple', mainStr, flags=re.IGNORECASE)
    ...:
    ...: if matchObj:
    ...:    print('Sub-string Found')
    ...: else:
    ...:    print('Sub-string Not Found')
    ...:
Sub-string Found
查找所有字符的出现次数和索引
  • use collections.Counter()

collections.counter(iterable-or-mapping)

代码语言:javascript复制
In [65]: from collections import Counter

In [66]: mainStr = 'This is a sample string and a sample code. It is a very short string. 001122'
    ...:
    ...: # Counter is a dict sub class that keeps the characters in string as keys and their frequency as value
    ...: frequency = Counter(mainStr)
    ...:
    ...: print("Occurrence Count of all characters :")
    ...: # Iterate over the dictionary and Print the frequency of each character
    ...: for (key, value) in frequency.items():
    ...:    print("Occurrence Count of ", key, " is : ", value)
    ...:
Occurrence Count of all characters :
Occurrence Count of  T  is :  1
Occurrence Count of  h  is :  2
Occurrence Count of  i  is :  5
Occurrence Count of  s  is :  8
Occurrence Count of     is :  15
Occurrence Count of  a  is :  6
Occurrence Count of  m  is :  2
Occurrence Count of  p  is :  2
Occurrence Count of  l  is :  2
Occurrence Count of  e  is :  4
Occurrence Count of  t  is :  4
Occurrence Count of  r  is :  4
Occurrence Count of  n  is :  3
Occurrence Count of  g  is :  2
Occurrence Count of  d  is :  2
Occurrence Count of  c  is :  1
Occurrence Count of  o  is :  2
Occurrence Count of  .  is :  2
Occurrence Count of  I  is :  1
Occurrence Count of  v  is :  1
Occurrence Count of  y  is :  1
Occurrence Count of  0  is :  2
Occurrence Count of  1  is :  2
Occurrence Count of  2  is :  2
  • use python regex
代码语言:javascript复制
In [67]: import re
    ...:
    ...: # Create a Regex pattern to match alphanumeric characters
    ...: regexPattern = re.compile('[a-zA-Z0-9]')
    ...:
    ...: mainStr = 'This is a sample string and a sample code. It is a very short string. 001122'
    ...:
    ...: # Iterate over all the alphanumeric characters in string (that matches the regex pattern)
    ...: # While Iterating keep on updating the frequency count of each character in a dictionary
    ...: iteratorOfMatchObs = regexPattern.finditer(mainStr)
    ...: frequencyOfChars = {}
    ...: indexPositions = {}
    ...:
    ...: for matchObj in iteratorOfMatchObs:
    ...:    frequencyOfChars[matchObj.group()] = frequencyOfChars.get(matchObj.group(), 0)   1
    ...:    indexPositions[matchObj.group()] = indexPositions.get(matchObj.group(), [])   [matchObj.start()]
    ...:
    ...: # Iterate over the dictionary and Print the frequency of each character
    ...: for (key, value) in frequencyOfChars.items():
    ...:    print("Occurrence Count of ", key , " is : ", value , ' & Index Positions : ', indexPositions[key])
    ...:
Occurrence Count of  T  is :  1  & Index Positions :  [0]
Occurrence Count of  h  is :  2  & Index Positions :  [1, 57]
Occurrence Count of  i  is :  5  & Index Positions :  [2, 5, 20, 46, 65]
Occurrence Count of  s  is :  8  & Index Positions :  [3, 6, 10, 17, 30, 47, 56, 62]
Occurrence Count of  a  is :  6  & Index Positions :  [8, 11, 24, 28, 31, 49]
Occurrence Count of  m  is :  2  & Index Positions :  [12, 32]
Occurrence Count of  p  is :  2  & Index Positions :  [13, 33]
Occurrence Count of  l  is :  2  & Index Positions :  [14, 34]
Occurrence Count of  e  is :  4  & Index Positions :  [15, 35, 40, 52]
Occurrence Count of  t  is :  4  & Index Positions :  [18, 44, 60, 63]
Occurrence Count of  r  is :  4  & Index Positions :  [19, 53, 59, 64]
Occurrence Count of  n  is :  3  & Index Positions :  [21, 25, 66]
Occurrence Count of  g  is :  2  & Index Positions :  [22, 67]
Occurrence Count of  d  is :  2  & Index Positions :  [26, 39]
Occurrence Count of  c  is :  1  & Index Positions :  [37]
Occurrence Count of  o  is :  2  & Index Positions :  [38, 58]
Occurrence Count of  I  is :  1  & Index Positions :  [43]
Occurrence Count of  v  is :  1  & Index Positions :  [51]
Occurrence Count of  y  is :  1  & Index Positions :  [54]
Occurrence Count of  0  is :  2  & Index Positions :  [70, 71]
Occurrence Count of  1  is :  2  & Index Positions :  [72, 73]
Occurrence Count of  2  is :  2  & Index Positions :  [74, 75]
  • use collections.Counter() 查找重复字符
代码语言:javascript复制
In [69]: from collections import Counter
    ...:
    ...: mainStr = 'This is a sample string and a sample code. It is a very short string. 001122'
    ...:
    ...: listOfDupChars = []
    ...: # Counter is a dict sub class that keeps the characters in string as keys and their frequency as value
    ...: frequency = Counter(mainStr)
    ...:
    ...: # Iterate over the dictionary and Print the frequency of each character
    ...: for (key, value) in frequency.items():
    ...:    if value > 4:
    ...:        listOfDupChars.append(key)
    ...: print('Duplicate characters ; ', listOfDupChars)
Duplicate characters ;  ['i', 's', ' ', 'a']

0 人点赞