一、集合
1、集合的定义
代码语言:javascript复制In [74]: s = {}
In [74]: s = {} # 空大括号是空的字典
In [75]: type(s)
Out[75]: dict
In [77]: type(s)
Out[77]: set
In [78]: help(set)
Help on class set in module builtins:
class set(object)
| set() -> new empty set object
| set(iterable) -> new set object
|
| Build an unordered collection of unique elements.
|
| Methods defined here:
In [80]: s = set([1, 2])
In [81]: s
Out[81]: {1, 2}
In [82]: s = set("xxj")
In [83]: s
Out[83]: {'j', 'x'}
In [84]: s = {1, 2, 1, 3}
In [85]: s
Out[85]: {1, 2, 3}
集合是无序的,元素不能重复,元素要能被哈希(hash,不可变)
二、集合的操作
1、增
代码语言:javascript复制z## set.add()
In [86]: s
Out[86]: {1, 2, 3}
In [87]: s.add("a") # 原地增加单个元素,元素要可哈希
In [88]: s
Out[88]: {1, 2, 3, 'a'}
In [89]: s.add(3)
In [90]: s
Out[90]: {1, 2, 3, 'a'}
In [93]: s.add([1, 2])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-93-2beaf0c16593> in <module>()
----> 1 s.add([1, 2])
TypeError: unhashable type: 'list'
In [94]: help(s.add)
In [95]: s.add((1, 2))
In [96]: s
Out[96]: {(1, 2), 1, 2, 3, 'a'}
## set.update() # 原地增加可迭代对象的元素
In [99]: help(s.update)
Help on built-in function update:
update(...) method of builtins.set instance
Update a set with the union of itself and others.
In [127]: s = set()
In [128]: s
Out[128]: set()
In [129]: type(s)
Out[129]: set
In [101]: s.update(10)
-----------------------------------------------------------------------
TypeError Traceback (most recent call l
<ipython-input-101-c184888ad9c5> in <module>()
----> 1 s.update(10)
TypeError: 'int' object is not iterable
In [131]: s.update(["a"])
In [132]: s
Out[132]: {'a'}
In [133]: s.update(["a"], ["b"])
In [134]: s
Out[134]: {'a', 'b'}
In [135]: s.update(["a"], ["b"], 1)
-----------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-135-fc556b8d9726> in <module>()
----> 1 s.update(["a"], ["b"], 1)
TypeError: 'int' object is not iterable
In [136]: s.update(["a"], ["b"], "xj")
In [137]: s
Out[137]: {'a', 'b', 'j', 'x'}
In [139]: s.update([["S", "B"]])
-----------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-139-da563f39a191> in <module>()
----> 1 s.update([["S", "B"]])
TypeError: unhashable type: 'list'
2、删
代码语言:javascript复制## set.remove()
In [142]: s
Out[142]: {'a', 'b', 'j', 'x'}
In [143]: s.remove("a")
In [144]: s
Out[144]: {'b', 'j', 'x'}
In [151]: s.remove("S")
-----------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-151-332efdd48daa> in <module>()
----> 1 s.remove("S")
KeyError: 'S'
## set.pop()
In [153]: s = {1, 2, 3, 4}
In [154]: s.pop()
Out[154]: 1
In [155]: s
Out[155]: {2, 3, 4}
In [156]: s.pop(5)
-----------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-156-23a1c03efc29> in <module>()
----> 1 s.pop(5)
TypeError: pop() takes no arguments (1 given)
In [157]: s.pop()
Out[157]: 2
In [158]: s.pop()
Out[158]: 3
In [159]: s.pop()
Out[159]: 4
In [160]: s.pop()
-----------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-160-e76f41daca5e> in <module>()
----> 1 s.pop()
KeyError: 'pop from an empty set'
## set.discard()
In [165]: help(set.discard)
Help on method_descriptor:
discard(...)
Remove an element from a set if it is a member.
If the element is not a member, do nothing.
In [166]: s = {1, 2, 3}
In [167]: s.discard(2)
In [168]: s.discard(1, 3)
-----------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-168-8702b734cbc4> in <module>()
----> 1 s.discard(1, 3)
TypeError: discard() takes exactly one argument (2 given)
In [169]: s.discard(2) # 元素不存在时,不会报错
In [170]: s
Out[170]: {1, 3}
In [32]: s.clear()
In [33]: s
Out[33]: set()
In [47]: del(s)
In [48]: s
-----------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-48-f4d5d0c0671b> in <module>()
----> 1 s
NameError: name 's' is not defined
小结:
remove 删除给定的元素,元素不存在时,抛出KeyError
discard 删除给定的元素,元素不存在时,什么也不做
pop 随机删除一个元素并返回,集合为空返回KeyError,
clear 清空集合
3、改
set不能修改单个元素
4、查找
集合不能通过索引,集合不是线性结构,没有索引
集合没有访问单个元素的方法
集合没有查找的方法
做成员运算(in和not in)的时候,set的效率远高于list(O(1)和O(n));
O(n)不一定小于O(1),还需要看数据规模
三、集合运算
1、交集
代码语言:javascript复制## set.intersection()
In [1]: s1 = {1, 2, 3}
In [2]: s2 = {2, 3, 4}
In [3]: s1.intersection()
Out[3]: {1, 2, 3}
In [4]: s1.intersection(s2) # 返回交集;不会修改原set
Out[4]: {2, 3}
In [26]: s2.intersection(s1)
Out[26]: {2, 3}
In [5]: s1.intersection([2,3])
Out[5]: {2, 3}
In [6]: help(set.intersection)
In [7]: s1.intersection(2)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-94b820092aa3> in <module>()
----> 1 s1.intersection(2)
TypeError: 'int' object is not iterable
In [17]: s1.intersection_update(s2) # set.intersection的_update版本,修改原set,返回None
In [18]: s1
Out[18]: {2, 3}
In [19]: s2
Out[19]: {2, 3, 4}
In [20]: s1 = {1, 2, 3}
In [21]: s2 = {2, 3, 4}
In [22]: s1 & s2 # set重载了按位与运算为求交集运算
Out[22]: {2, 3}
In [23]: s1
Out[23]: {1, 2, 3}
In [24]: s2
Out[24]: {2, 3, 4}
2、差集
代码语言:javascript复制In [27]: s1
Out[27]: {1, 2, 3}
In [28]: s2
Out[28]: {2, 3, 4}
In [29]: s1.difference(s2)
Out[29]: {1}
In [30]: s2.difference(s1)
Out[30]: {4}
In [31]: s1
Out[31]: {1, 2, 3}
In [32]: s2
Out[32]: {2, 3, 4}
In [33]: s1.difference_update(s2)
In [34]: s1
Out[34]: {1}
In [35]: s2
Out[35]: {2, 3, 4}
In [38]: s1
Out[38]: {1, 2, 3}
In [39]: s2
Out[39]: {2, 3, 4}
In [40]: s1 - s2 # set重载了运算符- 执行差集计算,相当于s1.difference(s2)
Out[40]: {1}
In [41]: s2 - s1
Out[41]: {4}
In [42]: s1 s2
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-42-1659087814e1> in <module>()
----> 1 s1 s2
TypeError: unsupported operand type(s) for : 'set' and 'set'
In [50]: s1.symmetric_difference(s2) # 对称差集
Out[50]: {1, 4}
In [51]: s1.symmetric_difference_update(s2)
In [52]: s1
Out[52]: {1, 4}
In [53]: s2
Out[53]: {2, 3, 4}
In [55]: s1 # set重载了异或运算符,执行求对称差集运算
Out[55]: {1, 2, 3}
In [56]: s2
Out[56]: {2, 3, 4}
In [57]: s1 ^ s2
Out[57]: {1, 4}
3、并集
代码语言:javascript复制In [58]: s1
Out[58]: {1, 2, 3}
In [59]: s2
Out[59]: {2, 3, 4}
In [60]: s1.union(s2) # 那set的union有update版本吗?其实update就是union的update版本
Out[60]: {1, 2, 3, 4}
In [61]: s1 | s2 # set重载了|运算符,执行求对称并集运算
Out[61]: {1, 2, 3, 4}
4、集合相关的判断
代码语言:javascript复制In [68]: s1 = {2, 3}
In [69]: s2 = {1, 2, 3, 4}
In [70]: s1.isdisjoint(s2) # 是否没有交集
Out[70]: False
In [71]: s1.issubset(s2) # 是否是子集
Out[71]: True
In [72]: s1.issuperset(s2) # 是否是父超集
Out[72]: False
In [73]: s2.issuperset(s1)
Out[73]: True
In [74]: s1 = {"a", "b"}
In [75]: s1.isdisjoint(s2)
Out[75]: True
四、集合的应用和限制
set常用于去重和大规模数据时成员运算时较快
str、bytes、bytearray对元素有要求,必须是8位的int;0-255
集合的元素不能重复,必须可hash(可变的类型都不能hash)