Python数据分析(中英对照)·Sets 集合

2022-12-01 15:17:35 浏览数 (1)

1.2.6: Sets 集合

集合是不同散列对象的无序集合。 Sets are unordered collections of distinct hashable objects. 但是,对象是可散列的意味着什么呢? But what does it mean for an object to be hashable? 这是一个更具技术性的话题,我们将不在这里详细讨论。 That’s a more technical topic, and we will not go into details here. 实际上,这意味着您可以将集合用于数字和字符串等不可变对象,但不能用于列表和字典等可变对象。 In practice, what that means is you can use sets for immutable objects like numbers and strings, but not for mutable objects like lists and dictionaries. 有两种类型的集合。 There are two types of sets. 一种类型的集合称为“集合”。 One type of set is called just "a set". 另一种类型的集合称为“冻结集合”。 And the other type of set is called "a frozen set". 这两者之间的区别在于,冻结集在创建后是不可变的。 The difference between these two is that a frozen set is not mutable once it has been created. 换句话说,它是不可变的。 In other words, it’s immutable. 相反,通常的法线集是可变的。 In contrast, your usual, normal set is mutable. 可以将集合视为无序的对象集合。 You can think of a set as an unordered collection of objects. 关于集合的一个关键思想是它们不能被索引。 One of the key ideas about sets is that they cannot be indexed. 所以集合中的对象没有位置。 So the objects inside sets don’t have locations. 关于集合的另一个关键特性是元素永远不会被复制。 Another key feature about sets is that the elements can never be duplicated. 因此,如果你的集合中有一个给定的元素或对象,比如说数字3,如果你尝试在集合中再次添加该数字,那么什么都不会发生。 So if you have a given element or object in your set, say number 3,if you try adding that number again in the set, nothing happens. 这意味着集合中的所有对象总是唯一的或不同的。 This means that all of the objects inside a set are always going to be unique or distinct. Python集对于跟踪不同的对象和执行诸如并集、交集和集差等数学集操作特别有用。 Python sets are especially useful for keeping track of distinct objects and doing mathematical set operations like unions, intersections, and set differences. 让我们下一个实验使用集合。 Let’s next experiment with using sets. 让我先创建一个空集。 Let me start by creating an empty set. 我将创建一个对象,一个我将调用ID的集合。 I’m going to create an object, a set that I’m going to call ids. 这个想法是在我的研究或数据集中包含不同的ID。 And the idea is that this would contain distinct ids in my study or my data set. 我可以创建一个空集,只需使用关键字set,然后再加上一组括号。 I can create an empty set by just using the key word set,and just following that with a set of parenthesis. 在本例中,我将创建一个名为ids的集合,它将是空的。 In this case, I would have created a set called ids, and it would be empty. 它里面没有物体。 It would have no objects in it. 比如说,我想做一些不同的事情。 Let’s say that I want to do something a little different. 我想创建一个包含几个成员的集合。 I’d like to create a set that has a few members in it. 在本例中,语法非常相似。 And in this case, the syntax is very similar. 我使用关键字集,后跟括号。 I use the keyword set, followed by parentheses. 在括号内,我插入了一个列表。 And inside the parenthesis, I insert a list. 假设我们被试的数字或ID如下——1、2、4、6、7、8和9。 Let’s say that the numbers or the ids of our subjects are the following– 1, 2, 4, 6, 7, 8, and 9. 这是我的第一集。 And this is my initial set. 如果我想问这个集合中有多少成员,我可以使用len函数。 If I wanted to ask how many members do I have in this set,I can use the len function. Python告诉我在这个集合中有七个对象。 And Python tells me that I have seven objects in this set. 假设我想在这个集合中再添加一个id。 Let’s say I wanted to add one more id to this set. 我们把那个身份证号码打10。 Let’s call that id number 10. 因此,我会键入ids.add,并将id号为10的对象添加到我的集合中。 So I would type ids.add, and I am adding an object with an id number 10 to my set. 如果我键入id,Python会告诉我这些是集合的当前成员。 If I type ids, Python tells me that these are the current members of the set. 身份证号码10已添加到此集合中。 And id number 10 has been added to this set. 如果我尝试添加,比方说,第2个,我已经在我的集合中,然后我问集合中的其他成员,你会看到什么都没有发生。 If I try adding, let’s say, number 2, which I already have in my set,and then I ask what other members of the set now,you’ll see that nothing has happened. 这是set的关键特性之一。 And this is one of the key features of set. 换句话说,如果集合中已有一个对象,并且再次尝试添加该对象,则不会发生任何事情。 In other words, if you already have an object in the set,and if you try adding that same object again, nothing happens. 我们可以使用pop函数从集合中删除成员或对象。 We can remove members or objects from sets using the pop function. 在这种情况下,Python将返回该集合的任意成员。 In that case, Python returns to you an arbitrary member of that set. 所以我可以运行几次。 So I can run this a couple of times. 如果我查看集合的内容,我现在可以看到集合中还有五个对象。 If I look at the contents of my set, I can see now that I have five objects remaining in my set. 让我重新定义我的ID集。 Let me redefine my ids set. 假设它由ID范围为0到9的个人组成。 Let’s say that it consists of individuals with ids ranging from 0 to 9. 我可以看一下内容,这看起来是正确的。 I can look at the contents and this looks correct. 想象一下,其中一些物体是男性和女性。 Imagine that some of these objects are males and females. 所以我要构造一个集合,我称之为雄性。 So I’m going to construct a set that I’m going to call males. 所以这是一套。 So it’s a set. 我需要建立一个列表。 I need to build that as a list. 假设这些是雄性的ID。 And let’s say that these are the ids of the males. 集合的一个非常有用的特性是,我们可以从数学集合运算中使用它们。 A very useful property of sets is that we can use them from a mathematical set operations. 我现在可以用雄性来定义一个新的集合,我称之为雌性。 I can now use the set males to define a new set that I’m going to call females. 所以我将女性定义为所有ID减去男性。 So I’m going to define females as all of the ids minus males. 如果我问Python女性是什么类型的,Python会告诉我这是一个集合。 If I ask Python what is the type of females, Python is telling me it’s a set. 我可以看看那一套的内容。 I can look at the contents of that set. 我还可以查看我的男性套装的内容。 I can also look at the contents of my males set. 我发现这两者是截然不同的。 And I see that these two are distinct. 在Python中还有其他执行集合操作的方法。 There are other ways to perform set operations in Python. 例如,我可以非常方便地执行集合并集操作。 For example, I can perform the set union operation in a very handy way. 假设我想创建一个集合,我将给每个人打电话。 Let’s say that I want to create a set which I’m going to call everyone. 每个人都包括所有的男性和女性。 And everyone consists of all of the males and all of the females. Python中集合并集的简写操作是垂直线。 The short hand operation for a set union in Python is a vertical line. 再次,如果我查看集合everyone的内容,我可以看到集合中的所有成员都在那里。 Again if I look at the contents of the set everyone,I can see that all of the set members are there. 最后,我可以使用“与”运算求两个集合的交集。 Finally, I can take an intersection of two sets using the ampersand operation. 比如说,我想带走所有人,我想再拿出一套。 Let’s say I want to take everyone, and I wanted to take out another set. 因此,这是在交叉口操作中执行的。 So this is performing at the intersection operation. 我可以定义另一个集合,在本例中它由ID1、2和3组成。 I can define another set, which in this case consists of the ids 1, 2, and 3. 然后我可以让Python返回这两个集合相交的所有人——一个集合包含成员1、2和3,另一个集合包含所有人。 And then I can ask Python to return everybody who is in the intersection of these two sets–one set containing members 1, 2, and 3 and the other one containing everybody. 在本例中,答案是由IDs1、2和3组成的集合。 And in this case, the answer is the set that consists of the ids 1, 2, and 3. 作为集合的一个简单应用,让我们使用集合来计算单词中唯一字母的数量。 As a simple application of sets, let’s use sets to count the number of unique letters in a word. 因此,让我首先定义我感兴趣的词。 So let me first define my word of interest. 让我们来看一些更复杂的东西,比如说一种不可剥夺的精神信仰。 Let’s go with something a little more complicated,something like anitdisestablishmentarianism. 我拼对了。 I spelt that right. 接下来我要做的是构造一个集合,所以我只说集合。 Now what I can do next is I can construct a set, so I just say set. 我从我的字符串中构造它,它被称为“word”。 I construct that from my string which is called "word". 我将称之为“字母”。 And I’m going to call this "letters". 为了找出这个单词中有多少个唯一的字母,我只要求Python返回字母对象的长度,即12。 To find out how many unique letters I have in this word,I just ask Python to return the length of the letters object, which is 12. 因此,在本例中,我们可以使用set对象简单地计算字符串中唯一字母的数量。 So in this case, we were able to use the set object to simply count the number of unique letters in a string.

0 人点赞