Python数据分析（中英对照）·Using the NumPy Random Module 使用 NumPy 随机模块

2.4.3: Using the NumPy Random Module 使用 NumPy 随机模块

NumPy makes it possible to generate all kinds of random variables. NumPy使生成各种随机变量成为可能。 We’ll explore just a couple of them to get you familiar with the NumPy random module. 为了让您熟悉NumPy随机模块，我们将探索其中的几个模块。 The reason for using NumPy to deal with random variables is that first, it has a broad range of different kinds of random variables. 使用NumPy来处理随机变量的原因是，首先，它有广泛的不同种类的随机变量。 And second, it’s also very fast. 第二，速度也很快。 Let’s start with generating numbers from the standard uniform distribution,which is a the completely flat distribution between 0 and 1 such that any floating point number between these two endpoints is equally likely. 让我们从标准均匀分布开始生成数字，这是一个0和1之间完全平坦的分布，因此这两个端点之间的任何浮点数的可能性相等。 We will first important NumPy as np as usual. 我们会像往常一样，先做一个重要的事情。 To generate just one realization from this distribution,we’ll type np dot random dot random. 为了从这个分布生成一个实现，我们将键入np-dot-random-dot-random。 And this enables us to generate one realization from the 0 1 uniform distribution. 这使我们能够从01均匀分布生成一个实现。 We can use the same function to generate multiple realizations or an array of random numbers from the same distribution. 我们可以使用同一个函数从同一个分布生成多个实现或一个随机数数组。 If I wanted to generate a 1d array of numbers,I will simply insert the size of that array, say 5 in this case. 如果我想生成一个一维数字数组，我只需插入该数组的大小，在本例中为5。 And that would generate five random numbers drawn from the 0 1 uniform distribution. 这将从0-1均匀分布中产生五个随机数。 It’s also possible to use the same function to generate a 2d array of random numbers. 也可以使用相同的函数生成随机数的2d数组。 In this case, inside the parentheses we need to insert as a tuple the dimensions of that array. 在本例中，我们需要在括号内插入该数组的维度作为元组。 The first argument is the number of rows,and the second argument is the number of columns. 第一个参数是行数，第二个参数是列数。 In this case, we have generated a table — a 2d table of random numbers with five rows and three columns. 在本例中，我们生成了一个表——一个由五行三列随机数组成的二维表。 Let’s then look at the normal distribution. 让我们看看正态分布。 It requires the mean and the standard deviation as its input parameters. 它需要平均值和标准偏差作为输入参数。 In this case, I’d like to set the mean to be equal to 0,and standard deviation equal to 1. 在这种情况下，我想将平均值设置为0，标准偏差设置为1。 This gives us the so-called standard normal distribution. 这就是所谓的标准正态分布。 Just to be clear, there are an endless number of different distributions depending on the parameter values. 需要明确的是，根据参数值的不同，有无数不同的分布。 But only the one with mean equal to 0 and a standard deviation equal 1 one has its own name– the standard normal distribution. 但是只有均值等于0，标准差等于1的那一个有自己的名字——标准正态分布。 To generate random numbers from the standard normal distribution,or from the normal distribution in general,we will be using the np dot random dot normal function. 要从标准正态分布或一般正态分布生成随机数，我们将使用np点随机点正态函数。 The first argument is the mean of the distribution, in this case 0. 第一个参数是分布的平均值，在本例中为0。 And the second argument is the standard deviation, which is equal to 1. 第二个参数是标准差，等于1。 Using this syntax enables us to draw one realization, one number,from the standard normal distribution. 使用这种语法，我们可以从标准正态分布中提取一个实现，一个数字。 If we’d like to generate instead a 1d array of numbersfrom the same distribution, we can specify the length of the 1d array as the third argument. 如果我们想从同一个分布中生成一个1d数字数组，我们可以指定1d数组的长度作为第三个参数。 In this case, if I would like to generate an array of five numbers,I will simply add a third argument, which is number 5 in this case. 在本例中，如果我想生成一个由五个数字组成的数组，我只需添加第三个参数，在本例中是数字5。 Finally, we can use the same function to generate 2d, or even 3d arrays of random numbers. 最后，我们可以使用相同的函数生成随机数的2d甚至3d数组。 In that case, we need to insert another pair of parentheses 在这种情况下，我们需要插入另一对括号 because the dimensions of the array will be added as a tuple. 因为数组的维度将作为元组添加。 If I’d like to generate a 2d array consisting of two rows and five columns — those arguments go right inside the tuple,and I can still continue to use the same function. 如果我想生成一个由两行五列组成的2d数组——这些参数正好位于元组内部，我仍然可以继续使用相同的函数。 Let’s revisit our example where we roll 10 die and added the result together. 让我们重温一下我们的示例，在这个示例中，我们滚动10个骰子并将结果加在一起。 Remember, we defined a random variable y as the sum of random variables x1 through x10, where each x variable is a standard die with 6 faces with the numbers from 1 to 6 on them. 记住，我们将随机变量y定义为随机变量x1到x10的总和，其中每个x变量是一个标准模具，有6个面，上面的数字从1到6。 We can code this example using NumPy arrays. 我们可以使用NumPy数组编写这个示例。 My overall strategy is to generate a table where each element corresponds to a roll of a die such that each element is a number between 1 and 6. 我的总体策略是生成一个表，其中每个元素对应一个骰子卷，这样每个元素都是1到6之间的数字。 If I have 10 columns in my table, I can then create my variable y by summing the table over all columns. 如果我的表中有10列，那么我可以通过对所有列求和来创建变量y。 Finally, the number of rows in the table is going to be equal to the number of realizations of the variable y that I would like to generate. 最后，表中的行数将等于我要生成的变量y的实现数。 Let’s look at this on the white board. 让我们看看白板上的这个。 I’m going to generate a small table in this case with just three columns and four rows. 在本例中，我将生成一个只有三列四行的小表。 The first column is x1,the second column is x2, all the way to the last column, which is x10. 第一列是x1，第二列是x2，一直到最后一列，即x10。 My rows in the table would then correspond to different realizations of the variable y. 表中的行将对应于变量y的不同实现。 So for example, my first realization of y is called a y1, would be the sum over the 10 different columns of the table. 例如，我对y的第一个实现称为y1，是表中10个不同列的总和。 My second realization of y would be a sum across the second row and over all of the 10 columns of the table. 我的第二个y实现是第二行和表中所有10列的总和。 The only problem is that we don’t know how to generate an area of random integers in NumPy. 唯一的问题是我们不知道如何在NumPy中生成一个随机整数区域。 Let’s Google it. 让我们用谷歌搜索一下。 The first hit is NumPy.random.randint. 第一个命中是NumPy.random.randint。 That looks promising so let’s take a closer look at the help page. 看起来很有希望，让我们仔细看看帮助页面。 This function looks promising. 这个功能看起来很有前途。 If you look at the input arguments, we have to provide at least one argument but we can potentially provide up to three different arguments. 如果您查看输入参数，我们必须至少提供一个参数，但我们可以提供多达三个不同的参数。 Let’s try out this function. 让我们试试这个函数。 I’m going to type np.random.ranint. 我要输入np.random.ranint。 I will generate just one realization. 我将只生成一个实现。 And the function seems to be working fine. 这个功能似乎运行得很好。 When dealing with relatively large amounts of data, such as large arrays,it’s very helpful to start small when writing your code. 在处理相对较大的数据量（如大型数组）时，在编写代码时从小处着手非常有帮助。 Starting small keeps things much more manageable and the fact that you can look at the data on the screen makes it much easier to locate potential problems. 从小处着手使事情更易于管理，而且您可以查看屏幕上的数据这一事实使查找潜在问题变得更容易。 Instead of having 10 columns, I’m just going to be using three columns for now. 现在，我不再使用10列，而是使用3列。 Also, instead of having 100 or 1 million rows,I’m just going to go with just 10 rows. 另外，我将只使用10行，而不是100或100万行。 Let’s generate this random array in NumPy. 让我们用NumPy生成这个随机数组。 I’m going to call this variable capital X. We’ll type np.random.randint. 我将这个变量称为X，我们将输入np.random.randint。 We’ll use the same start and end points, except that in this case we need to provide a third argument, which is the size of the array. 我们将使用相同的起点和终点，只是在本例中我们需要提供第三个参数，即数组的大小。 We would like to have 10 rows and 3 columns. 我们想要10行3列。 If we now look at the variable x, we see that it has 10 rows and 3 columns. 如果我们现在看变量x，我们会看到它有10行和3列。 We can also check the shape of the array by using the shape function x dot shape. 我们还可以使用形状函数x dot shape检查阵列的形状。 Python tells us x has 10 rows and 3 columns. Python告诉我们x有10行3列。 The next step would be to sum over all rows of x. 下一步是对x的所有行求和。 NumPy has a function called sum but I’m not fully sure how to use it, NumPy有一个名为sum的函数，但我不确定如何使用它， so let’s look at the documentation. 让我们看一下文档。 We could also Google this but in this case, I’m going to be using the command line help in iPython. 我们也可以用谷歌搜索，但在这种情况下，我将使用iPython中的命令行帮助。 To do that, I’m going to be typing np.sum. 为此，我将输入np.sum。 And Python returns to me information about the np sum function. Python会向我返回关于np sum函数的信息。 Scrolling to the top of the page, I see that only one argument, a,is necessary, which is the array of elements. 滚动到页面顶部，我看到只需要一个参数a，即元素数组。 The second optional argument is called axis. 第二个可选参数称为axis。 When using two dimensional or higher dimensional arrays,we need to specify the dimension on which the sum is taken. 当使用二维或更高维数组时，我们需要指定求和的维度。 Let’s practice using the NumPy sum function. 让我们练习使用NumPy和函数。 If we type np.sum, we get a sum over all of the elements of the array. 如果我们键入np.sum，我们将得到数组中所有元素的和。 We can also specify the axis or dimension along which we would like to take a sum. 我们还可以指定要求和的轴或维度。 We can also provide the optional argument axis, in this case equal to 0,in which case we are summing over all of the rows of the array. 我们还可以提供可选的参数轴，在本例中等于0，在这种情况下，我们对数组的所有行求和。 We can also try summing over dimension 1, in which case we’re summing over all of the columns. 我们也可以尝试对维度1求和，在这种情况下，我们对所有列求和。 If we had a three-dimensional array, to sum over the third dimension,we could set the argument axis equal to 2. 如果我们有一个三维数组，在三维上求和，我们可以将参数轴设置为2。 In this case, when I run this, Python gives me the error message–axis entry is out of bounds. 在本例中，当我运行此命令时，Python会给我错误消息——axis条目超出范围。 This is because I’m trying to sum over axis dimension 2,whereas I only have two dimensions — dimensions 0 and 1. 这是因为我试图在轴维度2上求和，而我只有两个维度——维度0和维度1。 Summarizing our finding, taking a sum over dimension 0 sums over rows,and taking a sum over dimension 1 sums over columns. 总结我们的发现，对维度0求和对行求和，对维度1求和对列求和。 So I’m now ready to write my y variable. 现在我准备好写y变量了。 I’m going to define y as np sum of x over axis equals 1. 我将y定义为轴上x的np和等于1。 If I now inspect my variable y, I’ll see that it has 10 elements as expected. 如果我现在检查变量y，我将看到它如预期的那样有10个元素。 Let’s now put our code together. 现在让我们把代码放在一起。 The first line is going to be, again, random.randint 1 comma 7. 第一行也是random.randint 1逗号7。 And I will now insert the actual dimensions of the array –100 rows and 10 columns. 现在我将插入数组的实际维度——100行10列。 My y variable is going to be formed as a sum so I’m using np sum of x. 我的y变量将形成一个和，所以我使用x的np和。 And here I specify axis equal to 1, which is dimension 1 of the array. 这里我指定轴等于1，这是数组的维数1。 If you wanted to plot the histogram of this, we can just say plt.hist of y. 如果你想绘制这个的柱状图，我们可以说plt.hist of y。 Let’s try running this code. 让我们试着运行这段代码。 In this case, we see that Python plots a histogram, which looks very similar to the histogram we saw before. 在本例中，我们看到Python绘制了一个直方图，它看起来与我们之前看到的直方图非常相似。 But let’s see what happens as we increase the number of rows in our table. 但是让我们看看当我们增加表中的行数时会发生什么。 I’m going to go back to my code and modify the 100 to 10,000. 我将回到我的代码，将100修改为10000。 I will also put a semi-colon at the end of plt.hist to suppress the output of the histogram function. 我还将在plt.hist的末尾放一个分号，以抑制直方图函数的输出。 And we can see that the histogram looks smoother. 我们可以看到直方图看起来更平滑。 I can further increase the size of the table to 1 million. 我可以进一步将表的大小增加到100万。 Remember, in this case we’re generating 1 million realizations of variable y. 记住，在本例中，我们生成了100万个变量y的实现。 And the histogram looks even more smooth in this case. 在这种情况下，直方图看起来更加平滑。 You can see that this code is shorter than our previous code for the same example that didn’t make use of NumPy. 您可以看到，对于未使用NumPy的同一示例，此代码比我们之前的代码要短。 Another difference you probably noticed is that this code is much faster. 您可能注意到的另一个区别是，此代码的速度要快得多。 Generally, using NumPy can result in code that runs over 10 times faster than standard Python code. 通常，使用NumPy可以生成比标准Python代码快10倍以上的代码。 In scientific computation, this makes a big difference. 在科学计算中，这有很大的不同。

numpy 编程算法 python

0 人点赞