1 00:00:00,960 --> 00:00:01,740 In this lesson, 2 00:00:01,740 --> 00:00:06,120 I want to show you how you can use loops with pandas dataframes and how to 3 00:00:06,120 --> 00:00:11,010 iterate over a pandas data frame. So here, I've got a simple dictionary, 4 00:00:11,280 --> 00:00:15,720 I've got two keys, student and score, and under student 5 00:00:15,750 --> 00:00:18,870 I've got a list of student names, and under score 6 00:00:18,870 --> 00:00:21,480 I've got a list of their corresponding scores. 7 00:00:21,990 --> 00:00:26,990 Now we know that we can loop through a dictionary very simply by creating a for 8 00:00:28,500 --> 00:00:30,540 loop and then we say, well, 9 00:00:30,600 --> 00:00:35,600 we're going to go through each of the key and values inside this student 10 00:00:36,120 --> 00:00:36,953 dictionary. 11 00:00:37,230 --> 00:00:41,190 And then we're going to get all of the items in order to be able to loop through 12 00:00:41,190 --> 00:00:45,360 it. So now when I print each of the keys, 13 00:00:45,690 --> 00:00:49,770 you can see that it goes through the dictionary and prints both of the keys. 14 00:00:50,910 --> 00:00:54,240 And similarly, I can get it to loop through both of the values. 15 00:00:54,660 --> 00:00:57,540 So this is how we've been looping through dictionaries 16 00:00:57,870 --> 00:01:01,020 and we've been using it in our dictionary comprehension. 17 00:01:01,890 --> 00:01:05,820 Now you can loop through a data frame in the same way that you loop through a 18 00:01:05,820 --> 00:01:07,800 dictionary. In a lot of ways, 19 00:01:07,830 --> 00:01:12,360 you can consider a data frame pretty much as if you're working with a Python 20 00:01:12,360 --> 00:01:16,170 dictionary. So I'm going to go ahead and import pandas 21 00:01:16,710 --> 00:01:20,730 and I'm going to use pandas to create a new data frame, 22 00:01:21,390 --> 00:01:24,840 and it's going to be created from our student dictionary. 23 00:01:25,230 --> 00:01:26,760 So you've seen all of this before, 24 00:01:26,820 --> 00:01:31,820 and I'll just call this the student_data_frame and I can print it for you to see 25 00:01:34,080 --> 00:01:34,910 what it looks like. 26 00:01:34,910 --> 00:01:35,743 Okay. 27 00:01:38,630 --> 00:01:40,280 This is our data frame. 28 00:01:40,280 --> 00:01:45,280 It looks like a pretty standard table with the first column being all of the 29 00:01:45,290 --> 00:01:49,280 indices. So at zero index is this first row, 30 00:01:49,940 --> 00:01:53,480 and that basically denotes the index of each row. 31 00:01:54,230 --> 00:01:56,660 Now working with this data frame, 32 00:01:56,750 --> 00:02:01,750 we can actually loop through a data frame using the same method as before. 33 00:02:02,990 --> 00:02:07,990 So we can say for key, value in our student_data_frame .items. 34 00:02:14,390 --> 00:02:17,690 So if I print each of the keys, 35 00:02:18,980 --> 00:02:23,180 you can see it's just going to give me the titles of each column. 36 00:02:23,810 --> 00:02:26,720 But if I print each of the values, 37 00:02:28,520 --> 00:02:32,030 then it's going to give me the data in each of the columns. 38 00:02:32,660 --> 00:02:37,280 Now this is not particularly useful because it's basically just looping through 39 00:02:37,610 --> 00:02:42,110 the names of our columns and then the data inside each column. 40 00:02:42,710 --> 00:02:46,430 This is why pandas has a inbuilt loop 41 00:02:47,180 --> 00:02:50,690 and it's a method called iterrows. 42 00:02:51,140 --> 00:02:56,140 And it allows us to loop through each of the rows of the data frame rather than 43 00:02:56,540 --> 00:02:57,680 each of the columns. 44 00:02:58,490 --> 00:03:03,490 And the way that we do that is we again use a for loop and then we can get hold 45 00:03:03,910 --> 00:03:07,030 of each of the index inside each row, 46 00:03:07,030 --> 00:03:10,570 so that corresponds to the number in that first column. 47 00:03:11,050 --> 00:03:14,320 And then we can get hold of the data in the row. 48 00:03:15,010 --> 00:03:19,660 And then we can say for index row in data frame, 49 00:03:19,690 --> 00:03:23,530 which is student_data_frame, and then its that method.iter 50 00:03:23,530 --> 00:03:24,850 rows. 51 00:03:26,290 --> 00:03:31,290 And now I can loop through each of those rows and print out either the index for 52 00:03:34,150 --> 00:03:35,260 each of those rows. 53 00:03:36,250 --> 00:03:39,820 So you can see that this is going to print out our data frame here, 54 00:03:40,150 --> 00:03:43,360 And then in order to print out each of the index at 0, 1, 2. 55 00:03:43,750 --> 00:03:46,900 But I can also print out each of the rows. 56 00:03:47,530 --> 00:03:52,530 So now I get the first row has a student and a score, 57 00:03:53,380 --> 00:03:57,310 the second row has a student and a score, and the third row has a student and 58 00:03:57,310 --> 00:03:58,143 score. 59 00:03:58,540 --> 00:04:03,540 So each of these rows is a pandas series object. So that means we can tap into the 60 00:04:04,480 --> 00:04:09,480 row and then get hold of the value under a particular column by using the dot 61 00:04:10,690 --> 00:04:13,930 notation. So we can say row.student 62 00:04:14,470 --> 00:04:16,360 and now when it goes through the loop, 63 00:04:16,690 --> 00:04:19,540 you can see first, it's going to print out our entire data frame, 64 00:04:19,870 --> 00:04:24,490 and then it's going to print out each of the students inside that data frame. 65 00:04:25,090 --> 00:04:28,240 Now I can also say row.score, 66 00:04:28,900 --> 00:04:32,320 and now it's going to give me each of the scores inside the data frame. 67 00:04:32,740 --> 00:04:35,320 And I can even do something like this where I say 68 00:04:35,410 --> 00:04:40,410 if the row.student is equal to Angela, 69 00:04:41,440 --> 00:04:46,440 well then we can print that particular row that we're currently looping on, 70 00:04:47,020 --> 00:04:51,850 .score. And this way we would get the student, Angela's score 71 00:04:51,880 --> 00:04:56,050 which happens to be 56, as you can verify here.