1 00:00:00,390 --> 00:00:01,770 Now in the last lesson, 2 00:00:01,800 --> 00:00:06,800 we looked at various ways of finding and locating elements on a particular HTML 3 00:00:07,920 --> 00:00:11,760 page. Now, in this lesson, we're going to put all of that to practice 4 00:00:12,030 --> 00:00:16,890 and you're going to get hold of all of these upcoming events. Now, 5 00:00:16,890 --> 00:00:19,470 because these events are time-dependent, 6 00:00:19,920 --> 00:00:23,400 the events that you'll see will of course be different from what I've got here. 7 00:00:24,120 --> 00:00:27,750 But the idea is to get hold of all of these dates, 8 00:00:27,780 --> 00:00:32,430 so all five of these, and then get hold of all five of these names, 9 00:00:32,670 --> 00:00:35,550 and we're going to create a dictionary from these events. 10 00:00:36,030 --> 00:00:40,470 And by the end of the challenge, you should be able to print out a dictionary 11 00:00:40,470 --> 00:00:41,970 that's structured like this. 12 00:00:42,510 --> 00:00:47,510 It's going to contain five items. Starting from zero to this is the first key 13 00:00:49,440 --> 00:00:54,440 and then the first value is a dictionary with a key of time and a key of name. 14 00:00:55,890 --> 00:00:59,100 And then the values corresponds to of course, 15 00:00:59,100 --> 00:01:01,950 the first date and the first name. 16 00:01:01,980 --> 00:01:06,000 So the first one we've got here is PyCon JP 2020 17 00:01:06,420 --> 00:01:10,920 and you can see down here in this section, upcoming events, 18 00:01:11,040 --> 00:01:13,110 the first one is this item. 19 00:01:13,590 --> 00:01:18,270 So we're basically converting whatever is here in the upcoming events into this 20 00:01:18,300 --> 00:01:21,720 dictionary format. That is the goal 21 00:01:21,930 --> 00:01:25,800 and you're going to be needing everything that you've learned previously. 22 00:01:26,130 --> 00:01:26,963 In addition, 23 00:01:27,090 --> 00:01:32,070 it might be worth taking a look at the documentation for locating elements with 24 00:01:32,070 --> 00:01:36,480 selenium. So I'll link to this page in the course resources as well, 25 00:01:36,870 --> 00:01:41,870 and once you're ready, get inspecting and see if you can complete this challenge 26 00:01:42,540 --> 00:01:46,200 and print out this dictionary. Pause the video now. 27 00:01:47,600 --> 00:01:48,433 All right. 28 00:01:53,180 --> 00:01:55,700 So our goal is to get hold 29 00:01:55,700 --> 00:01:59,600 of this piece of data and this piece of data, 30 00:01:59,630 --> 00:02:00,860 but just the text. 31 00:02:01,370 --> 00:02:05,750 So we're going to have to first figure out how to locate these items. 32 00:02:06,170 --> 00:02:08,960 So if I go ahead and inspect on this date, 33 00:02:09,470 --> 00:02:13,700 you can see that it's inside a HTML element called time. 34 00:02:14,690 --> 00:02:18,230 Now, in order to get hold of this element time, 35 00:02:18,320 --> 00:02:20,960 we can of course use an XPath. 36 00:02:21,380 --> 00:02:26,380 But the problem is that this XPath will be specific for this first item. 37 00:02:27,050 --> 00:02:32,050 While we actually probably want to use a find method that can find all of the 38 00:02:33,770 --> 00:02:36,680 dates and all of the event names. 39 00:02:37,160 --> 00:02:40,880 So we're going to be using one of these multiple element finds. 40 00:02:41,120 --> 00:02:43,280 So where it says find elements. 41 00:02:44,180 --> 00:02:47,660 Now I think the easiest way here, at least for me, 42 00:02:47,870 --> 00:02:51,050 is to think about it in terms of CSS selectors. 43 00:02:51,710 --> 00:02:56,710 This is a time element that lives inside a li 44 00:02:57,290 --> 00:03:00,940 which is inside a ul, but we still haven't found anything 45 00:03:00,940 --> 00:03:04,180 that's unique to this particular structure here, 46 00:03:04,720 --> 00:03:08,680 because if you take a look over here, when we look at the latest news, 47 00:03:08,950 --> 00:03:12,100 it's also a time inside an li inside a ul. 48 00:03:12,520 --> 00:03:17,520 And none of this is unique until we get to this div where we have a blog widget 49 00:03:19,450 --> 00:03:23,740 while here we've got a event widget. 50 00:03:24,130 --> 00:03:27,190 So there finally is a unique class name. 51 00:03:27,730 --> 00:03:32,730 So we're going to use this class name and then find the time element. Back in our 52 00:03:34,840 --> 00:03:35,290 code, 53 00:03:35,290 --> 00:03:40,290 let's go ahead and tap into our driver and then use the find_elements_by_css 54 00:03:41,710 --> 00:03:46,180 selector which is going to give us a list of elements that match the selector. 55 00:03:46,630 --> 00:03:51,630 And the selector is going to be first find a div with this particular class. 56 00:03:52,330 --> 00:03:55,300 So it has to have the class event-widget, 57 00:03:55,630 --> 00:03:58,990 so we write .event-widget. And then after a space, 58 00:03:59,020 --> 00:04:02,230 we specify the next thing we want to drill down to, 59 00:04:02,650 --> 00:04:04,390 which is a time element. 60 00:04:04,840 --> 00:04:08,320 So let's go ahead and just put the name of the HTML element like this, 61 00:04:08,650 --> 00:04:13,030 and we'll get the event times as a list, hopefully. 62 00:04:13,480 --> 00:04:16,779 So let's go ahead and print this out, event_times, 63 00:04:17,260 --> 00:04:22,029 and because this is actually going to be a selenium object 64 00:04:22,060 --> 00:04:24,190 rather than the actual text, 65 00:04:24,610 --> 00:04:28,690 we'll need to use a for loop in order to actually see what it is. 66 00:04:29,020 --> 00:04:31,690 So for time in event_times, 67 00:04:31,930 --> 00:04:35,950 let's go ahead and print each of the time.text. 68 00:04:39,460 --> 00:04:41,350 So once selenium has done its thing, 69 00:04:41,380 --> 00:04:46,120 you can see it's now got hold of all five dates. 70 00:04:47,950 --> 00:04:48,280 Now, 71 00:04:48,280 --> 00:04:53,200 the next thing we need to do is to get hold of the event names, 72 00:04:53,710 --> 00:04:55,810 and we're going to use a similar method. 73 00:04:55,810 --> 00:05:00,810 So let's go ahead and inspect on the name and you can see that this is now a 74 00:05:02,200 --> 00:05:03,033 anchor tag. 75 00:05:03,580 --> 00:05:08,580 Now you might think that the solution will be just as simple as copying what we 76 00:05:08,950 --> 00:05:13,950 had before and replacing the time element with an anchor tag element. 77 00:05:14,920 --> 00:05:19,840 But you'll see as I write my for loop, for name in event_names, 78 00:05:21,370 --> 00:05:23,230 print name.text. 79 00:05:24,070 --> 00:05:29,070 This does not actually get us what we want because it also gives us the first 80 00:05:30,640 --> 00:05:34,540 anchor tag in that div with class name event-widget 81 00:05:34,900 --> 00:05:39,370 which is this 'more' link here. So if we don't want that more link, 82 00:05:39,670 --> 00:05:42,070 we're going to have to be a little bit more creative. 83 00:05:42,760 --> 00:05:47,760 This anchor tag is also inside an li while that more link is definitely not 84 00:05:50,500 --> 00:05:55,210 inside an li. So we can narrow down on our selector by saying, 85 00:05:55,270 --> 00:05:59,060 okay, so it's inside a element with class event-widget, 86 00:05:59,360 --> 00:06:03,620 but then it's inside an li and then it's inside an anchor tag. 87 00:06:04,220 --> 00:06:09,080 Now that I've updated that CSS selector, you can see when I hit print, 88 00:06:09,200 --> 00:06:12,770 it gets us the actual names of all of the conferences. 89 00:06:13,520 --> 00:06:18,520 So the final thing to do is to actually create our events dictionary. 90 00:06:20,000 --> 00:06:22,310 You could do this using dictionary comprehension, 91 00:06:22,580 --> 00:06:25,460 but I'm going to do it in a slightly more long form way 92 00:06:25,490 --> 00:06:28,070 just so that anybody who's a little bit confused 93 00:06:28,340 --> 00:06:30,290 it'll make it a little bit easier to understand. 94 00:06:31,070 --> 00:06:34,880 So we're going to create a for loop and I'm going to use n. So I'm going to say 95 00:06:34,880 --> 00:06:36,530 for n in range 96 00:06:36,860 --> 00:06:41,860 and the range is going to be from zero to the length of event_times. 97 00:06:42,590 --> 00:06:45,530 So it's basically going to be a range from zero to four. 98 00:06:46,250 --> 00:06:50,600 Once I've got that range, now I'm going to add to my events. 99 00:06:51,320 --> 00:06:55,220 The key of the event is the actual n, so the number, 100 00:06:55,610 --> 00:07:00,610 and then the value of the event is a dictionary with the key of time 101 00:07:01,460 --> 00:07:06,410 and also a key of name, like this. 102 00:07:07,010 --> 00:07:11,780 Now the time is going to be from the event times and then getting hold of the 103 00:07:11,780 --> 00:07:13,580 item at index n, 104 00:07:13,940 --> 00:07:18,940 and the name is going to be event_names and the item at index 105 00:07:19,130 --> 00:07:19,963 n. 106 00:07:21,680 --> 00:07:26,680 Now the final thing we need to do is this gets hold of a selenium object and we 107 00:07:27,290 --> 00:07:31,340 have to get hold of the actual text. So let's write .text. 108 00:07:31,700 --> 00:07:34,550 And now we can print our events dictionary. 109 00:07:36,260 --> 00:07:41,000 And once that's done, you can see this is in the exact format that we wanted. 110 00:07:41,360 --> 00:07:43,640 We've got dictionary with five items, 111 00:07:43,820 --> 00:07:47,450 each item has a dictionary in itself, time 112 00:07:47,690 --> 00:07:50,870 and name of the upcoming Python conferences. 113 00:07:52,040 --> 00:07:55,370 So did you manage to complete that challenge? If not, 114 00:07:55,460 --> 00:07:58,550 it might be worth either reviewing CSS selectors 115 00:07:58,580 --> 00:08:02,990 which we went through in previous lessons or reviewing some of the lessons 116 00:08:02,990 --> 00:08:04,760 previously where we discussed 117 00:08:04,760 --> 00:08:09,760 how to locate and how to get hold of the text from elements using selenium.