1 00:00:00,480 --> 00:00:03,480 foreign 2 00:00:08,220 --> 00:00:11,660 Welcome to our last talk in this room 3 00:00:10,380 --> 00:00:14,580 today 4 00:00:11,660 --> 00:00:16,680 Dave has over 20 years professional 5 00:00:14,580 --> 00:00:18,359 experience building breaking and 6 00:00:16,680 --> 00:00:20,220 learning about things 7 00:00:18,359 --> 00:00:23,760 he does a mix of software engineering 8 00:00:20,220 --> 00:00:25,920 Cloud SRE devops and training with a 9 00:00:23,760 --> 00:00:27,840 focus on Automation event-driven 10 00:00:25,920 --> 00:00:29,840 architectures and serverless and is 11 00:00:27,840 --> 00:00:33,719 going to talk to us about proactive Ops 12 00:00:29,840 --> 00:00:36,000 figaroid Dave cool thanks yeah so today 13 00:00:33,719 --> 00:00:40,500 I'll be talking to you about proactive 14 00:00:36,000 --> 00:00:43,559 Ops or event-driven it operations before 15 00:00:40,500 --> 00:00:46,260 we get into what it's all about I want 16 00:00:43,559 --> 00:00:49,620 to start with a bit of a story any of 17 00:00:46,260 --> 00:00:51,899 you who have been on call you may 18 00:00:49,620 --> 00:00:55,140 identify with this 19 00:00:51,899 --> 00:00:56,579 so it's 2 30 in the morning you're 20 00:00:55,140 --> 00:00:58,500 faster sleep 21 00:00:56,579 --> 00:01:01,620 and 22 00:00:58,500 --> 00:01:03,660 you hear this sound 23 00:01:01,620 --> 00:01:06,540 you roll over 24 00:01:03,660 --> 00:01:10,040 you look at the clock 25 00:01:06,540 --> 00:01:10,040 it's 2 30 in the morning 26 00:01:10,560 --> 00:01:16,340 you look at your phone there's missed 27 00:01:13,020 --> 00:01:16,340 calls there's messages 28 00:01:17,040 --> 00:01:23,280 you get out of bed 29 00:01:20,040 --> 00:01:27,320 your first thought is this 30 00:01:23,280 --> 00:01:27,320 your second thought is this 31 00:01:27,479 --> 00:01:33,600 you grab your laptop you open it up 32 00:01:31,080 --> 00:01:36,119 you open Slack 33 00:01:33,600 --> 00:01:39,180 and 34 00:01:36,119 --> 00:01:41,700 there's a few messages waiting for you 35 00:01:39,180 --> 00:01:43,799 there's also a zoom call going 36 00:01:41,700 --> 00:01:46,500 you jump on Zoom 37 00:01:43,799 --> 00:01:49,020 everyone's got their camera off no one 38 00:01:46,500 --> 00:01:51,540 wants to be there you can tell it's 39 00:01:49,020 --> 00:01:53,939 going to be a long night 40 00:01:51,540 --> 00:01:56,700 you think that checking the logs might 41 00:01:53,939 --> 00:01:57,840 be a good place to start there's some 42 00:01:56,700 --> 00:02:01,500 areas 43 00:01:57,840 --> 00:02:03,439 it's actually a lot of errors 44 00:02:01,500 --> 00:02:06,780 I told you a lot 45 00:02:03,439 --> 00:02:08,640 so then you think the lugs aren't really 46 00:02:06,780 --> 00:02:11,520 going to tell me a lot let's try to 47 00:02:08,640 --> 00:02:13,980 figure out how we got here so you take a 48 00:02:11,520 --> 00:02:15,599 look at the get history 49 00:02:13,980 --> 00:02:18,000 and then you start looking at it 50 00:02:15,599 --> 00:02:20,640 properly and you realize that this isn't 51 00:02:18,000 --> 00:02:22,680 going to help you find the source of the 52 00:02:20,640 --> 00:02:25,980 problem 53 00:02:22,680 --> 00:02:28,200 it's running on AWS so you go have a 54 00:02:25,980 --> 00:02:30,840 look at the ec2 instance 55 00:02:28,200 --> 00:02:33,959 it's not properly tagged there's no real 56 00:02:30,840 --> 00:02:35,520 useful information set up there no one 57 00:02:33,959 --> 00:02:38,120 bothered to configure monitoring 58 00:02:35,520 --> 00:02:38,120 properly either 59 00:02:38,640 --> 00:02:44,519 so let's just pause for a second and 60 00:02:42,360 --> 00:02:46,019 I'll talk about what I won't talk about 61 00:02:44,519 --> 00:02:48,420 today 62 00:02:46,019 --> 00:02:50,459 so I'm not going to talk about splitting 63 00:02:48,420 --> 00:02:54,239 the monolith 64 00:02:50,459 --> 00:02:56,160 um have we got any managers in the room 65 00:02:54,239 --> 00:02:59,879 ah okay 66 00:02:56,160 --> 00:03:02,760 um you might think that devopsing harder 67 00:02:59,879 --> 00:03:05,340 will fix all of your problems devopsing 68 00:03:02,760 --> 00:03:08,940 harder won't fix all your problems I've 69 00:03:05,340 --> 00:03:11,580 seen teams try doesn't work 70 00:03:08,940 --> 00:03:13,800 um I won't fix your jira workflow but if 71 00:03:11,580 --> 00:03:15,540 your jira workflow looks like this let's 72 00:03:13,800 --> 00:03:16,819 talk afterwards I can't help you with 73 00:03:15,540 --> 00:03:20,640 that 74 00:03:16,819 --> 00:03:22,440 so let's rewind and figure out how we 75 00:03:20,640 --> 00:03:25,440 got here 76 00:03:22,440 --> 00:03:27,480 so the teams delivering features 77 00:03:25,440 --> 00:03:30,540 um you know bit every Sprint they're 78 00:03:27,480 --> 00:03:33,599 feeling all right but the technical debt 79 00:03:30,540 --> 00:03:36,000 is also piling up 80 00:03:33,599 --> 00:03:38,760 there's a lot of alerts there's a lot of 81 00:03:36,000 --> 00:03:40,799 notifications it's generating a lot of 82 00:03:38,760 --> 00:03:43,319 noise it's really hard to tell what's 83 00:03:40,799 --> 00:03:46,019 real and what's not and what the team 84 00:03:43,319 --> 00:03:48,599 needs to care about 85 00:03:46,019 --> 00:03:50,760 uh this is the Ops Team on a normal day 86 00:03:48,599 --> 00:03:53,040 they're dealing with the first issue 87 00:03:50,760 --> 00:03:55,440 they're still struggling and the next 88 00:03:53,040 --> 00:03:58,379 one comes along and it's just constantly 89 00:03:55,440 --> 00:04:00,959 feeling like that keeper 90 00:03:58,379 --> 00:04:03,120 so let's do a little retro 91 00:04:00,959 --> 00:04:05,640 so if you've got to go into the office 92 00:04:03,120 --> 00:04:07,980 on Friday the boss should be buying you 93 00:04:05,640 --> 00:04:09,540 Donuts I don't make the rules but if 94 00:04:07,980 --> 00:04:13,080 you're going into the office on Friday 95 00:04:09,540 --> 00:04:16,620 you deserve Donuts 96 00:04:13,080 --> 00:04:21,060 um we should implement observability 97 00:04:16,620 --> 00:04:23,759 we should be stopping doing 2 am outages 98 00:04:21,060 --> 00:04:25,500 if it's your birthday like the donuts I 99 00:04:23,759 --> 00:04:27,060 don't make the rules you should be 100 00:04:25,500 --> 00:04:28,440 getting the day off and it should be a 101 00:04:27,060 --> 00:04:30,300 paid day off 102 00:04:28,440 --> 00:04:32,759 even for contractors 103 00:04:30,300 --> 00:04:35,699 when things go wrong publish 104 00:04:32,759 --> 00:04:40,160 post-mortems everyone should know 105 00:04:35,699 --> 00:04:40,160 um what went wrong and why it went wrong 106 00:04:40,280 --> 00:04:46,620 if you're using safe 107 00:04:42,720 --> 00:04:50,340 please stop it's not helping anyone 108 00:04:46,620 --> 00:04:54,720 and we should start being proactive 109 00:04:50,340 --> 00:04:57,720 so how do we be proactive 110 00:04:54,720 --> 00:05:00,000 we need to build a platform and the 111 00:04:57,720 --> 00:05:04,320 platform is what should support the team 112 00:05:00,000 --> 00:05:07,020 being proactive now this is generated by 113 00:05:04,320 --> 00:05:09,500 mid-journey you'll notice that it's a 114 00:05:07,020 --> 00:05:12,440 bit dodgy in Parts your first proactive 115 00:05:09,500 --> 00:05:15,479 platform is going to be dodgy as well 116 00:05:12,440 --> 00:05:17,340 you just need to iterate and it gets 117 00:05:15,479 --> 00:05:19,139 better you don't want to see the first 118 00:05:17,340 --> 00:05:21,060 version of this image I got out of 119 00:05:19,139 --> 00:05:22,020 mid-journey I had to iterate to get the 120 00:05:21,060 --> 00:05:25,320 image 121 00:05:22,020 --> 00:05:28,139 okay we want to make our customers happy 122 00:05:25,320 --> 00:05:31,440 our customers are the people that we are 123 00:05:28,139 --> 00:05:34,199 here to keep happy and so we should be 124 00:05:31,440 --> 00:05:36,539 trying to make them look like this like 125 00:05:34,199 --> 00:05:40,259 this and like this 126 00:05:36,539 --> 00:05:41,580 so what do we need in order to make our 127 00:05:40,259 --> 00:05:45,600 customers happy 128 00:05:41,580 --> 00:05:47,639 we need events and so we have some 129 00:05:45,600 --> 00:05:49,139 events that come around every couple of 130 00:05:47,639 --> 00:05:52,080 months 131 00:05:49,139 --> 00:05:54,000 we have some events that it's probably 132 00:05:52,080 --> 00:05:55,800 too soon to talk about 133 00:05:54,000 --> 00:05:58,520 and then there's some events that we 134 00:05:55,800 --> 00:06:02,880 pretend didn't really happen 135 00:05:58,520 --> 00:06:03,720 but where do we get the events from 136 00:06:02,880 --> 00:06:06,000 um 137 00:06:03,720 --> 00:06:08,580 so the security nodes are probably like 138 00:06:06,000 --> 00:06:10,139 oh we're going to talk about Splunk no 139 00:06:08,580 --> 00:06:11,820 sorry we're not going to talk about 140 00:06:10,139 --> 00:06:13,919 Splunk today 141 00:06:11,820 --> 00:06:15,539 the data streaming knows like oh Cafe 142 00:06:13,919 --> 00:06:18,060 yeah cool 143 00:06:15,539 --> 00:06:20,460 sorry I'm disappointing you as well 144 00:06:18,060 --> 00:06:22,199 and the observability notes are like he 145 00:06:20,460 --> 00:06:23,940 said Implement observability we're going 146 00:06:22,199 --> 00:06:26,400 to talk about a hotel 147 00:06:23,940 --> 00:06:30,479 no sorry we're going to be talking about 148 00:06:26,400 --> 00:06:33,300 web Hooks and lots of systems have web 149 00:06:30,479 --> 00:06:37,020 hooks so give us data about what what's 150 00:06:33,300 --> 00:06:40,319 happening in those systems so various 151 00:06:37,020 --> 00:06:42,120 tools like GitHub have web hooks if 152 00:06:40,319 --> 00:06:45,600 you're an unwashed open source hippie 153 00:06:42,120 --> 00:06:48,000 gitlab will give you web hooks if you're 154 00:06:45,600 --> 00:06:50,160 stuck using bitbucket I'm really sorry 155 00:06:48,000 --> 00:06:51,500 but you can get web hooks out of 156 00:06:50,160 --> 00:06:54,660 bitbucket 157 00:06:51,500 --> 00:06:58,680 also if you're using jira you can get 158 00:06:54,660 --> 00:07:00,600 data event data via webhooks if you want 159 00:06:58,680 --> 00:07:01,979 a really nice ticketing system use 160 00:07:00,600 --> 00:07:04,340 linear 161 00:07:01,979 --> 00:07:07,919 not sponsored 162 00:07:04,340 --> 00:07:10,440 Asana click up 163 00:07:07,919 --> 00:07:13,259 um servicenow happy to give you web 164 00:07:10,440 --> 00:07:15,680 hooks zendesk less happy to give you web 165 00:07:13,259 --> 00:07:18,720 hooks on webhook data for some reason 166 00:07:15,680 --> 00:07:20,699 you tend to have to throw a lot of money 167 00:07:18,720 --> 00:07:24,180 at them and then grovel to your account 168 00:07:20,699 --> 00:07:26,520 manager speaking from experience helps 169 00:07:24,180 --> 00:07:29,699 Scout loads of systems will give you 170 00:07:26,520 --> 00:07:31,680 events if you're on AWS you can get them 171 00:07:29,699 --> 00:07:34,259 out of cloudtrail guard Duty security 172 00:07:31,680 --> 00:07:36,660 Hub and you get the cloud watch events 173 00:07:34,259 --> 00:07:39,599 out of various things as well 174 00:07:36,660 --> 00:07:42,960 so once we've got our events we want a 175 00:07:39,599 --> 00:07:46,199 bus for transporting those events around 176 00:07:42,960 --> 00:07:49,020 and why a bus well buses are really 177 00:07:46,199 --> 00:07:50,699 useful and they can be fun you can catch 178 00:07:49,020 --> 00:07:53,520 them to go shopping 179 00:07:50,699 --> 00:07:55,199 you can catch them to go on a holiday 180 00:07:53,520 --> 00:07:56,780 or you can even catch one to get real 181 00:07:55,199 --> 00:08:00,780 coffee 182 00:07:56,780 --> 00:08:00,780 so if 183 00:08:00,840 --> 00:08:06,120 yeah pick the guy who grew up in 184 00:08:02,340 --> 00:08:11,280 Melbourne so when we've got our bus on 185 00:08:06,120 --> 00:08:14,060 AWS it's going to be eventbridge for the 186 00:08:11,280 --> 00:08:16,500 unwashed hippies again you can use 187 00:08:14,060 --> 00:08:20,039 trigger mesh or there's other services 188 00:08:16,500 --> 00:08:22,259 around that give you a busted cool thing 189 00:08:20,039 --> 00:08:25,319 about triggermash is they use the cloud 190 00:08:22,259 --> 00:08:29,039 event standard like a lot of other 191 00:08:25,319 --> 00:08:32,039 platforms unlike AWS 192 00:08:29,039 --> 00:08:34,440 um but we're we're going to be using 193 00:08:32,039 --> 00:08:36,779 um Eventbrite or talking about event 194 00:08:34,440 --> 00:08:39,959 Bridge during my talk today 195 00:08:36,779 --> 00:08:43,080 another thing that's really useful is 196 00:08:39,959 --> 00:08:48,060 Step functions that allows us to 197 00:08:43,080 --> 00:08:50,160 abstract away uh sorry to separate our 198 00:08:48,060 --> 00:08:51,680 logic and our flow and we'll talk about 199 00:08:50,160 --> 00:08:55,860 that in a bit 200 00:08:51,680 --> 00:08:57,899 again you can use Apache airflow or 201 00:08:55,860 --> 00:08:58,440 other such tools 202 00:08:57,899 --> 00:09:00,959 um 203 00:08:58,440 --> 00:09:03,120 but yeah I'll talk about step functions 204 00:09:00,959 --> 00:09:06,000 and here's a little example flow so 205 00:09:03,120 --> 00:09:09,060 we've got GitHub we've got a Lambda 206 00:09:06,000 --> 00:09:12,240 function which is a got a function URL 207 00:09:09,060 --> 00:09:16,140 on it so we can send the events to it it 208 00:09:12,240 --> 00:09:20,160 goes into eventbridge matches a rule and 209 00:09:16,140 --> 00:09:22,200 goes to step function and because I 210 00:09:20,160 --> 00:09:24,959 really like doing animations in keynote 211 00:09:22,200 --> 00:09:27,120 we get to watch our event go wee 212 00:09:24,959 --> 00:09:29,279 straight across over to our step 213 00:09:27,120 --> 00:09:31,440 function and we'll talk a bit more about 214 00:09:29,279 --> 00:09:35,660 how all of this works 215 00:09:31,440 --> 00:09:39,420 so what we want is Lambda functions 216 00:09:35,660 --> 00:09:42,000 or an open source alternative is open 217 00:09:39,420 --> 00:09:43,980 Faz uh there's 218 00:09:42,000 --> 00:09:45,839 um Open West there's a whole bunch of 219 00:09:43,980 --> 00:09:48,600 different tools out there it's up to you 220 00:09:45,839 --> 00:09:51,120 to find the one that works best 221 00:09:48,600 --> 00:09:54,300 um I hate managing servers patching 222 00:09:51,120 --> 00:09:56,580 servers all that stuff and let's not get 223 00:09:54,300 --> 00:09:59,700 started about kubernetes so my 224 00:09:56,580 --> 00:10:01,860 preference is to use Lambda and when I'm 225 00:09:59,700 --> 00:10:04,140 talking about Lambda functions I'm not 226 00:10:01,860 --> 00:10:05,580 talking about big long Lambda functions 227 00:10:04,140 --> 00:10:08,820 like this 228 00:10:05,580 --> 00:10:12,660 I want to focus on writing really small 229 00:10:08,820 --> 00:10:16,019 Lambda functions that can be reused so 230 00:10:12,660 --> 00:10:18,680 here we've got a Lambda function 231 00:10:16,019 --> 00:10:21,000 and the purpose of this Lambda function 232 00:10:18,680 --> 00:10:24,680 is to just 233 00:10:21,000 --> 00:10:28,260 um match zero ticket references 234 00:10:24,680 --> 00:10:31,080 within any string and we can reuse this 235 00:10:28,260 --> 00:10:34,019 function wherever we're interested in 236 00:10:31,080 --> 00:10:36,300 finding out where jira ticket references 237 00:10:34,019 --> 00:10:39,779 are I love hating on jira but all my 238 00:10:36,300 --> 00:10:42,839 clients use it so I use this now the 239 00:10:39,779 --> 00:10:45,000 alternative to doing this is every time 240 00:10:42,839 --> 00:10:48,839 you need the regex to match a Giro 241 00:10:45,000 --> 00:10:52,440 ticket you go to stack Overflow or chat 242 00:10:48,839 --> 00:10:54,300 jpt or wherever and you copy and paste a 243 00:10:52,440 --> 00:10:57,899 slightly different version and then you 244 00:10:54,300 --> 00:11:00,300 end up with Y here does it work as I 245 00:10:57,899 --> 00:11:02,399 expect it but it doesn't match the the 246 00:11:00,300 --> 00:11:03,120 ticket when it runs through this bit of 247 00:11:02,399 --> 00:11:06,899 code 248 00:11:03,120 --> 00:11:09,899 so if we have this um code that we can 249 00:11:06,899 --> 00:11:12,360 reuse over and over again it gives us 250 00:11:09,899 --> 00:11:15,600 consistency 251 00:11:12,360 --> 00:11:17,399 and when I'm talking about having these 252 00:11:15,600 --> 00:11:19,740 little Lambda functions I'm talking 253 00:11:17,399 --> 00:11:23,160 about having a lot of them we want to 254 00:11:19,740 --> 00:11:27,240 have a whole collection of these 255 00:11:23,160 --> 00:11:30,440 functions that we can reuse so it might 256 00:11:27,240 --> 00:11:34,200 be matching particular strings 257 00:11:30,440 --> 00:11:36,480 or you know it might wrap libraries or 258 00:11:34,200 --> 00:11:38,459 whatever we want it to do but it makes 259 00:11:36,480 --> 00:11:41,279 it really easy to grab these functions 260 00:11:38,459 --> 00:11:44,339 and reuse things 261 00:11:41,279 --> 00:11:46,560 now you're probably going a lot of my 262 00:11:44,339 --> 00:11:48,540 stuff is integrating with third-party 263 00:11:46,560 --> 00:11:50,880 cloud services or third-party 264 00:11:48,540 --> 00:11:53,839 applications 265 00:11:50,880 --> 00:11:56,459 and a lot of them have apis 266 00:11:53,839 --> 00:11:58,800 thankfully most of them haven't gone 267 00:11:56,459 --> 00:12:00,540 down the road of Twitter and expect you 268 00:11:58,800 --> 00:12:02,300 to pay an arm and a leg every month just 269 00:12:00,540 --> 00:12:04,740 to access an API 270 00:12:02,300 --> 00:12:08,160 and they publish 271 00:12:04,740 --> 00:12:12,420 um specs for their apis in the form of 272 00:12:08,160 --> 00:12:15,779 open API or formally Swagger files which 273 00:12:12,420 --> 00:12:18,959 makes it really easy to interact with 274 00:12:15,779 --> 00:12:22,680 the apis but you still need to write 275 00:12:18,959 --> 00:12:24,720 code to interact with those apis now 276 00:12:22,680 --> 00:12:27,540 this is how we used to do it 277 00:12:24,720 --> 00:12:29,820 traditionally we'd spend most of our day 278 00:12:27,540 --> 00:12:31,500 working like this 279 00:12:29,820 --> 00:12:35,220 um lots of fun 280 00:12:31,500 --> 00:12:37,560 and this is how we do it in 2023 281 00:12:35,220 --> 00:12:40,800 um and I guarantee you this prompt works 282 00:12:37,560 --> 00:12:43,139 perfectly every time 283 00:12:40,800 --> 00:12:45,300 does all my work for me 284 00:12:43,139 --> 00:12:47,820 um well what I've actually gone and done 285 00:12:45,300 --> 00:12:51,060 is I've generated I've gone and created 286 00:12:47,820 --> 00:12:54,240 a python code generator which will 287 00:12:51,060 --> 00:12:57,480 generate Lambda functions that are 288 00:12:54,240 --> 00:13:00,300 clients for open API spec files and I 289 00:12:57,480 --> 00:13:03,800 call it picofun because there is little 290 00:13:00,300 --> 00:13:08,100 fun in writing boilerplate code 291 00:13:03,800 --> 00:13:09,000 so this is a quick demo of the the 292 00:13:08,100 --> 00:13:11,700 library 293 00:13:09,000 --> 00:13:14,579 of the tool 294 00:13:11,700 --> 00:13:17,459 um so we give it an output directory and 295 00:13:14,579 --> 00:13:20,760 a spec file I've been using some 296 00:13:17,459 --> 00:13:24,839 Services gives you free unauthor 297 00:13:20,760 --> 00:13:28,920 unauthenticated uh requests which I use 298 00:13:24,839 --> 00:13:33,120 I've also been using the GitHub API but 299 00:13:28,920 --> 00:13:34,920 that has around 900 endpoints and it 300 00:13:33,120 --> 00:13:37,139 takes several minutes for it to generate 301 00:13:34,920 --> 00:13:40,740 the code and then over an hour for it to 302 00:13:37,139 --> 00:13:42,360 deploy and when you have little bugs and 303 00:13:40,740 --> 00:13:43,800 you're preparing for a talk it's not 304 00:13:42,360 --> 00:13:45,680 much fun so we're going to use the 305 00:13:43,800 --> 00:13:50,040 little one today 306 00:13:45,680 --> 00:13:52,440 and so here we've run the first step and 307 00:13:50,040 --> 00:13:55,260 it's generated all of our Lambda 308 00:13:52,440 --> 00:13:57,600 functions for us we can see here there 309 00:13:55,260 --> 00:14:01,200 are all the the lambdas 310 00:13:57,600 --> 00:14:03,839 our way is just a requirements file and 311 00:14:01,200 --> 00:14:06,060 then we've got our terraform here I'm 312 00:14:03,839 --> 00:14:09,240 not going to talk through all of the 313 00:14:06,060 --> 00:14:11,339 terraform because it's pycon uh not 314 00:14:09,240 --> 00:14:13,980 hashicon 315 00:14:11,339 --> 00:14:16,680 um and so here's the the code it's 316 00:14:13,980 --> 00:14:19,440 generated again it's quite short the 317 00:14:16,680 --> 00:14:21,480 amount of code that it generates and now 318 00:14:19,440 --> 00:14:23,060 here we're going to deploy it with 319 00:14:21,480 --> 00:14:25,980 terraform 320 00:14:23,060 --> 00:14:28,079 and through the magic of a pre-recorded 321 00:14:25,980 --> 00:14:31,260 demo we can actually speed up the 322 00:14:28,079 --> 00:14:34,339 deployment by a minute or so 323 00:14:31,260 --> 00:14:37,560 um and so here it's actually 324 00:14:34,339 --> 00:14:40,980 getting ready to do the deployment I 325 00:14:37,560 --> 00:14:44,940 type yes and now come on do the speed up 326 00:14:40,980 --> 00:14:49,019 there we go and now it's almost done and 327 00:14:44,940 --> 00:14:52,680 we should have all of our functions 328 00:14:49,019 --> 00:14:55,199 very soon there we go so it has gone 329 00:14:52,680 --> 00:14:56,699 updated all of my functions because I've 330 00:14:55,199 --> 00:14:59,160 been deploying them over and over again 331 00:14:56,699 --> 00:15:01,139 we go here we can actually see that 332 00:14:59,160 --> 00:15:02,760 we've got the 333 00:15:01,139 --> 00:15:05,820 um 334 00:15:02,760 --> 00:15:09,600 bad edit we can see here that we've got 335 00:15:05,820 --> 00:15:11,579 the the code here has been deployed and 336 00:15:09,600 --> 00:15:14,639 we've got a couple of Lambda layers to 337 00:15:11,579 --> 00:15:19,620 keep our code nice and short so first 338 00:15:14,639 --> 00:15:22,800 we're going to call the user endpoint 339 00:15:19,620 --> 00:15:28,500 and this Returns the the Json from the 340 00:15:22,800 --> 00:15:30,300 the request and then if we jump to the 341 00:15:28,500 --> 00:15:32,519 um test 404 342 00:15:30,300 --> 00:15:35,040 we'll see what happens when we make a 343 00:15:32,519 --> 00:15:36,959 call that returns a 404. it gives us 344 00:15:35,040 --> 00:15:39,480 stack Trace we can do all the the error 345 00:15:36,959 --> 00:15:43,100 handling within step function which 346 00:15:39,480 --> 00:15:43,100 we'll talk about in a minute 347 00:15:43,500 --> 00:15:50,220 um whoops so now we're going to jump 348 00:15:47,160 --> 00:15:53,160 back to step functions and talk about 349 00:15:50,220 --> 00:15:56,300 why I like step function so much 350 00:15:53,160 --> 00:15:59,459 so here I've got a very simple 351 00:15:56,300 --> 00:16:03,720 step function this one is designed to 352 00:15:59,459 --> 00:16:06,000 listen to zendesk tickets now 353 00:16:03,720 --> 00:16:08,040 um I don't know how many of you work in 354 00:16:06,000 --> 00:16:11,279 large organizations I've worked in the 355 00:16:08,040 --> 00:16:13,680 Enterprise for way too long and there's 356 00:16:11,279 --> 00:16:17,160 always more than one ticketing system in 357 00:16:13,680 --> 00:16:20,100 large organizations and so 358 00:16:17,160 --> 00:16:22,860 um the internal tech support is handled 359 00:16:20,100 --> 00:16:25,560 by zendesk but the engineers they all 360 00:16:22,860 --> 00:16:28,019 want to use jira well they don't want to 361 00:16:25,560 --> 00:16:30,660 they're forced to use jira 362 00:16:28,019 --> 00:16:33,740 um and so what happens is you get 363 00:16:30,660 --> 00:16:37,259 support tickets coming in that reference 364 00:16:33,740 --> 00:16:40,079 jira tickets but there's and the Ops 365 00:16:37,259 --> 00:16:42,420 Team can link them up they rarely do and 366 00:16:40,079 --> 00:16:44,880 when you run get blame in six months 367 00:16:42,420 --> 00:16:48,240 time and you go find the the jira ticket 368 00:16:44,880 --> 00:16:50,160 and the code looks a bit dodgy and you 369 00:16:48,240 --> 00:16:52,680 go back to the ticket it's a bit vague 370 00:16:50,160 --> 00:16:55,079 it's actually nice to have the zendesk 371 00:16:52,680 --> 00:16:57,540 ticket that says hey this blow up 372 00:16:55,079 --> 00:16:59,519 production when it was deployed and this 373 00:16:57,540 --> 00:17:02,339 step function will just listen to every 374 00:16:59,519 --> 00:17:04,319 comment on every zendesk ticket and it 375 00:17:02,339 --> 00:17:06,240 will automatically link up the the 376 00:17:04,319 --> 00:17:06,919 tickets for you 377 00:17:06,240 --> 00:17:09,839 um 378 00:17:06,919 --> 00:17:13,559 this one here is a more complicated 379 00:17:09,839 --> 00:17:15,720 example so this is checking pull 380 00:17:13,559 --> 00:17:18,419 requests and doing some basic quality 381 00:17:15,720 --> 00:17:21,480 checks now the first thing it's going to 382 00:17:18,419 --> 00:17:23,579 do is check how many commits are on a 383 00:17:21,480 --> 00:17:26,040 pull request now I'm sure there's some 384 00:17:23,579 --> 00:17:27,900 of the devs here you're going oh Dave 385 00:17:26,040 --> 00:17:31,500 don't you know there's the squash and 386 00:17:27,900 --> 00:17:33,960 merge button in GitHub yes I know that 387 00:17:31,500 --> 00:17:35,640 button exists I hate it and there's a 388 00:17:33,960 --> 00:17:38,400 couple of reasons why I hate it number 389 00:17:35,640 --> 00:17:40,140 one the commits it creates aren't signed 390 00:17:38,400 --> 00:17:42,240 and I like having signed commits so 391 00:17:40,140 --> 00:17:45,600 actually no the 392 00:17:42,240 --> 00:17:48,360 um the that the right people have made 393 00:17:45,600 --> 00:17:53,880 the changes it adds to the verifiability 394 00:17:48,360 --> 00:17:56,100 of the code and the other thing that it 395 00:17:53,880 --> 00:17:57,480 doesn't do is it just squashes 396 00:17:56,100 --> 00:17:59,700 everything so you might have two or 397 00:17:57,480 --> 00:18:01,500 three unrelated things it's just going 398 00:17:59,700 --> 00:18:03,539 to squash it all into one commit I want 399 00:18:01,500 --> 00:18:05,340 Engineers thinking about what they're 400 00:18:03,539 --> 00:18:09,000 doing and organizing their work properly 401 00:18:05,340 --> 00:18:10,679 into nice commits so we do that and then 402 00:18:09,000 --> 00:18:13,260 we do a check to see if they commit 403 00:18:10,679 --> 00:18:14,820 signed now you're probably going oh geez 404 00:18:13,260 --> 00:18:17,700 this guy doesn't know the features of 405 00:18:14,820 --> 00:18:20,039 GitHub GitHub will check on your pull 406 00:18:17,700 --> 00:18:22,860 request that the commits are signed and 407 00:18:20,039 --> 00:18:24,960 they will stop you from merging but you 408 00:18:22,860 --> 00:18:28,140 can have this hooked up 409 00:18:24,960 --> 00:18:30,900 um and actually send a notification to 410 00:18:28,140 --> 00:18:34,140 the engineer and nag them the other 411 00:18:30,900 --> 00:18:37,200 thing when you're sending notifications 412 00:18:34,140 --> 00:18:40,860 um Engineers they're smart and they're 413 00:18:37,200 --> 00:18:43,020 lazy so if you start spamming them with 414 00:18:40,860 --> 00:18:46,559 notifications they're going to set up 415 00:18:43,020 --> 00:18:48,960 filter and just send all that mail to a 416 00:18:46,559 --> 00:18:50,580 folder that they never look at or set up 417 00:18:48,960 --> 00:18:52,679 to Auto delete 418 00:18:50,580 --> 00:18:55,440 managers 419 00:18:52,679 --> 00:18:57,720 they're also lazy but they're not quite 420 00:18:55,440 --> 00:18:59,760 as smart most of the time they're going 421 00:18:57,720 --> 00:19:02,700 to start getting inboxes full of 422 00:18:59,760 --> 00:19:06,120 notifications if you CC them on 423 00:19:02,700 --> 00:19:07,260 notifications for their staff so and 424 00:19:06,120 --> 00:19:08,880 then they're going to go to the person 425 00:19:07,260 --> 00:19:11,520 and go why do I keep getting messages 426 00:19:08,880 --> 00:19:14,400 about you not signing your commits oh 427 00:19:11,520 --> 00:19:18,299 let me just fix that and the problem 428 00:19:14,400 --> 00:19:20,640 goes away so tip always spam the manager 429 00:19:18,299 --> 00:19:22,679 as well 430 00:19:20,640 --> 00:19:25,559 um so yeah we go through do this check 431 00:19:22,679 --> 00:19:28,080 and here we're actually emitting another 432 00:19:25,559 --> 00:19:29,820 event so we can actually 433 00:19:28,080 --> 00:19:32,160 um react to it in different ways because 434 00:19:29,820 --> 00:19:33,299 if you've got a larger organization some 435 00:19:32,160 --> 00:19:35,039 teams 436 00:19:33,299 --> 00:19:36,960 um they may not care about some of this 437 00:19:35,039 --> 00:19:39,179 stuff some teams 438 00:19:36,960 --> 00:19:40,919 um may not want to have the the actual 439 00:19:39,179 --> 00:19:43,140 manager they might want to have the tech 440 00:19:40,919 --> 00:19:46,260 lead notified and stuff so by having 441 00:19:43,140 --> 00:19:48,059 different event listeners you can have 442 00:19:46,260 --> 00:19:51,840 um different logic based on the the 443 00:19:48,059 --> 00:19:53,640 reaction so you can break things up and 444 00:19:51,840 --> 00:19:55,980 um oh the other thing we've got in here 445 00:19:53,640 --> 00:19:58,320 that I forgot to mention is we're also 446 00:19:55,980 --> 00:20:00,080 checking that the 447 00:19:58,320 --> 00:20:03,600 um every commit 448 00:20:00,080 --> 00:20:05,640 references a jira ticket because in six 449 00:20:03,600 --> 00:20:07,260 months time when it's 2 30 in the 450 00:20:05,640 --> 00:20:10,260 morning and you're sitting there looking 451 00:20:07,260 --> 00:20:11,880 at get history you want to run get blame 452 00:20:10,260 --> 00:20:14,160 and be able to start jumping back 453 00:20:11,880 --> 00:20:15,840 through tickets and understanding what's 454 00:20:14,160 --> 00:20:18,120 going on or looking at the history and 455 00:20:15,840 --> 00:20:21,480 jumping back through those tickets so 456 00:20:18,120 --> 00:20:25,620 this also enforces that 457 00:20:21,480 --> 00:20:29,220 and the idea of this stuff is that you 458 00:20:25,620 --> 00:20:32,280 get streams of or not streams you get um 459 00:20:29,220 --> 00:20:35,840 flows of events coming from your various 460 00:20:32,280 --> 00:20:38,160 systems you identify things that are 461 00:20:35,840 --> 00:20:40,740 problems that 462 00:20:38,160 --> 00:20:44,160 um if they continue to happen generally 463 00:20:40,740 --> 00:20:46,500 result in outages or other quality 464 00:20:44,160 --> 00:20:49,620 issues so you can put these guardrails 465 00:20:46,500 --> 00:20:51,419 in place and start enforcing it and when 466 00:20:49,620 --> 00:20:54,179 you're building this type of platform 467 00:20:51,419 --> 00:20:56,520 I'm a huge fan of green 468 00:20:54,179 --> 00:20:59,220 um coding so reduce the amount of code 469 00:20:56,520 --> 00:21:00,960 you're writing reuse it as much as 470 00:20:59,220 --> 00:21:03,360 possible those small Lambda functions 471 00:21:00,960 --> 00:21:05,820 are really good and yeah we won't do 472 00:21:03,360 --> 00:21:08,039 recycling we will refactor stuff down 473 00:21:05,820 --> 00:21:10,919 into those smaller bits and break it up 474 00:21:08,039 --> 00:21:12,299 so we can reuse that code because if 475 00:21:10,919 --> 00:21:15,299 you've got 476 00:21:12,299 --> 00:21:17,340 um a whole lot of code that's doing a 477 00:21:15,299 --> 00:21:20,100 bunch of different things you have to 478 00:21:17,340 --> 00:21:21,900 write all the tests for it and then in 479 00:21:20,100 --> 00:21:24,720 three months time you're like oh I need 480 00:21:21,900 --> 00:21:26,520 that bit from this thing over here so 481 00:21:24,720 --> 00:21:28,380 I'll just copy and paste it and over 482 00:21:26,520 --> 00:21:32,480 time you have Drifters you copy and 483 00:21:28,380 --> 00:21:32,480 paste things from all over the place 484 00:21:33,120 --> 00:21:39,299 um and this is It's a real um mindset 485 00:21:36,659 --> 00:21:41,880 thing you you need to be 486 00:21:39,299 --> 00:21:44,940 um thinking about 487 00:21:41,880 --> 00:21:48,360 um how you engage your your developers 488 00:21:44,940 --> 00:21:50,159 and teach them that this is actually to 489 00:21:48,360 --> 00:21:52,500 make their jobs easier at first they're 490 00:21:50,159 --> 00:21:55,080 going to go oh god I've got a reference 491 00:21:52,500 --> 00:21:57,539 tickets every time I create a commit 492 00:21:55,080 --> 00:22:00,299 it's not that big a deal people get used 493 00:21:57,539 --> 00:22:03,360 to it and if you can demonstrate the the 494 00:22:00,299 --> 00:22:06,179 benefits as well it really helps the 495 00:22:03,360 --> 00:22:09,500 same with writing small bits of code a 496 00:22:06,179 --> 00:22:12,600 lot of developers they they like to 497 00:22:09,500 --> 00:22:16,020 write a lot of code write the tests and 498 00:22:12,600 --> 00:22:18,000 all of that whereas this gives you small 499 00:22:16,020 --> 00:22:20,940 bits that various people can actually 500 00:22:18,000 --> 00:22:25,260 reuse which is really good and we'll 501 00:22:20,940 --> 00:22:28,559 talk about how to pick which bits to 502 00:22:25,260 --> 00:22:30,059 um work on first so the Urgent and 503 00:22:28,559 --> 00:22:31,980 important stuff these are the things 504 00:22:30,059 --> 00:22:35,460 that are causing you problems all the 505 00:22:31,980 --> 00:22:38,820 time so so identify what the problem is 506 00:22:35,460 --> 00:22:42,000 what data can help you identify that 507 00:22:38,820 --> 00:22:43,020 that problem is likely to happen and 508 00:22:42,000 --> 00:22:45,299 what 509 00:22:43,020 --> 00:22:47,039 um reactions you need now at first maybe 510 00:22:45,299 --> 00:22:48,539 it's just spamming the dev and their 511 00:22:47,039 --> 00:22:51,000 manager 512 00:22:48,539 --> 00:22:54,240 um or maybe it's people being added to 513 00:22:51,000 --> 00:22:57,960 your GitHub organization that don't have 514 00:22:54,240 --> 00:23:01,020 a email address that is tied to the 515 00:22:57,960 --> 00:23:03,299 organization when they get added you get 516 00:23:01,020 --> 00:23:05,280 an event check it if it's not there kick 517 00:23:03,299 --> 00:23:08,340 him out and email the person who added 518 00:23:05,280 --> 00:23:09,900 them saying we need an email address if 519 00:23:08,340 --> 00:23:11,280 they keep adding them back they're just 520 00:23:09,900 --> 00:23:12,780 going to get kicked out time and time 521 00:23:11,280 --> 00:23:14,520 again that's the great thing about 522 00:23:12,780 --> 00:23:17,280 automation you don't have to worry about 523 00:23:14,520 --> 00:23:18,419 it it's just going to keep doing it 524 00:23:17,280 --> 00:23:20,580 um 525 00:23:18,419 --> 00:23:23,820 so that's the stuff you should focus on 526 00:23:20,580 --> 00:23:25,980 first then there's the stuff that is 527 00:23:23,820 --> 00:23:29,340 urgent but not important for the team 528 00:23:25,980 --> 00:23:31,620 building the proactive Ops platform but 529 00:23:29,340 --> 00:23:34,740 build it as a platform so another team 530 00:23:31,620 --> 00:23:36,539 can come along and go this thing's a 531 00:23:34,740 --> 00:23:39,840 real problem for us but only affects 532 00:23:36,539 --> 00:23:43,559 five developers so they will actually go 533 00:23:39,840 --> 00:23:46,860 create that little automation themselves 534 00:23:43,559 --> 00:23:49,940 and be able to go I can get this event 535 00:23:46,860 --> 00:23:52,380 from here and I can 536 00:23:49,940 --> 00:23:56,520 glue these through functions together 537 00:23:52,380 --> 00:23:59,340 and I've got a reaction I need 538 00:23:56,520 --> 00:24:01,860 um then that frees up the team to work 539 00:23:59,340 --> 00:24:04,260 on other stuff that is important but not 540 00:24:01,860 --> 00:24:07,320 urgent so you can do that later and my 541 00:24:04,260 --> 00:24:10,440 favorite one on the Eisenhower Matrix is 542 00:24:07,320 --> 00:24:13,080 eliminate when you've got tickets coming 543 00:24:10,440 --> 00:24:15,900 in for stuff that's not important and 544 00:24:13,080 --> 00:24:17,940 not urgent just close the ticket don't 545 00:24:15,900 --> 00:24:19,200 give people false hope how many people 546 00:24:17,940 --> 00:24:21,240 have gone through and found 547 00:24:19,200 --> 00:24:23,159 three-year-old tickets that no one's 548 00:24:21,240 --> 00:24:26,460 looked at and that's the occasional 549 00:24:23,159 --> 00:24:29,039 comment from a user going any update 550 00:24:26,460 --> 00:24:32,460 just close the ticket and then they can 551 00:24:29,039 --> 00:24:33,780 stop worrying about this well so yeah be 552 00:24:32,460 --> 00:24:38,220 conscious of 553 00:24:33,780 --> 00:24:41,360 um you know not giving people false hope 554 00:24:38,220 --> 00:24:43,740 now if you're new to working on 555 00:24:41,360 --> 00:24:46,980 serverless step functions Lambda 556 00:24:43,740 --> 00:24:48,260 functions all of that stuff AWS has a 557 00:24:46,980 --> 00:24:51,539 great resource 558 00:24:48,260 --> 00:24:53,539 serverlessland.com there's lots of 559 00:24:51,539 --> 00:24:56,340 patterns there there's documentation 560 00:24:53,539 --> 00:24:58,799 loads of good resources 561 00:24:56,340 --> 00:25:01,500 um I'll make sure everyone's who's 562 00:24:58,799 --> 00:25:03,299 grabbing the QR code has got it before 563 00:25:01,500 --> 00:25:08,840 we 564 00:25:03,299 --> 00:25:08,840 um jump to the the next slide 565 00:25:09,000 --> 00:25:15,539 cool so we've got a couple of minutes 566 00:25:11,640 --> 00:25:18,000 left so I'm happy to do questions 567 00:25:15,539 --> 00:25:20,820 um if you want to get 568 00:25:18,000 --> 00:25:23,340 um notified when picofund gets released 569 00:25:20,820 --> 00:25:25,380 I've broken a couple of tests that I 570 00:25:23,340 --> 00:25:28,260 need to fix I'll be throwing the the 571 00:25:25,380 --> 00:25:30,900 first preview version over the wall 572 00:25:28,260 --> 00:25:33,080 um either tonight or Monday depending on 573 00:25:30,900 --> 00:25:36,539 what I get up to tonight 574 00:25:33,080 --> 00:25:38,880 and I've got a newsletter which I will 575 00:25:36,539 --> 00:25:41,820 be putting out stuff over the next 576 00:25:38,880 --> 00:25:44,880 little while around these topics so feel 577 00:25:41,820 --> 00:25:48,120 free to um grab the QR code subscribe 578 00:25:44,880 --> 00:25:50,159 and um yeah hopefully you'll get some 579 00:25:48,120 --> 00:25:53,000 useful information 580 00:25:50,159 --> 00:25:53,000 any questions 581 00:25:55,760 --> 00:25:59,120 not at all 582 00:26:01,380 --> 00:26:05,820 why are they never close 583 00:26:03,900 --> 00:26:06,720 uh thanks yeah it's um really 584 00:26:05,820 --> 00:26:08,340 interesting 585 00:26:06,720 --> 00:26:10,740 um so we sort of have a similar thing 586 00:26:08,340 --> 00:26:13,799 with roll bar and slack um where we just 587 00:26:10,740 --> 00:26:16,320 sort of dump things in but um yeah it 588 00:26:13,799 --> 00:26:17,580 starts to get to the point where you've 589 00:26:16,320 --> 00:26:20,700 got 590 00:26:17,580 --> 00:26:23,400 um a mix between things that are Urgent 591 00:26:20,700 --> 00:26:25,860 and non-urgent and otherwise you end up 592 00:26:23,400 --> 00:26:27,900 with too many cues to monitor or you end 593 00:26:25,860 --> 00:26:29,400 up with everything being you know you 594 00:26:27,900 --> 00:26:31,200 don't know if it's important or not so 595 00:26:29,400 --> 00:26:33,299 you end up and then you're basically 596 00:26:31,200 --> 00:26:36,179 ignoring it so what's your tips to to 597 00:26:33,299 --> 00:26:37,740 separate the have to know from the nice 598 00:26:36,179 --> 00:26:39,900 to know or can ignore 599 00:26:37,740 --> 00:26:41,400 yeah so 600 00:26:39,900 --> 00:26:44,100 um first off 601 00:26:41,400 --> 00:26:46,320 um having notifications going into slack 602 00:26:44,100 --> 00:26:49,140 it's a great place to put stuff going 603 00:26:46,320 --> 00:26:51,480 we've got monitoring in place but no one 604 00:26:49,140 --> 00:26:53,580 ever really looks at it because there's 605 00:26:51,480 --> 00:26:57,059 just stuff constantly 606 00:26:53,580 --> 00:26:59,520 um showing up there so I I find slack is 607 00:26:57,059 --> 00:27:01,980 good for how your thing happened but 608 00:26:59,520 --> 00:27:03,840 it's not really important if there is 609 00:27:01,980 --> 00:27:06,179 action that needs to be taken by a 610 00:27:03,840 --> 00:27:09,240 person get your automation creating 611 00:27:06,179 --> 00:27:11,760 tickets and have watching in there so 612 00:27:09,240 --> 00:27:13,980 you can detect if it's a duplicate or 613 00:27:11,760 --> 00:27:16,500 not so you're not filling up jira with 614 00:27:13,980 --> 00:27:18,059 like you know 500 tickets when you have 615 00:27:16,500 --> 00:27:20,039 an outage 616 00:27:18,059 --> 00:27:21,900 um but that that way you've got 617 00:27:20,039 --> 00:27:24,419 something that's actionable and you can 618 00:27:21,900 --> 00:27:26,940 go through review it and prioritize it 619 00:27:24,419 --> 00:27:28,980 accordingly whereas since like I've seen 620 00:27:26,940 --> 00:27:31,200 so many times 621 00:27:28,980 --> 00:27:33,120 you'll have a discussion about a 622 00:27:31,200 --> 00:27:35,400 particular alert and it's like oh yeah 623 00:27:33,120 --> 00:27:38,039 we should do that we should do this oh I 624 00:27:35,400 --> 00:27:39,840 think it's this I'm too busy and then 625 00:27:38,039 --> 00:27:42,600 three months later 626 00:27:39,840 --> 00:27:45,539 you see the same type of message I think 627 00:27:42,600 --> 00:27:47,640 this happened once before and 628 00:27:45,539 --> 00:27:50,340 slack search isn't the best in the world 629 00:27:47,640 --> 00:27:53,460 it's still better than team search but 630 00:27:50,340 --> 00:27:56,700 um you know chat apps is where 631 00:27:53,460 --> 00:27:58,200 notifications go to die so get it into a 632 00:27:56,700 --> 00:28:00,600 ticketing system 633 00:27:58,200 --> 00:28:03,480 that's why I didn't mention like you 634 00:28:00,600 --> 00:28:05,159 know sending stuff to to slack 635 00:28:03,480 --> 00:28:07,940 um during my talk 636 00:28:05,159 --> 00:28:07,940 does that help 637 00:28:11,460 --> 00:28:14,000 yep 638 00:28:15,860 --> 00:28:20,299 yeah 639 00:28:17,760 --> 00:28:20,299 yep 640 00:28:20,760 --> 00:28:24,080 any other questions at all 641 00:28:24,720 --> 00:28:29,640 right thanks everyone for coming thank 642 00:28:27,659 --> 00:28:31,260 you you might enjoy a story from the 643 00:28:29,640 --> 00:28:34,500 organizer's office 644 00:28:31,260 --> 00:28:36,360 your VCR rewind image someone happened 645 00:28:34,500 --> 00:28:37,980 to glance off at the monitor just at 646 00:28:36,360 --> 00:28:41,539 that moment and had a small heart attack 647 00:28:37,980 --> 00:28:41,539 but everything was very broken 648 00:28:41,760 --> 00:28:49,080 thank you for your talk Dave 649 00:28:45,960 --> 00:28:51,860 and please enjoy your mug thank you 650 00:28:49,080 --> 00:28:51,860 thank you