Overview
Join Knox Hutchinson as he demonstrates how to work with S3 for static object storage or data ingestion.
Recommended Experience
- 1 to 2 years Archives
Related Job Functions
- Any
Knox brings a wealth of data analysis and visualization experience to CBT Nuggets. Knox started off as a CBT Nuggets learner, became a mentor in our Learner Community, and is now a trainer. Having benefited from the CBT Nuggets Learning Experience firsthand, Knox creates training that connects with learners.
Introducing S3
Let's get to know more about how to administer the Simple Storage Service (S3)
When to Use S3
Let's explore what the use cases for S3 are
Knowledge Check
Which of the following is NOT an example of when to use S3?
- AReplicating a Windows Fileshare
- BStoring images for your website
- CCloud ingestion point for ETL pipeline
- DMass-scale storage aka data lake
Verify your team's readiness — Request a Demo to verify practice assessments, completion reporting, and CSV / SCORM exports on the Team plan.
Create a Bucket
Now lets understand our options for creating S3 buckets!
Knowledge Check
True or False: With S3, you can use Server Side Encryption (SSE) but NOT Client Side Encryption
Verify your team's readiness — Request a Demo to verify practice assessments, completion reporting, and CSV / SCORM exports on the Team plan.
Bucket Policies
Lets understand how permissions to buckets and objects are set
Knowledge Check
Which of the following is a valid option for configuring a policy around?
- AProtocol (HTTP vs HTTPS)
- BIP Address
- CIAM User
- DAll of the above
Verify your team's readiness — Request a Demo to verify practice assessments, completion reporting, and CSV / SCORM exports on the Team plan.
Bucket Versioning
Now let's see how to keep track of our file history with versioning
Knowledge Check
What does enabling bucket versioning do?
- APreserves every version of an uploaded file
- BPreserves every bucket configuration as a version
- CIntegrates with CodeCommit source control
- DCreates a copy of a file in a different bucket
Verify your team's readiness — Request a Demo to verify practice assessments, completion reporting, and CSV / SCORM exports on the Team plan.
Lifecycle Policies
Let's explore how to control file rotation as files age
Knowledge Check
Which access tier is best for low latency, fast BUT infrequent reads?
- AStandard-IA
- BStandard
- CGlacier
- DDeep Glacier
Verify your team's readiness — Request a Demo to verify practice assessments, completion reporting, and CSV / SCORM exports on the Team plan.
Cross-Region Replication
Now lets explore how to replicate an S3 bucket to a different region!
Knowledge Check
What resource needs access to the KMS keys used to decrypt source files?
- AThe replication job's role
- BThe source bucket owner
- CThe destination bucket owner
- DThe object owner
Verify your team's readiness — Request a Demo to verify practice assessments, completion reporting, and CSV / SCORM exports on the Team plan.
Conclusion
I hope this has been informative for you and I would like to thank you for consuming.
View Transcript
Introducing S3
0:00[AUDIO LOGO]
0:06Welcome to the content on simple storage service or S3
0:10for short.
0:10I don't have the metrics in front of me.
0:13But if I were a gambling person, I would place a bet right now,
0:17the S3 is the most popular resource in all of AWS.
0:22If I'm wrong, then it's probably EC2.
0:25But the reason I say that is that S3 power so much content
0:31distribution and storage around the world.
0:34It's amazing.
0:35But it has so many other use cases too for storing objects.
0:39Now what are objects?
0:40Why is S3 so powerful?
0:43When should we use it?
0:44And how do we start actually administering it, securing it,
0:48making it resilient, and redundant?
0:51That's what this set of content is all about.
0:54So in this set of videos, we are going to dig into S3,
0:57perform some really cool tasks like encryption,
1:00like understanding policies to allow access to S3,
1:04and then even talk about things like replication.
1:07So without further ado, let's talk about the simple storage
1:10service.
1:10Let's go.
When to Use S3
0:00[AUDIO LOGO]
0:06We've briefly talked about this already
0:08in this set of videos, when we were talking about storage
0:11options, when to use S3.
0:14S3 is all about object storage, at the end of the day.
0:17So really, what it comes down to is, what is an object?
0:21And what is S3 capable of versus what it's not capable of?
0:25The biggest comparison to this is
0:27going to be between S3 and EFS.
0:31So without further ado, let's talk about what
0:33S3 brings to the table.
0:34Let's go.
0:36So if you're going to understand S3, what you really going
0:39to understand are objects.
0:42S3, or Simple Storage Service, is all about storing objects.
0:47Now, objects at the simplest form are files.
0:51But I want you to think about how
0:53you've been interacting with files traditionally
0:55up until this point.
0:56Whether you're working with Linux-based machines,
0:59macOS, Windows, whatever, you have a hard drive, maybe,
1:04or solid-state drive, or whatever it is.
1:06And on that hard drive, you install an operating system.
1:09And then the operating system manages a file system.
1:15In Windows, we know this, pretty much, as NTFS.
1:19And NTFS is really where permissions to access folders
1:24and files are managed.
1:25So you may have a root folder, which contains many subfolders.
1:31These are going to be the world's worst folders.
1:33They look like boots or like the state of Louisiana.
1:36And inside of those, you have all of your different files.
1:40The way these file systems, and really, the way the EFS
1:46service, is designed to work are that the operating system
1:50and the file system protocol allow you to put permissions
1:54on each one of these objects.
1:56And they're inherited as you go down the tree.
2:01We can put things like ACLs, which
2:03are going to be allow statements or deny statements.
2:06We can apply permissions to groups, and so on.
2:09And ultimately, that's what determines access
2:12to these files.
2:13And really, the metadata of all of this, of all of this folder
2:18structure and everything, is baked in to each one
2:21of the objects that you create.
2:22So when you create a file here, that file
2:25is going to have all of the metadata about its NTFS
2:28permissions or whatever file system you're using.
2:32All of those permissions are baked into it.
2:34That's not what S3 is designed to mimic or emulate.
2:39That's what EFS is designed to do.
2:41EFS created a file share that you
2:44can mount onto something like an EC2 endpoint
2:48and then use the EC2's operating system
2:51to create a file system on EFS.
2:54That's not S3.
2:56S3 is just a bucket.
3:00And we put files that we call objects inside of it.
3:05One of the primary use cases for this
3:08are to hold and store contents, in particular static content.
3:13And we've seen this in action already.
3:16We've gone through several sets of videos
3:18where I created a basic React plus Node application.
3:23And instead of having the EC2 instance or the Beanstalk
3:27instance store the content, we put them on an S3 bucket.
3:31And whenever people needed to download
3:33the images or any static content, the website
3:37itself would redirect the browser to download it from S3.
3:41We love this because offloading that workload onto S3,
3:47first of all, saves us a little bit of compute and optimizes
3:51the effort needed by the server hosting my React application.
3:56I don't need to have my server hosting static content anymore,
4:00when the S3 bucket can host it for me.
4:04The S3 bucket, being platform as a service,
4:07is effectively infinitely scalable,
4:10both in terms of storage as well as
4:13in terms of handling the requests that
4:16come in looking for it, or rather,
4:18looking for that content.
4:20To put this another way, the likelihood
4:22that my EC2 instance goes down because of my mismanagement
4:26or whatever is significantly higher than the likelihood
4:30of the S3 bucket going down.
4:32In this particular case, I can clearly
4:35see why I may have static images, or videos,
4:40or any other assets.
4:42And the file system permissions themselves are irrelevant.
4:46I'm expecting anyone from around the world
4:49to be accessing this S3 bucket and downloading this content.
4:53And this really is one of the primary use
4:56cases for S3 buckets.
4:58But it's not the only use case for S3 buckets.
5:01What are some other use cases for S3 buckets?
5:03Well, we also talked a lot about this--
5:05ETL.
5:07S3, because it's so scalable and we
5:09do have the ability to manage permissions
5:12on things like the bucket itself,
5:14makes it a perfect target for extract, transform, and load.
5:19This will be my S3 bucket right here.
5:21And what you should think about this
5:23is I'm going to have applications running all
5:26around the world, doing all sorts of cool things.
5:30And each one of these applications
5:32might generate and store their own data according to this.
5:38Maybe I have an application that's for new user signups.
5:42I'll put signups right here.
5:45Maybe I have an application that's for orders.
5:49Maybe I have an application that's
5:52for applications, for applications
5:55to talk to other applications.
5:57The point is all of these different apps
5:59are generating data.
6:01And this data is valuable to me, especially
6:04as someone who wants to try to analyze
6:06what's happening across all of my apps
6:09and all of my ecosystems.
6:10So using other types of ETL tools,
6:14their entire purpose is to extract the data, E, out
6:18of these databases, transform them
6:21into a standard format on the fly-- that's the T--
6:25and then load them into a destination, like an S3 bucket.
6:29Because S3, again, is scalable, it
6:33can handle this ingestion point.
6:36It can handle a huge volume of data.
6:39And it can handle a huge velocity of data.
6:44And it can handle a huge variety of data.
6:48This is what we call the three Vs of big data.
6:54And this is what makes S3 a great example of a data lake.
7:00As data gets pulled out of all of these different databases,
7:03throughout all day long, they're probably
7:06going to get converted into some type of file format, like a CSV
7:10or like a Parquet file.
7:13It'll then get dropped onto this S3 bucket-- a.k.a.
7:17a data lake-- so that it's now ingested
7:20into the AWS ecosystem.
7:23These databases could be running in AWS already.
7:27But they could also be running on premises in a data center,
7:30colocated, in my HQ, in Azure, or in Google.
7:35Once these files live in S3, I can do a whole bunch of things.
7:39I can use my analytics tools to query S3 files directly.
7:44Or I might want to move these files further
7:47into the ecosystem, into something
7:49like a data warehouse, like the AWS Redshift resource.
7:55The Redshift resource is a data warehouse,
7:58which is a relational database, like a SQL database, that's
8:02been optimized for analytics or read-heavy workloads
8:06and storing a humongous amount of data.
8:09Now, you might be thinking to yourself right now,
8:11now, wait a minute, Knox.
8:12In the previous example, we were talking about content.
8:15Somebody goes to download my website.
8:18And they're redirected to an S3 bucket,
8:21meaning that anybody can access that S3 bucket.
8:23And that sounds fine for static content.
8:26But when it's my critical business data,
8:29how then do I want to stop people
8:32from downloading that data, especially if there's no
8:35file system permissions?
8:36Well, that's a great talking point.
8:38Right here, when we create this S3 bucket right here,
8:43we still have the ability to manage permissions
8:46to the bucket itself or specific objects within the bucket.
8:50The difference is those permissions
8:52are not set on the files or the objects themselves.
8:56They're set by AWS.
8:59And as requests come in to the S3 API that is managed by AWS,
9:06it evaluates, based on IAM policies,
9:09if the requester has permissions to access this S3 bucket
9:13or if they have permissions to access specific objects
9:17in the S3 bucket.
9:18That's right.
9:19We use IAM policies and roles to manage access
9:25to the specific bucket or the objects in the bucket.
9:28And instead of allowing a file system
9:31to evaluate if the permission is going to go through,
9:34we're allowing the S3 API as the request
9:38comes inbound in the first place to manage this.
9:41Now, I don't know what your feelings
9:43are around the IAM policies.
9:45S3 buckets being very, very powerful and very,
9:48very important, now you're being forced
9:51into understanding these IAM policies and specifically
9:55JavaScript, JSON a little bit better.
9:58So that's a big thing need to wrap your head around.
10:00And if you're spying ahead, you can
10:02see we're going to talk about bucket policy
10:04examples in an upcoming video.
10:06But consider some other use cases for S3 buckets.
10:10One of the best use cases that I particularly love to see
10:13and use S3 for-- let's change the color here--
10:16are logs, log shipping.
10:19That's right.
10:20Think about it.
10:20Well, let me go ahead and rewrite that.
10:22Logs.
10:23I have apps all over the place.
10:25And I may be generating system logs,
10:29OS logs, or even application logs.
10:32And a perfect use case for this are to dump those logs into S3.
10:37We can actually use things like the Kinesis Firehose, which
10:42is a file-streaming or data-streaming service,
10:46to stream our logs in real time into AWS,
10:49who will then write a copy of those logs to an S3 bucket.
10:53You could almost think about this as a live streaming
10:56queue for ingesting real-time data feeds into AWS
11:01and then directing those to various different places
11:04once it's in the AWS ecosystem-- again, like S3 buckets.
11:09In fact, I'm doing similar things with this Beanstalk S3
11:14bucket right here or even the mag store UI.
11:17You see these job tasks?
11:19It creates a folder-like structure right here.
11:23And then we actually see what could actually
11:25be created right here in the actual file themselves.
11:29Now, one of the final things that I'm
11:31going to point out about S3 is one
11:34of the more confusing points when you first learn about it.
11:37See, I'm obviously here in the S3 section right now.
11:41But what region am I operating in?
11:43Global.
11:44But then what the heck is this right here, AWS Region?
11:47How could I be global but also have a region?
11:49Well, ultimately, when I want to write a file
11:52to a disk in Amazon, I have to tell it what region
11:57those disks need to live in.
11:58That's what this is all about.
12:00But every S3 bucket, every single one
12:03that has ever been created, has a globally
12:06unique fully qualified domain name that comes with it.
12:11So when you create an S3 bucket, that S3 bucket
12:15has to have a globally unique name.
12:18That way, traffic from around the world can be routed
12:21to S3:// or https://.
12:28So it shows up as global because it's
12:32going to be a globally resolvable resource.
12:35The DNS of your S3 bucket is, in fact, global.
12:39But there is a region that you choose where the buckets live.
12:43And the big thing to understand about that
12:45is regarding around your cost.
12:49While, of course, the amount of data or the volume of data
12:53that you store is going to incur a cost,
12:56you can choose what disk subsystems your data lives on.
13:01So you might be able to put your data
13:04on very, very fast hot disks.
13:07Or if it's older data, you could put it
13:10on colder disks that cost less money.
13:13Or if it's the oldest data, and you really only want
13:16to keep it for archive purposes, you
13:19can migrate your data all the way back there.
13:21And we're talking fractions of a penny per gigabyte
13:25at that point.
13:25We're going to talk more about the risks that come along
13:28with that in a later video.
13:31But it's not just the storage that racks up a cost with S3.
13:36Importantly, when people access the objects within your storage
13:41and download them, you are going to incur egress costs.
13:46As the data is sent out of the data center,
13:49that's where you rack up a bill.
13:52So if you're starting to upload really large 4K videos
13:57that you want to serve publicly around the world, when they
14:01start downloading those 4K videos,
14:03guess what, that's where you're going to incur
14:06your real bulk of your cost.
14:08So at the end of the day, use cases for this
14:11are going to be static content--
14:15big use case-- or ETL for data ingestion and data lake.
14:24I'm going to put data lake over here.
14:27Or application automations.
14:33If we're doing things like writing log files
14:36or temporarily generating data that other applications need
14:40to pick up and read, this is going
14:42to be a perfect use case for this.
14:44I'm also going to put in here CI/CD pipelines, specifically
14:49uploading artifacts.
14:52Now, if you're not familiar with the DevOps
14:54world of CI/CD pipelines and artifacts,
14:57the idea here is all about I've created an application.
15:01I've made some changes to that application.
15:04And now I need to automate the process
15:06of testing that application and deploying it
15:10into various environments, like a development
15:12environment, a test environment, or even the production
15:15environment.
15:16That's what CI/CD does.
15:18And lots of times when we're creating an application,
15:21we have to compile it.
15:23Or we have to upload supporting dependencies
15:27to make our application run.
15:29And those are known as artifacts.
15:31A perfect place for doing local development
15:33and then getting my artifacts into S3
15:36so that a CI/CD pipeline can pick up those artifacts
15:39and ship them, S3.
15:42That's the go-to use case here.
15:44See, we don't need file system permissions to pull that off,
15:47do we?
15:48We just need a place to store the files.
15:50And this is what S3 brings to the table.
15:54Now, S3, even as it says Simple Storage Service, it's simple.
15:58It can be as simple as you want it to be.
16:00Or it can get pretty complicated.
16:02And we're going to dig into the more complicated aspects
16:05as we progress through this set of videos.
16:07For now, this has been understanding when to use S3.
16:10I hope this has been informative for you,
16:11and I'd like to thank you for viewing.
Create a Bucket
0:00[AUDIO LOGO]
0:06Now we begin by actually diving into it.
0:08In this set of videos, we're going
0:10to walk through the process of creating a bucket.
0:12Importantly, we're going to explore in detail
0:15some of these different options and how
0:17we might want to configure our bucket at the time of creation.
0:21This is actually going to require a lot more elaboration
0:25as we go along.
0:26So in the next set or in the next few videos,
0:28you'll learn more about things like bucket policies.
0:31So without further ado, let's create our first bucket.
0:34Let's go.
0:35So let's dive into it, right?
0:36OK.
0:37What we're going to do first is we're
0:38going to walk through the process of creating
0:40an S3 bucket, but exploring the different possibilities as we
0:44go through the console of creating the bucket.
0:46Right here I'm in the S3 section.
0:48Notice the region does say global because right out
0:52of the gate when you create the bucket, one
0:54of the very first questions they ask
0:56you is, where do you want this data to live
0:59when it gets written to disk?
1:01That's an important thing to get under your head.
1:03Now I'm intentionally going to name this mayc-east because I'm
1:10going to put this data in the East US right
1:13now, US East 2 data center.
1:14The ultimate goal that I have is I'm
1:17going to create replication from this bucket
1:20to a different bucket in a different region
1:22later, something we're going to talk
1:24about at the end of this set of videos.
1:26So just bear with me as we chug along through this.
1:29So do note right here, this is where
1:31you get to choose what region you want to work with.
1:34Now, notice a lot of these are actually grayed out
1:36for me because some of them are not--
1:38what's the right word there?
1:39Not experimental, but you do have
1:41to actually apply to enable some of these regions for you
1:45to actually write it.
1:47So having showed that, we'll leave this right here.
1:49Now underneath this come all of the different settings
1:52for configuring your bucket.
1:54If you have a dollar that you want to copy,
1:56you can just choose the bucket that you
1:57want to copy from the dropdown right here,
2:00and it will mimic the exact types of settings.
2:02What kinds of settings would we be looking at anyways?
2:04Well, it starts out with object ownership.
2:07What are we really talking about here?
2:10Remember, objects are the files that
2:13get written inside of a bucket.
2:16By default, the setting is to say
2:20that the person who uploads the file
2:22is not the owner of the object.
2:26Instead, by default, with this ACLs disabled setting,
2:30we're saying that the creator of the bucket
2:33owns all of the contents of the bucket.
2:36This would make sense where I Knox create a bucket.
2:40And then I have a fleet of applications that are writing
2:44data into the bucket itself.
2:47I want to be the one who has control
2:49over what happens to the data that
2:52gets written into the bucket.
2:53I don't want the application to be
2:55the one responsible for managing each of the file
2:59permissions that come inbound.
3:01This works particularly well when I only have
3:04one account in the mix here.
3:07I have my AWS account and my applications
3:10are all part of that same AWS account.
3:13And they're also writing data to it.
3:15However, if I have multiple users spread out
3:18across multiple accounts and we want
3:21to give them autonomy over what happens
3:24for each one of the objects that they upload,
3:27we can use access control lists.
3:30You can kind of think about access control
3:33lists as an off-the-shelf canned permission
3:37that you can specify.
3:39Now as it says right here, objects in this bucket
3:42can be owned by other AWS accounts.
3:47Access to the bucket and its objects
3:50can be specified using those ACLs.
3:53This is different from over here where ACLs are disabled.
3:56This goes back again to me the owner and creator
4:00of this bucket will be specifying
4:03what is allowed to happen in this bucket using
4:06only policies.
4:09And this is where it starts to get really confusing.
4:12In this case, I'm going to be writing JavaScript Object
4:16Notation policies to define what can go on inside the bucket,
4:20as well as for a group of objects
4:24or even specific objects--
4:26a specific object itself.
4:28I'm going to be writing JavaScript
4:30to define each and every permission
4:32to go on for each of the files that
4:34can come in, or importantly, what AWS users or roles are
4:43allowed to have object or access to the bucket, the objects
4:46or specific objects.
4:48You should think about this policy
4:50as sitting in front of your S3 bucket.
4:53So here is my S3 bucket right here.
4:56With ACLs disabled as a request comes in,
5:00it's going to get screened by my policy
5:03first before it's allowed to progress on
5:06to perform the exact action against the exact object
5:09that we specify in the policy.
5:12Where it gets even more confusing
5:14is when we start to throw public access into the mix,
5:17scroll up just to here, right here under object ownership,
5:22I've said the ACLs are disabled.
5:24That makes you think, OK, well, I'm not using ACLs.
5:27Well, when we start to throw in public access,
5:30we can actually define ACLs in our policy.
5:36Now block all public access is the default setting
5:39because, of course, you want to start off
5:41with the principle of least privilege.
5:44We only want to slowly open up public access whenever
5:48we need to.
5:49So when we uncheck this, we see OK, well, we've
5:52unchecked block all public access.
5:55Now we have to select what pieces we want to block.
5:59So if you're looking at what's going on here,
6:01it's basically new, any, new any.
6:05And we start off with new ACLs or any ACL, new policies
6:12or any policies down here.
6:15That's really what we're trying to block access to.
6:17So what we're saying if we were to check this box off,
6:20block public access to buckets and objects
6:22granted through new ACLs, if somebody uploads an object.
6:27So let's say we check this off, create the bucket.
6:29Then somebody uploads an object.
6:32Then they create an ACL to make that object public access,
6:37this would block that from happening.
6:40Let's say we create it-- let's say instead,
6:43we'll move on to the next one, we check off this box.
6:46And let's say we already had an ACL on this bucket that
6:51allowed any public access.
6:53This is effectively saying we'll ignore it.
6:56So any files that get uploaded after the fact will be
7:00ignoring this particular ACL.
7:02Then we repeat the process again.
7:04But instead of ACLs, we're using policies.
7:07So this is effectively overwriting
7:10any ACLs or policies that might grant
7:14public access to this bucket or objects
7:17on the bucket after the fact.
7:18I'm just going to go ahead and leave this at the default,
7:21block all public access for right now.
7:23Versioning is something we're going
7:25to talk about in an upcoming video that's pretty exciting.
7:28Tags, basically key value pair that
7:31helps you organize your AWS resources.
7:33And then, of course, encryption.
7:35Now I'm intentionally going to use a KMS key for this.
7:40I already have KMS keys created.
7:42We've already talked about encrypting data
7:44using KMS already as well.
7:46So I'm going to go ahead and choose for my AWS KMS keys,
7:50important to remember these are going to be the KMS
7:53keys in this region, US East 2.
7:58I'm not able to select a KMS key that's not in US East 2.
8:02So when I click this dropdown, these are my KMS keys
8:05that I'm looking at.
8:06I'll choose my demo key that lists right here.
8:10Now let's go ahead and talk about this real quick
8:13while we have a chance.
8:14You see this SSE, that's Server Side Encryption.
8:20This is basically saying that we are going
8:24to be uploading a file like so.
8:27And as it arrives on the server that hosts the disks,
8:32it's going to be getting my KMS keys, and then encrypting it,
8:36gobbling it up like this before it gets written to the disk.
8:40The S3 resource here is going to be managing access to the KMS
8:45key, encrypting the data as it's written to the disk,
8:48and decrypting it as it's written from the disk.
8:51This is all happening on the server side.
8:56There is a potential for there to be client side encryption.
9:02And in this case, our calling application
9:04would be doing things like retrieving a KMS key,
9:07encrypting the data before it gets uploaded.
9:11So I'll put Enc right here.
9:13I'll just go ahead and write encrypted.
9:15And then uploading that into the S3 resource.
9:20This again, is typically handled all in the application code
9:24itself, which is why we're not seeing any type of options
9:27here.
9:28The thing to understand is that S3 can absolutely
9:32handle any type of encrypted data,
9:34whether it's being encrypted on the server side
9:36or it's being encrypted on the client side.
9:39So in this case, I'm choosing to encrypt it
9:41with my own KMS keys.
9:43And I'm choosing to do this for a very specific reason,
9:45to demonstrate how this is going to work whenever we actually
9:48go through replication again.
9:50So with this being said, we can also explore things
9:53like in the advanced settings.
9:55Really the only advanced setting we have is with object locks.
9:59And it's basically for something like an in place legal hold
10:06or some type of immutable storage that you need here.
10:10We see this as write once, read many.
10:14And it helps you prevent objects from being
10:16deleted or overwritten for a fixed amount of time
10:20or indefinitely.
10:22So if we are uploading something that
10:24needs to be legally held and unaltered for a specific amount
10:28of time or indefinitely, this is what the object lock does.
10:32I click Create the Bucket and boom,
10:34now I have my bucket created.
10:38Weirdly I created this bucket and an old bucket that I had
10:41deleted is now showing back up.
10:44But showing is not found.
10:45When I actually click on this bucket,
10:46it shows you that it's not found.
10:48And if I try and delete this one more time,
10:50it shows you is not found.
10:51So I don't know what's really going
10:52on there, why it's trying to show me this old bucket that I
10:54had already deleted.
10:55But nonetheless, my mayc-east bucket is now created.
10:59Now we've got some more work to do with this bucket
11:01as we start to explore things like the policies themselves.
11:06When we scroll-- oh-oh, that's under permissions right here.
11:08Wrong section.
11:09When we talk about the policies, we're
11:11going to pick it up in the next video to talk about that.
11:14I hope this has been informative for you.
11:15And I'd like to thank you for viewing.
Bucket Policies
0:00[AUDIO LOGO]
0:06This is where the simple storage service
0:08becomes a little less simple when it comes down
0:11to object policies.
0:13Whenever we're creating an S3 bucket,
0:15the thing that we keep in mind is the objects
0:18themselves don't have a file system to attach permissions
0:22to.
0:23Instead, we're going to use the AWS platform to manage access
0:28into the bucket and to each one of those objects.
0:31And as we've seen throughout this set of videos so far, when
0:35we're defining permissions or privileges or access
0:39rights in AWS, we're really defining things
0:42like what feels like an IM policy.
0:45Put this another way when we're defining access
0:48to the S3 bucket itself or the objects within them,
0:52we're defining them with JavaScript.
0:54So without further ado, let's explore
0:56some of the many different examples
0:58the AWS gives us for the types of things
1:01that we can manage with a bucket policy.
1:03Let's go.
1:05So you might want to spy on this access section
1:08right now because I see three different things right
1:11off the bat.
1:12And this really all comes down to whether or not
1:14I check that block public access button whenever
1:19I was creating the bucket in the previous video.
1:23First of all, I see public right off the bat.
1:26In this case, I uncheck that box so that block public access
1:30wasn't created, then I created a policy
1:34to actually allow this to be a truly public bucket.
1:38Then or we could say objects could be public
1:43if I create a policy to allow it,
1:46or bucket and objects are not public.
1:50So if we're looking at it, this whole bucket is public.
1:54If we're looking at this one, maybe an individual object
1:58or a group of objects inside this
2:00could be public, but I have to create a policy to allow it
2:03or nothing in this public--
2:05in this bucket is going to be public.
2:07Now let's start to dig into each one of these
2:10to explore what we've done here.
2:11In this particular bucket, I'm going to go into the bucket,
2:14then go into the permissions and immediately
2:17block all public access bucket settings.
2:19You can see right here it's off.
2:23Scroll down just to here, and the individual checkboxes here
2:26are also missing at this point.
2:28So then I start to explore the bucket policy that allows this
2:32to be a truly public bucket.
2:34If I'm looking at here, the principle is anyone.
2:38The action is get the object or get the specific version
2:43of the object, and the resource that this is allowable
2:46is the bucket name slash, and then a wild card,
2:50which effectively means all objects in this bucket.
2:54So we are allowing anyone to get the contents of this bucket.
3:00That's effectively what this policy does.
3:03And if you want to create a public bucket,
3:06this is how you do it.
3:08This is, again, with ACLs disabled right here.
3:12So what we're seeing right here is
3:15with the bucket owner is going to be the one who is enforcing
3:18the policies above.
3:20These ACLs down here-- they're a list of ACLs,
3:23but they're effectively irrelevant.
3:25We've just granted the bucket owner
3:27full permissions to control what goes on in this bucket.
3:30So let's look at some other buckets
3:31right here, like my Elastic Beanstalk bucket.
3:34This was created by the Elastic Beanstalk CloudFormation
3:38service whenever I deploy Elastic Beanstalk.
3:40And this is really for logs of what's going on
3:43within Elastic Beanstalk.
3:44And understand and also uploading
3:47the archived artifacts to ship my website
3:52into Elastic Beanstalk.
3:54So as I deploy new website code zipped up,
3:58this is where it's going to stage these files before it
4:01could ship to EC2.
4:02If we explore these permissions right here, block
4:06all public access is turned off, but we
4:08control specific operations for specific resources in this S3
4:15bucket.
4:16In this case, you can see things like managing the actual S3
4:21Beanstalk logs.
4:23So, in this case, we're allowing this specific principal,
4:27in this case, an EC2 role to create
4:30new objects in this specific path right there of this S3
4:35bucket.
4:36See, pretty interesting.
4:37Same thing with the EC2 roles down here.
4:40We're allowing this EC2 role to perform
4:43these actions against this specific bucket
4:46and these specific objects inside of the bucket.
4:51So this should be highlighting immediately
4:53how you control permissions into the bucket
4:56and into specific objects of the bucket
4:59and when we might want to use this.
5:01In this case, we're giving permissions
5:03to EC2 instances who've been granted this role the privilege
5:08to look at the contents of these specific paths.
5:11That's how the two roles were able to access--
5:15the workers on the EC2 instances were
5:17able to access this S3 bucket and pull down the new content
5:21that they needed to deploy into the actual website or the web
5:25hosting service itself.
5:28Pretty cool.
5:30And then lastly, the bucket that I just
5:31created right here where nothing is allowed to be public.
5:34That's because we've turned on block all public access,
5:37and currently, there are no policies.
5:39Only I, as the bucket owner, has full permissions
5:43to this particular bucket.
5:45Nobody else has permissions to this bucket.
5:47No other resource has permissions to this bucket.
5:50Now bucket policies can get quite complex,
5:53and this is why they've given us bucket policy examples
5:57to start building off of.
5:59Look at all of these right here.
6:00We can create policies that require client-side encryption.
6:05We can create policies using canned ACLs.
6:08We can create policies that allow it--
6:11allow access based on tags or specifying
6:15what tags are allowed to be created.
6:17We can create policies to whitelist specific IP
6:21addresses.
6:22We can create policies whether the protocol being used
6:26was unencrypted or encrypted.
6:29We already kind of saw access to specific folders
6:32whenever we were looking at that EC2 instance trying to access
6:35things like environment logs.
6:38And we can even require multifactor authentication
6:42for specific operations that take place
6:45on this S3 bucket, which I think is pretty cool too.
6:48We'll actually take a look at that later.
6:51So I highly encourage you to find this document
6:54and take a look at how this is going to work.
6:58Understanding the logic of these policies can also be tricky.
7:01Seeing things like Require server side
7:04encryption for an object to be written is going to be a must.
7:08So if I'm exploring the logic here,
7:10I'm saying deny the put object operation
7:15on the specific bucket when the condition is
7:19null for this server-side encryption KMS.
7:24So that's kind of an interesting thing.
7:27We're basically saying the condition here
7:29is when this is the key that's being used for understanding
7:33server side encryption.
7:35If that is set to null under any instance than
7:39deny the put object operation.
7:41Do you see how it's kind of a weird logic
7:44to understand how specifying this key being set
7:47to null and a higher up tier.
7:49I don't know, it's just one of those things
7:51you have to wrap your head around when
7:53you start to see it.
7:54It's kind of a similar logic right here
7:56for denying a request if they're saying--
7:59if they're not using the correct server side encryption KMS key.
8:04We're again denying the creation in this bucket
8:07if this key right here is not equal right there.
8:13So if the request comes in and the ARN is not
8:17equal to this exact key, then deny the objects.
8:21Again, it's kind of a weird ordering of the logic there,
8:24but it's one that you really want to make
8:26sure you wrap your head around.
8:28Here's where we can actually create policies
8:31using the canned ACLs.
8:33So when we scroll down here.
8:35We can see how I'm granting access
8:37to two different accounts.
8:39Multiple accounts can access an S3 object to create objects
8:43while also giving them public read.
8:46This ACL right here is the named canned ACL
8:50that allows public read.
8:51So in this case, we're allowing these two accounts,
8:55the write operations, while also setting
8:58this to be a public read ACL--
9:00kind of crazy.
9:02A similar policy right here that grants
9:05cross account permissions to create objects
9:08while ensuring that the bucket owner has full control.
9:12Very similar construct right here
9:15where we have we're specifying the principles that are allowed
9:18to create an object, but we're also adding a conditional ACL
9:22and the name of the ACL is bucket owner full control.
9:25It's again a canned ACL.
9:28Here's where we can allow a user to have
9:30read only access to a specific object that has been tagged
9:35with a specific key and value.
9:37So here we specify the user based on the principle.
9:40They're allowed to get object or historical versions of it
9:43in the specific bucket if the S3 existing object
9:50tag is a key of environment with a value of production.
9:55See again, it's kind of this weird syntax for managing tags
9:59and conditions.
10:00But once more, it's really important to see these things
10:04and just wrap your head around how this is going
10:06to work whenever it comes.
10:08And this is why I really want you to take the time
10:11to explore these bucket policy examples
10:14and try to understand how these conditional statements work
10:17because they're really, really important for working
10:20with S3 buckets.
10:21So now we've explored how policies work on S3 buckets
10:25and how they enforce permissions for authorization
10:30into the S3 buckets and the individual objects themselves.
10:34I hope this has been informative for you,
10:36and I'd like to thank you for viewing.
Bucket Versioning
0:00[MUSIC PLAYING]
0:06One of the critical things that you, as an administrator,
0:08need to understand about S3, are how to track changes
0:13within our files.
0:14What happens if a user overwrites
0:17a file with malicious data or corrupted data?
0:20What happens if they accidentally delete the file?
0:23This is really where you start thinking about backups
0:26at the end of the day.
0:27And S3 implements something called versioning.
0:30But versioning sets up an even bigger conversation
0:34surrounding the amount of storage we use,
0:36and how we actually want to store
0:39all of the different versions of the same file.
0:42So first, let's talk about versioning.
0:45And then in the next video, we'll
0:47talk about file lifecycles.
0:48Let's go.
0:50So here's what it wants you to see.
0:51I want you to see versioning in action.
0:54I'm in the bucket that I just created, mayc-east.
0:57Let's go ahead and start by uploading a file into it.
1:00I'll click on Upload File.
1:01I'll add my file.
1:03And then I'll choose something like the CBT Nuggets
1:05artificial intelligence policy that just came out.
1:08I'll choose Upload.
1:09And just like that, it gets uploaded pretty quickly.
1:12Right here, I can close this dialog box.
1:14Because right now, this is just the Upload status page.
1:16And I'll click Close.
1:18There.
1:18Now the policy exists.
1:20Pretty cool, right?
1:21I can explore this object right here.
1:24Understanding that when it comes to actually coding this,
1:28this is actually going to be a key.
1:31That's really an important thing to understand.
1:33When we were to actually access this particular file
1:36programmatically, using code, it would come back to us
1:41like a JSON object.
1:42And the key would be the name of the file.
1:46Other metadata associated with that file
1:49would also come back as part of the object that's
1:52contained here.
1:53It's another reason why we actually look at these objects,
1:56even though they're files, we look at them
1:57as JavaScript objects.
1:59I see things, like the object URL.
2:02But knowing that public access is actually
2:04restricted-- when I click on it, what's it going to do?
2:07Access denied.
2:08That's exactly what we expected it to do.
2:10Let's jump on back into the console.
2:11One second, though.
2:13My bucket properties immediately tell me
2:16that this file does not have versioning
2:18enabled, nor is this file being replicated,
2:21which is an interesting thing-- nor is object locking going on.
2:26We are using server side encryption.
2:28And it's currently being stored on the standard access class.
2:32This is all setting up a big conversation
2:35to talk about what a lot of these things,
2:37if not most of these things, actually mean.
2:40So let's see what happens now, when
2:43I jump back into my bucket, and I try and re upload the file.
2:47I'll go to Add files.
2:49I'll choose my CBT Nuggets artificial intelligence policy.
2:52I'll click Open.
2:53And I'll click Upload.
2:54Well, it goes and uploads just fine.
2:57I'll click Close.
2:58And I still see the file here.
3:00When I click on the file, and I click on the versions,
3:03I see-- wait a minute.
3:04There's no version here.
3:06Hmm.
3:07What's that really mean?
3:08That means that I just overwrote the file
3:11that I had originally uploaded.
3:14The old version of the file is gone.
3:17You want to see it done another way?
3:18Watch this.
3:19I'll call this version 1 file right here.
3:22And then I'll immediately go and say, file, and Save As.
3:25Let's go File-- if it's going to let me.
3:27Why can't I click on File?
3:28Hang on one second.
3:29Let's you Control-S, or Command-S.
3:31There we go.
3:31Now the dialog pops up.
3:32I'll put this on my desktop.
3:34And I'll call this File 1, like this.
3:37Click Save.
3:38And there we go.
3:39Version 1 File is now there.
3:40So I go back into my buckets.
3:42And I'll go back here into mayc-east.
3:45I'll upload the bucket--
3:46or I'll upload the file.
3:47I'll go back to my desktop.
3:49And upload it, just like so.
3:51This is, again, to drive home this concept of what's
3:55really happening here.
3:56So with File 1 RTF now uploaded, I'm
4:00going to reopen that file on my desktop.
4:03So let's reopen it here.
4:04Oh.
4:05Came up on the other screen.
4:07I'll now call this version 2 of my file.
4:12Command-S to save it.
4:14And now I've got a new version of my file, right?
4:17So I'm going to upload this new version
4:19of my file called File 1.
4:24Click Open.
4:25And upload.
4:26OK.
4:27So now I've uploaded that file one more time.
4:30I'll go into File 1 RTF.
4:32I see there are no historical versions of it.
4:35So when I download this file, and open this downloaded file,
4:40it's version 2.
4:41So I've effectively overwritten the original version
4:45of my file.
4:46So what I can do, if I want, is I can
4:50enable versioning on my bucket.
4:53So I'm going to go into mayc-east here.
4:55Go into Properties.
4:57And turn on bucket versioning.
4:59This way, whenever a file wants to be overwritten,
5:03what it actually does--
5:05so here was-- let's call this version 1 of my file.
5:09When I'm ready to work locally on my computer here,
5:13like so-- this is my laptop--
5:14And I create version 2 of my file.
5:17When I want to upload it into the bucket
5:20right here, what's going to happen
5:22is S3 will actually store version 2 of this file
5:26separately from version 1.
5:29So with bucket versioning enabled,
5:32I'll click Save Changes, like so.
5:34Bucket versioning is now enabled.
5:36I'll go back into my objects.
5:39I'll reopen my version 1 file right here.
5:42File 1 RTF.
5:43And I'll call this version 3 of my file.
5:47Command-S to save it.
5:48Close it all out.
5:50And I'll reupload that file.
5:52So let's go back into Add the files.
5:55There's file 1 more time.
5:57Click open.
5:58Hang on.
5:58One second.
5:59One second.
5:59One second.
5:59That was wrong.
6:00Hang on.
6:00We're going to go back into upload the file here.
6:03Add files.
6:06Find the correct one, where I just created
6:08this version 3 of my file.
6:10And click Open.
6:12We'll upload it.
6:13It gets uploaded successfully.
6:16So now, see.
6:16I still have file 1 RTF.
6:19Only one file 1 RTF.
6:21But when I go into versions now-- aah.
6:24There's the old version of my file.
6:27Now, when files get uploaded, they're
6:29going to be uploaded with a version ID.
6:34So if I'm using code, I can say, go to this S3 bucket,
6:40get this particular object based on its key.
6:44But then get this particular version of it.
6:48So one more time, just to see it all in action.
6:52I'll bring the File Explorer back up.
6:54We'll go back to my desktop file.
6:56Open this one up.
6:58I'll call this version 4 of my file.
7:03Close this out.
7:05Close it out.
7:06Do one more upload here.
7:08Just to highlight this, I want to show you--
7:10I want you to be able to see exactly what this is doing.
7:14Hang on.
7:15There we go.
7:15Now I got it fixed.
7:16Version 4 of my file.
7:18Upload.
7:19There it goes.
7:20Now, when I go into file 1 RTF, and I see versions,
7:24I see these different versions that are popping up.
7:27I see which one is the current version,
7:29and I see the old version here.
7:31When I click on the old version, I can see details about this.
7:35Importantly, like when that version was last modified.
7:39Let's jump back here into that file,
7:41and go back into versions.
7:43Also check this out.
7:44When I click on one of the older versions,
7:47I can download that older version.
7:49I can open that older version.
7:52I can delete that older version.
7:54Or, under Actions, I can do some cool things.
7:58I can make this older version public.
8:01I can give that older version a different tag.
8:04I can share the older version with a pre-signed URL
8:09so that somebody can access it.
8:10Or I can download the older version
8:13as a different file or a different format.
8:15Pretty cool, right?
8:16So if I wanted to be like, oh, you know what?
8:19This version broke something right here,
8:22so I need to restore to my older version.
8:24I would just download it, and then reupload it.
8:27And then that would become the current version.
8:30Now, what's the big takeaway from this?
8:32Each one of these versions is a separate file.
8:36At this point, you should see how
8:38my one 384 byte file is actually times three at this point.
8:45Doesn't seem like a huge deal when it's a simple text file.
8:49But all of a sudden, if we're uploading
8:51different versions of images, or large Excel
8:54documents, or anything like--
8:56Power BI files can get pretty large.
8:59Or Tableau files can get pretty large.
9:01Data sources, Parquet files.
9:03Things like that can chew through a lot of space
9:07really quickly.
9:09You could see the volume, or the amount of data
9:13that you're storing go up by multiples very quickly.
9:19And therefore, your bill goes up by multiples.
9:23This is the challenge that comes along with versioning.
9:26And that's why it's really important
9:28to understand that if you're going to be implementing
9:31versioning, you probably also want to be implementing
9:35lifecycle policies.
9:36But that's what we'll talk about in the next video.
9:39For now, this is understanding what bucket versioning does.
9:42I hope this has been informative for you,
9:43and I'd like to thank you for viewing.
Lifecycle Policies
0:00[AUDIO LOGO]
0:06In the previous video, we implemented versioning,
0:09and now we understand that when I want to overwrite or change
0:13an existing file, what it's really doing
0:15is creating a new version of the file
0:18and I can have the ability to roll back to previous versions
0:21if that's what I need to do.
0:23But that came at an expense.
0:25We're now consuming more storage.
0:27So what can I do about that?
0:30One of the cool things that AWS brings to the table for S3
0:34are lifecycle policies.
0:36This is where we manage the lifecycle
0:38of our objects themselves and their
0:41are many different versions.
0:42Put another way, we're figuring out
0:44when it's time to archive an old file or an old version.
0:49Let's talk more.
0:50In the previous video, we introduced the challenge
0:53of versioning, and that's when we start uploading files, then
0:57editing those files, then uploading
0:59a new version of those files.
1:00They could start to chew through a lot of space
1:04because we could effectively have hundreds or thousands
1:09of versions of the same file.
1:11And if that one file was just 1 megabyte in size
1:15and it was 1,000 copies of it, but now we've
1:18used through a gig of storage on just one
1:21file because it gets frequently edited and re-uploaded.
1:24Do you see how this could be a bit of a challenge?
1:27This is why we want to start to explore something
1:29called bucket lifecycles.
1:31Now, what are lifecycles all about?
1:33At the end of the day, this S3 bucket lives on disks.
1:39And some disks are faster or more performant than others.
1:43And those faster, more performant disks,
1:46they cost more money.
1:48But it's really, really useful when
1:50we have particularly hot data, data
1:53that's getting accessed or updated a whole lot.
1:56But what happens if I have the current version of my file,
2:01it's hot, but the 100th version of my file
2:06hasn't been accessed for months?
2:08Wouldn't it be ideal to move the older files to something
2:12like cold or archive storage?
2:15This is what a lifecycle policy does.
2:18And where we go to explore these are under the Management
2:21tab of our bucket, in particular,
2:24you see lifecycle rules.
2:26What we're really defining here are
2:27what are the rules or policies for moving old files to slower,
2:33less expensive disks?
2:35A great example-- and if you're the blog writer of this,
2:38I can never remember where I found this,
2:40but this is a really common example that's spread around
2:42with blogs is let's say I have a business expenses app,
2:48and I'm a salesperson.
2:50I travel to a different city and I
2:52take a client out to a nice steak dinner
2:54that costs something like $350.
2:57I take a picture of the receipt, and then I upload it
3:01into our business expenses app.
3:05Behind the scenes, that picture is going
3:08to be living in an S3 bucket.
3:10So for this particular payroll cycle,
3:13that picture is pretty important because the accountant who
3:17manages payroll or business expense reimbursement,
3:21they're going to access this picture
3:23and validate that my expense report meets up
3:26with what the receipt says.
3:28So within the first 30 days, that's
3:31a pretty important picture.
3:32But then let's say 90 days have passed,
3:36am I accessing that picture anymore?
3:38No.
3:39Why does it then need to live on premium SSDs?
3:43We probably want to rotate it back out to slower disks.
3:47And this is what we define with lifecycle rules.
3:50So here let's create a lifecycle rule real quick.
3:53We'll give this a name like liferule,
3:56and then something like mayceast to specify the bucket.
4:01You then have the ability to define the rule scope.
4:04Put another way, this is a filter defining
4:08what exact kinds of files you want this to apply to.
4:13As it says, filters can be applied to file prefixes,
4:17like every file we have begins with dev-
4:22Maybe we want that all of the lifecycle rules to apply
4:26to every file that begins with dev-,
4:28that's what the prefix does.
4:30Or maybe we want it to apply based
4:32on tags, or maybe only if objects are of a certain size,
4:37or any combination of these that suits our case.
4:41Filters are highly advisable and right
4:44below here's where we get to see how to specify these.
4:47Here's the prefix, here's the tags, and here are the object
4:51size rules that we can specify.
4:53Or you can just say apply this to all objects.
4:56Now, it does tell you if you want this rule to apply
4:59to all objects, we need to acknowledge
5:03that this is going to be moving all of our objects
5:05after a certain time has gone by.
5:07If you want to filter these out, change this up from above.
5:10So you have to acknowledge this.
5:12Now, underneath this, whether we choose the filters or not,
5:16we then have to choose the exact rules that we
5:20want to take this in action to.
5:22This is effectively asking--
5:23let me change the color here because that yellow is not
5:25showing up very well.
5:26This is effectively asking what kinds of versions
5:30do we want these actions to take on.
5:32Do we want to move the current version?
5:35Do we want to move the non-current versions?
5:39Do we want to expire versions?
5:42Do we want to permanently delete non-current versions, or so on?
5:47So it's not just about moving them
5:50to a different tier of storage disk,
5:52we could actually delete old versions
5:54if that's what we want to do.
5:55Let's say I just want to move non-current versions of objects
5:59between storage classes.
6:01So I want to keep my current version
6:03on the hot disks and any older versions of that,
6:07I want to move to slower disks.
6:09Now, how do we choose which tier to go to?
6:12When we click the dropdown, we actually
6:14see we have a lot of different options,
6:16and intelligent tiering is pretty cool.
6:18Amazon basically uses machine learning
6:21to move data between tiers because we don't know
6:25the kinds of access patterns that we
6:27may be having on a specific type of object that's
6:30being uploaded.
6:32For instance, think about that expense reimbursement ticket.
6:36I can pretty easily predict that ticket,
6:40that picture is going to be accessed within the first 30
6:43days because accounting has to reimburse that expense
6:46within that first 30 days.
6:48After that, it's probably not going to be accessed anymore.
6:52But if I'm uploading objects where
6:54I have no idea how frequently older versions are going
6:57to be accessed or not, I'm going to choose intelligent tiering
7:00and let Amazon determine which standard class this needs
7:04to go to.
7:05Now, if I knew, for instance, that that picture was going
7:09to be moving to an older tier after 30 days,
7:13I might want to use something like standard tier
7:16like standard SSDs, but under the infrequently accessed tier.
7:21This still guarantees me to have 30-millisecond storage access,
7:27but I might only access to it once a month or so.
7:30That's what the standard-IA does.
7:33If I want to reduce my cost even more, I might use One Zone-IA.
7:38This is the same thing as standard,
7:41except for the data is only going to exist
7:42in one availability zone.
7:45So I'll say non-current-- right here,
7:47non-current versions are going to be moved to One Zone-IA
7:52after a certain number of days occur.
7:56Let's say after 20 days have passed,
7:59I want to move non-current versions of my files
8:02into One Zone-IA.
8:04Now, the number of newer versions here is basically
8:07saying I want to keep my non--
8:11like up to a certain number of versions still in the hot tier.
8:15So if I put something like 5 right here,
8:17it would be keeping 5 non-current versions
8:21in the hot tier, all other non-current versions
8:24will be moved.
8:26So in this case, I'm just going to delete this out
8:28because it's optional.
8:29Now, this moves them into One Zone-IA, which still has 30--
8:36[STAMMERING] what am I trying to say here--
8:38with millisecond access 30 days minimum storage duration.
8:43That's what One Zone-IA does.
8:45Is it's still relatively hot, it's not frozen,
8:50but it's not super hot anymore.
8:52It's what's in between, it's cold.
8:53Now, what happens if I want to move data out
8:56of that and into true archive storage
8:58because we're not going to use it anymore?
9:00We like to call this write--
9:03write once, hopefully read never.
9:08This is the WORN strategy.
9:12The idea with archive storage is we are being forced
9:15to access it for some reason.
9:17Maybe we have some legal reason that we need to archive data,
9:20but this particular storage is going to be super cold.
9:24It can take a long time to access it because it
9:27has to be rehydrated.
9:30And the cost of rehydrating could be extremely expensive,
9:35but nonetheless, the cost to store data in archive storage
9:40is extremely cheap and we've then
9:42met whatever our compliance goal is to store this data.
9:46So I can add another transition underneath this one
9:49by clicking Add transition right there.
9:52Within this, we can start to explore
9:54the different types of glacier archive storage that us offers.
9:59When I click this dropdown, we see
10:01glacier for flexible retrieval requires
10:0490 days minimum storage, but retrieval
10:06can take minutes to hours.
10:08Glacier Deep Archive can take a retrieval of--
10:13allows you to retrieve data less--
10:15this data is access less than once a year
10:17with a retrieval of hours and requires 180 days minimum.
10:22So in this case, I may specify something
10:24like after 90 days living in One Zone-IA may move it then
10:29to Glacier Deep Archive.
10:31When we put this down here, we have
10:32to acknowledge this lifecycle rule
10:34because we're using glacier.
10:37Down below this, it shows you to review
10:39the transition of what happens.
10:41When we have non-current options on day 0,
10:44they're just going to stay non-current.
10:46On day 20, they're going to be moved to One Zone-IA.
10:50On day 90, they're going to be moved to Glacier Deep Archive.
10:55So fire off Create rule--
10:58oh and, of course, it does say you
10:59have to have 30 days required before transitioning to that.
11:02It did tell me that.
11:03Didn't it?
11:04Of course, it did.
11:04Right there.
11:05So we'll go ahead and create this rule right there and boom,
11:08the rule is now being created.
11:11Give it a simple refresh and now my rule
11:14has been brought to life.
11:16So now we have the ability to manage versions, and all
11:20of this historical data by moving it through
11:22to older cheaper storage based on how frequently it
11:26gets access.
11:27This is the importance of understanding
11:29tiering options in S3 buckets.
11:31I hope this has been informative for you,
11:33and I'd like to thank you for viewing.
Cross-Region Replication
0:00[MUSIC PLAYING]
0:06One of the final things that we're going to talk about
0:08is tiering.
0:09Again, remember, this is a really important thing
0:11to understand.
0:12The fully qualified domain name for our S3 bucket
0:16is globally accessible.
0:17However, where the files actually live
0:21are stored within a specific region.
0:23And there may be reasons why we need to replicate data from one
0:27region to a different region.
0:29Or, importantly, from one account to a different account.
0:33So in this video, we're going to explore replication.
0:36This is going to be a big video.
0:38Because replication can be big to set up.
0:40So without further ado, let's implement an S3 bucket
0:44replication to a different region.
0:47Let's go.
0:48The final thing that I want you to see in experience
0:51is replication, because replication
0:53can be a big undertaking.
0:55We absolutely have the ability to take a source bucket,
1:00and replicate it to a destination bucket.
1:04Importantly, there are some caveats, or some things
1:08that you can understand.
1:09The source bucket can be in a different account
1:13of a destination bucket.
1:15Or they can be in the same account.
1:17They can also be in different regions.
1:20Or they can be in the same regions.
1:23They can also be encrypted with KMS keys.
1:26Remember, KMS keys are specific to their region.
1:30So if I'm encrypting data in a source bucket in US East 2,
1:34and I want to send that data to a different region that's
1:38going to be using a different KMS key,
1:40we're going to have to manage that process,
1:43telling it what keys are supposed
1:44to be used to decrypt the data, and then re-encrypt
1:47the data upon arrival.
1:50There are a bunch of different reasons
1:52why you might implement different strategies
1:54of replication.
1:56In the S3 documentation, they actually
1:58have a section of why, or when to use replication.
2:02There's different reasons when you might
2:04use cross-region replication.
2:06First one is to meet compliance requirements.
2:08Second one is to minimize latency to your customers.
2:12Or operational efficiency, which is really about
2:15if you have different compute clusters in different AWS
2:18regions.
2:19But there's also reasons to use same region application,
2:22like aggregating multiple buckets into a same bucket,
2:26or configuring replication between a production and a test
2:30environment.
2:31There's even some great reasons to build
2:33in two-way replication.
2:35I love the particular use case for managing failover
2:38in keeping data consistent as it moves from one region
2:41to the next.
2:42So how do we actually implement this?
2:44Well, first of all, you have to have a source bucket
2:47and a destination bucket.
2:48I'm going to go ahead and create a destination bucket right now,
2:51like I said.
2:52I'm going to call this mayc-west.
2:54I'm going to put this in the US West 2.
2:58Replication requires, importantly,
3:01that bucket versioning is enabled,
3:03which means you probably want to consider lifecycle policies
3:07on your buckets as well.
3:08Also, to demonstrate this and understand
3:10the complexities of using KMS keys,
3:13I'm going to choose to use a KMS key.
3:15Now, since I've chosen US West 2, that, of course,
3:18means I have to use a key that I've created in US West two,
3:22which I have right here.
3:23West KMS 3, which was what was just disabled right there.
3:26I'll go ahead and create this bucket right now.
3:29So just remember, bucket versioning must be enabled.
3:32And I've already got a KMS key in the West
3:36that I'm using to encrypt this data.
3:38Now my goal is to establish replication
3:41between the US East 2 bucket and the US West 2 bucket.
3:45So I'm going to go into US East right here
3:48to be my source bucket.
3:50I'll go into Management.
3:51And right underneath my Lifecycle Rules,
3:54I see Replication Rules.
3:55We're going to create a replication rule,
3:57and get this thing started.
3:58I want to say right now, this process
4:01is so much easier than it used to be in the old versions
4:06of eight of AWS.
4:07Let's go ahead and call this mayc-replication,
4:10just to go ahead and call that a replication rule.
4:12The status of this is going to be enabled.
4:14And it's going to be a top priority.
4:16We can have multiple replication rules.
4:19And this really matters.
4:21Because if an object has already been replicated
4:26to a different region, then it will not
4:29be replicated by a second rule to a different region.
4:33Once it's been replicated, that object
4:36is no longer available for replication.
4:39So setting the priority of your replication rule effectively
4:43dictates what's going to be the target region that you
4:46try to go to first.
4:48All right.
4:49Here we go.
4:50We start off with our source bucket.
4:51Well, that's the bucket that we're working on.
4:54Just like the prefix--
4:56or just like the lifecycle rules,
4:58we specify filters of the types of objects
5:01that we want to replicate.
5:03This can go based off of prefixes or tags.
5:06Notably, not file size.
5:08In this case, I'm going to choose replicate all objects.
5:12Now I'm going to choose my destination bucket.
5:14From here, I can choose a bucket that exists in this account.
5:17But notice, I could specify a different bucket
5:20in a different account.
5:21I would need to know the account ID, and then the bucket name.
5:25That's also a good thing to know.
5:27Also keep in mind who are the owners of the objects when
5:30it goes to a destination.
5:32We can change the ownership to be the destination bucket
5:35owner.
5:36And therefore, they control the policies for those objects
5:40when they come inbound.
5:41That's a big security thing that you want to keep in mind.
5:44You may have a file that's private.
5:46The destination bucket might be a public bucket.
5:49So do keep that in mind.
5:50Since my bucket was created in this account,
5:53I'll go ahead and just choose to browse S3.
5:55And I'll choose mayc-west right here, and choose
5:59that particular bucket.
6:00Now, the IAM role.
6:02There's going to be, the AWS S3 service
6:05is going to be the one reading the objects out
6:09of your source bucket, and writing them
6:11to a destination bucket.
6:13This is a really important thing to understand.
6:16And that's why they're asking you to choose an IAM role.
6:19Now, I can tell it to create a new role from scratch for me,
6:23but that's going to give me a little bit
6:25of a problem with the KMS keys when I get started.
6:29Because right now, this newly created role
6:32doesn't have access to the currently in use KMS keys.
6:37So I could pre-create this role, and then specify the role here,
6:43while also going into KMS and granting that role
6:46permissions to the keys.
6:48For instance, I want you to see, this
6:50is what that particular role would look like.
6:52I'm going to go ahead and collapse it here under my code
6:55samples for SAA C03.
6:57You'll see the S3 folder.
7:00And this is what the role is going to look like.
7:03We're going to allow the allow process
7:06for get replication configuration.
7:08List the bucket for the specific bucket.
7:11Then we're also going to allow get object version
7:15for replication, object version ACL,
7:18object version tacking for all objects in the source bucket.
7:22Then we're going to allow replicate object,
7:25replicate delete, and replicate tags for the destination
7:29bucket.
7:30So we'll give that particular role a name,
7:34then grant that role permission to the KMS keys.
7:37Now, I'm going to go ahead and fix this after the fact,
7:40since I'm already here.
7:41I'll let it create a role for me.
7:43We're also going to specify when you write data to that target
7:47destination, it needs to be encrypted with the encrypt--
7:52So first of all, when we choose encryption,
7:54we're going to replicate with encryption.
7:56The first thing it's asking me is
7:57KMS keys for decrypting the source objects.
8:01I used my demo key for my source bucket.
8:05Next, in this option, is what is the KMS
8:08key that's used for encrypting destination objects?
8:12I'll choose from the dropdown here.
8:13And I'll choose the West KMS S3 key
8:17that I created for the destination topic.
8:19I also can choose a different storage class.
8:22So if I want to write from standard to, say,
8:25something like 1 Zone IA, I can override the destination
8:29target, and write it to a different storage
8:31class for storage purposes.
8:33Pretty cool, right?
8:34I'll go ahead and uncheck that right now,
8:36and use the defaults.
8:37Now, the last things that I want to enable here.
8:40Replication time control is all about aggressively replicating.
8:45This says, when you enable this, it
8:47replicates 99.99% of new objects within 15 minutes,
8:52and includes replication metrics down below.
8:56Replication metrics are all about streaming
8:58replication task data to the CloudWatch monitor.
9:02So you can monitor the amount, or number of objects
9:06that are currently queued for replication, the amount of data
9:09that's been replicated, or even diagnose replication failures
9:13with this.
9:13For my example, I'm going to go ahead and check this off.
9:17It is important to know that in some cases
9:20though, replication can take up to two hours for replication
9:25to actually occur between your source and destination.
9:28So with that being said, I'll click Save right here.
9:33It asks me if I want to replicate existing objects.
9:36But since I don't have any existing
9:38objects in my S3 bucket, I'll just
9:39go ahead and leave this at No.
9:41And click Submit.
9:43So now when I see replication rules,
9:45I see I now have replication enabled.
9:48But I'm not done yet, am I?
9:49I still need access to those KMS keys.
9:52So I'm going to go into my replication rules here.
9:56And from here, I'm going to check this out.
9:58We're going to go ahead, and actually,
9:59under Actions, click Edit.
10:04When I scroll down to the IAM role,
10:06I see the name of the role that was created.
10:08S3 CRR-- that's Cross Region Replication-- role
10:13for Mayc East undercsore 1.
10:16That's the role that needs access to KMS keys.
10:20So I'm going to jump over to KMS.
10:22Let's go to my kiloohms keys.
10:23That way, this role that's performing the replication
10:27can decrypt and encrypt the data.
10:29I'll go into my key here.
10:32I'll go to my key users.
10:34I'm going to add that user.
10:36I'll search for CRR.
10:39And if I expand the name, I see the newly created role.
10:43And I'll add them to be a key user.
10:45I'll change my region to be the Oregon region right there.
10:53We'll go into the customer managed keys, West KMS.
10:56And under the Key Users one more time, I'll search for CRR.
11:01Expand the name.
11:03Check off this role.
11:04And now click Add.
11:06Perfect.
11:06So at this point, that role now has
11:09the access to the S3 buckets, and the KMS keys
11:13that are used for encryption and decryption on those S3 buckets.
11:17So my final task here is to do some testing.
11:20I'll go ahead and upload a file.
11:22Let's go ahead-- Oh, I did have files in here.
11:24Duh.
11:24I'll go ahead and add a new file here into the mix.
11:27We'll go into something like Recents.
11:31And I'll upload my Mayc image.
11:33Here we go.
11:34We'll click Upload.
11:35And there we go.
11:37It's now been successfully upload.
11:38Within a few minutes, this mayc 5013 JPEG
11:43should be replicated over to mayc west.
11:46So if I jump over to mayc west, like I said,
11:49it could take a little while for this to kick in,
11:51but it will eventually appear.
11:53Because I also enabled metrics, I can jump into the metrics.
11:57Scroll on down to the replication rules.
12:00Choose my replication rule right here, and display the charts.
12:04Now, like I said, it's going to take
12:06a while for the replication service to come to life.
12:09You got to give it some time.
12:11It's not going to kick off right away.
12:13It could take up to a few hours for the first replication
12:16to occur.
12:17But once replication is up and running,
12:19because we enabled RTC--
12:22right there, the Replication Time Control.
12:25Right there.
12:26Oh.
12:26Right there.
12:27That means it's going to aggressively replicate data
12:30once the replication jobs get kicked off.
12:34So jumping back to buckets.
12:35Now that a few minutes have passed, I'll go into mayc-west.
12:38And there it is right there, replicated.
12:41So that's how we configure replication
12:44across our S3 buckets.
12:46I hope this has been informative for you,
12:47and I'd like to thank you for viewing.
Team training path
Turn this skill into assignable team training
This free skill is a preview of the courses your team can assign, track, and report on with CBT Nuggets.
$749
seat / year