PriMoThemes — now s2Member® (official notice)

This is now a very OLD forum system. It's in READ-ONLY mode.
All community interaction now occurs at WP Sharks™. See: new forums @ WP Sharks™

AWS Services and Fees

s2Member Plugin. A Membership plugin for WordPress®.

AWS Services and Fees

Postby drbyte » December 12th, 2011, 6:01 am

Form AWS:

"It looks like there are lots of requests coming from cloud front and the bucket has increase in size by 200GB since the beginning of December. Every time cloud front needs to update its cache, the customer is charged for the requests as per this text found on http://aws.amazon.com/cloudfront.

(Amazon CloudFront can use Amazon S3 or Amazon EC2 as the origin server to store the original, definitive versions of your files. Normal fees will apply for Amazon S3 and Amazon EC2 usage, including “origin fetches” – data transferred from Amazon S3 or Amazon EC2 to edge locations. )

We would advise that unless you require cloud front, you should cancel the service."

So, please be advised Guys

from that page

When a client requests a page using that domain name, Amazon CloudFront determines the best edge location to serve your content. If an edge location doesn’t have a copy of the file that the end user requests, Amazon CloudFront will get a copy from the origin server and hold it at the edge location so it’s available for future requests. You can also specify a default file (e.g., index.html) that will be served for requests made for the root of your distribution without an object name specified – for instance, requests made to http://abc123.cloudfront.net/ alone, without a file name.
-----------------------------------------------------------

"Amazon CloudFront uses the expiration period you set on your files (through cache control headers) to determine whether it needs to check the origin for an updated version of the file. If you expect that your files will change frequently, the best practice is to use object versioning to manage these changes. To implement object versioning, you create a unique filename in your origin server for each version of your file and use the file name corresponding to the correct version in your web pages or applications. With this technique, Amazon CloudFront caches the version of the object that you want without needing to wait for an object to expire before you can serve a newer version."


So Jason, I was correct..When I was updating each post to use AWS and testing that file, it was a whole request for the whole video size, some as large as 1GB even thought I only tested 5 seconds of that file.

Looking at my requests:

Amazon CloudFront: Dec total is: $2.61
Amazon Simple Storage Service: Dec 1st to 12: $39.05

Now, this is what I do not understand Jason

AWS Data Transfer (excluding Amazon CloudFront):
$0.120 per GB - up to 10 TB / month data transfer out 680.766 GB 81.69

That's when I had both streaming and html5 fallback on my JWPlayer code

Since I removed html5 fallback and the price been up by 3 dollars (form 6trh to 12th)
User avatar
drbyte
Experienced User
Experienced User
 
Posts: 269
Joined: May 6, 2010

Re: AWS Services and Fees

Postby drbyte » December 12th, 2011, 8:15 am

an Update:

email form billing:

"Hello,

Per our phone conversation I would like to let you know upon investigating your problem more I would like you to check if you have the cache set to refresh after 30 days on cloud-front, this would cause cloud front to pull most of your data every 30 days. This seems like a very plausible solution to your answer.

Also to stream threw cloud front your buckets must be public facing. There are programs that scan for public buckets in order to download all the data off of them, so this could also be causing your data spike.

It is not possible for AWS to stop a third party from downloading content located in your Amazon S3 bucket."

Not sure how but mine is not public

Jason..can you please confirm when using JWplayer along with html5 fallback, looking at the source code taken the http link generated and enter it in the address bar of Google Chrome.. does it play without being logged in

Thank you

Sam
User avatar
drbyte
Experienced User
Experienced User
 
Posts: 269
Joined: May 6, 2010

Re: AWS Services and Fees

Postby drbyte » December 12th, 2011, 8:24 am

Not sure if this apply Jason. I got this by searching:

Amazon S3/CloudFront 304s stripping Cache-Control headers

Beware of relying on Cache-Control: max-age and Expires HTTP header fallback behavior on Amazon CloudFront. The Cache-Control header may get stripped on CloudFront 304s, and browsers will then have to fall back to whatever is in the Expires header. If that Expires date has passed, or if you never specified it, all subsequent requests for the resource will be conditionally validated by the browser.

The Problem

I was looking at my web server’s health metrics recently (via Cacti), and noticed a spike in outbound traffic and HTTP requests. Analytics logs didn’t show a large increase in visitors or page loads, and it looked like the additional traffic was simply an increase in requests for static images.

The Investigation

The static images in question have HTTP Cache headers set for 1 year into the future, so they can be easily cached by browsers and proxies per performance best-practices. The suggested way to set a far expiry date is by setting both the Cache-Control header (eg, Cache-Control: public, max-age=31536000), as well as an Expires header with a static date set for the same time in the future. The Expires header is a HTTP/1.0 header that sets a specific date, say Jan 1 2011, whereas Cache-Control is relative, in seconds. Theoretically, if both Cache-Control and Expires headers are sent, the Cache-Control header should take precedence, so it’s safe to additionally set Expires for fall-back cases.

This combination of caching header behavior works good if you are using Amazon’s CloudFront CDN, backed by static files on Amazon S3, which is what I use for several sites. The files are uploaded once to S3, and their HTTP headers are set at upload time. For the static images, I am uploading them with a 1-year max-age expiry and an Expires header 1 year from when they’re uploaded. For example, I uploaded an image to S3 on Oct 5 2010 with these headers:

Cache-Control: public, max-age=31536000
Expires: Thu, 05 Oct 2011 22:45:05 GMT

Theoretically, HTTP/1.1 clients (current web browsers) and even ancient HTTP/1.0 proxies should both be able to understand these headers. Even though the Expires header was for Oct 5 2011 (a couple days ago), Cache-Control should take precedence and the content should still be fresh for all current web browsers that recently downloaded the file. HTTP/1.0 proxies will only understand the Expires header, and they may want to conditionally validate the content if the date is past Oct 5 2011, but they should be a small part of HTTP accesses.

So my first thought was that the additional load on the server was from HTTP/1.0 proxies re-validating the already-expired content since I had set the content to expire in 1 year and that date had just passed. I should have set a much-further expiry in the first place — these images never change. To fix this, I could easily just re-upload the content with a much longer Expires (30 years from now should be sufficient).

However, as I was investigating the issue, I noticed via the F12 Developer Tools that IE9 was conditionally validating some of the already-expired images, even though the Cache-Control header should be taking precedence. Multiple images were being conditionally re-validated (incurring a HTTP request and 304 response), for every IE session. All of these images had Expires header date that recently passed.

After I cleared my IE browser cache, the problem no longer repro’d. It was only after I happened to F5 the page (refresh) that the past-Expires images were being conditionally requested again on subsequent navigations.

The Repro

Take, for example, this request of a static file on my webserver that expired back on Jan 1, 2010:

GET /test/test-public.txt HTTP/1.1 Accept: text/html, application/xhtml+xml, */* Accept-Language: en-US User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0) Accept-Encoding: gzip, deflate Connection: Keep-Alive Host: cf.nicj.net

HTTP/1.0 200 OK Date: Sat, 08 Oct 2011 02:28:03 GMT Cache-Control: public, max-age=946707779 Expires: Fri, 01 Jan 2010 00:00:00 GMT Last-Modified: Sat, 08 Oct 2011 02:25:58 GMT ETag: "098f6bcd4621d373cade4e832627b4f6" Accept-Ranges: bytes Content-Type: text/plain Content-Length: 4 Server: AmazonS3

IE and other modern browsers will download this content today, and treat it as fresh for 30 years (946,707,779 seconds), due to the Cache-Control header taking precedence over the Jan 1, 2010 Expires header.

The problem comes when, for whatever reason, a browser conditionally re-validates the content (via If-Modified-Since). Here are IE’s request headers and Amazon’s CloudFront response headers:

GET /test/test-public.txt HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Accept-Language: en-US
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host: cf.nicj.net
If-Modified-Since: Sat, 08 Oct 2011 02:28:03 GMT

HTTP/1.0 304 Not Modified
Date: Sat, 08 Oct 2011 02:31:54 GMT
Content-Type: text/plain
Expires: Fri, 01 Jan 2010 00:00:00 GMT
Last-Modified: Sat, 08 Oct 2011 02:25:58 GMT
ETag: "098f6bcd4621d373cade4e832627b4f6"
Age: 232

We see the additional If-Modifed-Since in the request, and the same Expires date in the response. Unfortunately, there’s an important missing header in this response: the Cache-Control header. It appears, at least from my testing, that CloudFront strips the Cache-Control headers from 304 responses.

After this happens it appears that IE forgets the original Cache-Control header so all subsequent navigations to the page will trigger conditional GETs for those resources. Since the 304 is missing the Cache-Control header, it just sees the Expires tag, and thinks it needs to always re-validate the content from now on.

Why This Would Happen

But what’s causing the re-validation (If-Modifed-Since) and subsequent 304 in the first place?

User agents shouldn’t normally re-validate these resources, since the original Cache-Control header should keep it fresh for quite a while. Except, when you either force a refresh (F5) of the page, or if the content has passed its natural freshness.

On F5 refresh, all resources on the page are conditionally re-validated via If-Modified-Since. And, as we’ve seen, the resources on CloudFront are sent back missing the original Cache-Control header, and IE updates its cache with just the Expires tag, instead of keeping the resource still fresh for a year. For some reason, this doesn’t occur with Chrome or Firefox on F5.

In addition, the problem will appear in all browsers when they need to send a If-Modified-Since header for re-validation of content they think might have expired, such as with max-age headers that have expired (shorter-expiring content).

Take, for example, a resource that you set to expire 1 day from now, and either set the Expires header to 1 day from now (per best practices) or simply don’t specify the Expires header:

Cache-Control: public, max-age=86400

For the first 24 hours after your visitor loads the resource, modern browsers won’t re-validate the resource. At hour 24 and 1 second, the browser will send a conditional request. Unfortunately, with CloudFront, the 304 response will be missing the Cache-Control header. The browser then doesn’t realize that the resource should be fresh for another 24 hours. So even if the content wasn’t actually updated after those 24 hours, all subsequent navigations with the resource will trigger a conditional validate of that resource, since the original Cache-Control headers were lost with the 304. Ugh.

How to Avoid the Issue

Note this doesn’t appear to affect Chrome 14 and FireFox 6 in the F5 scenario. Both browsers send conditional If-Modified-Since headers on F5 and get back the same CloudFront response (sans Cache-Control headers), but they don’t appear to be affected by the missing Cache-Control header. Subsequent navigations in Chrome and FF after a F5 do not conditionally re-validate the CloudFront content. They do appear to be affected by the missing Cache-Control header for naturally stale content on If-Modified-Since requests.

I haven’t investigated the F5 problem on pre-IE9 versions, but I would assume the problem exists there as well. As far as I can tell, this isn’t fixed in IE10 beta.

I’ve only found this problem on CloudFront’s CDN servers. I couldn’t find a way to get Apache to naturally skip the Cache-Control header for 304s if the header was in the original HTTP 200 response (for example, when using mod_expires on static content).

The bottom line is that requests that send an If-Modified-Since to CloudFront and get a 304 back will essentially lose the Cache-Control hints. If your Expires header is missing, or in the past, the resource will be conditionally validated on every page navigation until it gets evicted from the cache. That can cause a lot of unnecessary requests and will slow down your visitor’s page loads.

The simple solution is to use a much-further expiry time. 30 years should suffice. Then, if the original Cache-Control header is lost from CloudFront 304s, the 30-year-from-now Expires header will keep the resource from having to be validated.

I’m not sure why Amazon CloudFront strips the Cache-Control header from 304 responses. I’ll follow up with them.

Back to my original problem: I think it’s actually Amazon’s CloudFront servers noting that the Expires for a lot of my static images are past-due. They’re checking the origin server to see if any new content is available. The above issue isn’t likely causing a ton of additional load, but it was interesting to find none-the-less!
User avatar
drbyte
Experienced User
Experienced User
 
Posts: 269
Joined: May 6, 2010

Re: AWS Services and Fees

Postby drbyte » December 12th, 2011, 11:18 pm

Others I found:

"I just remembered that when I first started, I was incorrectly using the getObject() function of the S3 PHP class. As I was poring over the code, I realized that I didn't want to actually get the files, but I just wanted to get the files' information. So I changed it to getObjectInfo()...but not until I had already run the script on 50 or so files. So my server was in the process of downloading however many files I ran the script on"
User avatar
drbyte
Experienced User
Experienced User
 
Posts: 269
Joined: May 6, 2010

Re: AWS Services and Fees

Postby Jason Caldwell » December 15th, 2011, 9:11 pm

Hi Sam. Thanks for posting this useful information.

"Amazon CloudFront uses the expiration period you set on your files (through cache control headers) to determine whether it needs to check the origin for an updated version of the file. If you expect that your files will change frequently, the best practice is to use object versioning to manage these changes. To implement object versioning, you create a unique filename in your origin server for each version of your file and use the file name corresponding to the correct version in your web pages or applications. With this technique, Amazon CloudFront caches the version of the object that you want without needing to wait for an object to expire before you can serve a newer version."


So Jason, I was correct..When I was updating each post to use AWS and testing that file, it was a whole request for the whole video size, some as large as 1GB even thought I only tested 5 seconds of that file.
Sorry. I'm not sure I fully understand. Viewing a post, would open an RTMP stream to a CloudFront edge location. Streaming files are not fully downloaded, so you're only viewing a portion of the video, not the whole thing. I understand that if the edge location does not yet have a copy of the file, you would be charged for the transfer from your Bucket to the edge location though.

Is that correct, or am I missing something?

Note: The HTML5 fallback URL should not be loaded at all in most cases.
~ Jason Caldwell / Lead Developer
& Zeitgeist Movie Advocate: http://www.zeitgeistmovie.com/

Is the s2Member plugin working for you? Please rate s2Member at WordPress.org.
You'll need a WordPress.org account ( comes in handy ). Then rate s2Member here Image
.
User avatar
Jason Caldwell
Lead Developer
Lead Developer
 
Posts: 4045
Joined: May 3, 2010
Location: Georgia / USA

Re: AWS Services and Fees

Postby Jason Caldwell » December 15th, 2011, 9:16 pm

Not sure how but mine is not public

Jason..can you please confirm when using JWplayer along with html5 fallback, looking at the source code taken the http link generated and enter it in the address bar of Google Chrome.. does it play without being logged in
When you tell s2Member to auto-configure your Amazon S3/CloudFront Distros, s2Member will modify the ACLs and Bucket Policy, so that files inside this Bucket are NOT publicly available.

However, if you allow access to these files through s2member_file_download_url() function, provided by the code samples you referenced; this generates URLs that are digitally signed by s2Member. These URLs ... one leading to the RTMP stream, and another to the MP4 file via HTTP, are both available to whomever you introduce them to ( and based on the parameters you pass to the API function ).

For the benefit of other readers, see:
s2Member -> Download Options -> JW Player Examples

See also: viewtopic.php?f=40&t=12453&src_doc_v=111206#src_doc_s2member_file_download_url%28%29
~ Jason Caldwell / Lead Developer
& Zeitgeist Movie Advocate: http://www.zeitgeistmovie.com/

Is the s2Member plugin working for you? Please rate s2Member at WordPress.org.
You'll need a WordPress.org account ( comes in handy ). Then rate s2Member here Image
.
User avatar
Jason Caldwell
Lead Developer
Lead Developer
 
Posts: 4045
Joined: May 3, 2010
Location: Georgia / USA

Re: AWS Services and Fees

Postby drbyte » December 16th, 2011, 12:40 am

Jason Caldwell wrote:When you tell s2Member to auto-configure your Amazon S3/CloudFront Distros, s2Member will modify the ACLs and Bucket Policy, so that files inside this Bucket are NOT publicly available.


I understand this Jason. It's suppose to be private unless otherwise.

Jason Caldwell wrote:However, if you allow access to these files through s2member_file_download_url() function, provided by the code samples you referenced; this generates URLs that are digitally signed by s2Member. These URLs ... one leading to the RTMP stream, and another to the MP4 file via HTTP, are both available to whomever you introduce them to ( and based on the parameters you pass to the API function ).


RTMP is fine, it's the HTTP link that is scary and since it expires in 24 hours that makes it worse.

I only noted the above on Google Chrome. FF gave me access denied so as IE9. Chrome and Safari did not. Not sure if it's a browser cache issue or something else.

It's misleading statement form Amazon when they say "Pay only of what you use"

When I first created my bucket, It must have been somewhere close but not in California. Now, they say that when somebody requests a file through CloudFront, if the file does not already exist in that particular edge serve, it gets requested and cached to the closest server to serve the requester. Meaning, If a 1GB file to be viewed in Japan, CloudFront would have to request the new file from the US server and Cache it somewhere in Asian, then transmit it. That request cost money too along the used bytes.

That's request is not calculated per byte but as whole, because the file was not originally stored on that edge server. Be advised that AWS says that, if for some reason, the file being cached on that edge server is not popular or they need its space, they will delete it.......and the story start all over. What a way of making money :)

So, if you have 2TB of files that will be viewed each month, them most likely you would be paying for: A. Storage, B. CloudFront requests, C. Data Transferred from S3 to Cloud Front, and finally..Data Transfer from one edge to another....what they call "Cache"

In my situation, when I was updating my posts, I was viewing each one after the changes and each one was a full request bouncing back and froth form one Edge to another plus the bytes of each view. Nice. I calculated it and it sounds about right, total posts changed around ~700, each post contain a video file request of ~800MB...I hope my math adds in here but I guess around ~650GB x 0.120 = ~$76

From 12/01/11 to 12/104/11 my AWS data transfer only was $74.00 (that's when I was updating the posts)

From 12/05/11 to 12/15/11 my AWS data transfer adds up to be $11.00 (No post changes) just members viewing some media I suppose.

Interesting :)

Finally, here's the last email I got from Amazon:

I keep asking them about the AWS data out (excluding CloudFront) and they keep sending me emails about CloudFront

"Hello Sam,

Charges for Cloudfront are solely based on Data transfer out and number of requests. That being said you will not be charged for the storage on the AWS Asia servers. If you are using another Amazon Web Service as the origin of your files, you will be charged separately for use of that service, including for storage, GET requests and data transfer out of that service to Amazon CloudFront’s edge locations.. Also unless a customer downloads the file off of the servers you will only be charged for what the customer viewed in bytes not the full data amount.

As far as setting your cache refresh to 30 days unfortunately you've reached the Customer Service team and we're not technically trained to handle your question. At AWS, we keep technical support unbundled from what you pay for the underlying service so that customers only pay for the level of support they require. You have two options to get assistance with this issue.

Basic Support: You can post your question to our Discussion Forums (http://aws.amazon.com/forums). Many experienced AWS developers regularly participate in the forums, and they can often answer questions like this. AWS engineers also review forum posts to ensure the root cause of the issue is not with the AWS infrastructure. We also have technical documentation, including code samples, at our Resource Center here: http://aws.amazon.com/resources
Additionally, be sure not to post your Secret Access key, because revealing this key will compromise the security of your account. Most other values, including your Access Key ID, request signatures, and request IDs aren't sensitive information.

Premium Support: AWS Premium Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. All AWS Premium Support plans offer customers of AWS Infrastructure Services an unlimited number of support cases with pay-by-the-month pricing and no long-term contracts. With pricing as low as $49 per month, the four plans provide developers and businesses the flexibility to choose the support plans that meet their specific needs. You can sign up for AWS Premium Support here:

http://aws.amazon.com/premiumsupport
"

Sam
User avatar
drbyte
Experienced User
Experienced User
 
Posts: 269
Joined: May 6, 2010

Re: AWS Services and Fees

Postby Jason Caldwell » December 16th, 2011, 6:26 am

Thanks Sam. I see. So if you have lots of extremely large files, the cost adds up in the beginning, but should become lower as each of these edge locations caches your objects. So long as you have a cache expiration setting that is reasonable, and so long as the "it-could-happen" situations do NOT arise very often, with respect to automatic expiration of cached objects, due to storage optimizations.

RTMP is fine, it's the HTTP link that is scary and since it expires in 24 hours that makes it worse.
I understand. This is worrying you because someone could take this and download the entire file with it, costing you money. It sounds like you might want to consider RTMP as your only option. That way you can be sure that files are not downloaded ( i.e. stolen ), and that your bandwidth costs will remain reasonable. I've seen many companies doing this very thing. Especially when you're dealing with protected content.
~ Jason Caldwell / Lead Developer
& Zeitgeist Movie Advocate: http://www.zeitgeistmovie.com/

Is the s2Member plugin working for you? Please rate s2Member at WordPress.org.
You'll need a WordPress.org account ( comes in handy ). Then rate s2Member here Image
.
User avatar
Jason Caldwell
Lead Developer
Lead Developer
 
Posts: 4045
Joined: May 3, 2010
Location: Georgia / USA

Re: AWS Services and Fees

Postby drbyte » December 20th, 2011, 1:15 am

Hi Jason

I did some testing on a local installation and seems that each page/post player view counts as one request form both CLoudFront and S3 even thought the player auto play set at 0.

I can see that because I enabled AWS logging to both CloudFront and S3

I tested the above using FireFox and IE9 and it seems that FireFox is the only one initiating the request from CloudFront and S3. IE9 did only when I started the player.

Investigating the issue more, I found out by disabling W3 Total Cache it disables Firefox CloudFront and S3 requests. ???

I tried disabling each one of the option in W3 Total Cache, the Browser Cache option did the trick. If it's enabled, you can see the requests on AWS just by viewing the post page, when it's disabled, nothing happen.

Sam
User avatar
drbyte
Experienced User
Experienced User
 
Posts: 269
Joined: May 6, 2010


Return to s2Member Plugin

Who is online

Users browsing this forum: No registered users and 2 guests

cron