Akamai DataStream

Akamai BigQuery Integration

Google Cloud Platform (GCP) BigQuery is a columnar database tool that provides data analysis without having to take care of the underlying infrastructure. It also lets you visualize your data with an integrated tool called Data Monitor.  

You can now integrate Akamai DataStream with BigQuery to find meaningful insights, use familiar SQL, and take advantage of a pay-as-you-go model.

Note: You can integrate raw logs and aggregated metrics streams with BigQuery. In this example, we’ll integrate a raw logs stream that pushes data to BigQuery. There are three steps to integrate DataStream with BigQuery:

There are five steps to integrate DataStream with BigQuery:

  1. Get started with DataStream

  2. Set up your API client

  3. Set up a GCP account

  4. Integrate DataStream with BigQuery

  5. Make a DataStream API call

 

1. Get started with DataStream

In DataStream, configure a raw logs stream and choose your data sets. For example, you can select Request Header Data to choose the headers that you want to receive when calling the API. You may want to receive headers such as Authorization or Range, Accept-Encoding, and many others.

You can also choose a sample rate. Unless justified, you should select 100% to get all the traffic that hits your site.

For details, see the get started section.

 

2. Set up an API client for the DataStream API

To integrate DataStream with BigQuery, you need an API client with at least read-only access to the DataStream API. To create an API client, navigate to the Identity and Access Management page in Control Center.

Once you’re on the page, create an API client.

API Client

 

Provide a name for your client and grant it access to the Pull Datastream API.

 

Permissions

Finally, create credentials for your API client. You can see these credentials only once, so be sure to download or copy them directly into the Datadog integration tile for Akamai.

 

3. Set up a GCP account

You need to open a new project and start creating the following products:

Cloud storage

Set up two buckets. One to store the logs, and the other to store a cloud function script.

cloud storage

Compute engine

Set up one compute workload to call the DataStream API and copy it to cloud storage.

Compute engine

BigQuery database

Create a BigQuery database for your logs. You will add a table later.

BigQuery Database.

Once you are done, go to API services in your Google Cloud Platform and enable the following APIs cloud functions: BigQuery and cloud storage.

4. Integrate DataStream with BigQuery

Compute engine setup

SSH into the compute engine that you previously set up. Then, install the Google Cloud API.

For more details, see https://cloud.google.com/sdk/install

Install the Akamai APIs and clients. Copy the previously created Akamai credentials and paste them to the .edgerc file. For more details, see

https://developer.akamai.com/api/getting-started

Next, grant the compute engine access to the GCP resources such as storage, BigQuery, and the cloud function. For more details, see

https://cloud.google.com/iam/docs/granting-changing-revoking-access

BigQuery table setup

First, you need to get the DataStream schema. You’ll find it here:

https://developer.akamai.com/api/web_performance/datastream/v1-api.zip

Next, prepare a BigQuery schema that matches the DataStream schema. The BigQuery schema looks exactly like the one here. Note that the schema has a lot of nested records.

table setup

Then, use the prepared schema to create a table in BiqQuery. This command uses the schema called schema.json to create a table called edgescapedemo.

bq mk --table akamai-206503:datastream_logs.edgescapedemo ./schema.json

Cloud function setup

You also need to write a cloud function. The cloud function is a serverless computing product. For more details, see https://cloud.google.com/functions/

 It can act on triggers. Here, the trigger that we use is google.storage.object.finalize. As soon as something is uploaded to cloud storage, the trigger will fire. For more details, see

https://cloud.google.com/functions/docs/calling/storage

Once you’ve prepared the cloud function, you can deploy it with this command:

gcloud beta functions deploy datastream-cloud-function --trigger-resource=akamai-datastream --trigger-event google.storage.object.finalize --source=. --stage-bucket=gs://akamai-script-cloudfunction --entry-point=jsonLoad

5. Call DataStream API

Now all the pieces are in place, you can start your API calls script and push the DataStream JSON response file to cloud storage. Once the file is uploaded to cloud storage, the finalize  trigger activates the cloud function and stores the file or your data in a BigQuery table.

Here is the flow:

1.Make an  API call for the DataStream APIi from the compute engine. This can be a cron job:

http --auth-type edgegrid -a datastream-pull-api: ":/datastream-pull-api/v1/streams/851/raw-logs?start=2018-10-30T06:30:00Z&end=2019-10-23T06:40:00Z&page=0&size=100"

2. Push the output to the bucket for DataStream logs:

gsutil cp output.json gs://akamai-datastream

As soon as it’s in the bucket, it’ll activate the cloud function. Looking at the cloud function logs, you can verify if it has successfully completed.

BigQuery7

You can return the logs with this command:

gcloud beta functions logs read datastream-cloud-function

3. Once it’s done, you can open the BigQuery interface and query the table. You’ll see something similar to this:

BigQuery8

Use Cases

Segment number vs download time

In this example, let’s consider a customer who uses DataStream to ingest logs every 5 minutes. The customer has received performance complaints over the past few minutes or has some side statistics showing an increase in load times, such as page loads. Here, the customer can quickly make a query to see the load times for all objects or scripts on his page.

The customer could use BigQuery to get file types together with their download times. This example SQL query returns ts and m3u8 file types:

select d.message.reqPath, CAST(d.netPerf.downloadTime AS INT64) * 1 As dtime FROM `akamai-206503.datastream_logs.edgescapedemo` ,

UNNEST(data) as d

where d.message.reqPath LIKE "%.ts" or d.message.reqPath LIKE "%.m3u8"

order by dTime desc

BigQuery9

We can easily point out the files that take longer to download. Then, we can investigate futher and  make specific queries about the title, allowing the customer to identify the root cause of the problem.

What’s more, Google Data Studio is an integrated feature, making it easy to visualize any query or table in a dashboard or report. Let’s look at this table. With one click, you can convert it into a graph.

BigQuery10BigQuery11

Aggregation

Using an aggregate stream, you can also find out if the numbers of errors have increased. Aggregate data streams retrieve real-time of 4xx and 5xx HTTP error occurrences.

Here is the data stream call:

http --auth-type edgegrid -a datastream-pull-api: ":/datastream-pull-api/v1/streams/1201/aggregate-logs?start=2019-02-15T09:19:37Z&end=2019-02-17T10:40:00Z&aggregateMetric=2xx%2C3xx%2C4xx%2C5xx&page=0&size=100"

BigQuery12

You can then ingest the return JSON file into BigQuery and visualize the errors as a time series.