tim gross | @0x74696d
http://0x74696d.com/slides/falling-in-and-out-of-love-with-dynamodb.html





http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf



Table rows are referenced by primary key: a hash key or hash key + range key.
primary key | attributes
---------------------------------------------------------------
hash key | attribute | attribute | attribute, etc...
primary key | attributes
--------------------------------------------------------------------------
hash key | range key | attribute | attribute | attribute, etc...
Hash keys and range keys have a parent-child relationship.

Range keys are sorted, but only with the "bucket" of a given hash key.

hash key | range key
-------------------------------------------------------------
content_type.entity_id | FAN_OF.content_type.entity_id
content_type.entity_id | FANNED_BY.content_type.entity_id
Querying can be done with API (Python library is boto):
Which actors does this user follow?
1 results = table.query(hash_key=user_id,
2 range_key=BEGINS_WITH('FAN_OF.Actor.'))
Who are this actor's fans?
1 results = table.query(hash_key=actor_id,
2 range_key=BEGINS_WITH('FANNED_BY.User.')


hash key | range key | attributes
--------------------------------------------------------
series.episode | timestamp | a bunch of attributes
"Hey, you know in MongoDB you can..."
"Nope, hierarchal keys, remember?"
"Well in Redis you can..."
"Nope, hierarchal keys, remember?"
hash key | range key | attributes
------------------------------------------------------------------
day | timestamp | series, episode, a bunch of attributes


Leaking abstraction!
Hot hash key --> hot instance


fsck do you doAmazon's recommended way:
hash key | range key | attributes
-----------------------------------------------------------------
timestamp + random token | session ID | series, episode, etc.
Cannot query timeseries data without doing EMR jobs.
1 results = []
2 for i in range(9999):
3 key = str(my_timestamp) % i
4 results.append(table.query(hash_key=key))




![]()
For each check, if number of tokens exceeds limit, then shut off the oldest stream the next time it checks in. The video player then complains to the user.
hash key | range key | attributes
-----------------------------------------------------------------
user ID | browser-identifying-GUID | timestamp




1 dyconn=boto.connect_dynamodb(AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY)
2
3 if now.hour > 6: #check for raising limit
4 pMetric = Metric.replace('Consumed', 'Provisioned')
5 provo = c.list_metrics("", {'TableName': tablename},
6 pMetric,
7 "AWS/DynamoDB")
8 threshold=provo[0].query(start, end, 'Sum','Count',60)[0]['Sum']
9 tperc=count/threshold
10 if tperc > .80:
11 provset[READ_CAP] = dyconn.describe_table(tablename)\
12 ['Table'][PROVISIONED][READ_CAP]
13 provset[WRITE_CAP] = dyconn.describe_table(tablename)\
14 ['Table'][PROVISIONED][WRITE_CAP]
15 provset[pMetric]=threshold*1.2
16 table_name = dyconn.get_table(tablename)
17 dyconn.update_throughput(table_name,
18 provset[P_READ_CAP],
19 provset[P_WRITE_CAP])


User, series, and episode number make up the schema.
hash key | range key | attributes
----------------------------------------------------------------------
user | series_id.episode_id | video timestamp, watched datetime
Easy to get where a user is in a given video.
1 results = table.query(hash_key=user,
2 range_key=EXACT('%s.%s' % (series.episode)))
Easy to get all episodes for a series.
1 results = table.query(hash_key=user,
2 range_key=BEGINS_WITH('%s.' % (series,)))
Get most-recently watched for a Series is not too bad either.
The Amazon-recommended way:
UserEpisode table
hash key | range key | attributes
----------------------------------------------------------------------
user | series_id.episode_id | video timestamp, watched datetime
MostRecentlyWatchedEpisode table
hash key | range key | attributes
----------------------------------------------------------------------
user | series_id | episode_id, video timestamp, watched datetime
Where is this user in a given video?
1 table = UserEpisodeTable.get_table()
2 results = table.query(hash_key=user,
3 range_key=EXACT('%s.%s' % (series.episode)))
What is the most recently-watched episode for this user for this series?
1 table = MostRecentlyWatchedEpisode.get_table()
2 results = table.query(hash_key=user,
3 range_key=EXACT('%s' % (series,)))
What are the most recently-watched episodes for this user for all series?
1 table = MostRecentlyWatchedEpisode.get_table()
2 results = table.query(hash_key=user,
3 range_key=BEGINS_WITH(''))
MostRecentlyWatchedSeries table
hash key | attributes
----------------------------------------
user | ordered list of series_ids
Exporting via EMR can be done, but the production migration would be tough.
1 import csv
2 import boto
3 from multiprocessing import Pool
4
5
6 def write_data(filename):
7 """
8 This will be called by __main__ for each process in our Pool.
9 Error handling and logging of results elided.
10 Don't write production code like this!
11 """
12 conn = boto.connect_dynamodb(aws_access_key_id=MY_ID,
13 aws_secret_access_key=MY_SECRET)
14 table = conn.get_table('my_table_name')
15
16 with open(filename, 'rb') as f:
17 reader = csv.reader(f)
18 items = []
19 for row in reader:
20 dyn_row = table.new_item(hash_key='{}',format(row[0]),
21 attrs = {'series': row[1],
22 'episode': row[2],
23 'timestamp': row[3],
24 'moddt': row[4] })
25 items.append(dyn_row)
1 if len(items) > 25:
2 batch_items = items[:25]
3 batch_list = conn.new_batch_write_list()
4 batch_list.add_batch(table, batch_items)
5 response = conn.batch_write_item(batch_list)
6 if not response['UnprocessedItems']:
7 items = items[25:]
8 else:
9 unprocessed = [ ui['PutRequest']['Item']['user']
10 for ui in
11 response['UnprocessedItems']\
12 ['my_table_name']]
13 for item in batch_items:
14 if item['user'] not in unprocessed:
15 items.remove(item)
1 if __name__ == '__main__':
2 files = ['xaao','xabf','xabw',... ]
3 pool = Pool(processes=len(files))
4 pool.map(write_data, files)
For full source and notes, see:
http://0x74696d.com/posts/dynamodb-batch-uploads

tim gross | @0x74696d
http://0x74696d.com/slides/falling-in-and-out-of-love-with-dynamodb.html
These slides use `landslide`, press P to get presenter's notes
| Table of Contents | t |
|---|---|
| Exposé | ESC |
| Full screen slides | e |
| Presenter View | p |
| Source Files | s |
| Slide Numbers | n |
| Toggle screen blanking | b |
| Show/hide slide context | c |
| Notes | 2 |
| Help | h |