Social Hash: An Assignment Framework for Optimizing Distributed Systems Operations on Social Networks (NSDI '16) PDF

Document Details

NimbleBandoneon

Uploaded by NimbleBandoneon

دانشگاه پیام نور

2016

USENIX

Alon Shalita, Brian Karrer, Igor Kabiljo, Arun Sharma, Alessandro Presta, Aaron Adcock, Herald Kllapi, and Michael Stumm

Tags

distributed systems social networks optimization computer science

Summary

This paper presents the Social Hash framework for optimizing distributed systems operations on social networks, specifically addressing the assignment of objects to components. The framework uses a two-level approach, decoupling optimization from adaptation, to achieve efficiency in large-scale systems. The authors demonstrate the framework's effectiveness using two real-world applications.

Full Transcript

Social Hash: An Assignment Framework for Optimizing Distributed Systems Operations on Social Networks Alon Shalita, Brian Karrer, Igor Kabiljo, Arun Sharma, Alessandro Presta, and Aaron Adcock, Facebook; Herald Kllapi, University of Athens; Michael Stumm, University of To...

Social Hash: An Assignment Framework for Optimizing Distributed Systems Operations on Social Networks Alon Shalita, Brian Karrer, Igor Kabiljo, Arun Sharma, Alessandro Presta, and Aaron Adcock, Facebook; Herald Kllapi, University of Athens; Michael Stumm, University of Toronto https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/shalita This paper is included in the Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16). March 16–18, 2016 Santa Clara, CA, USA ISBN 978-1-931971-29-4 Open access to the Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16) is sponsored by USENIX. Social Hash: an Assignment Framework for Optimizing Distributed Systems Operations on Social Networks Alon Shalita† , Brian Karrer† , Igor Kabiljo† , Arun Sharma† , Alessandro Presta† , Aaron Adcock† , Herald Kllapi∗ , and Michael Stumm§ † Facebook {alon,briankarrer,ikabiljo,asharma,alessandro,aadcock}@fb.com ∗ University of Athens [email protected] § University of Toronto [email protected] Abstract of edges, and it consumes many hundreds of petabytes of How objects are assigned to components in a distributed storage space. system can have a significant impact on performance The information presented to Facebook users is pri- and resource usage. Social Hash is a framework for marily the result of dynamically generated queries on the producing, serving, and maintaining assignments of ob- Social Graph. For instance, a user’s home profile page jects to components so as to optimize the operations contains the results of hundreds of dynamically triggered of large social networks, such as Facebook’s Social queries. Given the popularity of Facebook, the Social Graph. The framework uses a two-level scheme to de- Graph must be able to service well over a billion queries couple compute-intensive optimization from relatively a second. low-overhead dynamic adaptation. The optimization at The scale of both the graph and the volume of queries the first level occurs on a slow timescale, and in our ap- makes it necessary to use a distributed system design for plications is based on graph partitioning in order to lever- implementing the systems supporting the Social Graph. age the structure of the social network. The dynamic Designing and implementing such a system so that it op- adaptation at the second level takes place frequently to erates efficiently is non-trivial. adapt to changes in access patterns and infrastructure, A problem that repeatedly arises in distributed sys- with the goal of balancing component loads. tems that serve large social networks is one of assign- We demonstrate the effectiveness of Social Hash with ing objects to components; for example, assigning user two real applications. The first assigns HTTP requests requests to compute servers (HTTP request routing), or to individual compute clusters with the goal of minimiz- assigning data records to storage subsystems (storage ing the (memory-based) cache miss rate; Social Hash de- sharding). How such assignments are made can have creased the cache miss rate of production workloads by a significant impact on performance and resource us- 25%. The second application assigns data records to stor- age. Moreover, the assignments must satisfy a wide age subsystems with the goal of minimizing the number range of requirements: e.g., they must (i) be amenable of storage subsystems that need to be accessed on multi- to quick lookup, (ii) respect component size constraints, get fetch requests; Social Hash cut the average response and (iii) be able to adapt to changes in the graph, usage time in half on production workloads for one of the stor- patterns and hardware infrastructure, while keeping the age systems at Facebook. load well balanced, and (iv) limit the frequency of as- signment changes to prevent excess overhead. 1 Introduction The relationship between the data of the social net- work and the queries on the social network is m : n — Almost all of the user-visible data and information a query may require several data items and a data item served up by the Facebook app is maintained in a sin- may be required by several queries. This makes finding gle directed graph called the Social Graph [2, 34, 35]. a good assignment of objects to components non-trivial; Friends, Checkins, Tags, Posts, Likes, and Comments finding an optimal solution for many objective functions are all represented as vertices and edges in the graph. As is NP Hard. Moreover, a target optimization goal, such, the graph contains billions of vertices and trillions captured by an objective function, may conflict with the USENIX Association 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16) 455 Components   adaptation in the dynamic assignment step. Our solutions (e.g.,  compute  clusters  or   to the assignment problem rely on being able to benefi- assignment   storage  subsystems)   dynamic     cially group together relatively small, cohesive sets of objects in the Social Graph. In the optimizations per- Groups   formed by the static assignment step, we use graph par- assignment   titioning to extract these sets from the Social Graph or static     from prior access patterns. Optimization methods other than graph partitioning could be used interchangeably, but graph partitioning is expected to be particularly ef- Objects  (e.g.,  data  records  or  HTTP  requests)   fective in the context of social networks, because most Figure 1: Social Hash Abstract Framework requests are social in nature where users that are socially close tend to consume similar data. The Social in So- goal of keeping the loads on the components reasonably cial Hash reflects this essential idea of grouping socially well balanced. In the next subsection, we propose a two- similar objects together. level framework that allows us to trade off these two con- flicting objectives. Contributions This paper describes the Social Hash framework for as- Social Hash Framework signing objects to components given scenario-dependent We have developed a general framework that accommo- optimization objectives, while satisfying the require- dates the HTTP request routing and storage sharding ex- ments of fine-grained load balancing, assignment stabil- amples mentioned above, as well as other variants of the ity, and fast lookup in the context of practical difficulties assignment problem. In our Social Hash framework, the presented by changes in the workload and infrastructure. assignment of objects (such as users or data records) to The Social Hash framework and the two applications components (such as compute clusters or storage subsys- described in this paper have been in production use at tems) is done in two steps. (See Fig. 1.) Facebook for over a year. Over 78% of Facebook’s In the first step, each object is assigned to a group, “stateless” Web traffic routing occurs with this frame- where groups are conceptual entities representing clus- work, and the storage sharding application involves tens terings of objects. Importantly, there are usually many of thousands of storage servers. The framework has also more groups than components. This assignment is based been used in other settings (e.g., to distribute vertices in a on optimizing a given, scenario-dependent, objective graph processing system, and to reorder data to improve function. For example, when assigning HTTP requests compression rates). We do not describe these additional to compute clusters, the objective function may seek to applications in this paper. minimize the (main memory) cache miss rate; and when The three most important contributions we make in assigning data records to disk subsystems, the objective this paper are: function may seek to minimize the number of disk sub- 1. the two-step assignment hierarchy of our frame- systems that must be contacted for multi-get queries. Be- work that decouples (a) optimization on the Social cause this optimization is typically computationally in- Graph or previous usage patterns from (b) adapta- tensive, objects are re-assigned to groups only periodi- tion to changes in the workload and hardware in- cally and offline (e.g., daily or weekly). Hence, we refer frastructure; to this as the static assignment step. In the second step, each group is assigned to a com- 2. our use of graph partitioning to exploit the structure ponent. This second assignment is based on inputs of the social network to optimize HTTP routing in from system monitors and system administrators so as to very large distributed systems; rapidly and dynamically respond to changes in the sys- 3. our use of query history to construct bipartite graphs tem and workload. It is able to accommodate compo- that are then partitioned to optimize storage shard- nents going on or offline, and it is responsible for keep- ing. ing the components’ loads well balanced. Because the assignments at this level can change in real time, we re- With respect to (1), the use of a multi-level scheme for fer to this as the dynamic assignment step. allocating resources in distributed systems is not new, not A key attribute of our framework is the decoupling of even when used with graph partitioning. In par- optimization in the static assignment step, and dynamic ticular, some multi-tenant resource allocation schemes 456 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16) USENIX Association have used approaches that are in many respects similar eral records, and a record may be consumed by several to the one being proposed here [19, 26, 27, 28]. How- queries. For example, if the dataset consists of recent ever, the specifics of our approach, especially as they re- posts produced by all the users, a typical query might late to Facebook’s operating environment and workload, fetch the recent posts produced by a user’s friends. are sufficiently interesting and unique to warrant a ded- The assignment of data records to storage subsystems icated discussion and analysis. Regarding (2), edge-cut determines the number of hosts a query needs to commu- based graph partitioning techniques have been used for nicate with to obtain the required data. A common opti- numerous optimization applications, but to the best of mization is to group requests destined to the same storage our knowledge not for making routing decisions to re- subsystem and issue a single request for all of them. Ad- duce cache miss rates. Similarly, for (3), graph partition- ditionally, since requests to different storage subsystems ing has previously been applied to storage sharding , are processed independently, they can be sent in paral- but partitioning bipartite graphs based on prior access lel. As a result, the latency of the slowest request will patterns is, as far as we know, novel. determine the latency of a multi-get query, and the more We show that the Social Hash framework enables sig- hosts a query needs to communicate with, the higher the nificant performance improvements as measured on the expected latency (as we show in Section 6.1). It is thus production Social Graph system using live workloads. desirable to choose a data record assignment scheme that Our HTTP request routing optimization cut the cache collocates the data required by similar queries within a miss rate by 25%, and our storage sharding optimization small number of storage subsystems. cut the average response latency in half. 3 The assignment problem 2 Two motivating example applications Assigning objects to system components is a challeng- In this section, we provide more details of the two exam- ing part of scaling an online distributed system. In this ples we mentioned in the Introduction. We discuss and section, we abstract the essential features of our two mo- analyze these applications in significantly greater detail tivating examples to formulate the problem we solve in in later sections. this paper. HTTP request routing optimization. The purpose of HTTP request routing is to assign HTTP requests to com- pute clusters. When a cluster services a request, it fetches 3.1 Requirements any required data from external storage servers, and then caches the data in a cluster-local main memory-based We have the following requirements: cache, such as TAO or Memcache , for later reuse Minimal average query response time: User satis- by other requests. For example, in a social network, a faction can improve with low query response times. client may issue an HTTP request to generate the list of Load balanced components: The better load- recent posts by a user’s friends. The HTTP request will balanced the components, the higher the efficiency of the be routed to one of several compute clusters. The server system; a poorly load-balanced system will reach its ca- will fetch all posts made by the user’s friends from ex- pacity earlier and in some cases may lead to increased ternal databases and cache the fetched data. How HTTP latencies. requests are assigned to compute clusters will affect the Assignment stability: Assignments of objects to cache hit rate (since a cached data record may be con- components should not change too frequently in order sumed by several queries). It is therefore desirable to to avoid excessive overhead. For example, reassigning a choose a HTTP request assignment scheme which as- query from one cluster to another may lead to extra (cold) signs requests with similar data requirements to the same cache misses at the new cluster. compute cluster. Fast lookup: Low latency lookup of the object- Storage sharding optimization. The purpose of stor- component assignment is important, given the online na- age sharding is to distribute a set of data records across ture of our target distributed system. several storage subsystems. A query which requires a certain record must communicate with the unique host that serves that record.1 A query may consume sev- 3.2 Practical challenges 1 To simplify our discussion, we disregard the fact that data is typi- Meeting the requirements listed above is challenging for cally replicated across multiple storage servers. a variety of reasons: USENIX Association 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16) 457 Scale: The assignment problem typically requires Objects are first assigned to groups in a optimization- assigning a large number of objects to a substantially based static assignment that is updated on a slow smaller number of components. The combinatorial ex- timescale of a day to a week. Groups are then assigned plosion in the number of possible assignments prevents to components using an adaptation-based dynamic as- simple optimization methods from being effective. signment that is updated on a much faster timescale. Effects of similarity on load balance: Colocating Dynamic assignment is used to keep the system load- similar objects usually results in higher load imbalances balanced despite changes in the workload or changes in than when colocating dissimilar objects. For example, the underlying infrastructure. This two-level design is similar users likely have similar hours of activity, brows- intended to accommodate the disparate requirements and ing devices, and favorite product features, leading to load challenges of efficiently operating a huge social network, imbalance when assigning similar users to the same com- as described in Section 3. pute clusters. Below, we give more concrete details on the abstract Heterogenous and dynamic set of components: framework, how it is implemented, and how it is used. Components are often heterogeneous and thus support In Sections 5 and 6 we will become even more concrete different loads. Further, the desired load on each compo- and present specific implementation issues for our two nent can change over time; e.g., due to hardware failure. examples. We begin by presenting our rationale for using Finally the set of components will change over time as a two-level design. new hardware is introduced and old hardware removed. Dynamic workload: The relationship between data and queries can change over time. A previously rarely 4.1 Rationale accessed data record could become popular, or new types of queries could start requesting data records that were Our two-level approach for assigning objects to compo- previously not accessed. This can happen, for example, if nents is motivated by the observation that there is a con- friendship ties in the network are introduced or removed, flict between the objectives of optimization and adapta- or if product features change their data consumption pat- tion. In theory, one could assign objects to components tern. directly, resulting in only one assignment step. However, Addition and removal of objects: Social networks this would not work well in practice because of diffi- change and grow constantly, so the set of objects that culties adapting to changes: as mentioned, component must be assigned changes over time. For example, users loads often change unpredictably; components are added may join or leave the service. or removed from the system dynamically; and the sim- ilarity of objects that are naturally grouped together for The magnitude and relative importance of these prac- optimization leads to unbalanced utilization of resources. tical challenges will differ depending on the distributed Waiting to rerun the assignment algorithm would leave system being targeted. For Facebook, the scale is enor- the system in a suboptimal state for too long, and chang- mous; similar users do have similar patterns; and het- ing assignments on individual objects without re-running erogeneous hardware is prevalent. On the other hand, the assignment algorithm would also be suboptimal. changes to the graph occur at a (relatively) modest rate (in part because we often only consider subgraphs of the An assignment framework must therefore address both Social Graph); and rate of hardware failures is reason- the optimization and adaptation objectives, and it must ably constant and predictable. offer enough flexibility to be able to shift emphasis be- tween these competing objectives at will. With a two- level approach, the static level optimizes the assignment 4 Social Hash Framework to groups where, from the point of view of optimization, the group is treated as a virtual component. The dynamic In this section, we propose a framework called the Social level adapts to changes by assigning groups to compo- Hash Framework which comprises a solution to the as- nents. Multiple groups may be assigned to the same signment problem and, moreover, addresses the practical component; however, all objects in the same group are challenges listed above. guaranteed to be assigned to the same component. (See In Section 1 we introduced the abstract framework Figure 1.) As such, what is particularly propitious about with objects at the bottom, (abstract) groups in the mid- our architecture is that dynamic reassignment of groups dle, and components at the top. Recall that objects are to components does not negate the optimization step be- queries, users, or data records, etc., and components are cause objects in a group remain collocated to the same computer clusters, or storage subsystems, etc.. component, even after reassignment. 458 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16) USENIX Association We are able to seamlessly shift emphasis between   monitoring   graph   static optimization and dynamic adaptation by means of   specifications   Graph   Dynamic     info   Partitioning   Assignment   operator   the parameter n, the ratio of number  of groups to number   console     of components; that is n := |G| |C|. When n = 1, the   Social  Hash  Tbl     Assignment  Tbl   emphasis is entirely on the static optimization. There is   g   c   a 1 : 1 correspondence between groups and components.   key   group     As noted above, this may not work well for some applica-     led Request   Lookup   tions because it may not be sufficiently adaptive. When f ai   group   n  1, we trade off optimization for increased adapta-     tion. When n is too large, the optimization quality may   key   Missing  key     assignment   be severely degraded, and the overhead of dynamic as-     signment may be prohibitive. Clearly, the choice of n,   and thus the tradeoff between optimization and adapta-   Figure 2: Social Hash Architecture tion, is best selected on a per-application basis; as we     show in later sections, some applications require less ag-   When a client wishes to look up which component an gressive adaptation than others, allowing more emphasis   object has been assigned   to, it will do so in two steps: to be placed on optimization. first the object key is used to index into the Social Hash Table to obtain the target group, g; second, g is used to 4.2 Framework Overview index into the Assignment Table to obtain the component id c. This is shown in Figure 2. In this subsection, we describe the main elements of Because the Social Hash Table is constructed only pe- the Social Hash framework, as depicted in Fig. 2: the riodically, it is possible that a target key is missing in static assignment algorithm, the dynamic assignment al- the Social Hash table; for example, the key could re- gorithm, the lookup method, and the missing key assign- fer to a new user or a user that has not had any activity ment. In the discussion that follows it is useful to note in the recent past (and hence is not in the access log). that objects are uniquely identified by a key. When an object key is not found in the Social Hash Ta- The static assignment algorithm generates a static ble, then the Missing Key Assignment rule does the ex- mapping from objects to groups using the following in- ception handling and assigns the object to a group on the put: (i) a context dependent graph, which in our work fly. The primary requirement is that these exceptional as- can be either a unipartite graph (e.g., friendship graph) signments are generated in a consistent way so that sub- or a bipartite graph based on access logs (e.g., relating sequent lookups of the same key return the same group. queries and accessed data records); (ii) type of object that Eventually these new keys will be incorporated into the is to be assigned to groups (e.g. data records, users, etc); Social Hash Table by the static partitioning algorithm. (iii) an objective function; (iv) number of groups; and (v) permissible imbalance between groups. The output of the static partitioning algorithm is a hash table of (key, 4.3 Static assignment algorithm group) pairs, indexed by key. We refer to this hash table We use graph partitioning algorithms to partition objects as the Social Hash Table.2 into groups in the static assignment step. Graph par- The dynamic assignment uses the following input: titioning algorithms have been well-studied , and a (i) current component loads, (ii) the desired maximum number of graph partitioning frameworks exist [5, 18]. load per component, and possibly (iii) the historical loads However, social network graphs, like Facebook’s Social per group. The desired load for each component is pro- Graph, can be huge compared to what much of the ex- vided by system operators and monitoring systems, and isting literature contemplates. As a result, an approach the historical loads induced by each group can be de- is needed that is amenable to distributed computation on rived from system logs. As the observed and desired distributed memory systems. We built our custom graph loads change over time, the dynamic assignment shifts partitioning solution on top of the Apache Giraph graph groups among components to balance the load. The out- processing system , in part because of its ability to par- put of the dynamic assignment is a hash table of (group, tition graphs in parallel; other graph processing systems component) pairs, called the Assignment Table.2 could have also potentially been used [10, 11, 20]. 2 In practice, any key-value store that supports fast lookups can be The basic strategy in obtaining good static assign- used. We describe it as a hash table for ease of comprehension. ments is the following graph partitioning heuristic. We USENIX Association 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16) 459 assume the algorithm begins with an initial (weight-) the best results. Factors that may affect the the choice of balanced assignment of objects to groups represented as load balancing strategy include: pairs (v, gv ), where v denotes an object and gv denotes the Accuracy in predicting future loads: Low pre- group to which v is initially assigned. Next, for each v, diction accuracy favors a strategy with a high group-to- we record the group g∗v that gives the optimal assignment component ratio (e.g.,  1, 000) and groups being as- for v to minimize the objective function, assuming all signed to components randomly. This is the strategy that other assignments remain the same. This step is repeated is used for HTTP routing. On the other hand, the amount for each object to obtain a list of pairs (v, g∗v ). Each ob- of storage used by data records is easier to predict (in ject can be processed in parallel. Finally, via a swap- our case), and hence warrants a low group-to-component ping algorithm, as many reassignments of v to g∗v are car- ratio and non-random component assignment. ried out under the constraint that group sizes remain un- Dimensionality of loads: A system requiring bal- changed within each iteration; the swapping can again be ancing across multiple different load dimensions (CPU, done in parallel as long as it is properly coordinated (in storage, queries per second, etc.) favors using a high our implementation with a centralized coordinator). This group-to-component ratio and random assignment. overall process is then repeated with the new assignments Group transfer overhead: The higher the overhead taken as the initial condition for the next iteration. The of moving a group from one component to another, the above process is iterated on until it converges or reaches more one would want to limit the rate of moves between a limit on the number of iterations. components by increasing the load imbalance threshold The initial balanced assignment required by the static that triggers a move. assignment algorithm is either obtained via a random as- Assignment memory: It can be more efficient to signment (e.g., when the algorithm is run for the very assign a group back to an underloaded component it was first time) or is obtained from the output of the previous previously assigned to in order to potentially benefit from iteration of the static assignment algorithm modulo the the residual state that may still be present. This favors re- newly added objects that are assigned randomly. membering recent assignments, or using techniques sim- The above procedure manages to produce high quality ilar to consistent hashing. results for the graphs underlying Facebook operations in Finally, we note that load balancing strategies used in a fast and scalable manner. Within a day, a small cluster other domains will need to be adapted to the Social Hash of a few hundred machines is able to partition the friend- framework; e.g., load is transferred from one component ship graph of over 1.5B+ Facebook users into 21,000 bal- to another in increments of a group; and the load each anced groups such that each user shares her group with at group incurs is not homogeneous, in part because of the least 50% of her friends. And the same cluster is able to similarity of objects within groups. update the assignment starting from the previous assign- ment within a few hours, easily allowing a weekly (or even daily) update schedule. Finally, it is worth point- 5 Social Hash for Facebook’s Web Traffic ing out that the procedure is able to partition the graph Routing into tens of thousands of groups, and it is amenable to maintaining stability, since each iteration begins with the In this section, we describe how we applied the Social previous assignment and it is easy to limit the movement Hash framework to Facebook’s global web traffic routing of objects across groups. to improve the efficiency of large cache services. This is Facebook’s largest application using the framework and We have successfully used the above heuristic on both has been in production for over a year. unipartite and bipartite graphs, as we describe in more Facebook operates several worldwide data centers, detail in Sections 5 and 6. each divided into front-end clusters containing web and cache tiers, and back-end clusters containing database 4.4 Dynamic assignment and service tiers. To fulfill an HTTP request, a front-end web server may need to access databases or services in The primary objective of dynamic assignment is to keep (possibly remote) back-end clusters. The returned data is component loads well balanced despite changes in access cached within front-end cache services, such as TAO patterns and infrastructure. Load balancing has been well or Memcache. Clearly, the lower the cache miss researched in many domains. However, the specific load rate, the higher the efficiency of hardware usage, and the balancing strategy used for our Social Hash framework lower the response times. may vary from application to application so as to provide In addition, to reduce latencies for users, Facebook 460 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16) USENIX Association has “Point-of-Presence” (PoP) units around the world: For the dynamic assignment step, we kept the existing small-scale computational centers which reside close to consistent hash scheme, which is oblivious to the type of users. PoPs are used for multiple purposes, includ- identifier it receives as input (either user- or group-id). ing peering with other network operators and media To be able to make an HTTP request routing decision caching. When an HTTP request to one of Face- at run time, it is necessary to access both the Social Hash book’s services is issued, the request will first go to a Table and the Assignment Table. The latter is computed nearby PoP. A load balancer in the PoP then routes the on-the-fly using the consistent hash mechanism, which request to one of the front-end clusters over fast commu- requires a fairly small map between clusters and their nication channels. weights; it is therefore easy to hold the map in the POP memories. The former, however, is large, consuming several gigabytes of storage space when uncompressed. 5.1 Prior strategy We considered storing the Social Hash Table close to the Prior to using Social Hash, routing decisions were PoP (in its own memory or in a nearby storage service), based on user identifiers, using a consistent hashing but decided not to do so due to added PoP complexity, scheme. To make a routing decision, the user identi- fault tolerance considerations, and limited PoP resources fier was extracted from the request, where it was encoded that could be put to better use by other services. We also within the request persistent attributes (i.e., cookie), and considered sending a lookup request to a data center, but then used to index into a consistent hash ring to obtain rejected this idea due to latency concerns. the cluster id. The segments of the consistent hash ring Instead, we encode the user assigned group within the corresponded in number and weight to the front-end clus- request persistent attributes (i.e., as a cookie) and de- ters. The ring’s weights were dynamic and could be code it to make a routing decision when a request arrives. changed at any time, allowing dynamic changes to the Requests that do not have the group-id encoded in the cluster’s traffic load. The large number of users in com- header are routed to a random front-end cluster, where parison to the small number of clusters, along with the the session creation mechanism accesses a local copy of random nature of the hash ring, ensured that each clus- the Social Hash Table to fetch the group assigned to the ter received a homogeneous traffic pattern. With fixed user. Because the Social Hash Table is updated once a cluster weights, a user would repeatedly be routed to the week, group-ids in the headers may become stale. For same cluster, guaranteeing high hit rates for user-specific this reason, a user request will periodically (at least once data. The consistent nature of the ring also ensured an hour) trigger an update process where the group-id is that changes to cluster weights resulted in relatively mi- updated with its latest value from the Social Hash Ta- nor changes to the user-to-cluster mapping, reducing the ble. This allows long lasting connections to receive fresh number of cache misses after such changes. routing information. Our design eliminates the complexities and overhead of a Social Hash Table lookup at the PoPs, requiring just 5.2 Social Hash implementation a single header read instead. The design is also more re- For the Social Hash static assignment, we used a uni- silient to failure, because even if the data store providing partite graph with vertices representing Facebook’s users the Social Hash Table is down, group-id’s will mostly be and edges representing the friendship ties between them. available in the request headers. We partition the graph using the edge-cut optimization For technical reasons, some requests cannot be tagged criterion, knowing that friends and socially similar users properly with either the user or the group identifier (be- tend to consume the same data, and that they are there- cause the requests may have been issued by crawlers, fore likely to reuse each other’s cached data records. bots or legacy clients). These requests are routed ran- We use a relatively large number of groups for two rea- domly, yet in a consistent manner, to one of the front- sons. First, the global routing scheme needs to be able to end clusters while respecting load constraints. In the past shift traffic across clusters in small quantities. Second, three months, 78% of the requests had a valid group-id changes in HTTP request routing will affect many sub- that could be used for routing (and those that did not were systems at Facebook, not just the cache tiers; and it is not tagged with a user-id, a group-id, or any other identi- very difficult to predict how much load each group will fier.). incur on each subsystem. Hence, we have found the best Some may argue that the decreased miss rates strategy to balance the load overall is to use many groups achieved with Social Hash leads to a fault tolerance issue, and assign the groups to clusters randomly. because the data records are less likely to be present in USENIX Association 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16) 461 traffic between front-end clusters. Repeating the process 1.00 of assigning different numbers of groups into compo- nents offline and examining the resulting imbalance on 0.75 known loads led us to use 21,000 groups on a few 10’s of clusters; our group-to-component ratio is thus quite high. Edge locality The combination of new users being added to the sys- 0.50 tem and changes to the friendship graph causes edge lo- cality to degrade over time. We measured the decrease of edge locality from an initial, random assignment of users 0.25 to one of 21,000 groups over the course of four weeks. We observed a nearly linear 0.1% decrease in edge lo- 0.00 cality per week. While small, we decided to update the 10 100 1,000 10,000 100,000 routing assignment once a week so as to minimize a no- Number of groups ticeable decrease in quality. At the same time, we did Figure 3: Edge locality (fraction of edges within groups) vs. the not observed a decrease in cache hit rate between up- number of groups for Facebook’s friendship graph. dates, implying that 0.1% is a negligible amount. The decrease in edge locality implies that a longer update multiple caches simultaneously. This could be a concern schedule would also be satisfactory, and that Social Hash as the recovery of a cache failure would overwhelm the can tolerate a long maintenance breakdown without alter- backing storage systems with excessive traffic and thus ing Facebook’s web traffic routing quality. lead to severely degraded overall performance. How- For the past three months, the Social Hash Table used ever, our experience indicates that a failure of the main- for Facebook’s routing has maintained an edge locality of memory caches within a cluster only causes a temporary over 50%, meaning half the friendships are within each load increase on the backend storage servers that stays of the 21,000 groups. This edge locality is slightly higher within the normal operational load thresholds. than the exploratory values shown in Figure 3, because we iterated longer in the graph partitioning algorithm on the production system than we did in the experiments 5.3 Operational observations from which we obtained the figure. The static assign- ment is well-balanced, with the largest group containing To get a sense of how access patterns of friends relate, at most 0.8% more users than the average group. Each we sampled well over 100 million accesses to TAO data weekly update by the static assignment step resulted in records from access logs. We found that when two users around 1.5% of users switching groups from the previ- access the same data record, there is a 15% chance they ous assignment. All of these updates were suitably small are friends. This is millions of times larger than the prob- to avoid noticeable increases in the cache miss rate when ability of two random users being friends. We conclude the updates were introduced into production. that co-locating the processing of friends’ HTTP requests as much as possible is an effective strategy. Figure 3 depicts edge locality vs. the number of groups 5.4 Live traffic experiment used to partition the 1.5B+ Facebook users. Edge lo- cality measures the fraction of “friend” edges connect- To measure the effectiveness of Social Hash-based HTTP ing two users that are both assigned to the same group routing optimization, we performed a live traffic experi- (thus, the goal of static assignment would be to maxi- ment on two identical clusters with the same hardware, mize edge locality). It is not a surprise that edge locality number of hosts and capacity constraints. These clusters decreases with the number of groups. Perhaps a bit more are typical of what Facebook uses in production. Each unexpected is the observation that edge locality remains cluster had many hundred TAO servers, which served the reasonably large even when the number of groups in- local web tier with cached social data. creases significantly (e.g., >20% with 1 million groups); For our experiment, we selected a set of groups ran- intuitively, this is because the friendship graph contains domly from the Social Hash Table. We then routed all many small relatively dense communities. We chose the HTTP requests from users assigned to these groups to smallest number of groups that would satisfy our main one “test” cluster, while HTTP requests from a same requirement for dynamic assignment, namely to be able number of other users were routed to the second, “con- to balance the load by shifting only small amounts of trol” cluster. Hence, the control cluster saw traffic with 462 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16) USENIX Association Percentage change in TAO miss rate (%) 10 5.0 Percentage change in idle CPU (%) 0 2.5 −10 0.0 −20 −2.5 −30 −5.0 Fr i Fr i Fr i Fr i t t t t e e e e u u u u n n n n n n n n d d d d Sa Sa Sa Sa Tu Tu Tu Tu Th Th Th Th Su Su Su Su Mo Mo Mo Mo We We We We Day Day Figure 4: Percentage change in TAO miss rate (left, where lower is better) and CPU idle rate (right, where higher is better) on the Social Hash cluster relative to the cluster with random assignment. Area between red dashed lines: period of the test. Orange dashed lines: traffic shifts. Green dot-dash line: Social Hash Table is updated. The values on the days traffic was shifted (Tuesday and Wednesday, respectively) are not representative attributes very similar to the traffic it received with the 6 Storage sharding prior strategy: the traffic with the prior strategy was sam- pled from all users, while the traffic for the control clus- In this section, we describe in detail how we applied ter was sampled from all users except those associated the Social Hash framework to sharded storage systems with the test cluster. We ran the experiment for 10 days. at Facebook. The assignment problem is to decide how During this time, operational changes that would affect to assign data records (the objects) to storage subsystems hit rates on the two clusters were prevented. (the components). The left hand side of Figure 4 shows the change in cache miss rate between the test and control clusters. It 6.1 Fanout vs. Latency is evident that the miss rate drops by over 25% when assigning groups to a cluster as opposed to just users. The objective function we optimize is fanout, the number The right hand side of Figure 4 shows the change in of storage subsystems that must be contacted for multi- average CPU idle rate between the test and the control get queries. We argue and experimentally demonstrate cluster. The test cluster had up to 3% more idle time that fanout is a suitable objective function, since lower compared to the control cluster. fanout is closely correlated with lower latencies. During the experiment, we updated the Social Hash Multiget queries are typically forced to issue requests Table by applying an updated static assignment. The to multiple storage subsystems, and they do so in paral- time at which this occurred is shown with a vertical green lel. As such, the latency of a multi-get query is deter- dot-dash line. We note that the cache miss rate and the mined by the slowest request. By reducing fanout, the CPU idle time are not affected by the update, demon- probability of encountering a request that is unexpect- strating that the transition process is smooth. edly slower than the others is reduced, thus reducing the Figure 5 compares the daily working set size for TAO latency of the query. This is the fundamental argument objects at both clusters. The daily working set of a clus- for using fanout as the objective function for the assign- ter is the total size of all objects that were accessed by ment problem in the context of storage sharding. An- the TAO instance on that front-end cluster at least once other argument is that lower fanout reduces the connec- during that day. The figure shows that the working set tion overhead per data record. size dropped by as much as 8.3%. To further elaborate the relevance of choosing fanout We conclude from this experiment that Social Hash as the objective function, consider this abstract scenario. is effective at improving the efficiency of the cache for Suppose 1% of individual requests to storage servers HTTP requests: fewer requests are sent to backend sys- incur a significant delay due to unanticipated system- tems, and the hardware is utilized in a more efficient way. specific issues (CPU thread scheduling delays, system USENIX Association 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16) 463 1.00 Percentage change in working set (%) 0 0.75 −4 Single call CDF 0.50 Two calls Expected −8 0.25 0.00 i i t e e u u n n n n d d Fr Fr Sa Tu Tu Th Th 0.0 0.5 t 1.0 t 1.5 t 2.0 t Su Su Mo Mo We We Day Latency Figure 5: Percentage change in daily TAO working set size on Figure 6: Cumulative distribution of latency for a single re- the Social Hash cluster relative to the cluster with random as- quest, two requests in parallel, and the expected distribution signment. The red dashed lines indicate the first and last days from two independent samples from the single request distribu- of the test where the test was running only during part of the tion, where t is the average latency of a single call day (so the values for these two days may not be representa-. tive). social network.3 The queries and data records accessed interrupts, etc.). If a query must contact 10 storage by the queries are represented by two types of vertices. servers, then one can calculate that the multi-get request A query vertex is edge-connected to a data vertex iff the has a 9.6% chance an individual sub-request will expe- query accesses the data record. The graph partitioning rience a significant delay. If the fanout can be reduced, algorithm is then used to partition the data vertices into one can reduce the probability of incurring the delay. groups so as to minimize the average number of groups each query is connected to. We ran a simple experiment to confirm our under- standing of the relationship between fanout and latency. Clearly, most data needs to be replicated for fault tol- We issued trivial remote requests and measured the la- erance (and other) reasons. Many systems at Facebook tency of a single request (fanout=1) and the latency of do this by organizing machines storing data into non- two requests sent in parallel (fanout=2). Figure 6 shows overlapping sets, each containing each data record ex- the cumulative latency distribution for both cases. A actly once. We refer to such a set as a replica. Since fanout of 1 results in lower latencies than a fanout of 2. assignment is independent between replicas, we will re- If we calculate the expected distribution computed from strict our analysis to scenarios with just one replica. two independent samples from the single request distri- bution, then the observed overall latency for two parallel 6.3 Simplified sharding experiment requests matches the expected distribution quite nicely. One possible caveat to our analysis of the relationship We consider the following simple experiment. We use between fanout and latency is that reducing fanout gener- 40 stripped down storage servers, where data is stored in ally increases the size of the largest request, which could a memory-based, key-value store. We assume that there increase latency. Fortunately, storage subsystems today is one data record per user. We run this setup in two have processors with many cores that can be exploited configurations. In the first, “random” configuration, data by the software to increase the parallelism in servicing a records are distributed across the 40 storage servers us- single, large request. ing a hash function, which is a common practice. In the second, “social” configuration, we use our Social Hash framework to minimize fanout. We then sampled a live traffic pattern, and issued the 6.2 Implementation same set of queries to both configurations, and we mea- sured fanout and latency. With the random configuration, For the static assignment we apply bipartite graph parti- tioning to minimize fanout. We create the bipartite graph 3 In some cases, prior knowledge of which records each query must from logs of queries from the dynamic operations of the retrieve is sufficient to create the graph. 464 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16) USENIX Association 1.00 50 0.75 40 Average fanout 30 random parallel CDF social parallel 0.50 social serial 20 0.25 10 0.00 3 10 30 100 300 1,000 3,000 10,000 0 3t 6t 9t 12 t Latency Number of groups Figure 7: Cumulative latency distribution for fetching data of Figure 8: The average fanout versus number of groups on Face- friends, where t is the average latency of a single call. book’s friendship graph when using edge locality optimization (dotted curve) and our fanout optimization (solid curve), re- the queries needed to issue requests to 38.8 storage sub- spectively. systems on average. With the social configuration, the queries needed to issue requests to only 9.9 storage sub- being able to maintain good load balance. systems on average. This decrease in fanout resulted in a In practice, fanout degrades over time. For the 40 2.1X lower average latency for the queries. group solution we used in the simplified application, we The cumulative distribution of latencies for the ran- observed a fanout increase of about 2% on average over dom and social configurations are shown in Figure 7, the course of a week. A single static assignment update where we also include the social configuration’s latency sufficed to bring the fanout back to what it was previ- distribution after disabling parallelism within each ma- ously, requiring only 1.85% of the data records to have chine. Without parallelism, the average latency is still to be moved. With such low impact, we decided static as- lower then with the random configuration, but only by signment updates were only necessary every few months, 23%. Furthermore, the slowest 25% queries on the social relying on dynamic assignment to move groups when configuration without parallelism exhibited substantially necessary in between. Even then, we found that dynamic higher latencies than the 25% slowest queries on the ran- assignment updates were not necessary more than once dom configuration. This figure confirms the importance a week on average. We used the same static assignment of using parallelism within each system. for all replicas, but made dynamic assignment decisions independently for each replica. 6.4 Operational observations After we deployed storage sharding optimized with So- 7 Related work cial Hash to one of the graph databases at Facebook, con- taining thousands of storage servers, we found that mea- As discussed in Section 4.3, graph partitioning has an ex- sured latencies of queries decreased by over 50% on av- tensive literature, and our optimization objectives, edge erage, and CPU utilization also decreased by over 50%. locality and fanout, correspond to edge cut and hyper- We attribute much of this improvement in perfor- graph edge cut. A recent review of graph partitioning mance to our method of assigning data records to groups, methods can be found online. Many graph partition- using graph partitioning on bi-partite graphs generated ing systems have been built and are available. For exam- from prior queries. The solid line in Figure 8 shows the ple, Metis [16, 18] is one that is frequently used. average fanout as a function of the number of groups A Giraph-based approach to graph partitioning called when using our method. The dotted line shows the av- “Spinner” was recently announced. Our work is dis- erage fanout when using standard edge-cut optimization tinct in that their application was optimizing batch pro- criteria on the (unipartite) friendship graph. cessing systems, such as Giraph itself, via increased edge After analyzing expected load balance, we decided on locality, and our graph partitioning system is embedded a group-to-component ratio of 8; the dynamic assign- in the Social Hash framework. ment algorithm then selects which 8 groups to assign to Average fanout in a bipartite graph, when presented as the same storage subsystem, based on the historical load a hypergraph, with vertices being one side of the bipartite patterns. This allowed us to keep fanout small, while still graph, and hyper-edges representing the vertices from USENIX Association 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16) 465 the other side, directly translates into the hypergraph par- We demonstrated the effectiveness of the Social Hash titioning problem. Hypergraph partitioning also has an framework with the HTTP request routing and storage extensive literature [4, 17], and one of the publicly avail- sharding applications. For the former, Social Hash was able parallel solutions is PHG , which can be found in able to decrease the cache miss rate by 25%, and for the Zoltan toolkit. the latter, it was able to cut the average response time in Partitioning online social networks has previously half, as measured on the live Facebook system with live been used to improve performance of distributed sys- traffic production workloads. The approaches we took tems. Ugander and Backstrom discuss partitioning large with both applications was, to the best of our knowledge, graphs to optimize Facebook infrastructure. Stein novel; i.e., graph partitioning the Social Graph to opti- considered a theoretical application of partitioning to mize HTTP request routing, and using query history to Facebook infrastructure. Tran and Zhang consid- construct bipartite graphs that are then partitioned to op- ered a multi-objective optimization problem based on timize storage sharding. edge cut motivated by read and write behaviors in online Our approach has some limitations. It was designed social networks [31, 32]. in the context of optimizing online social networks and Other research has considered data replication in com- hence will not be suitable for every distributed system. bination with partitioning for sharding data for online so- To be successful, both the static and dynamic assignment cial networks. Pujol et al. studied low fanout configura- steps rely on certain characteristics, which tend to be ful- tions via replication of data between hosts and Wang filled by social networks. For the static step, the under- et al. suggested minimizing fan-out by random replica- lying graph must be conducive to partitioning, and the tion and query optimization. Nguyen et al. con- graph must be reasonably sparse so that the partitioning sidered how to place additional replicas of users given is computationally tractable; social graphs almost always a fixed initial assignment of users to servers [22, 30]. meet those characteristics. The social graph cannot be Dean and Barroso investigated the effect of latency changing too rapidly; otherwise the optimized static as- variability on fanout queries in distributed systems, and signment will be obsolete too quickly and the attendant suggested several means to reduce its influence. Jeon et exception handling becomes too computationally com- al. argued for the necessity of parallelizing execu- plex. For the dynamic step, we assume that the workload tion of large requests, in order to tame latencies. and the infrastructure does not change too rapidly. Our contribution differs from these lines of research While we have been able to obtain impressive effi- by presenting a realized framework integrated into pro- ciency gains using the Social Hash framework, we be- duction systems at Facebook. A production application lieve there is much room for further improvement. We to online social networks is provided by Huang et al. who are currently: (i) working on improving the performance describe improving infrastructure performance for Ren- of our graph partitioning algorithms, (ii) considering us- ren through a combination of graph partitioning and data ing historical query patterns and bi-partite graph parti- replication methods. Sharding has been considered tioning to further improve cache miss rates, (iii) incorpo- for distributed social network databases by Nicoara, et rating geo-locality considerations for our HTTP routing al. who propose Hermes. optimizations, and (iv) incorporating alternative repli- cation schemes for further reducing fanout in storage sharded systems. 8 Concluding Remarks We introduced the Social Hash framework for produc- Acknowledgements ing, serving, and maintaining assignments of objects to We would like to thank Tony Savor, Kenny Lau, Venkat components in distributed systems. The framework was Venkataramani and Avery Ching for their support, Alex designed for optimizing operations on large social net- Laslavic, Praveen Kumar, Jan Jezabek, Alexander Ramirez, works, such as Facebook’s Social Graph. A key aspect of Omry Yadan, Michael Paleczny, Jianming Wu, Chunhui Zhu, Deyang Zhao and Pavan Athivarapu for helping integrate with the framework is how optimization is decoupled from dy- Facebook systems, Sanjeev Kumar, Kaushik Veeraraghavan, namic adaptation, through a two-level scheme that uses Dionysis Logothetis, Romain Thibaux and Rajesh Nishtala for graph partitioning for optimization at the first level and their feedback on early drafts, Badr El-Said and Laxman Dhuli- dynamic assignment at the second level. The first level pala for their contributions to the framework, and Dimitris leverages the structure of the social network and its us- Achlioptas for discussions on graph partitioning. We would age patterns, while the second level adapts to changes in also like to thank the reviewers for their constructive and help- the data, its access patterns and the infrastructure. ful comments. 466 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16) USENIX Association References M. Jeon, S. Kim, S.-w. Hwang, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Predictive parallelization: Taming Apache Giraph. http://giraph.apache.org/. tail latencies in Web search. In Proc. 37th Intl. ACM SI- N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Di- GIR Conference on Research & Development in Informa- mov, H. Ding, J. Ferris, A. Giardullo, S. Kulkarni, H. Li, tion Retrieval (SIGIR’14), pages 253–262, 2014. M. Marchukov, D. Petrov, L. Puzar, Y. J. Song, and D. Karger, E. Lehman, T. Leighton, R. Panigrahy, V. Venkataramani. TAO: Facebook’s distributed data store M. Levine, and D. Lewin. Consistent hashing and ran- for the Social Graph. In Proc. 2013 USENIX Annual Tech- dom trees: Distributed caching protocols for relieving hot nical Conference (USENIX ATC’13), pages 49–60, 2013. spots on the World Wide Web. In Proc. 29th Annual ACM A. Buluç, H. Meyerhenke, I. Safro, P. Sanders, and Symp. on Theory of Computing (STOC’97), pages 654– C. Schulz. Recent advances in graph partitioning. CoRR, 663, 1997. abs/1311.3144, 2013. G. Karypis and V. Kumar. A fast and high quality mul- U. V. Catalyurek and C. Aykanat. Hypergraph- tilevel scheme for partitioning irregular graphs. SIAM J. partitioning-based decomposition for parallel sparse- Sci. Comput., 20(1):359–392, Dec. 1998. matrix vector multiplication. IEEE Trans. on Parallel and G. Karypis and V. Kumar. Multilevel k-way hypergraph Distributed Systems, 10(7):673–693, 1999. partitioning. VLSI design, 11(3):285–300, 2000. C. Chevalier and F. Pellegrini. PT-Scotch: A tool for D. Lasalle and G. Karypis. Multi-threaded graph par- efficient parallel graph ordering. CoRR, abs/0907.1375, titioning. In Proc. IEEE 27th Intl. Symp. on Parallel 2009. and Distributed Processing (IPDPS’13), pages 225–236, R. Cohen, L. Katzi, and D. Raz. An efficient approxima- 2013. tion for the generalized assignment problem. Information H. Lin, K. Sun, S. Zhao, and Y. Han. Feedback-control- Processing Letters, 100(4):162–166, 2006. based performance regulation for multi-tenant applica- J. Dean and L. A. Barroso. The tail at scale. CACM, tions. In Proc. 15th Intl. Conf. on Parallel and Distributed 56(2):74–80, Feb. 2013. Systems (ICPADS’09), pages 134–141, Dec 2009. K. Devine, E. Boman, L. Riesen, U. Catalyurek, and G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, C. Chevalier. Getting started with Zoltan: A short tutorial. I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system In Proc. 2009 Dagstuhl Seminar on Combinatorial Scien- for large-scale graph processing. In Proc. ACM SIGMOD tific Computing, 2009. Also available as Sandia National Intl. Conf. on Management of Data (SIGMOD’10), pages Labs Tech Report SAND2009-0578C. 135–146, 2010. K. D. Devine, E. G. Boman, R. T. Heaphy, R. H. Bissel- C. Martella, D. Logothetis, and G. Siganos. Spin- ing, and U. V. Catalyurek. Parallel hypergraph partition- ner: Scalable graph partitioning for the cloud. CoRR, ing for scientific computing. In Proc. Intl. Parallel and abs/1404.3861, 2014. Distributed Processing Symposium (IPDPS), pages 10– K. Nguyen, C. Pham, D. Tran, F. Zhang, et al. Preserv- 20, 2006. ing social locality in data replication for online social net- J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and works. In Proc. 31st Intl. Conf. on Distributed Computing C. Guestrin. Powergraph: Distributed graph-parallel Systems Workshops (ICDCSW), pages 129–133, 2011. computation on natural graphs. In Proc. 10th Symp. on D. Nicoara, S. Kamali, K. Daudjee, and L. Chen. Her- Operating Systems Design and Implementation (OSDI mes: Dynamic partitioning for distributed social network 12), pages 17–30, 2012. graph databases. In Proc. 18th Intl. Conf. on Extend- J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. ing Database Technology (EDBT’15), pages 25–36, Mar. Franklin, and I. Stoica. Graphx: Graph processing in 2015. a distributed dataflow framework. In Proc. 11th Symp. R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, on Operating Systems Design and Implementation (OSDI H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, 14), pages 599–613, Broomfield, CO, Oct. 2014. P. Saab, D. Stafford, T. Tung, and V. Venkataramani. Q. Huang, K. Birman, R. van Renesse, W. Lloyd, S. Ku- Scaling Memcache at Facebook. In Proc. 10th USENIX mar, and H. C. Li. An analysis of facebook photo caching. Conf. on Networked Systems Design and Implementation In Proc. 24th Symp. on Operating Systems Principles (NSDI’13), pages 385–398, 2013. (SOSP’13. J. M. Pujol, V. Erramilli, G. Siganos, X. Yang, Y. Huang, Q. Deng, and Y. Zhu. Differentiating your N. Laoutaris, P. Chhabra, and P. Rodriguez. The lit- friends for scaling online social networks. In Proc. IEEE tle engine(s) that could: Scaling online social networks. Intl. Conf. on Cluster Computing (CLUSTER’12), pages SIGCOMM ComputĊommun. Rev., 40(4):375–386, Aug. 411–419, Sept 2012. 2010. USENIX Association 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16) 467 D. Shue, M. J. Freedman, and A. Shaikh. Performance isolation and fairness for multi-tenant cloud storage. In Proc. 10th USENIX Symp. on Operating Systems Design and Implementation (OSDI 12), pages 349–362, 2012. D. D. C. Shue. Multi-tenant Resource Allocation For Shared Cloud Storage. PhD thesis, Princeton University, 2014. Y. Song, Y. Sun, and W. Shi. A two-tiered on-demand re- source allocation mechanism for VM-based data centers. IEEE Trans. on Services Computing, 6(1):116–129, 2013. D. Stein. Partitioning social networks for data locality on a memory budget. Master’s thesis, University of Illinois, Urbana-Champaign, 2012. D. A. Tran, K. Nguyen, and C. Pham. S-CLONE: Socially-aware data replication for social networks. Com- puter Networks, 56(7):2001–2013, 2012. D. A. Tran and T. Zhang. Socially aware data partition- ing for distributed storage of social data. In Proc. IFIP Networking Conference, pages 1–9, May 2013. D. A. Tran and T. Zhang. S-PUT: An EA-based frame- work for socially aware data partitioning. Computer Net- works, 75:504–518, Dec. 2014. J. Ugander and L. Backstrom. Balanced label propaga- tion for partitioning massive graphs. In Proc. 6th ACM Intl. Conf. on Web Search and Data Mining (WSDM-13), pages 507–516, 2013. J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The anatomy of the Facebook Social Graph. CoRR, abs/1111.4503, 2011. V. Venkataramani, Z. Amsden, N. Bronson, G. Cabr- era III, P. Chakka,

Use Quizgecko on...
Browser
Browser