Facebook datacenter "secrets"

Facebook Open Sources Its Servers and Data Centers. Facebook has shared many details of its new server and data center design on Building Efficient Data Centers with the Open Compute Project article and project. Open Compute Project effort will bring this web scale computing to the masses. The new data center is designed for AMD and Intel and the x86 architecture.

OpenCompute-Logo-Main

You might ask Why Facebook open-sourced its datacenters? The answer is that Facebook has opened up a whole new front in its war with Google over top technical talent and ad dollars. “By releasing Open Compute Project technologies as open hardware,” Facebook writes, “our goal is to develop servers and data centers following the model traditionally associated with open source software projects. Our first step is releasing the specifications and mechanical drawings. The second step is working with the community to improve them.”

Chassis-Small

By the by this data center approach has some similarities to Google data center designs, at least to details they have published. Despite Google’s professed love for all things open, details of its massive data centers have always been a closely guarded secret. Google usually talks about its servers once they’re obsolete.

Open Compute Project is not the first open source server hardware project. How to build cheap cloud storage article shows another interesting project.

77 Comments

  1. Tomi Engdahl says:
    Emerson, Facebook team to design ‘rapid deployment data center’
    http://www.cablinginstall.com/articles/2014/05/emerson-facebook-rddc.html

    Emerson Network Power (NYSE: EMR) announced that it is working with Facebook to design and deploy the company’s second data center building in Luleå, Sweden. According to a press release, the “Luleå 2” facility will be the pilot for Facebook’s new “rapid deployment data center (RDDC)”, which was designed and developed in collaboration with Emerson Network Power’s data center design team.

    The Luleå 2 facility will span approximately 125,000 sq. ft. and Emerson will deliver over 250 shippable modules, including power skids, evaporative air handlers, a water treatment plant, and data center superstructure solutions. It will be built next to Facebook’s first data center building in Luleå, which came online in June 2013.

    “Because of our relentless focus on efficiency, we are always looking for ways to optimize our data centers including accelerating build times and reducing material use,”

    Reply
  2. Pleasanton legal advisor says:
    Thanks for finally writing about >Facebook datacenter
    “secrets” | <Liked it!
    Reply
  3. how to do affiliate marketing says:
    whoah this weblog is wonderful i love studying your articles.
    Keep up the great work! You know, a lot of persons
    are searching around for this info, you could aid them greatly.
    Reply
  4. Tomi Engdahl says:
    Facebook Acquires Security Startup PrivateCore to Better Protect Its Data Centers
    http://recode.net/2014/08/07/facebook-privatecore/

    Facebook announced on Thursday that it has acquired PrivateCore, an online security startup specifically focused on server security. Terms of the deal were not disclosed.

    The two-year-old startup will help Facebook keep its massive data centers safe from malware attacks and other forms of security breaches.

    Reply
  5. Tomi Engdahl says:
    White Paper Download: Thermal Efficiency: Facebook’s Datacenter Server Design – EEdge Vol 3 Article
    http://www.mentor.com/products/mechanical/techpubs/download?id=84315&contactid=1&PC=L&c=2014_07_09_mad_eedge_fb_article
    Reply
  6. Tomi Engdahl says:
    Facebook unveils Autoscale, its load-balancing system that achieves an average power saving of 10-15%
    http://thenextweb.com/facebook/2014/08/08/facebook-unveils-autoscale-load-balancing-system-achieves-average-power-saving-10-15/

    acebook today revealed details about Autoscale, a system for power-efficient load balancing that has been rolled out to production clusters in its data centers. The company says it has “demonstrated significant energy savings.”

    For those who don’t know, load balancing refers to distributing workloads across multiple computing resources, in this case servers. The goal is to optimize resource use, which can mean different things depending on the task at hand.

    The control loop starts with collecting utilization information (CPU, request queue, and so on) from all active servers. The Autoscale controller then decides on the optimal active pool size and passes the decision to the load balancers, which distribute the workload evenly.

    Reply
  7. Tomi Engdahl says:
    Startup Sees Enterprise Op for TLC NAND
    http://www.eetimes.com/document.asp?doc_id=1323580&amp;

    In some uses cases, Fife says, the company will employ lower-cost TLC NAND, particularly for what has been dubbed cold storage of data, and that the company’s variable code rate LDPC-based error-correcting code (ECC) memory can address endurance concerns. However, he believes, multi-level cell (MLC) is still the best option for hyperscale applications.

    Social networking giant Facebook has been vocal about wanting a low-cost flash technology, saying at last year’s Flash Summit that a relatively low-endurance, poor-performance chip would better serve its need to store some 350 million new photos a day. Not long after, Jim Handy, principal analyst at Objective Analysis, concluded that Facebook would have to settle for a hierarchy of DRAM-flash-HDD for the foreseeable future. TLC might be cheaper and viable for cold storage, but not as cheap as Facebook would like, he said.

    Reply
  8. Tomi Engdahl says:
    Facebook Experimenting With Blu-ray As a Storage Medium
    http://beta.slashdot.org/story/206305

    The discs are held in groups of 12 in locked cartridges and are extracted by a robotic arm whenever they’re needed. One rack contains 10,000 discs, and is capable of storing a petabyte of data, or one million gigabytes. Blu-ray discs offer a number of advantages versus hard drives. For one thing, the discs are more resilient: they’re water- and dust-resistant, and better able to withstand temperature swings.

    Reply
  9. Tomi Engdahl says:
    Facebook decided it couldn’t wait for companies like Arista to come out with new switches, so it will build its own. The Wedge switch (above), already being tested in production networks, will become a design Facebook will contribute to its Open Compute Project, an open-source hardware initiative.

    “We wanted to get agility because we are changing our requirements in a three-month cycle,” far faster than vendors like Arista and Broadcom can field new products, said Yuval Bachar, a former Cisco engineering manager, now working at Facebook.

    The company’s datacenters are approaching a million-server milestone, Bachar said. Today it uses 10 Gbit/s links from its top-of-rack servers, but it will need to upgrade in six to eight months, he said. The Wedge sports up to 32 40G ports.

    The most interesting thing about Wedge is its use of a small server card, currently using an x86 SoC. However it could be replaced with an ARM SoC or “other programmable elements,” Bachar said.

    Source: http://www.eetimes.com/document.asp?doc_id=1323695&page_number=2

    Reply
  10. Tomi Engdahl says:
    Facebook, the security company
    CSO Joe Sullivan talks about PrivateCore and Facebook’s homegrown security clout.
    http://arstechnica.com/security/2014/08/facebook-the-security-company/

    A VM in a vCage

    The technology PrivateCore is developing, vCage, is a virtual “cage” in the telecom industry’s usage of the word. It is software that is intended to continuously assure that the servers it protects have not had their software tampered with or been exploited by malware. It also prevents physical access to the data running on the server, just as a locked cage in a colocation facility would.

    The software integrates with OpenStack private cloud infrastructure to continuously monitor virtual machines, encrypt what’s stored in memory, and provide additional layers of security to reduce the probability of an outside attacker gaining access to virtual servers through malware or exploits of their Web servers and operating systems. If the “attestation” system detects a change that would indicate that a server has been exploited, it shuts it down and re-provisions another server elsewhere. Sullivan explained that the technology is seen as key to Facebook’s strategy for Internet.org because it will allow the company to put servers in places outside the highly secure (and expensive) data centers it operates in developed countries.

    “We’re trying to get a billion more people on the Internet,” he said. “So we have to have servers closer to where they are.”

    By purchasing PrivateCore, Facebook is essentially taking vCage off the market. The software “is not going to be sold,” Sullivan said. “They had a couple of public customers and a couple of private ones. But they took the opportunity to get to work with us because it will develop their technology faster.”

    Sullivan said the software would not be for sale for the foreseeable future. “The short-term goal is to get it working in one or two test-beds,“

    It’s been 18 months since Facebook was hit by a Java zero-day that compromised a developer’s laptop. Since then, Facebook has done a lot to reduce the potential for attacks and is using the same anomaly detection technology the company developed to watch for fraudulent Facebook user logins to spot problems within its own network and facilities.

    The Java zero-day, he said, “drove home that it’s impossible to secure an employee’s computer 100 percent.” To minimize what an attacker can get to, Facebook has moved virtually everything that employees work with into its own cloud—reducing the amount of sensitive data that resides on individual employees’ computers as much as possible.

    Reply
  11. Tomi Engdahl says:
    Facebook’s Newest Data Center Is Now Online In Altoona, Iowa
    http://techcrunch.com/2014/11/14/facebooks-newest-data-center-comes-online-in-altoona-iowa/

    Facebook today announced that its newest data center in Altoona, Iowa, is now open for business. The new facility complements the company’s other centers in Prineville, Ore; Forest City, N.C. and Luleå, Sweden (the company also operates out of a number of smaller shared locations). This is the first of two data centers the company is building at this site in Altoona.

    What’s actually more interesting than the fact that the new location is now online is that it’s also the first data center to use Facebook’s new high-performance networking architecture.

    With Facebook’s new approach, however, the entire data center runs on a single high-performance network. There are no clusters, just server pods that are all connected to each other. Each pod has 48 server racks — that’s much smaller than Facebook’s old clusters — and all of those pods are then connected to the larger network.

    Reply
  12. Tomi Engdahl says:
    Facebook’s New Data Center Is Bad News for Cisco
    http://www.wired.com/2014/11/facebooks-new-data-center-bad-news-cisco/

    Facebook is now serving the American heartland from a data center in the tiny town of Altoona, Iowa. Christened on Friday morning, this is just one of the many massive computing facilities that deliver the social network to phones, tablets, laptops, and desktop PCs across the globe, but it’s a little different from the rest.

    As it announced that the Altoona data center is now serving traffic to some of its 1.35 billion users, the company also revealed how its engineers pieced together the computer network that moves all that digital information through the facility. The rather complicated arrangement shows, in stark fashion, that the largest internet companies are now constructing their computer networks in very different ways—ways that don’t require expensive networking gear from the likes of Cisco and Juniper, the hardware giants that played such a large role when the foundations of the net were laid.

    From the Old to the New

    Traditionally, when companies built computer networks to run their online operations, they built them in tiers. They would create a huge network “core” using enormously expensive and powerful networking gear. Then a smaller tier—able to move less data—would connect to this core. A still smaller tier would connect to that. And so on—until the network reached the computer servers that were actually housing the software people wanted to use.

    For the most part, the hardware that ran these many tiers—from the smaller “top-of-rack” switches that drove the racks of computer servers, to the massive switches in the backbone—were provided by hardware giants like Cisco and Juniper. But in recent years, this has started to change. Many under-the-radar Asian operations and other networking vendors now provide less expensive top-of-rack switches, and in an effort to further reduce costs and find better ways of designing and managing their networks, internet behemoths such as Google and Facebook are now designing their own top-of-racks switches.

    This is well documented. But that’s not all that’s happening. The internet giants are also moving to cheaper gear at the heart of their massive networks. That’s what Facebook has done inside its Altoona data center. In essence, it has abandoned the hierarchical model, moving away from the enormously expensive networking gear that used to drive the core of its networks.

    Reply
  13. Tomi Engdahl says:
    Facebook Weaves New Fabric
    Smaller switches are better, says datacenter.
    http://www.eetimes.com/document.asp?doc_id=1324638&amp;

    Facebook’s new datacenter marks the latest effort to design a warehouse-sized system as a single network. The effort suggests big datacenters may switch to using more smaller, cheaper aggregation switches rather than relying on –and being limited by– the biggest, fastest boxes they can purchase.

    The company described the fabric architecture of its new Altoona, Iowa, datacenter in a Web post. It said the datacenter uses 10G networking to servers and 40G between all top-of-rack and aggregation switches.

    The news comes just weeks after rival Microsoft announced it is starting to migrate all its servers to 40G links and switches to 100G. Microsoft suggested it might use FPGAs on future systems to extend bandwidth in the future given it is surpassing what current and expected Ethernet chips will deliver.

    Big datacenters have long been pushing the edge of networking which is their chief bottleneck. The new Facebook datacenter appears to try to solve the problem using a novel topology, rather than using more expensive hardware.

    Chip and systems vendors hurriedly developed efforts for 25G Ethernet earlier this year as another approach for bandwidth-starved datacenters. They hope some datacenters migrate from 10 to 25G to the server with road maps to 50 and possibly 200G for switches.

    Facebook suggested its approach opens up more bandwidth and provides and easier way to scale networks while still tolerating expected component and system failures. It said its 40G fabric could quickly scale to 100G for which chips and systems are now available although rather expensive.

    Facebook said its new design provides 10x more bandwidth between servers inside the datacenter where traffic growth rates are highest. It said it could tune the approach to a 50x bandwidth increase using the same 10/40G links. The fabric operates at Layer 3 using BGP4 as its only routing protocol with minimal features enabled.

    “Our current starting point is 4:1 fabric oversubscription from rack to rack, with only 12 spines per plane, out of 48 possible.”

    Reply
  14. Tomi Engdahl says:
    Network lifecycle management and the Open OS
    http://www.edn.com/design/wireless-networking/4438310/Network-lifecycle-management-and-the-Open-OS

    The bare-metal switch ecosystem and standards are maturing, driven by the Open Compute Project.

    For decades, lifecycle management for network equipment was a laborious, error-prone process because command-line interfaces (CLIs) were the only way to configure equipment. Open operating systems and the growing Linux community have now streamlined this process for servers, and the same is beginning to happen for network switches.

    Network lifecycle management involves three phases: on-boarding or provisioning, production, and decommissioning. The state of network equipment is continually in flux as applications are deployed or removed, so network administrators must find ways to configure and manage equipment efficiently and cost-effectively.

    In the server world, the emergence of Linux based operating systems have revolutionized server on-boarding and provisioning. Rather than using a CLI to configure servers one at a time, system administrators can use automation tools like Chef and Puppet to store and apply configurations with the click of a mouse. For example, suppose an administrator wants to commission four Hadoop servers. Rather than using a CLI to provision each of them separately, the administrator can instruct a technician to click on the Hadoop library in Chef and provision the four servers automatically. This saves time and eliminates the potential for configuration errors due to missed keystrokes, or calling up an old driver.

    This kind of automated provisioning has been a godsend to network administrators and is fast becoming the standard method of lifecycle management for servers. But what about switches?

    Network administrators would like to use the same methodology for switches in their networks, but the historical nature of switches has held them back.

    Traditionally, network switches have been proprietary devices with proprietary operating systems. Technicians must use a CLI or the manufacturer’s own tools to provision a switch.

    Using a CLI for lots of repetitive tasks can lead to errors and lost productivity from repeating the same mundane tasks over and over again.

    Today, three manufacturers (Big Switch, Cumulus, and Pica8) are offering Linux-based OSs for bare-metal switches that allows these switches to be provisioned with standard, Linux tools.

    Application-programming interfaces (APIs) like JSON or RESTful interfaces that interact with the operating system CLI are becoming more common. APIs help make a second parallel between server and network life cycle thinking. Open APIs give developers a common framework to integrate with home grown and off the shelf management, operations, provisioning and accounting tools. Chef and Puppet are becoming common tools on the server side that also extend functionality for networking. Linux-based network OSs are open and offer the ability to run applications like Puppet in user space, simply typing “apt get install puppet” runs them natively on the switch itself.

    The three phases of network lifecycle management: on-boarding or provisioning, production, and decommissioning all benefit from this combination of CLI, Linux, and open APIs. Tools around Linux help build the base of the stack, getting Linux onto the bare metal through even more fundamental tools like zero touch provisioning. A custom script using a JSON API might poll the switch OS for accounting data while in production. And lastly, Puppet could be used to push a new configuration to the switch, in effect decommissioning the previous application in this case.

    Reply
  15. Tomi Engdahl says:
    Bank of America wants to shove its IT into an OpenCompute cloud. What could go wrong?
    Cheaper customised hardware is good for Facebook
    http://www.theregister.co.uk/2015/03/07/bank_of_america_cloud/

    Selling hardware to the financial industry used to be a cash cow for big-name server makers, but they will be getting short shrift from Bank of America, which is shifting its IT into a white-box-powered, software-defined cloud.

    “I worry that some of the partners that we work closely with won’t be able to make this journey,” David Reilly, chief technology officer at the bank, told The Wall Street Journal.

    Bank of America made the decision to slide the bulk of its backend computing systems to the cloud in 2013, and wants to have 80 per cent of its systems running in software-defined data centres within the next three years. Last year it spent over $3bn on new computing kit.

    To make the move, the bank is talking to hardware manufacturers building low-cost, no-brand cloud boxes as part of the OpenCompute Project

    “What works for a Facebook or a Google may not work in a highly regulated environment such as the one we operate within,” Reilly noted.

    Reply
  16. Tomi Engdahl says:
    Intel gives Facebook the D – Xeons thrust web pages at the masses
    System designs and software libraries published
    http://www.theregister.co.uk/2015/03/10/facebook_open_compute_yosemite/

    Open Compute Summit Facebook is using Intel’s Xeon D processors to build stacks of web servers for the 1.39 billion people who visit the social network every month.

    The OpenRack server design is codenamed Yosemite, is pictured above, and is available for anyone to use under the OpenCompute project. The hardware “dramatically increases speed and more efficiently serves Facebook traffic,” the website’s engineers boast.

    Each sled holds four boards, and on each board sits a single Xeon D-1540 processor package with its own RAM and flash storage. That D-1540 part features eight cores (16 threads) running at 2GHz, plus two 10Gb Ethernet ports, PCIe and other IO.

    Each processor consumes up to 65W, 90W for the whole server card, and 400W (TDP) for a full sled. A single rack can hold 48 sleds, which adds up to 192 Xeon Ds and 1,536 Broadwell cores. The Yosemite motherboard has a 50Gb/s multi-host network interconnect that hooks the four CPU boards through a single Ethernet port.

    The key thing is that this design is easier for Facebook’s software engineers to program. Each independent server is essentially a single socket processor with its own RAM, storage and NIC, whereas previous designs are two-socket affairs. The single-socket design gets rid of all the NUMA headaches present in a two-socket system, when writing and tuning multi-threaded code to generate and serve web pages.

    “890 million people visit Facebook on mobile every day. We have to build the infrastructure to support this.”

    Reply
  17. Tomi Engdahl says:
    Facebook’s ‘Wedge’ network switch will soon be on sale to all
    http://www.pcworld.com/article/2895252/facebooks-wedge-network-switch-will-soon-be-on-sale-to-all.html?null

    A network switch that Facebook designed for its own data centers will soon be on sale from Taiwanese manufacturer Accton Technologies, the latest sign of progress from the community hardware effort known as the Open Compute Project.

    Facebook set up the OCP about four years ago as a way for data center operators to collaborate on new hardware designs that they can then ask low-cost manufacturers to produce. Part of the goal is to get cheaper, more standardized hardware than what’s normally supplied by top-tier vendors like Cisco, Hewlett-Packard, and Dell.

    Facebook is already using the top-of-rack switch, known as Wedge, in its own data centers, and it will be available to others in the first half from Accton and its OEM partners, said Jay Parikh, head of Facebook’s infrastructure division. Cumulus Networks and Big Switch Networks will provide software for it, and Facebook has put some of its own network software on Github for companies that want to “roll their own.”

    The company won’t make money from the switch, and it’s not getting into the hardware business. By making the specification open, it hopes other OCP members will make improvements it can benefit from, Parikh said. It’s basically an open source model for hardware.

    Facebook also designed a new server, code-named Yosemite, that it will also submit to OCP. The standard two-socket servers widely used in data centers create bottlenecks for some of its applications, Parikh said, so it worked with Intel to design Yosemite, a new system that’s made up of four single-socket servers.

    The social network is using a new system-on-chip from Intel, and it created a management platform that’s server-card agnostic, so that cards can be sourced from multiple vendors. Up to 192 of the processors can fit into one computer rack, although the main benefit is the flexibility of the design.

    One of the OCP’s goals is to do away with “gratuitous differentiation”—add-on features from vendors that not all customers need but everyone has to pay for because they’re bundled with products. Those variations don’t only make products more expensive, they can also make it complex to manage multi-vendor environments.

    HP’s well known brand and worldwide support offering could encourage other enterprises to adopt the systems—and Microsoft hopes they will. It uses the Open CloudServer to run its Azure computing service

    Reply
  18. Tomi Engdahl says:
    James Niccolai / PC World:
    Facebook announces Yosemite server chassis and Wedge network switch will be available soon as part of the company’s Open Compute Project

    Facebook’s ‘Wedge’ network switch will soon be on sale to all
    http://www.pcworld.com/article/2895252/facebooks-wedge-network-switch-will-soon-be-on-sale-to-all.html?null

    Reply
  19. Tomi Engdahl says:
    Convenience trumps ‘open’ in clouds and data centers
    Sorry OpenStack and Open Compute, we’re not all Facebook
    http://www.theregister.co.uk/2015/03/17/openstack_open_compute_vs_proprietary/

    Call it OpenStack. Call it Open Compute. Call it OpenAnything-you-want, but the reality is that the dominant cloud today is Amazon Web Services, with Microsoft Azure an increasingly potent runner-up.

    Both decidedly closed.

    Not that cloud-hungry companies care. While OpenStack parades a banner of “no lock in!” and Open Compute lets enterprises roll-their-own data centres, what enterprises really want is convenience, and public clouds offer that in spades. That’s driving Amazon Web Services to a reported $50bn valuation, and calling into question private cloud efforts.

    For those enterprises looking to go cloud – but not too cloudy – OpenStack feels like a safe bet. It has a vibrant and growing community, lots of media hype, and brand names like HP and Red Hat backing it with considerable engineering resources.

    No wonder it’s regularly voted the top open-source cloud.

    The problem, however, is that “open” isn’t necessarily what people want from a cloud.

    While there are indications that OpenStack is catching on (see this Red Hat-sponsored report from IDG), there are far clearer signs that OpenStack remains a mass of conflicting community-sponsored sub-projects that make the community darling far too complex.

    As one would-be OpenStack user, David Laube, head of infrastructure at Packet, describes:

    Over the course of a month, what became obvious was that a huge amount of the documentation I was consuming was either outdated or fully inaccurate.

    This forced me to sift through an ever greater library of documents, wiki articles, irc logs and commit messages to find the ‘source of truth’.

    After the basics, I needed significant python debug time just to prove various conflicting assertions of feature capability, for example ‘should X work?’. It was slow going.

    While Laube remains committed to OpenStack, he still laments that “the amount of resources it was taking to understand and keep pace with each project was daunting”.

    Open Compute may not compute

    Nor is life much better over in Open Compute Land. While the Facebook project (which aims to open source Facebook’s datacentre designs) has the promise to create a world filled with hyper-efficient data centres, the reality is that most enterprises simply aren’t in a position to follow Facebook’s lead.

    Back in 2012, Bechtel IT exec Christian Reilly lambasted Open Compute, declaring that: “Look how many enterprises have jumped on Open Compute. Oh, yes, none. That would be correct.”

    While that’s not true – companies such as Bank of America, Goldman Sachs, and Fidelity have climbed aboard the Open Compute bandwagon – it’s still the case that few companies are in a position to capitalize on Facebook’s open designs.

    This may change, of course. Companies such as HP are piling into the Open Compute community to make it easier, with HP building a new server line based on Open Compute designs, as but one example.

    The new and the old

    One of the biggest problems with the private cloud is the nature of the workloads enterprises are tempted to run within it.

    As Bittman writes in separate research, while VMs running in private clouds have increased three-fold in the past few years, even as the overall number of VMs has tripled, the number of active VMs running in public clouds has expanded by a factor of 20.

    This means that: “Public cloud IaaS now accounts for about 20 per cent of all VMs – and there are now roughly six times more active VMs in the public cloud than in on-premises private clouds.”

    While a bit dated (2012), Forrester’s findings remain just as true today:

    Asking IT to set up a hyper-efficient Facebook-like data centre isn’t the “fastest way to get [things] done”. Ditto cobbling together a homegrown OpenStack solution. In fact, private cloud is rarely going to be the right way to move fast.

    Sure, there are other reasons, but the cloud that wins will be the cloud that is most convenient. Unless something drastic changes, that means public cloud will emerge triumphant.

    Reply
  20. Tomi Engdahl says:
    Facebook sued for alleged theft of data center design
    http://www.itworld.com/article/2902314/facebook-sued-for-alleged-theft-of-data-center-design.html

    Facebook is being sued by a British engineering company that claims the social network stole its technique for building data centers and, perhaps worse, is encouraging others to do the same through the Open Compute Project.

    BladeRoom Group (BRG) says it contacted Facebook in 2011 about using its technique, which involves constructing data centers in a modular fashion from pre-fabricated parts. It’s intended to be a faster, more energy-efficient method.

    What happened next isn’t clear, since much of the public version of BRG’s lawsuit is redacted. But it claims Facebook ended up stealing its ideas and using them to build part of a data center in Lulea, Sweden, that opened last year.

    Big data centers can cost billions to build and operate, and giants like Facebook, Google and Microsoft have been working hard to make them more efficient.

    Now, some bleeding-edge companies build data centers from prefabricated parts that are manufactured at a factory, then delivered and assembled quickly on site.

    Facebook revealed such a design at the Open Compute Project Summit last January and said it would allow it to add new capacity twice as fast as its previous approach. It shared its ideas through the OCP, which it set up for companies to collaborate on new ideas for data center infrastructure.

    Soon after, it said it had hired Emerson Network Power to apply the technique in Sweden.

    BRG claims the ideas weren’t Facebook’s.

    Reply
  21. Tomi Engdahl says:
    One day all this could be yours: Be Facebook, without being Facebook
    The pros and cons of Open Compute
    http://www.theregister.co.uk/2015/03/30/open_compute_for_the_rest_of_use/

    Data centre design is a costly business, costing Apple $1.2bn for a pair of “next-generation” carbon-neutral plants in Ireland and Denmark. Even the smallest average Joe data centre will easily cost north of $1m once the options such as multihomed networks, HVAC systems and other such critical kit is installed with redundancy, security and a thousand other things that only data centre designers think about ahead of time.

    Complexity is the enemy of efficiency and drives costs up. At this point, a lot of companies realised there had to be an easier and cheaper way to do it without paying vendor premium tax out the rear. Fortunately, in these modern times, there is: the open compute movement.

    Google and Amazon are not going to give away their secret sauce to any potential competitor, but Facebook has, open-sourcing their hardware designs with the Open Compute Project in 2011. Open Compute is still in its infancy, but holds a lot of promise for data-intensive and large, cloud-based organisations.

    Those of use outside the hyper-scale tier of web computing will be thinking: “Yeah, so what? I ‘ain’t no Google”. But Open Compute could end up helping even the smallest of compute-intensive players.

    This new infrastructure design can be seen in HP’s Moonshot and other similar systems, producing System-on-a-Chip (SOC) based infrastructure that can be swapped out at will, meaning users no longer have to unrack a huge server or pull it out to fix an issue and making the technology cheap enough to almost be disposable.

    Part of the Open Compute vision is to also support white-label brands, helping you build your own infrastructure from the ground up, thereby removing the vendor premium.

    This road isn’t feasible for anyone except the largest of vendors.

    A number of vendors – including HP, Dell, Quanta and many more – produce Open Compute servers in various configurations that are highly configurable and manageable, designed for one purpose: Open Compute nodes. This saves having to effectively roll your own compute design.

    Open Compute offers some appealing prospects, should it come off:

    Vendor agnostic management – One of the core tenets of Open Compute is that simplicity helps drive out cost

    Vendor agnostic hardware – This is probably what most people see when they think about Open Compute and some of the big savings are actually quite dull, but important.

    Every component has a use and failure is going to happen – rather than repair the hardware in situ, it’s pulled out and replaced. Such an approach also helps reduce the hardware cost, because there is no on-board redundancy and failure should have little or no effect on the services being provided.

    Quick, dynamic management – Large companies traditionally suffer from very long-winded approvals processes, sometimes meaning servers can take several weeks to be put into service.

    As with all good things, there are a certain amount of downsides that need to be understood about Open Compute. Small businesses will find their returns very limited. Open Compute is designed for data-centric companies that need to be able to scale. Less than forty servers and there will be minimal savings.

    The first, is to prepare the basics. Power and cooling are key.

    Next, understand the basics of the Open Compute model. It is designed around disposable commodity hardware. Whilst all that is good, you need to understand how to set up your software to ensure that you none of your “disposable” machines lie on the same hardware or even same rack if possible. This helps ensure that a single failure doesn’t take down your public-facing infrastructure.

    Another tip is to make space.

    Start small until you know what you’re doing. Data centres are complex environments, so don’t immediately deploy Open Compute gear into a production environment.

    Starting small translates as working in a less important dev environment first.

    Open Compute is in its infancy, but it aims to do what no other large-scale vendor is prepared to do – share technology and improvements with anyone who is interested and tweak designs to optimise them further.

    One COO, who wished to remain anonymous, had this to say about Open Compute: “The days of traditional vendor hardware are coming to an end. Our largest customers are now our most fearsome competitors. The hardware business has become a race to the bottom, where we are ‘collectively’ fighting in a battle we’ve already lost.”

    Reply
  22. Tomi Engdahl says:
    White Paper Download: Thermal Efficiency: Facebook’s Datacenter Server Design – EEdge Vol 3 Article
    http://www.mentor.com/products/mechanical/techpubs/download?id=84315&contactid=1&PC=L&c=2015_04_21_mad_eedge_fb_article#
    Reply
  23. Tomi Engdahl says:
    Rackspace in Crawley: This is a local data centre for local people
    But everywhere is 127.0.0.1 for Uncle Sam
    http://www.theregister.co.uk/2015/04/24/rackspace_data_centre/

    Rackspace has completed its Crawley data centre in West Sussex, and claims that it is among the most power-efficient in the UK.

    The new facility is 130,000 sq ft in area, and the site covers 15 acres in all. It is designed for up to 50,000 servers. The amount of power available is initially 6MW across two suites, with plans for 12MW across four suites. Its physical security includes a perimeter fence, badge readers, and fingerprint scanners.

    The data centre is scheduled to open in May, took 15 months to build, and was designed by Digital Reality with Rackspace. It will be Rackspace’s 10th data centre in the world.

    The Crawley warehouse is built to Open Compute Project (OCP) standards.

    The facility has a PUE (Power Usage Effectiveness) of 1.15, whereas Rackspace states that the UK average is 1.7 – the lower the better.

    Reply
  24. Tomi Engdahl says:
    Facebook’s Open Compute could make DIY data centres feasible
    Fitting in never looked so good
    http://www.theregister.co.uk/2015/05/07/build_v_buy_your_datacenter2/

    DIY vs COTS: Part 2 Last time I looked at the PC versus console battle as a metaphor for DIY versus Commercial Off the Shelf (COTS) data centres, and touched on the horrors of trying to run a DIY data centre.

    Since 2011, however, we’ve had the Open Compute Project, initiated by Facebook. The ideal is some kind of industry-standard data centre, with OCP members agreeing open interfaces and specs.

    Does Open Compute shift the DIY data centre story back in favour of build and against buy?

    The PC-versus-console metaphor is relevant to an examination of Open Compute. Of particular note is that after the dust had cleared, the PC gaming market settled into a sense of equilibrium.

    DIY data centre types of today are fortunate. The market as a whole has ground down the margins on servers to the point that the Open Compute Project handles most of this. For those needing a little bit more vendor testing and certification, Supermicro systems with their integrated IPKVMs are such good value for dollar that you can go the DIY route but still get most of the benefits of COTS and still keep it cheap.

    The ODMs are getting in on the deal. Huawei, Lenovo, ZTE, Xiaomi, Wiwynn/Wistron, Pegatron, Compal and Lord knows how many others are now either selling directly to customers or selling on through the channel with minimal added margin.

    Recently, it has been noted that this is affecting storage. It’s only noticeable there because – unlike servers – it’s a relatively new phenomenon. Networking is next, and I wouldn’t want to be the CEO of Cisco right about now.

    DIY data centres made easy

    The selection of ultra-low-margin servers and storage is getting better and better every month. In fact, the low-margin providers are even now certifying their solutions for various hypervisors. The near universal adoption of virtualisation combined with the sheer number of people adopting these models means that finding benchmarks, quirks, foibles and driver conflicts is now a minor research project for the average SMB.

    Put simply: DIY data centres are no longer required to recreate significant chunks of the COTS vendors’ value-add, because there is an in-between.

    Anyone willing to maintain their own spares cabinet and deal with some minor supply chain issues can use Open Compute to make DIY data centres cheaply and easily. And while that’s great for an enterprise, the value of this decreases the smaller you get.

    We also had many Sys Admins working together, pooling the resources of MSPs and individual companies until collectively we had the budget of an upper-midmarket company and the manpower resources of an enterprise. Even with the advances to the DIY market, the cost of dealing with supply chain issues makes COTS the better plan.

    A very limited number of people will know what you’re talking about if you quote an Open Compute model. Only the nerdiest of spreadsheet nerds will understand what you mean if you try to use a Supermicro model name for anything. Nearly everyone knows what’s in a Dell R710 or can discuss issues with HP Gen 9 servers in depth.

    COTS servers are the consoles of the data centre. In the fullness of time, you’ll end up paying a lot more. From not getting access to BIOS updates unless you pay for support to having to pay a licence to access the IPKVM functionality of your server’s baseband management controller, COTS servers cost. They’re a lot up front and they nickel and dime you until the bitter end.

    The collapse of COTS server margins seems inevitable. Even the proudest banner waver of ultra-high-margin servers – HP – has decided to build an Open Compute solution. Win/win for everyone, surely?

    Not quite.

    Unlike the PC-versus-console wars, the DIY-versus-COTS data centre wars are just beginning. The Open Compute Project may ultimately be the stabilising influence that provides a usable equilibrium between low margin and model stability, but we’re not quite there yet.

    HP is only dipping its toe in the water. You can’t buy their Open Compute nodes unless they really like you, and you buy lots of them. It’s their way of not losing hyperscale customers, it is not here to benefit the masses. Dell, Supermicro and so forth don’t sell Open Compute nodes and we are only just now starting to see differentiation in Open Compute designs.

    Open Compute servers are where gaming notebooks were about 10 years ago.

    Storage is lagging servers here by about two years, but is experiencing greater pressures from hyper-convergence

    Software has already advanced to the point that it doesn’t really matter if all the nodes in your virtual cluster are the same.

    When the majority of the market can be served by a “sweet spot” Open Compute server that is essentially homogenous, regardless of supplier, then DIY data centre supply chain issues evaporate.

    Hardware vendors won’t survive that level of commoditisation. They need margins to keep shareholders happy, buy executive yachts and keep up the completely unrealistic double-digit annual growth that Wall Street demands. As soon as any of the hardware-sales-reliant big names start posting consistent revenue declines, they’ll enter a death spiral and evaporate.

    Selling the hardware departments to China, as IBM has done with its x86 commodity line, will only delay this for a few years. Manufacturers in China can show growth by taking customers away from the US makers, but very soon here those US suppliers will no longer be selling hardware. Then the OEMs in China will have to compete among themselves. That battle will be vicious and there will be casualties.

    Market consolidation will occur and the handful of survivors will collectively – but not together, if you know what I mean, anti-trust investigators – put up prices.

    DIY versus COTS is an old, old debate. There is no one answer to this that will apply to all businesses. It is, however, worth taking the time to think beyond this refresh cycle and beyond just the hardware.

    Reply
  25. Tomi Engdahl says:
    Facebook scares Cisco with 6-pack network switch platform update
    http://www.cloudhub.uk.com/2385/facebook-scares-cisco-6-pack-network-switch-platform-update?utm_source=Outbrain&utm_medium=Cpc&utm_campaign=Inquirer%252BReferral&WT.mc_is=977=obinlocal

    Facebook is building a fresh network by itself to support its own operations, but, in a move that should worry networking equipment giant Cisco, is giving the software and designs for it away for free.

    While it has been ongoing for a while, Facebook has announced a new facet in the project: a networking product called “6-pack.”

    It’s funny how the social network also chose to announce the update to its networking technology, which it is hoping will challenge the networking industry, on the same day that Cisco, the market share leader of this $23 billion market reported its financial earnings.

    In its blog post, Facebook digged at Cisco by making out how “traditional networking technologies…tend to be too closed, monolithic, and iterative for the scale at which we operate and the pace at which we move”.

    The 6-pack is a switch platform that will be installed in the social network’s vision of its own scalable data centre, a vision that it says only itself can build because of its high demands.

    Facebook’s top networking engineer, Yuval Bachar, explained in a blog post facebookthat as the social network’s infrastructure has scaled, it has “frequently run up against the limits of traditional networking technologies.”

    “Over the last few years we’ve been building our own network, breaking down traditional network components and rebuilding them into modular disaggregated systems that provide us with the flexibility, efficiency, and scale we need,” he added.

    Introducing “6-pack”: the first open hardware modular switch
    https://code.facebook.com/posts/717010588413497/introducing-6-pack-the-first-open-hardware-modular-switch/

    But even with all that progress, we still had one more step to take. We had a TOR, a fabric, and the software to make it run, but we still lacked a scalable solution for all the modular switches in our fabric. So we built the first open modular switch platform. We call it “6-pack.”

    The platform

    The “6-pack” platform is the core of our new fabric, and it uses “Wedge” as its basic building block. It is a full mesh non-blocking two-stage switch that includes 12 independent switching elements. Each independent element can switch 1.28Tbps. We have two configurations: One configuration exposes 16x40GE ports to the front and 640G (16x40GE) to the back, and the other is used for aggregation and exposes all 1.28T to the back. Each element runs its own operating system on the local server and is completely independent, from the switching aspects to the low-level board control and cooling system. This means we can modify any part of the system with no system-level impact, software or hardware. We created a unique dual backplane solution that enabled us to create a non-blocking topology.

    We run our networks in a split control configuration. Each switching element contains a full local control plane on a microserver that communicates with a centralized controller. This configuration, often called hybrid SDN, provides us with a simple and flexible way to manage and operate the network, leading to great stability and high availability.

    The only common elements in the system are the sheet metal shell, the backplanes, and the power supplies, which make it very easy for us to change the shell to create a system of any radix with the same building blocks.

    If you’re familiar with “Wedge,” you probably recognize the central switching element used on that platform as a standalone system utilizing only 640G of the switching capacity. On the “6-pack” line card we leveraged all the “Wedge” development efforts (hardware and software) and simply added the backside 640Gbps Ethernet-based interconnect. The line card has an integrated switching ASIC, a microserver, and a server support logic to make it completely independent and to make it possible for us to manage it like a server.

    The fabric card is a combination of two line cards facing the back of the system. It creates the full mesh locally on the fabric card, which in turn enables a very simple backplane design.

    “6-pack” is already in production testing, alongside “Wedge” and “FBOSS.” We plan to propose the “6-pack” design as a contribution to the Open Compute Project, and we will continue working with the OCP community to develop open network technologies that are more flexible, more scalable, and more efficient.

    Reply
  26. Tomi Engdahl says:
    Julie Bort / Business Insider:
    How Facebook’s Open Compute Project became a major force in data center hardware, with hundreds of companies, including HP, Foxconn, and Goldman Sachs on board — How Facebook is eating the $140 billion hardware market — It started out as a controversial idea inside Facebook.

    How Facebook is eating the $140 billion hardware market
    http://uk.businessinsider.com/facebook-open-compute-project-history-2015-6?op=1?r=US

    It started out as a controversial idea inside Facebook. In four short years, it has turned the $141 billion data-center computer-hardware industry on its head.

    Facebook’s extraordinary Open Compute Project is doing for hardware what Linux, Android, and many other popular products did for software: making it free and “open source.”

    That means that anyone can look at, use, or modify the designs of the hugely expensive computers that big companies use to run their operations — all for free. Contract manufacturers are standing by to build custom designs and to build, in bulk, standard designs agreed upon by the group.

    In software, open source has been revolutionary and disruptive. That movement created Linux, which is the software running most data centers around the world, and Android, the most popular smartphone platform in the world.

    Jonathan Heiliger dreamed up OCP in 2011 back when he was leading Facebook’s infrastructure team

    It started off with Facebook’s data centers.

    Most companies lease space in already existing data centers. But for huge tech companies like Google, Microsoft, Apple, and Amazon, it’s more efficient to build their own.

    The trouble was, in 2011, data centers were becoming known as one of the dirtiest, carbon-spewing parts of the tech industry.

    Facebook built its state-of-the-art data center in Prineville, Oregon, where it invented ways to use less electricity. So Facebook published the Prineville designs to contribute to the green data-center movement.

    Then it occurred to Heiliger: Why not share all of the Facebook’s hardware designs?

    Heiliger argued that the technology, particularly the hardware, “is not our competitive advantage.” and that “open source should be a core tenet at Facebook.”

    There are some huge advantages to making hardware open source.

    Hardware engineers, no matter who they work for, could collaborate. Ideas would flow. New tech would be invented more quickly. Difficult tech problems are fixed faster. And everyone would to share equally in the results.

    It would be 180 degrees from the classic culture of patents and lawsuits and trade secrets that has ruled the tech industry for decades. But Facebook didn’t make hardware, so there was no risk to its business.

    Zuck was in. One argument was particularly persuasive: “A company in Mountain View thinks their tech was a differentiator. We didn’t believe that,” Heiliger says, referring to the fact that Google builds much of its own hardware and a lot of its own software and keeps most of that stuff a closely guarded secret.

    Now that OCP has become a phenomenon, Google’s top hardware-infrastructure guy (a legend in his world), Urs Hölzle, offers a begrudging respect for the project

    When asked about OCP, Hölzle told us, “It actually makes a lot of sense because it’s open source for hardware. It’s relatively basic today,” he said. “It could be the start of something a little bit deeper.”

    “It will be relevant only for the very, very large companies — for the Facebooks, the Ebays, the Microsofts.”

    That’s because Helinger did several smart things when he started this project.

    First, he hired Frank Frankovsky away from Dell to help Facebook invent hardware and to lead Open Compute Project. Frankovsky quickly became its face and biggest evangelist.

    Next, he got Intel, a much older company with lots of experience in open source, on board. Intel’s legal team set up OCP’s legal structure

    Then, he asked Goldman Sachs’ Don Duet to join the board.

    He knew they were onto something almost immediately at OCP’s first conference.

    “We thought maybe 50 people would show up.” Instead over 300 came. “That was incredible,” he remembers.

    Goldman has been happy to buy OCP servers.

    Duet says Godman will never go back to buying servers the old way. “We’ve been clear to the vendor community. There’s no reason to go backwards. We didn’t go back after adopting open-source operating systems.”

    The man told him that OCP had turned his company into a $1 billion business, with hundreds of new customers.

    “You convinced us that it was the right thing to do and it was going to be ok, and we’re not only more profitable but we see new channels of business we hadn’t seen before. It wouldn’t have happened without you,”

    Last December, Frankovsky left Facebook to launch his own OCP hardware-inspired startup, too, an optical-storage startup still in stealth. He remains the chairman of the OCP project. And there have been other startups, like Rex Computing, launched by a teenage electronics wunderkind.

    But perhaps the biggest watershed moment for OCP happened just a few weeks ago, on March 10, 2015.

    He said HP’s server unit had agreed to become an OCP contract manufacturer and had launched a new line of OCP servers.

    Both HP and Dell had been watching and involved in OCP for years, even contributing to the designs. But behind the scenes they were not completely on board.

    One day, Frankovsky hopes that Cisco will follow HP’s lead and join the open-source hardware movement.

    The open-source hardware movement will lead to its own massive hits that will totally change the industry.

    And there’s a good reason for that, says Frankovsky: “Openness always wins, as long as you do it right. You don’t want to wind up on the wrong side of this one. It’s inevitable.”

    Reply
  27. Tomi Engdahl says:
    The case against Open Compute Project Storage flotation
    OCP-S caught in no-man’s land between enterprise and hyper-scale
    http://www.theregister.co.uk/2015/07/02/open_compute_project_storage_flotation_questionable/

    Did you know there was a storage part of the Open Compute Project? If not, you do now.

    The Facebook-generated OCP aims to make good, basic hardware available for data centres at low cost, with no bezel tax and no unwanted supplier differentiation justifying high prices. Its main focus is servers, but that’s not all, as there is also a storage aspect.

    The OCP-S project covers:

    Cold Storage
    Fusion-io
    Hyve – “Torpedo” 2 x OpenU storage server that can accommodate 15 3.5″ drives in a 3 x 5 array
    OpenNVM – Open-source project for creating new interfaces to non-volatile memory
    OpenVault – 30 drives in 2U

    These seem to be limited use cases; JBODs or disk drawers, flash and archive storage.

    Web access via the hot links provided for each category is variable. Neither of the Fusion-io links for the specification and CAD models work.

    El Reg: What do you understand the status of the OCP storage (OCP-S) initiative to be?

    Michael Letschin: While a work in progress, the lack of storage industry support means the OCP-S concept is still very much a pipe dream for all but the largest webscale companies. For customers to be considering the move, it’s fair to say that they will have to have taken the leap and embraced software-defined storage (SDS) as a starting point.

    El Reg: Do you think there is a need for it?

    Michael Letschin: The concept behind the Open Compute project is completely worthwhile, but though it brings the promise of true commodity hardware to the forefront, it hinges on whether systems can be integrated easily into the current data centre.

    El Reg: Is storage flash and disk hardware developing so fast that OCP-S cannot keep up?

    Michael Letschin: No. The interfaces for these drives are still much the same, so as to allow for integration into existing infrastructures. There is no reason that OCP-S would be any different.”

    El Reg: Is storage driven so much by software that interest in OCP-S (hardware-based) items on their own is low?

    Michael Letschin: Given scale-up solutions are still the norm for enterprises, the concept of a single head storage server is not of much interest today. As scale-out becomes more commonplace, the OCP-S hardware will understandably become more appealing: the pieces become modules that are essentially just bent metal.”

    Michael Letschin: In today’s environments, yes. OCP-S assumes scale-out and this makes sense for the likes of Facebook, but it’s still early days for software-defined scale-out in the enterprise market. For example, the Open Rack standard is designed with new data centres in mind.

    Reply

Leave a Comment

Your email address will not be published. Required fields are marked *

*

*