翻译内容:
NoSQL Distilled 第四章 Distribution Models
作者简介:
本节摘要:
第四章我们主要说的是NoSQL运行在分布式的集群上的一些细节。今天我们主要说分布式的第一种方案就是不分布,只有一个server来部署的方案。由于新的一章的开始,我们还是先从引言开始,引言之后便是今天的内容。
The primary driver of interest in NoSQL has been its ability to run databases on a large cluster. As data volumes increase, it becomes more difficult and expensive to scale up—buy a bigger server to run the database on. A more appealing option is to scale out—run the database on a cluster of servers. Aggregate orientation fits well with scaling out because the aggregate is a natural unit to use for distribution.
催生NoSQL的主要原因就是我们需要一种数据库可以运行在庞大的集群上。随着数据量的增长,通过购买更大的服务器来纵向扩展变得越来越麻烦和昂贵。与之相对的,将服务器运行在服务器集群上的横向扩展的方式,开始倍受青睐。面向聚合的这种结构恰好就和这种横向扩展很般配,因为聚合在集群上此时就自然成了数据分布单元。
Depending on your distribution model, you can get a data store that will give you the ability to handle larger quantities of data, the ability to process a greater read or write traffic, or more availability in the face of network slowdowns or breakages. These are often important benefits, but they come at a cost. Running over a cluster introduces complexity—so it’s not something to do unless the benefits are compelling.
各式分布式模型所带来的好处也不同。有些模型的数据库存储能够处理大量的数据,有些能够处理大量网络读取或写入请求,还有一些能够更好地应对网络速度慢或网络故障等状况。这些优势都很重要,不过它们都有成本。在集群上运行数据库比较复杂,所以除非刚才说的那些优点确有必要,否则不应该随意选用。
Broadly, there are two paths to data distribution: replication and sharding. Replication takes the same data and copies it over multiple nodes. Sharding puts different data on different nodes. Replication and sharding are orthogonal techniques: You can use either or both of them. Replication comes into two forms: master-slave and peer-to-peer. We will now discuss these techniques starting at the simplest and working up to the more complex: first single-server, then master-slave replication, then sharding, and finally peer-to-peer replication.
更宽泛的说,数据分布有两条路径:复制(replication)和分片(sharding)。复制就是把同一份数据拷贝到多个节点上;而分片就是把不同数据存放到不同的节点上。复制和分片是两项“正交”的技术:既可以二选一,也可以两种同时使用。复制有主从式(master-slave)和对等式(peer-to-peer)。现在我们就从易到难来讨论这些技术:先讲单一服务器,然后讲分片,然后讲主从复制,最后讲对等复制。
现在我们进入今天的内容:
4.1 Single Server 单一服务器
The first and the simplest distribution option is the one wewould most often recommend—no distribution at all. Run the database on a single machine that handles all the reads and writes to the data store. We prefer this option because it eliminates all the complexities that the other options introduce; it’s easy for operations people to manage and easy for application developers to reason about.
第一个也是最简单的分布式方案,也是我们大部分时候推荐使用的方案,就是根本就不分布,就一个server。(译者曰:这个时候,你是不是感觉出现了哲学:不分布也是一种分布)我们把数据库放在一个server,然后让它处理对数据的存取。我们推荐这种方案是因为它比其它的分布式的那些方案要简单很多。一个server对于维护人员来说好管理,对于应用开发人员来说也是非常简单。
Although a lot of NoSQL databases are designed around the idea of running on a cluster, it can make sense to use NoSQL with a single-server distribution model if the data model of the NoSQL store is more suited to the application. Graph databases are the obvious category here—these work best in a single-server configuration. If your data usage is mostly about processing aggregates, then a single-server document or key-value store may well be worthwhile because it’s easier on application developers.
尽管很多的NoSQL数据库都是为集群而生的,但是NoSQL运行在单机上有时候也是很有感觉的,如果你的应用程序恰好需要某种NoSQL数据库的数据模型的话。(译者曰:只要某个NoSQL数据库的数据模型符合你的应用需求,运行在单机上也是很带感的。) 图数据库就是这种特别适合运行在单机上的NoSQL数据库(译者曰:我们前面的某章说图数据库就是NoSQL阵营的奇葩)。如果你的数据使用大部分时候都是处理聚合的话,那么你完全可以把文档数据库或者key-value数据库运行在单机上,而且这样对于application developers也是很简单的。
以上就是今天的内容,下回我们说分片(sharding)。
附:本节词汇及短语
key-value store 这里我们统指键值数据库。比如redis、Cassandra等。
document store 这里我们统指文档数据库。比如MongoDB、CouchDB、Apa
che Jackrabbit等。
Graph databases 图数据库。比如:neo4j等。
application developers 应用程序开发人员。
operations people 运维人员
running on a cluster 运行一个集群上