taartech: Greenplum

· Greenplum (Database management system) was a big data analytics company which has it headquarter in
San Mateo, California

· It is a Division of Pivotal Software Industry

· Stable latest version of Greenplum is 5.10.0, which was released on July 2018

· It works on Linux operating system

· Greenplum is actively developed by the Greenplum Database open source community and Pivotal

History of Greenplum

Greenplum, the company which was founded in September 2003 by Scott Yara and Luke Lonergan. Greenplum was a merger of two smaller companies:

a. Metapa

b. Didera

Greenplum is based in San Mateo, California, and released its database management system software based on the PostgreSQLin April 2005, and called it Bizgres. Rounds of venture capital were invested during 2006 and 2007.

· In July 2006, a partnership with Sun Microsystems was announced, during which Sun Microsystems acquired
MySQL AB. The Bizgres project was supported through 2008, when the product was just called "Greenplum"

· In July 2010, Greenplum was acquired by EMC Corporation and thus becoming the foundation of EMC's big
data software division. At the time of acquisition, Greenplum's products were the Greenplum Database, Chorus
(a management tool), and Data Science Labs.

· In 2012, Greenplum became part of Pivotal Software and its database management system software was
known as the Pivotal Greenplum Database sold through Pivotal Software

· In 2013, a variant using Apache Hadoop to store data in the Hadoop file system called Hawq was announced

· In 2015, the GreenplumDB and Hawq open source software projects were announced

Technology

Pivotal's Greenplum database product uses the Massively Parallel Processing (MPP) techniques. In this, each group of computers (cluster) consists of:

· a master node (catalog information is stored)

· standby master node

· segment nodes (here all of the data resides)

Segment nodes run one or more than one segments, which are modified PostgreSQL database instances and are assigned a content identifier. For every table, data is divided among the many segment nodes based on the distribution column keys which are specified by the user in the data definition language. For each segment, the content identifier present is both has a primary segment and mirror segment which are not running on the same physical host. Whenever a query enters into the master node, it is parsed, planned, and dispatched to all of the segments to execute the query plan and then it either return the requested data or insert the result of the query into the database table.

Greenplum Version 5

In September 2017, Version 5 of Greenplum Database was released, and that includes the first iteration of the Greenplum project strategy of merging PostgreSQL and General Availability of the GPORCA Optimizer for cost-based optimization of SQL designed for the big data.

taartech

Wednesday, 28 August 2019

Greenplum

No comments:

Post a Comment

Understanding Data Science: Unlocking the Power of Data

Search This Blog