Wednesday, 28 August 2019

Greenplum



·    Greenplum (Database management system) was a big data analytics company which has it headquarter in
    San Mateo, California
·     It is a Division of Pivotal Software Industry
·     Stable latest version of Greenplum is 5.10.0, which was released on July 2018
·     It works on Linux operating system
·     Greenplum is actively developed by the Greenplum Database open source community and Pivotal
History of Greenplum
Greenplum, the company which was founded in September 2003 by Scott Yara and Luke Lonergan. Greenplum was a merger of two smaller companies: 
a.    Metapa 
b.    Didera 
Greenplum is based in San Mateo, California, and released its database management system software based on the PostgreSQLin April 2005, and called it Bizgres. Rounds of venture capital were invested during 2006 and 2007.
·     In July 2006, a partnership with Sun Microsystems was announced, during which Sun Microsystems acquired
    MySQL AB. The Bizgres project was supported through 2008, when the product was just called "Greenplum"
·     In July 2010, Greenplum was acquired by EMC Corporation and thus becoming the foundation of EMC's big 
    data software division. At the time of acquisition, Greenplum's products were the Greenplum Database, Chorus 
    (a management tool), and Data Science Labs.
·    In 2012, Greenplum became part of Pivotal Software and its database management system software was
    known as the Pivotal Greenplum Database sold through Pivotal Software
·    In 2013, a variant using Apache Hadoop to store data in the Hadoop file system called Hawq was announced
·    In 2015, the GreenplumDB and Hawq open source software projects were announced
Technology
Pivotal's Greenplum database product uses the Massively Parallel Processing (MPP) techniques. In this, each group of computers (cluster) consists of:
·       a master node (catalog information is stored)
·       standby master node
·       segment nodes (here all of the data resides) 
Segment nodes run one or more than one segments, which are modified PostgreSQL database instances and are assigned a content identifier. For every table, data is divided among the many segment nodes based on the distribution column keys which are specified by the user in the data definition language. For each segment, the content identifier present is both has a primary segment and mirror segment which are not running on the same physical host. Whenever a query enters into the master node, it is parsed, planned, and dispatched to all of the segments to execute the query plan and then it either return the requested data or insert the result of the query into the database table. 
Greenplum Version 5
In September 2017, Version 5 of Greenplum Database was released, and that includes the first iteration of the Greenplum project strategy of merging PostgreSQL and General Availability of the GPORCA Optimizer for cost-based optimization of SQL designed for the big data.



No comments:

Post a Comment

Unlocking Creativity: Coding for Kids Age 3-12

  Ignite Your Child's Imagination Through Playful Learning with Block-Based Programming! In a world driven by technology, it's nev...