BigQuery – What, Where & How?
So, Google has rolled out its next big product, BigQuery to deal with querying big datasets for those who do not have the finances or the infrastructure and hardware to get the job done. BigQuery makes use of Google’s own ample infrastructure and hardware power to solve the problem by enabling super-quick SQL like queries against append-only tables. Once you have shifted your massive datasets to BigQuery, you can sit back and relax while Google takes care of your big data needs.
There are 2 primary ways to interact with BigQuery which include making use of a command line or browser based tool to access BigQuery or utilizing client libraries like PHP, Python or Java to make calls to the BigQuery REST API. Additionally, one can make use of several third-party tools that allow you to visualize and load your data.
BigQuery – The Modules
Now that you have an idea of exactly what Google’s BigQuery is here to do, let’s take a closer look at the core fundamentals that make up BigQuery.
Projects
Quite essentially, projects are the top-level containers in the Google Cloud Platform and are responsible for the storage of your BigQuery data along with your billing and authorized user information. Every project is assigned a unique ID and a user-friendly name by the database. Billing is done on a per-project basis, which means it is best to create a singular project on behalf of your company, which can be looked after by the accounts department in your company.
Tables
Tables in BigQuery refer to the module that contains all of your data and holds a corresponding table schematic, which chronicles information like field names, types and other data. Moreover, virtual tables that are defined by an SQL query are also supported by BigQuery.
There are primarily 3 ways that BigQuery creates your tables, which include
→ Data loaded into new table
→ Running a query and
→ Copying a table
Datasets
Datasets are essentially the modules that contain your tables and provide access to them. They also allow you to organize your tables; however prior to loading data into BigQuery, you must have a minimum of one dataset. BigQuery requires the creation of ACLs (Access Control Lists) on your datasets prior to sharing information with others. You can create the ACLs only on your datasets, not on the tables contained within.
Jobs
In BigQuery, jobs denote the tasks you assign the tool for the loading, querying, exporting or copying of data, which are then carried out by BigQuery in a non synchronized manner. The reason for this non synchronized execution of jobs is simple due to the amount of time they can take to be performed. These jobs can later be polled to find out their status or viewed at a later date via means of the Google Developers Console.
BigQuery – How do I Interface With it?
Playing around (or interfacing if you prefer the no non-sense approach) with BigQuery can be achieved through 3 different paths, which include
Load, Query and Export
First, load up your data onto BigQuery. This has to be done prior to any querying or expoerting. Once you have loaded your data onto BigQuery, you can either query it, or if you chose to take the data out, export it.
Query and View
Querying or viewing your data can only be done post loading of data onto BigQuery. After loading your data, you can view it via means of the bigquery.tabledata.list() method or the bigquery.jobs.getQueryResults() method. Querying of the data can be achieved through methods like bigquery.jobs.query() or bigquery.jobs.insert().
Data Management
BIgQuery isn’t simply about viewing and querying your data. All the data loaded onto BigQuery can also be managed through functions provided by the tool. These functions facilitate tasks that allow users to list projects, jobs, tables and datasets while also providing information regarding jobs, tables and datasets. Apart from these, BigQuery also lets you update, patch or delete tables and datasets.
Conclusion
Since BigQuery is a form of IaaS (Infrastructure as a Service), it can be used complimentarily with MapReduce. This fact, coupled with BigQuery’s enticing pricing is sure to make this latest product by Google a massive hit for companies that require querying of big datasets for their business needs.
By:[googleplusauthor]