Breaking New Ground : A Unified Data Solution With Machine Learning, Speed and Ease Of Use
Imagine being able to arrive at your destination as much as 200 times quicker or being able to complete your most important tasks as much as 200 times faster than normal. That would be pretty impressive. What if you could get answers to your analytics queries that many times faster and run your machine learning algorithms with maximum efficiencies on your data by simply plugging in a pre-configured and pre-optimized system to your infrastructure? That’s what the IBM Integrated Analytics Systems (IIAS) is designed to do.
As part of an organization’s “ground to cloud” hybrid data warehouse strategy, IIAS is a machine learning enabled cloud-ready unified data solution (in the past, this was called a “data warehouse appliance”) that can accelerate your analytics queries up to 210[1] times faster. From a machine learning perspective IIAS is pre-loaded with Apache™ Spark and IBM Data Science Experience (DSX) enabling organizations to use the system as an integral part of their data science collaborations.
Converging Analytics and ML Technologies
IIAS represents a convergence of Db2 Warehouse and PureData Systems for Analytics that enables organizations to write analytics queries and machine learning algorithms and run them anywhere across their hybrid infrastructure. It can handle mixed workloads from structured to unstructured data, offering integration with Hadoop, high speed query routing and bulk data movement and real time data ingest.
Architected for Performance
Built on the latest IBM Power 8 technology, IIAS leverages 4X threads per core, 4X memory bandwidth and 6X more cache at lower latency compared to select x86 architectures, which helps optimize an organization’s analytics – as shown in figure #1. The hardware based all Flash storage translates to potential faster insights than disk storage with high reliability and operational efficiencies. It is designed for massive parallel performance leveraging in-memory BLU columnar processing with dynamic movement of data from storage. It skips unnecessary data processing of irrelevant data and patented compression techniques help preserve order so data can be processed without decompressing it. Another aspect of performance is Spark embedded into the core engine therefore being co-located on the same data node which removes unnecessary network and hardware latencies.
Design Simplicity
IIAS is designed around simplification and ease of use. For data experts that don’t want to be database experts IIAS helps provide fast time to value with an easy to deploy, easy to operate “Load and Go” architecture. As a preconfigured system (what we’ve often called an appliance) can help lower the total cost of ownership with built-in tools for data migration and data movement. Using a common analytics engine enables organizations to write their analytics queries once and run them across multiple environments with IBM Fluid Query providing data virtualization through federated queries. I cover this in more detail in the “A Hybrid approach to the cloud and your data” section below
With no configuration, no storage administration, no physical data model needed – nor indexing or tuning necessary, business intelligence developers & DBAs can achieve fast delivery times. IIAS is also data model agnostic and is able to handle structured and unstructured data and workloads. It also comes with a self-service management dashboard.
Business Analysts can run ad hoc queries without the need to tune or create indexes and can run complex queries against large datasets and load & query data simultaneously.
Machine Learning Built-in
IIAS offers organizations the opportunity to embrace a machine learning ecosystem by simply plugging a preconfigured ready-to-go system into a client’s existing infrastructure. It’s all an organization needs for a truly cognitive experience which includes fast data ingest, data mining, prediction, transformations, statistics, spatial, data preparation for predictive and prescriptive in-place analytics.
Preconfigured with IBM’s award winning IBM Data Science Experience (DSX) data scientists, engineers, business analysts and cognitive app developers can build, train and deploy models through the sophisticated but easy to use interface allowing them to collaborate on cognitive applications across multiple platforms. DSX Local instances from an expanded IIAS can be joined to create a larger DSX Local cluster to support additional users. For those who prefer Notebooks IIAS offers built-in Jupyter Notebooks (Zeppelin coming soon) for visualizing and coding data science tasks using Python, R and Scala. RStudio is also built-in and Spark embedded (see figure # 2) on the system allowing parallelization and acceleration of tasks leveraging sparklyr and dplyr libraries.
Users can now create and deploy models through programmatic as well as visual builder interfaces – (simple 3 – 4 steps from ingesting data, cleaning data, training, deploying and scoring a model).
A Hybrid Approach to the Cloud and your Data
When it comes to your data, a one-size-fits-all approach rarely works. The IIAS is built on the Common SQL Engine, a set of shared components and capabilities across the IBM hybrid data management offering family that helps deliver seamless interoperability across your infrastructure.
For example, a data warehouse that your team has been using might need to be moved to the cloud to meet seasonal capacity demands. Migrating this workload to IBM Db2 Warehouse on Cloud can be done seamlessly with tools like IBM Bluemix® Lift. The Common SQL Engine helps ensure no application rewrites are required on your part.
Essentially, the Common SQL Engine provides a view of your data, regardless of where it physically sits or whether it is unstructured or semi-structured data. The system’s built-in data virtualization service in the Common SQL Engine helps unify data access across the logical data warehouse allowing an organization to federate across Db2, Hadoop and even third-party data sources.
Integrated and Open
IIAS provides integration with tools for model building and scoring including IBM SPSS, SAS, Open Source R, Fuzzy Logix. For BI and visualization there is integration with IBM Cognos, Tableau, Microstrategy, Business Objects, SAS, Microsoft Excel, SPSS, Kognito and Qlikview. And for those looking to build their own custom analytics solutions IIAS integrates with Open Source R, Java, C, C++, Python and LUA enabling organizations to use the skills sets they already have. Integration with IBM Infosphere Governance Catalog also helps users with self-service data discovery.
The Secret Sauce – the Sum of the Parts
IBM Integrated Analytics Systems (IIAS) is the only unified data solution currently in the market equipped with all the combined set of capabilities discussed above. And the key differentiator in my view of the IIAS is the convergence of multiple analytics technologies on to a single platform that together create a hybrid data warehouse capable of massive parallelism, scalability, query acceleration, embedded machine learning engine and built-in cognitive tools. Integration with open source technologies as well as IBM and third-party analytics and BI technologies all based on a common analytics engine offering simplicity with load and go features make it a very open platform. Add to this the simplicity and performance characteristics mentioned earlier and it’s easy to see how the IIAS can help organizations more efficiently and effectively tackle their most challenging analytics and cognitive workloads like never before. In summary (see figure #3 below), the IBM Integrated Analytics Systems is designed to help organizations do data science faster.
For more information read the announcement letter or listing the solution page.
Dinesh Nirmal – Vice President, IBM Analytics Development
Follow me on Twitter: @dineshknirmal
Based on IBM internal tests of 97 analytics queries run an a full rack of IBM N3001-010 (Mak)) and a full rack of IBM Integrated Analytics Systems (IIAS), the average speed was 5 times faster, the median was 2 times queries and the maximum was 210 times faster. More than 80% of queries ran faster. Performance is based on measurements using an internal IBM benchmark in a controlled environment. This benchmark is a variant of an industry standard decision support workload. It is configured to use a 30TB scale factor and a single user issuing queries, and contains a mix of queries that are compute-bound or I/O-bound in the test environment. Note: Actual throughput or performance will vary depending upon many factors, including considerations such as the workload characteristics, application logic and concurrency.