Db2 for z/OS – Thirty-Five and Still Hip

imageDb2 for z/OS is one the most business-critical products in the IBM portfolio and remains core to many transaction processing, advanced analytics, and machine learning initiatives.

Looking back…

Over the past 35 years Db2 has been on an exciting and transformational journey. Retired IBM Fellow Edgar F. Codd published his famous paper in 1969 “A Relational Model of Data for Large Shared Data Banks“. From that, “Sequel” – later renamed SQL – was born.

Db2 launched in 1983 on MVS but Don Haderle (retired IBM Fellow and considered to be the “father of Db2”) views 1988 as a seminal point in its development as DB2 version 2 proved it was viable for online transactional processing (OLTP), the lifeblood of business computing at the time.

Thus was born a single database and the relational model for transactions and business intelligence.

Success on the mainframe led to the port to open systems platforms, like UNIX, Linux and other platforms on both IBM and non-IBM hardware.

Db2 helped position IBM as an overall solution provider of hardware, software and services. Its early success, coupled with IBM WebSphere in the 1990s put it in the spotlight as the database system for several Olympic games — 1992 Barcelona, 1996 Atlanta and the 1998 Winter Olympics in Nagano. Performance was critical – any failure or delays would be visible to viewers and the world’s press as they waited for event scores to appear.

Today…

Mainframes continue to store some of the world’s most valued data. The platform is capable of 110,000 million instructions per second, which (doing the math) translates into a theoretical 9.5 trillion instructions per day. With such high-value data, some of which holds highly sensitive financial and personal information, the mainframe becomes a potential target for cyber-criminals. Thankfully, the IBM Z platform is designed to be one the most securable platforms. Another key capability of the platform is the integrity of the z/OS system and IBM’s commitment to resolve any integrity-related issues.

Db2 for z/OS is a strong foundation for the IBM Z analytics portfolio with the latest iteration, version 12, providing enhanced performance over the previous version. Db2 leverages the reliability, availability and serviceability capabilities of the IBM Z platform, which delivers five nines (99.999) percent—near continuous data availability.

Advanced in-memory techniques result in fast transaction execution with less CPU, making Db2 an in-memory database. Rich in security, resiliency, simplified management and analytics functionality, Db2 continues to provide a strong foundation to help deliver insight to the right users, at the right time.

The ability to ingest hundreds of thousands of rows each second is critical for more and more applications, particularly for mobile computing and the Internet of Things (IoT) where tracking website clicks, capturing call data records for a mobile network carrier, tracking events generated by “smart meters” and embedded devices can all generate huge volumes of transactions.

Many consider a NoSQL database essential for high data ingestion rates. Db2 12, however, allows for very high insert rates without having to partition or shard the database — all while being able to query the data using standard SQL with Atomicity, Consistency, Isolation, Durability (ACID) compliance.

In 2016 Db2 for z/OS moved to a continuous delivery model that delivers new capabilities and enhancements through the service stream in just weeks and sometime days instead of multi-year release cycles. This helps deliver greater agility while maintaining the quality, reliability, stability, security requirements demanded by its customer base.

We also enhance performance with every release and now provide millions of inserts per second, trillions of rows in a single table, staggering CPU reductions….  the list goes on.

Db2 for z/OS is the data server at the heart of many of today’s data warehouses, powering  IBM analytics solutions such as Cognos, SPSS, QMF, ML for z/OS, IBM DB2 Analytics Accelerator and more.  In short, Db2 creates a sense of “Data Gravity” where its high value prompts organizations to co-locate their analytics solutions with their data. This helps remove unnecessary network and infrastructure latencies as well as helping reduce security vulnerabilities. The sheer volume and velocity of the transactions, the richness of data in each transaction, coupled with data in log files, is a potential gold mine for machine learning and A.I. applications to exploit cognitive capabilities and do smarter work, more intelligently and more securely. And so Machine Learning for z/OS was released, built on open source technology, leveraging the latest innovations while making any perceived complexities of the platform transparent to data scientists through the IBM Data Science Experience interface.

Tomorrow…

The future is hybrid cloud. Customers will always need on-prem data and applications but the move to cloud (public or private) is in high demand. We see the opportunity to help customers reduce capital and management costs, enabling them to focus on running their data and advanced analytics to create business advantages while providing a dynamic, elastic scale-out infrastructure in the cloud from any of our data centers around the world. Cloud-enabling applications and middleware such as Db2 for z/OS also helps clients to rapidly provision new services and instances on demand — again for both public or private cloud.

To the end user, the processing platform is (and should be) transparent to them and transparent to the applications that connect to or through DB2 for z/OS.

We recognize the draw of cloud — and how fast it’s changing. It’s why this DBMS offering continues to leverage a continuous delivery model to speed this transformational journey.

Our  “One Team” approach has made this work possible. Many talented people participate in this work but some of the key players driving the effort are Namik Hrle – IBM Fellow,  and Distinguished Engineers Jeff Josten and John Campbell.

Your next move…

To stay connected to what’s happening next for Db2 for z/OS, I encourage to check in regularly at ibm.com/analytics/db2/zos and also at the World of DB2.

Dinesh Nirmal,
VP IBM Analytics Development
Follow me on twitter @DineshNirmalIBM

The Five Pillars of Fluid ML

A few months ago, I was talking with the CTO of a major bank about machine learning. At one point he shook his head ruefully and said, “Dinesh, it only took me 3 weeks to develop a model. It’s been 11 months, and we still haven’t deployed it.”

This is just one example of the hazards you meet when machine learning encounters the real world. One thing is becoming clear: Machine learning data and models aren’t static. They never will be.

We need to embrace the fact that machine learning will only work over the long term if it’s fluid. In this case, being fluid means building your machine learning system on five important pillars as shown in figure #1:

FluidML1

Figure #1: Five pillars of “Fluid ML”

1. Managed.

For machine learning to do real and lasting work for an organization, you need thoughtful, durable, transparent infrastructure. That starts with identifying the data pipelines and correcting any issues around poor or missing data that can hamstring the accuracy of the models. It also means integrated governance and version control for models. Be sure that the version of each model – and there may be thousands of models being used concurrently— clearly indicates its inputs; regulators will want to know.

2. Resilient.

Being fluid means accepting from the outset that your models will fall out of synch. That “drift” can happen quickly or slowly depending on what’s changing in the real world. You need a way to do the data science equivalent of regression testing — and you need to do that testing frequently without burning up your time.

That means configuring a system that lets you set accuracy thresholds and automatic alerts to let you know your models need attention. Will you need to retrain the model on old data, acquire new data, or re-engineer your features from scratch? The answer depends on the data and the model, but the first step is knowing there’s a problem.

WATCH: I introduce the concept of Fluid ML in my keynote at O’Reilly Strata Data Conference in February.

3. Performant.

Most machine learning is computationally intense — both during training and particularly when models have been deployed. Most enterprises need models to be able to score transactions in milliseconds not minutes – to identify and prevent fraud or leverage a fleeting opportunity. You need excellent performance in both realms. Ideally, you can train models on GPUs and then deploy them on high-performance CPUs and enough memory to do real-time scoring.

And of course you want everything to run fast and error-free regardless of where you deploy: on-prem, cloud, or multi-cloud. Here, Fluid ML equals flexibility for the run time environment, without compromise.

4. Measurable.

These days, organizations across sectors are budgeting generously for machine learning projects, but those budgets will dry up if data science teams can’t deliver concrete results. You need to be able to quantify and visualize changes over time: improvements in data access and data volume, improvements in model accuracy, and ultimately improvements to the bottom line.

Begin with the end in mind. Think not only about what you need to measure now, but also about what you’ll want to measure in the future as your data science work matures. Is the system fluid enough to track those long term goals?

5. Continuous.

I started by pointing out that machine learning data and models aren’t static and never will be. The fifth and final pillar of Fluid ML is about continuous learning as the world changes. Ensure that your system lets you use tools like Jupyter and Zeppelin notebooks that can plug into processes for scheduling evaluations and retrain models.

At the same time, expect your own learning to grow and evolve as you absorb the advantages and limitations of various algorithms, languages, data sets, and tools. Fluid machine learning requires not only continuous improvement from the data and the system, but also continuous improvement from you and your teams.

 

The first three pillars are about “always-on” and the second two are about continuous learning. Wherever you are in your data science journey, the pillars of Fluid ML can bring focus to each moment and clarity for the future. It’s a bright future, and thinking carefully about machine learning can get us there. Try it today at datascience.ibm.com.

 

Dinesh Nirmal,

VP IBM Analytics Development

Follow me on twitter @DineshNirmalIBM

Breaking New Ground : A Unified Data Solution With Machine Learning, Speed and Ease Of Use

Imagine being able to arrive at your destination as much as 200 times quicker or being able to complete your most important tasks as much as 200 times faster than normal. That would be pretty impressive. What if you could get answers to your analytics queries that many times faster and run your machine learning algorithms with maximum efficiencies on your data by simply plugging in a pre-configured and pre-optimized system to your infrastructure?  That’s what the IBM Integrated Analytics Systems (IIAS) is designed to do.

As part of an organization’s “ground to cloud” hybrid data warehouse strategy, IIAS is a machine learning enabled cloud-ready unified data solution (in the past, this was called a “data warehouse appliance”) that can accelerate your analytics queries up to 210[1] times faster. From a machine learning perspective IIAS is pre-loaded with Apache™ Spark and IBM Data Science Experience (DSX) enabling organizations to use the system as an integral part of their data science collaborations.

Converging analytics and ML technologies

IIAS represents a convergence of Db2 Warehouse and PureData Systems for Analytics that enables organizations to write analytics queries and machine learning algorithms and run them anywhere across their hybrid infrastructure.  It can handle mixed workloads from structured to unstructured data, offering integration with Hadoop, high speed query routing and bulk data movement and real time data ingest.

Architected for Performance

Built on the latest IBM Power 8 technology, IIAS leverages 4X threads per core, 4X memory bandwidth and 6X more cache at lower latency compared to select x86 architectures, which helps optimize an organization’s analytics – as shown in figure #1. The hardware based all Flash storage translates to potential faster insights than disk storage with high reliability and operational efficiencies.   It is designed for massive parallel performance leveraging in-memory BLU columnar processing with dynamic movement of data from storage. It skips unnecessary data processing of irrelevant data and patented compression techniques help preserve order so data can be processed without decompressing it. Another aspect of performance is Spark embedded into the core engine therefore being co-located on the same data node which removes unnecessary network and hardware latencies.

 IIAS Picture1

Figure #1: Optimized Hardware for Big Data and Analytics

Design Simplicity

IIAS is designed around simplification and ease of use. For data experts that don’t want to be database experts IIAS helps provide fast time to value with an easy to deploy, easy to operate “Load and Go” architecture.  As a preconfigured system (what we’ve often called an appliance) can help lower the total cost of ownership with built-in tools for data migration and data movement. Using a common analytics engine enables organizations to write their analytics queries once and run them across multiple environments with IBM Fluid Query providing data virtualization through federated queries. I cover this in more detail in the “A Hybrid approach to the cloud and your data” section below 

With no configuration, no storage administration, no physical data model needed – nor indexing or tuning necessary, business intelligence developers & DBAs can achieve fast delivery times. IIAS is also data model agnostic and is able to handle structured and unstructured data and workloads.  It also comes with a self-service management dashboard.

Business Analysts can run ad hoc queries without the need to tune or create indexes and can run complex queries against large datasets and load & query data simultaneously.

Machine Learning built-in.

IIAS offers organizations the opportunity to embrace a machine learning ecosystem by simply plugging a preconfigured ready-to-go system into a client’s existing infrastructure. It’s all an organization needs for a truly cognitive experience which includes fast data ingest, data mining, prediction, transformations, statistics, spatial, data preparation for predictive and prescriptive in-place analytics.

Preconfigured with IBM’s award winning IBM Data Science Experience (DSX) data scientists, engineers, business analysts and cognitive app developers can build, train and deploy models through the sophisticated but easy to use interface allowing them to collaborate on cognitive applications across multiple platforms. DSX Local instances from an expanded IIAS can be joined to create a larger DSX Local cluster to support additional users. For those who prefer Notebooks IIAS offers built-in Jupyter Notebooks (Zeppelin coming soon) for visualizing and coding data science tasks using Python, R and Scala. RStudio is also built-in and Spark embedded (see figure # 2) on the system allowing parallelization and acceleration of tasks leveraging sparklyr and dplyr libraries.

IIAS Picture2

Figure #2: The power of embedded Spark 

Users can now create and deploy models through programmatic as well as visual builder interfaces – (simple 3 – 4 steps from ingesting data, cleaning data, training, deploying and scoring a model).

A hybrid approach to the cloud and your data

When it comes to your data, a one-size-fits-all approach rarely works. The IIAS is built on the Common SQL Engine, a set of shared components and capabilities across the IBM hybrid data management offering family that helps deliver seamless interoperability across your infrastructure.

For example, a data warehouse that your team has been using might need to be moved to the cloud to meet seasonal capacity demands. Migrating this workload to IBM Db2 Warehouse on Cloud can be done seamlessly with tools like IBM Bluemix® Lift. The Common SQL Engine helps ensure no application rewrites are required on your part.

Essentially, the Common SQL Engine provides a view of your data, regardless of where it physically sits or whether it is unstructured or semi-structured data. The system’s built-in data virtualization service in the Common SQL Engine helps unify data access across the logical data warehouse allowing an organization to federate across Db2, Hadoop and even third-party data sources.

Integrated and Open

IIAS provides integration with tools for model building and scoring including IBM SPSS, SAS, Open Source R, Fuzzy Logix. For BI and visualization there is integration with IBM Cognos, Tableau, Microstrategy, Business Objects, SAS, Microsoft Excel, SPSS, Kognito and Qlikview. And for those looking to build their own custom analytics solutions IIAS integrates with Open Source R, Java, C, C++, Python and LUA enabling organizations to use the skills sets they already have. Integration with IBM Infosphere Governance Catalog also helps users with self-service data discovery.

The Secret Sauce – the sum of the parts.

IBM Integrated Analytics Systems (IIAS) is the only unified data solution currently in the market equipped with all the combined set of capabilities discussed above.  And the key differentiator in my view of the IIAS is the convergence of multiple analytics technologies on to a single platform that together create a hybrid data warehouse capable of massive parallelism, scalability, query acceleration, embedded machine learning engine and built-in cognitive tools. Integration with open source technologies as well as IBM and third-party analytics and BI technologies all based on a common analytics engine offering simplicity with load and go features make it a very open platform.  Add to this the simplicity and performance characteristics mentioned earlier and it’s easy to see how the IIAS can help organizations more efficiently and effectively tackle their most challenging analytics and cognitive workloads like never before. In summary (see figure #3 below), the IBM Integrated Analytics Systems is designed to  help organizations do data science faster.
IIAS blog Picture3

Figure #3 : IIAS – Do data science faster

For more information read the announcement letter or listing the solution page.

 

Dinesh Nirmal,

VP Analytics Development

Follow me on twitter @DineshNirmalIBM


 

  1. Based on IBM internal tests of 97 analytics queries run an a full rack of IBM N3001-010 (Mak)) and a full rack of IBM Integrated Analytics Systems (IIAS), the average speed was 5 times faster, the median was 2 times queries and the maximum was 210 times faster. More than 80% of queries ran faster. Performance is based on measurements  using an internal IBM benchmark in a controlled environment.  This benchmark is a variant of an industry standard decision support workload. It is configured to use a 30TB scale factor and a single user issuing queries, and contains a mix of queries that are compute-bound or I/O-bound in the test environment. Note: Actual throughput or performance  will vary depending upon many factors, including considerations such as the workload characteristics, application logic and concurrency.

Mihai Nicolae: Code Craftsman, Aspiring Chef and World Traveler

As much as I love meeting long-time IBMers and hearing their perspective on our evolution over the years, it’s a special pleasure to visit with our newer team members and to hear their visions for IBM’s future. You’ll remember my conversations with Martyna Kuhlmann, Ketki Purandare, and Phu Truong.

This time, I’m talking with Mihai Nicolae, a developer working out of our Markham office near Toronto. In just two years with IBM, Mihai has already been transformational on flagship products — Db2 , Watson Data Platform, and Data Science Experience. He’s currently trading time between DSX Local, IBM Data Platform, and the new Machine Learning Hub in Toronto.

MNPicture1

Dinesh and Mihai

I hope you’ll take as much inspiration from our conversation as I did.

Dinesh: Where are you from originally?

Mihai: Romania. I’m very grateful — and always will be — for my parents having the courage to emigrate to Canada in their forties for me to have the opportunity to attend university here.

Dinesh: I bet they’re proud of you.

Mihai: Oh absolutely, I can’t ever have a doubt about that based on how much they talk about it.

Dinesh: If my son’s first job out of college was at IBM, I’d be proud, too. Tell me about your experience so far.

Mihai: I’ve been at IBM for two years full-time. Currently, I’m working on DSX Local and IBM Data Platform, which just started in January, after my time on the Db2 team. It’s been an amazing journey, especially GA-ing the product in only 4 months.

Dinesh: First of all, thanks and kudos to you and the team for delivering DSX in such a short amount of time. You’re now diving into machine learning. Did you take ML classes at university?

Mihai: I took one Intro-to-AI class, but frankly I feared the stats component of the ML course — and that 40% of my performance would depend on a 2-3 hour, stats-intensive exam.  At this point, I know that no hard thing is insurmountable if you put in the work.

MNPicture2

Mihai at Big Sur.

Dinesh: Where do you see machine learning or data science going from here?

Mihai: I think it’ll be a vital component of every business. AI is the once-in-a-lifetime technology destined to advance humanity at an unprecedented scale. I think the secrets to defeating cancer, reversing climate change, and managing the global economy lie within the growing body of digital data.

But reaching that potential has to happen with the trust of end-users, trust in security and lack of bias. That’s why I think IBM will be a leader in those efforts: because IBMers really do value trust — I see it in the way we interact with each other day to day, as much as I see it in our interactions with clients. Trustworthiness is not something that can be compartmentalized.

Dinesh: Well said. I know you also work on encryption. Where does that fit in?

Mihai: When data is the core of everything, encryption is critical — encryption plus everything to do with security, including authentication and authorization. They’re all essential for earning and keeping user trust.

Dinesh: I love your passion for your work. Do you ever leave the office? What are your hobbies?

Mihai: Ha! I go to the gym, and I recently subscribed to one of those recipe services that delivers ingredients in pre-determined amounts. But traveling is really my fixation: California, Miami, Rhode Island and Massachusetts last year. And this year, I’ve been to the Dominican Republic, and then I head to Nova Scotia this summer.

MNPicture3

…and at the Grand Canyon.

Dinesh: Nice. Do you have a particular dream destination?

Mihai: Thailand has a moon festival in April, where you get to have a water fight for three days. It’s the Thai new year. That might be my next big pick.

Dinesh: I travel a lot and I think there can be something really creative about travel, especially with the types of trips you’re talking about. I like asking developers whether they think of themselves as creative people. What’s your thought?

Mihai: Travel is definitely creative, but you’re making me think of the recipe service. I think of cooking from a card like learning programming from sample code: You get the immediate wow factor from building and running the working product but you don’t necessarily understand how and why the pieces fit so well together, or even what the pieces are. But over time, and with experience, you get understanding and appreciation. I think that’s when innovation and creativity can flourish.

Dinesh: Thanks, Mihai. Thanks for taking the time, thanks for the great work, and thanks for evolving IBM for our customers.

Dinesh Nirmal

Vice President Analytics Development

Follow me on twitter @DineshNirmalIBM

 


Home town: Constanta, Romania

Currently working on: DSX Local, Machine Learning Hub Toronto

Favorite programming language: Python

Top 5 future travel destinations:

  1. Thailand for Songkran
  2. Australia for scuba diving in Great Barrier Reef and surfing
  3. Brazil for Rio Carnaval
  4. Mexico for Mayan ruins and Diez y Seis
  5. Germany for Oktoberfest and driving on the Autobahn

 

 

IBM Machine Learning for z/OS – Like no other

Like no other Private Cloud

With many of the top banks, retailers, and insurance organizations using IBM® z Systems® , combined with tried and tested virtualization capabilities, EAL5+ security rating and the ability to handle billions of transactions a day[1], the platform becomes attractive as a private cloud for running advanced analytics as well as cloud managed services.

Those organizations are in an enviable position, with volumes of new and historical business-critical data available on such powerful and reliable systems. The sheer volume and velocity of the transactions, the richness of data in each transaction, coupled with data in log files, is a potential gold mine for machine learning applications to exploit cognitive capabilities and do smarter work, more intelligently — and more securely.

Leveraging Machine Learning on z Systems

Set against an asymptotic curve of information growth, Chief Information Officers and data scientists constantly battle to gain deeper insights from the volumes of transactions and log data on the platform (and many other platforms) and turn those insights into concrete gains. In most cases, the CIOs already have astute teams of data scientists and data engineers combing through this data — and yet they see their teams struggle to make enough time for the deep work they’re trained to do.

“Enterprises are well aware of the tremendous potential and value of the transactional and operational data on their z Systems. Yet most of them struggle with how to expose the data within the enterprise in a secure and controlled way thats still flexible enough to foster innovation and support efficient access for a variety of roles data scientists, operations, and application developers. Not an easy task, but organizations that can do so potentially obtain an edge over the competition.”

—Andrei Lurie, DB2 for z/OS Architect, IBM

Machine learning has the potential to be the perfect intelligent app — to hike efficiency, create and cement deep personal relationships with customers, push into new lines of business and new markets while helping to minimize financial risk and fraud.

I have heard customers say that the mainframe has never been hacked. But it doesn’t mean cyber criminals aren’t trying, nor that unscrupulous people aren’t attempting to commit fraud. Having applications that embed predictive models that can analyze, sense, react and become smarter with every transaction and interaction in such a business critical environment brings us a long way toward identifying and preventing potential fraud.

But z Systems is not just about transactions. It is already considered to be a hybrid transaction and analytics processing (HTAP) environment with a complete set of the analytics capabilities and acceleration technologies available today. IBM has also added full exploitation of Apache Spark™ on both z/OS and Linux® on z Systems – a solid base for building, testing and deploying high performance machine learning applications.

“By running advanced Apache Sparkanalytics directly on their production systems, enterprises can improve both the efficiency and timeliness of their insights. Moving Spark inside the mainframe also simplifies and can help reduce security risks as there is only one copy of the data to protect, and that copy resides inside z/OS’s security rich environment.”

— Fred Reiss, Chief Architect, IBM Spark Technology Center

For all these reasons and more, we are delivering the full range of our machine learning capabilities to z/OS essentially bringing advanced ML to the world’s most valued data.

Machine Learning without Compromise.

When asked to describe machine learning I break it down into three perspectives: Freedom, Productivity and Trust. I find these resonate well with customers’ needs.

Freedom. Think of freedom as a set of unified but powerful capabilities such as the flexibility of the interfaces that can be used to interact with machine learning — whether a Jupyter notebook or intuitive graphical interfaces catering to the needs of various personas from beginners to expert data scientists. With support for Python™, Java™, and Scala, different organizations can leverage their preferred programming language and skills when building machine learning applications. Machine learning from IBM can be developed on and deployed across different computing environments such as private cloud and public cloud – including IBM z Systems z/OS with a choice of frameworks such as SparkML, TensorFlow™ and H20.

With the data available to machine learning solutions, users can create advanced algorithms or choose from a set of predefined powerful algorithms and models without requiring advanced data science expertise.

Think of all this capability running on one of the highest performing platforms available: IBM z Systems. It means machine learning can be brought to bear many thousands of times per second[2] — which can help reduce costs and risks, finding and leveraging new opportunities at every transaction and interaction.

Productivity. To make machine learning consumable it has to be easy and intuitive for end users. To this end, IBM machine learning was built around three core principles of simplicity, collaboration (across multiple personas) and convergence of a wide range of technologies from our analytics portfolios and our research laboratories. The user experience is key, whether the user is a data scientist – advanced or beginner — or a computing generalist. Across personas, IBM Machine Learning lets users engage and collaborate on machine learning projects– leveraging the combined skills of the team. Wizards within the tools provide step-by-step processes and workflows that automate many aspects of building, testing and deploying learning machines. As part of the process the IBM Cognitive Assistance for Data Scientists (CADS) automates the selection of the best algorithm given a training data set. It starts by allocating a small part of the data set to each candidate algorithm, then estimates performance on a full data set. It uses the estimate to rank algorithms, and allocates more data to the best ones. It iterates until the best algorithms get all of the data set.

Trust. Once a model is built and tested, it needs to be deployed. A model – in fact the entire machine learning application (learning machine) — is similar to a living organism, evolving and adapting over time as it encounters new data with each transaction and interaction. This is achieved through a continuous feedback loop that enables the model to adapt and change, altering specific parameters within the model itself to become smarter and more consistent over time – while avoiding overfitting.  This auto-tuning is key to reducing manual intervention. Of course some human intervention or model adaptation may be necessary where a human judgement or decision is required. Therefore, keeping track of the version of the models over the lifecycle of the learning machine is important for audit purposes or to fall back to a previous version.  Another aspect of trust is of course the governance and security (the who, how, when, where) of the data, the models, and the many machine learning artifacts. IBM z Systems is recognized in the industry when it comes to security[3]– and a key reason why some of the biggest names and well known organizations across many industries run their business critical applications and data on the platform.

These three perspectives are summarized in the Figure #1 below.

ibmml3pillarspicture1

Figure #1 IBM Machine Learning the complete picture.

From a technology point of view, our aim is to free up data science teams to do the deep work that’s being asked of them — work that gets harder and harder as the world moves faster and with less certainty. Ultimately, the gains that CIOs are seeking will come from a collaboration between smart data systems and smart data scientists. Machine learning on z/OS will help enable and encourage that collaboration.

IBM Machine Learning Hub – Beyond the Technology

While the technology aspects may deliver very advanced machine learning capabilities, IBM recognizes the need to nurture and partner with organizations as they embrace and fully exploit its machine learning technologies. The first IBM Machine Learning Hub will provide the means to achieve this, with the aim to accelerate and enrich organizations’ machine earning knowledge and expertise.

The “hub” will allow organizations to access IBM world-class data science professionals who can provide education and training, expert advice on all aspects of machine learning – as well as lead and deliver proofs-of-concept and full client engagements. They focus on delivering tailored machine learning knowledge and skills transfer built around the needs and wants of customers. This combination of both the technology aspects and the knowledge / skills base is an opportunity to provide a unique machine learning experience what I consider to be the machine learning ecosystem.

Let me close this blog post by inviting you to take a look at a short video on machine learning here and reading the recent announcement of IBM Brings Machine Learning to the Private Cloud .

 

Dinesh Nirmal,

VP Analytics Development

Follow me on twitter @DineshNirmalIBM

 


[1] http://www-03.ibm.com/press/us/en/pressrelease/51623.wss 

[2] based on IBM SPSS Modeler Scoring Adapter for DB2 for z/OS performance http://www.ibmsystemsmag.com/mainframe/Business-Strategy/BI-and-Analytics/SPSS-Modeler-Scoring/?page=3

[3] EAL5 + Security Rating http://www.redbooks.ibm.com/redpapers/pdfs/redp5153.pdf


Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation (ASF).

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Python is a registered trademark of the PSF.

Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.

TENSORFLOW is a trademark of Google Inc.

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both.

Welcome to the Private Cloud

Readers of this blog know that I like to imagine the world through the eyes of my young son. I’m struck by his constant drive to push himself to his next edge of independence. I also know his appetite for danger goes only so far. He understands some of the safety boundaries we have set to protect him from the chaos and to help him thrive. Home is a safe environment in which we prepare him for the outside world, public places like school where he interacts with other students, sharing resources with others.

His effort to find the right balance of exploration and safety resonates with what we mean by “private cloud” and preparing clients for a hybrid cloud environment: private plus public, as in figure #1 below:

wpcpicture1a

Figure #1: Hybrid cloud the path toward optimal business outcomes

Hybrid cloud can provide ultimate flexibility by allowing organizations to place data and associated workloads where it best makes sense – for optimal business outcomes which I discuss in more detail later in the blog.

Private Cloud defined

In the simplest terms, private cloud (sometimes also called internal cloud, dedicated cloud or corporate cloud) provide all the benefits of cloud provisioning, management capabilities along side the scalability, agility, and the developer-driven services available from cloud vendors — but behind the firewall. Figure #2 below offers some details about the differences.

Public and private clouds are both destinations for the execution of business workloads. More and more, we see organizations eager to take a hybrid approach which allows those workloads to seamlessly execute “together” across public and private cloud allowing those customers ultimate flexibility based on (but not limited to) :

  • The volumes and types of data
  • Sensitivity of the data
  • Performance and service levels required
  • Security requirements
  • Business criticality
  • Data regulation and governance
  • Types of systems, processes, and applications

How you put the pieces together depends on the needs of your business. There are many economic and service level factors to consider. A private cloud is often the responsibility of the organization running it. Besides the factors mentioned above, the responsibilities can include: hardware, software, support, maintenance, service-level agreements with the business and all the necessary human and technical resources associated with it. With a public cloud many of these economic and service level responsibilities can often be devolved to a third party – allowing the organization using the public cloud to focus on its core business processes and needs.

That said, some enterprise customers are seeing that many of the benefits typically associated with the public cloud — lower cost, speed of provisioning, reduced management — are increasingly available on private cloud configurations that also allow data to be governed securely, smoothly, and transparently.

wpcpicture2

Figure #2 : Private and Public cloud differentiation

Life behind the firewall

What we mean by “behind the firewall” depends on individual clients and their needs. It might mean that the data is maintained completely within a client’s own protected data center by the client themselves. Or, that the data and apps live on fully dedicated bare-metal servers off-site, supported by a cloud vendor like IBM managing hardware, maintenance, connectivity, redundancy, and security on the client’s behalf, all of which help that client drastically reduce capital expenses for the servers, in-house IT staff and the burdens of obtaining and updating software.

Avoiding expenses and hassle is just the beginning of what’s possible, but let’s first consider why maintaining a private cloud while exploring public cloud options is the right fit for so many of them. Broadly, private cloud configurations can address two particular needs:

  1. The need to create a highly secure and reliable home for sensitive data, to perform advanced analytics, and to maintain data sovereignty — while allowing that data to be in conversation with data and analytics that are accumulating in the public cloud. In this sense, private cloud is one end of a private/public cloud hybrid configuration in which data is accessed, moved, and managed using secure, service-layer APIs.
  1. The need to modernize systems and processes — even behind the firewall. Organizations who see the benefits of maintaining a private cloud nevertheless demand the clear advantages of public cloud I mentioned before: elastic scalability, agility, consumability of API-driven services, easier management, and rapid provisioning, to name just a few. The key concepts here are:
  • virtualization — The use of virtual operating systems and highly elastic virtual processing power.
  • federation — The ability to take several different physical entities and represent them as a single logical entity.
  • data fabric — A software-defined approach for connecting disparate storage and management resources across private and public cloud. The approach enables multiple components to interoperate through a set of common, standardized services and APIs regardless of physical location, type of data, or type of service. As mentioned above, clear data governance is particularly crucial in hybrid environments — and even more so when country-specific compliance rules require different data policies across geographies.

As my colleague wrote:

Private Cloud is about delivering an elastic data fabric behind the clients firewall. From a user perspective, the experience goes from “Provision me a database to do xyz” to “Here is my data and analytical needs, please help.” There is no need for dedicated repositories for a specific application and user needs are met automatically, with limited human intervention.

wpcpicture3

Figure #3 : Hybrid cloud architecture  

Path to Cloud Benefits

Regardless of their focus, organizations are hungry for simplicity, transparency, and the ability to move toward cloud without starting from scratch. They know that their future success lives at the edges of wide networks, at the points of direct contact with customers and the outside world. Mobile phones, IoT sensors, and other connected devices are the new lifelines to current and potential customers, who passively or actively exchange volumes of data with servers. That data runs the gamut in terms of privacy and sensitivity: from the temperature of the toaster to credit card information, from glucose levels to the current whereabouts of my son’s backpack. All that activity at the outer edges of the network has shifted a portion of the business into the cloud even for traditionally cloud-wary sectors like finance, government, and healthcare. For those organizations, a private cloud offers an environment for core-mission, transactional workloads even as the public cloud allows them to explore CPU-intensive or streaming applications that are (for now) less central to the business. Not surprisingly, these sectors are exploring tunable hybrid cloud infrastructures. Figure #3 above offers some perspective.

Alongside the need to stay connected to customers, pressure to come to the cloud is also intense in terms of cost savings, easier management / provisioning, and — perhaps ironically — security. Security threats evolve so rapidly and attacks come from so many directions that internal security teams can struggle to keep up. And since some of the most severe cyber-attacks can come from within a company’s own ranks rather than from exterior bots or hackers, the internal teams are finding that the security of the cloud providers can be advantageous in terms of speed, currency, and completeness. As Cameron McKenzie points out, “Enterprises are starting to seriously consider the cloud as a viable option because they’ve realized that security is a battle they can’t win on their own.”

Advantages of IBM Private Cloud

Right now, IBM Private Cloud can help provide the best of the public and private cloud worlds. In fact, a recent InformationWeek post about private cloud states that “IBM is the market leader.” Our deep, in-house knowledge can help organizations breathe easy in terms of performance, cost, security, and white-glove attention and support. We start with the assumption that those organizations need to leverage the systems and processes they have in place by cloud-enabling their investments — rather than starting from square one.

Think of the IBM Private Cloud as a stack. You still need that physical infrastructure that offers high availability, scalability, performance — a strong data and analytics foundation to ingest, prepare, wrangle, discover and transform data into trusted assets. On top of that you need the ability to manage existing investments in applications and solutions as well as creating new services and apps that are cloud-enabled and can be rapidly provisioned – everything from management of the infrastructure to a collaborative development environment. Oh, and the need for security and governance of the data, transactions and applications over their lifecycles doesn’t go away.  All these layers in the stack (regardless of whether an organization buys into all of them) can be provided by IBM today – and many of them were well established and available before the mainstream adoption of cloud.

Customer environments without exception are multi-vendor, consisting of an array of heterogeneous platforms. That’s why the private cloud platform is designed to co-exist and integrate with many different technology infrastructures. The goal is to bring cognitive analytics capabilities to wherever the data is with flexibility in mind – such as delivering offerings in multiple form factors to help meet the diverse needs of our clients on their cloud and cognitive journeys. A great example is the use of Docker images that make it possible to run our analytics and other offerings across many different infrastructures leveraging the attributes of private cloud.

Innovation and Investment for client success

We’re innovating and investing on clients’ behalf to help bring them not only the expected benefits of the private and public cloud, but with the robust internal partnerships with IBM Power and IBM z Systems, business partners like the ones described above, and access to market-leading data management solutions, world class descriptive, predictive and prescriptive analytics solutions – all in a cloud-enabled integrated, secure and governed environment. All this comes together within the private cloud data platform with tried and tested infrastructure, governance, security, data fabric capabilities and cognitive computing services – with the flexibility to provision data and policies across private and public cloud environments. This is an optimal hybrid model.

 

In subsequent posts, we’ll look at private cloud strategies related to data repositories, analytics, content management, and integration/governance issues — and how these strategies braid together.

In the meantime, I encourage you to click the IBM private cloud page – a great place to explore and try some of the capabilities that exist today, and get a preview of what’s coming soon.

 

Dinesh Nirmal, Vice President, Analytics Development.

Follow me on Twitter @DineshNirmalIBM

Phu Truong: Humble Leader, Loves Logic, Hates Calculations

You in the Private Cloud: A bi-weekly series of conversations with IBM talent around the world

If you’ve seen the movie “Hidden Figures” — and if you haven’t I highly recommend you do, and not just because IBM is a central character — you’ve seen how the race to get a man into space was profoundly affected at the 11th hour by one courageous woman, and the help her boss, her friends, her teachers, and her family gave her to get to that one minute in time when she made a difference.

People first. To build rockets to the stars and machines that think, people need to dream things up, and work with sustained, supported effort to make them real.

We have many talented people in IBM Private Cloud. This year, I’ll continue to meet and talk to as many of you as I can and I’ll post our conversations here every two weeks. My hope is we’ll get to know each other, and feel even more connected and supported in our work.

This week, I was able to lure Phu Truong away from coding on the IBM data platform to meet at IBM Silicon Valley Laboratories, San Jose, CA.

You were an intern with us until just a few months ago. Why did you choose to come to IBM full time, out of all the choices in Silicon Valley? The appeal of IBM is the opportunity to work on new technologies, specifically, new technologies on the back end. A few weeks into my internship the senior engineer I worked with set me to work on learning Node.js® and React. I want to be a full stack engineer so now I’m working on UI, but to be really great there you need a feel for art, and I don’t have that. The back end is pure logic. I loved it, so much so that I started staying very late at night to work.

Untitled-1.png

Some people love their jobs because of people, or culture, but clearly, you love the technical work. How did you decide on computer programming as a profession? I come from Vietnam, and I had no programming background there, to be honest. I studied mathematics at university and planned to go into it professionally, but I’m very bad at calculations. I make mistakes all the time! What I love about mathematics is logic — the feeling I get when I solve a problem using logical thinking is intensely satisfying to me. I feel very good about myself.  So when I came to the U.S., I had a fresh start. I asked my friends to help me find a field that uses logical thinking to solve problems, and they recommended computer science. One week into my first CS class, Data Structures and Algorithms, I knew I’d found my profession.

So now you’re at IBM, you worked on the Data Science Experience (DSX) and now you’re working on the IBM data platform. Are you thinking of following the full path from engineer to Senior Technical Staff Member (STSM) to Distinguished Engineer (DE) to IBM Fellow? I don’t know, that may be too much!

Untitled-2.png

I hear great things about you so maybe not! You’re already mentoring others in your team on Node.js, after being here only a few months. I consider it more like sharing knowledge. When a colleague comes to me with a question, I might know something they don’t and they might know something I don’t. I might say something wrong when we’re working together and that’s an opportunity for them to correct me and for me to learn. Growing up, I helped my younger brother with his schoolwork, so I guess it’s natural for me to help. But it benefits everyone.

What do you like to do outside of work? I like to play Ping Pong with my friends from San Jose State, or go with them to the beach. And I love to travel—I want to go to Cancun, because of all the natural landscapes the beach is my favorite and I’ve heard it’s spectacular there. After that, Paris and London. I love eating out, so much so that I tell my friends I want to marry a chef!

You have an adventurous spirit! IBM is an international company so, I don’t know about Cancun, but travel to Europe is likely. What’s it like living in the heart of Silicon Valley after growing up in Vietnam? I grew up in Saigon, in a very tall, very thin town house: Saigon is famous for thin houses. Here, being surrounded by rolling green hills and close to the beach is wonderful. I think my family worried about me when I moved here, not that it was dangerous, but that I might just chase money and give up on my education: I worked as a waiter, a data entry clerk and a school bus driver, any job I could get, I took. But I never gave up on my education. I think now they don’t worry about me anymore. I think they might be proud of me.

untitled-3

You’ve achieved a great deal here in a very short period of time, making a significant contribution to two products that customers like. It’s tremendous, and I’m happy you’re here. I am as well. I think the biggest difference between Vietnam and here is in education and learning. In Vietnam, education was driven by memorizing things and was not interesting to me. And, we are taught to do exactly what teachers tell us to do; they don’t give students a chance to explore their interests. So to be first at San Jose State and now at IBM where it’s part of my job to learn new skills—well, I like it very much.

Name: Phu Truong
Hometown: Saigon, Vietnam
Currently working on: IBM Data Platform
Favorite Programming Language: Node.js
Top 3 travel destinations: Cancun, Paris, London
Best Vietnamese Food in Silicon Valley: Pho Y 1 on the Capitol Expressway, San Jose

Dinesh Nirmal,  

Vice President, Analytics Development
Follow me on Twitter @DineshNirmalIBM

Node.js is a trademark of Joyent, Inc. and is used with its permission. We are not endorsed by or affiliated with Joyent.

Welcome!

Why a Cloud First Strategy Can Benefit Customers with IBM BigInsights on Cloud.

Hadoop – the early years

The origins of Apache™Hadoop® go as far back as 2003 in reference to the emergence of a new file system, followed by the introduction of MapReduce and the birth of Hadoop in 2006. It achieved notoriety and fame as the fastest system to sort a terabyte of data and when it became an Apache open source project (Apache Hadoop) it sent a signal that it was ready for prime time.  The world never looked back.  Within IT shops and even board rooms there was huge interest, excitement – even hype with suggestions that it might replace the enterprise warehouse.

A quick Hadoop refresh

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Maturity, BigInsights platforms and the move to cloud

All technologies move through a maturity cycle. Hadoop is no exception and is maturing fast. IBM® saw the opportunity to build enterprise ready mission critical Hadoop based solutions delivering its BigInsights™ portfolio which adds significant value to the core Apache Hadoop open source software. IBM helped lead and drive the Open Data Platform initiative (ODPi) [see ODPi.org] to encourage interoperability across different Hadoop vendors.

Built around a no-charge open source based core which includes Apache Spark™ called the IBM Open Platform with Apache Hadoop (IOP), IBM BigInsights brings a rich set of capabilities from advanced and high performance analytics such as BigSQL, to visualization through BigSheets all neatly brought together to meet the needs of different personas.  IBM BigInsights is cited as being a leader in The Forrester Wave™: Big Data Hadoop Distributions, Q1 2016 available from Forrester.

Usage of Hadoop in general varied widely across customers – some having multiple thousand node clusters and others just 5 or 10. Having 100s or thousands of nodes might be off-putting for some customers in terms of capital and management costs.  With BigInisghts having established itself as a leader and with IBM focused on a Cloud First Strategy, we saw the opportunity to help customers reduce these capital and management costs, to enable them to focus on running the analytics for business advantage while providing BigInsights on a dynamic elastic and scale out infrastructure in the cloud through IBM SoftLayer and Bluemix technologies from any of our many data centers around the world.

Screen Shot 2016-05-16 at 2.10.37 PM

Figure 1: IBM Open Platform and BigInsights – cloud services.

 

The following report cites IBM as a leader : “The Forrester Wave™ : Big Data Hadoop Cloud Solutions, Q2 2016” which states :

“IBM differentiates BigInsights with end-to-end advanced analytics. IBM BigInsights runs atop IBM’s SoftLayer cloud infrastructure and can be deployed on any of 17 global data centers. IBM’s client relationships require it to be flexible in how it offers Hadoop in the cloud and offer highly customized configurations. IBM is making significant investments in Spark, offering data science notebooks that run with the platform. Enterprises using IBM’s data management stack will find BigInsights a natural extension to their existing data platform. The company has also launched an ambitious open source project, Apache SystemML, from its newly minted Spark Technology Center. IBM’s customers value the maturity and depth of its Hadoop extensions, such as BigSQL, which is one of the fastest and most SQL-compliant of all the SQL-for-Hadoop engines. In addition, BigQuality, BigIntegrate, and IBM InfoSphere Big Match provide a mature and feature-rich set of tools that run natively with YARN to handle the toughest Hadoop use cases.”

The report shows IBM scored among the highest in the solution configuration, data security, data management, development, cloud platform integration, ability to execute, road map, professional services, fixes and partnerships criteria.

 

In closing…

To conclude, there has never been a better time to invest in your BigInsight projects whether on-prem or in the cloud. The IBM Cloud First strategy is helping customers better manage their costs and focus on delivering business value and insight.  IBM can help abstract the complexities of managing infrastructures in a highly performing, highly available, security-rich and elastic scale-out environment across 17 worldwide multi-tenant data centers.  IBM BigInsights, combined with making data easy and our leadership and investment in Apache Spark, is helping deliver a next generation analytics platform capable of advanced analytics, machine learning, streaming, powerful SQL, graph analytics and more.

For more information on IBM BigInsights or to get started on BigInsights on Cloud click here.

Dinesh Nirmal – Vice President, Next Generation Platform, Big Data & Analytics on z

Follow me on Twitter:  @IBM_Innovation

 

 

TRADEMARK DISCLAIMER: Apache, Apache Hadoop, Hadoop, Apache Spark, Spark and the Spark logo are trademarks of The Apache Software Foundation.

IBM, IBM BigInsights, BigInsights are trademarks of the IBM Corporation.

 

Piotr Gnysinski: QA Wizard, Former Farmhand, and Family Man

I originally started the “You in the Private Cloud” series as a way to introduce our talented team to each other across our many geographies. I knew it was important for us to know each other as more than email addresses or voices during meetings.

But I didn’t realize at the time that it would become one of the favorite parts of my job. I truly love settling in for great conversations with the terrific people working on IBM Analytics offerings across the globe.

This time was no different. Many of you know that we have a vibrant presence in Krakow, Poland. And while there recently I got the chance to visit with Piotr Gnysinski who works as Test Lead on the Information Governance Catalog, a key part of our InfoSphere Information Server offering.

Picture1PG

Piotr with Dinesh

Dinesh: I know you worked for a while for Comarch whose founder is Janusz Filipiak -—a famous, larger-than-life tech founder. What was it like working there?

Piotr: When I joined, Comarch was already a big company. It was my first job in IT and the first time I experienced emotions from customers coming our way: real people on the receiving end of my work — sometimes with real joyful reactions, sometimes with irritation as a result of bugs that made it through to the field.

I had to switch to real proactive thinking. I would say this attitude —this deep and strong engagement for customer advocacy and not just technical skills — is the most important single characteristic that can help someone do well in our business, or any business for that matter.

Dinesh: You’ve got a reputation for designing robust testing frameworks that cover a lot of ground. I think testing can seem like a mystery to many of us. Give me a sense of how you approach things.

Piotr: It depends on what you’re testing, but a big tool for us across the board is the idea of pair-wise testing. We know from studies that most defects can be discovered in tests of the interactions between the values of two variables (65-97%)[1]. A factor could be the browser vendor, the underlying operating system, and so on.

So, when you have an almost infinite number of tests you could run and very limited time, you first think of all those possible factors and figure out their possible values, then you classify these into groups called “equivalence classes”. You know that testing a single value from a class will probably give the same result as testing any other value in the group, so now you use algorithms that make sure each pair of classes is covered at least once — and you make sure to mix up which specific values are getting tested in the different pairs. That gives you good coverage.

I’ll send you a link to some information about Combinatorial Test Design if anybody wants to read up some more.

Picture2PG

Piotr with wife Justyna, daughter Julia, and son Szymon

Dinesh: What do you do on weekends for fun?

Piotr: Almost every weekend, my wife Justyna and I take our son and daughter on some adventure: water park, bike riding, or visiting the playground. But my favorite is to bring them to visit Henrykow, which is a small village with about 30 people. My aunt and uncle have a farm and I used to go there every summer when I was a kid. I collected so many fantastic memories from there.

So now, whenever I have a chance, I pack up the family and two hours later we are in ‘Neverland’. They still keep livestock and they still work the land, so my kids get to see and do all that as well. For instance, not so long ago, they witnessed a calf being born, they very often get to ‘drive’ — being on my lap — a tractor, play in the hay for hours, or we go through the woods or the swamps, which always ends up with at least two of us all wet and muddy.

Picture3PG

At the beach with friends and family 

Dinesh: It looks like you also make it to the gym once in a while. Am I crazy?

Piotr: Ha! Yes, I do weights mostly. There is something very satisfying in pushing yourself over imagined limits and doing completely exhausting training sessions, after which you can barely move. Yeah, gym is fun!

I’ll also get ideas for work at the gym, usually related to current work stuff: how are we going to approach creating our environment matrix for an upcoming release or how can we improve a process that was raised during a Lessons Learned session. Nothing revolutionary that would change the IT world, but very down-to-earth solutions that help us get better and better at what we do.


Dinesh Nirmal

Vice President, Analytics Development

Follow me on twitter @DineshNirmalIBM

 

Picture4PG

Piotr’s hometown is  Bedzin, Poland, most famous for its castle.

 

Picture5PG

Piotr: “A nearby roundabout, which was designed back when we had Communism here aiming to be perfect non-collision intersection for cars and trams. What we are left with, is this ’roundabout’ that is called ‘a kidney’ and where cars cross paths with trams three times before they leave it 🙂 It makes just about as much sense as Communism itself.”

Favorite programming language: JavaTM

Top 5 authors:

  1. Terry Pratchett
  2. Andrzej Sapkowski
  3. James Whitaker
  4. J.K. Rowling
  5. Wiktor Suworow

  1. IBM Haifa Research Laboratory Combinatorial Test Design (CTD) http://research.ibm.com/haifa/dept/svt/papers/The_value_of_CTD.pdf

Opening up the Knowledge Universe.

IBM Data Science Experience Comes to a Powerful, Open, Big Data Platform.

I have just finished presenting at the DataWorks Summit in San Jose. CA. where a partnership between IBM and HortonWorks was announced the aim of which is to help organizations further leverage their Hadoop infrastructures with advanced data science and machine learning capabilities. 

Some Background.

When Apache™ Hadoop® first hit the market there was huge interest in how the technology could be leveraged – from being able to perform complex analytics on huge data sets by using a cluster of thousands of cheap commodity servers and Map/Reduce  – to predictions that it would replace the enterprise data warehouse.  About three years ago Apache™ Spark™ gained a lot of interest unleashing a multi-purpose advanced analytics platform to the masses – a platform capable of performing streaming analytics, graph analytics, SQL and Machine Learning with a focus on efficiency, speed and simplicity.

I won’t go into details on the size of the Hadoop market, but many organizations invested heavily for numerous reasons including, but not limited to, it being seen as an inexpensive way to store massive amounts of data, the ability to perform advanced queries and analytics on large data sets with rapid results due to the Map / Reduce paradigm.  From one perspective, it was a data scientist’s dream to be able to reveal deeper insights and value from one’s data in ways not previously possible.

Spark represented a different but complementary opportunity allowing data scientists to apply cognitive techniques on data using machine learning – and other ways of querying data – in HDFS™ as well as data stored on native operating systems.

Many organizations including IBM made investments in Hadoop and Spark based offerings. Customers were enthused because these powerful analytics technologies were all based on open source representing freedom and low cost. Organizations including IBM participated in initiatives such as ODPi to help ensure interoperability and commonality between their offerings without introducing proprietary code.

Self-Service, Consumable, Cognitive tools.

Frustrated with IT departments not being able to respond fast enough to the needs of the business, departments sought a “platform” that would allow them to perform “self-service” analytics without having to be die-hard data scientists / engineers or developers.

The IBM Data Science Experience (DSX) emerged as a tool that could help abstract complexity, unify all aspects of data science disciplines regardless of technical ability to allow a single user or multiple personas to collaborate on data science initiatives on cloud, locally (on-prem) or while disconnected from the office (desktop).  Whether you prefer your favorite Jupyter notebook, R Studio, Python, Spark or a rich graphical UI that provides advanced users with all the tools they need – as well as cognitively guiding inexperienced users through a step by step process of building, training, testing, deploying a model – DSX helps unify many aspects into an end to end experience.

DSX Arch1
Figure #1 : Data Science Experience – Making data simple and accessible to all. 

Enterprise Ready.

A lot needs to happen for machine learning to be enterprise ready and robust enough to withstand business critical situations. Through DSX (see figure #1), advanced machine learning capabilities, statistical methods and advanced algorithms such as Brunel visualizations are available. Sophisticated capabilities such as automated data cleansing help ensure models are executing against trusted data. Deciding which parts of the data set are key to the predictive model (feature selection) can be a difficult task. Fortunately, this capability is automated as part of the machine learning process within DSX.  An issue that many data scientists face is the potential for predictive models to be impacted by rogue data or sudden changes in the market place.  IBM machine learning helps address this issue by keeping the model in its optimal state through a continuous feedback loop that can fine tune parameters of the model without having to take it off line.  This allows the model to sense and respond to each interaction (level of granularity defined by policy) without any human interaction.

A knowledge Universe – Unleashing Cognitive insights on Hadoop Data Lakes – with Power.

The potential of integrating the richness of DSX and the cognitive ML capabilities with all that data residing in HDFS (as well as many other data sources outside of Hadoop) is an exciting proposition for the data science community. It could help unlock deeper insights, increasing an organization’s knowledge about itself, the market, products, competitors, customers, sentiment at scale, at speeds approaching real time. One of the key features delivered as part of Hadoop 2.0 was YARN (yet another resource negotiator) that manages resources involved when queries are submitted to a Hadoop cluster, far more efficiently than in earlier versions of Hadoop – ideal for managing ever increasing cognitive workloads.

Simply put, I cannot think of a time where there has been a better opportunity for organizations to leverage their Hadoop investments until now.  The combination of Hadoop based technologies integrated with IBM ML and DSX unleashes cognitive insights to a very large Hadoop install base.

All very promising so far –but there is one more nugget to unleash that will help organizations with their cognitive workloads. IBM just announced HDF 3.0 for IBM Power Systems, bringing the built-for-big-data performance and efficiency of Power Systems with POWER8 to the edge of the data platform for streaming analytics applications.  This solution joins HDP for Power Systems, recently launched, which offers a 2.4X price-performance advantage [1] versus x86-based deployments.

I’m excited at the possibilities that lie ahead – how data scientists and machine learning experts might leverage and benefit from our offerings and the integration with Hadoop infrastructures – how they might take it to the next level in ways we’ve not yet imagined as we continue to enrich our offerings with more capabilities.

For more information on how to get started with Machine Learning click the link below : datascience.ibm.com

 

Dinesh Nirmal – VP Analytics Development.  

Follow me on twitter @DineshNirmalIBM

 


 

IBM, the IBM logo, ibm.com, IBM Elastic Storage Server, IBM Spectrum Scale, POWER8 and Power Systems are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their rst occurrence in this information with a trademark symbol (® or TM), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at http://www.ibm.com/legal/copytrade.shtml.

Apache Spark, Apache Hadoop, HDFS, Spark, Apache, Hadoop and the Spark, Hadoop logos are trademarks of The Apache Software Foundation.

Other company, product or service names may be trademarks or service marks of others.

1 – Based on IBM internal testing of 10 queries (simple, medium, complex) with varying run times, running against a 10TB DB on 10 IBM Power Systems S822LC for Big Data servers (20 C/40 T), 256GB memory, HDP 2.5.3, compared to published Hortonworks results based on the same 10 queries running on 10 AWS d2.8xlarge EC2 nodes (Intel Xeon E5-2676 v3), HDP 2.5. Individual results may vary based on workload size and other conditions.  Data as of April 20, 2017; pricing is based on web prices for the Power Systems S822LC for Big Data (https://ibm.biz/BdiuBC) and HP DL380 Intel Xeon HP DL380; 20 C/40 T, 2 X E5-2630 v4; 256 GB found at marketplace.hpe.com

Martyna Kuhlmann – DB2 Regression Tester and Artist.

All tech companies draw international talent, but arguably none more so than IBM. Our Analytics Development team alone has labs in Canada, Germany, India, Japan, China, and the US. It’s fascinating for me to hear first-hand the different paths you took to IBM; we are richer for your talent and for the different ways of thinking and for being you and what bring to the team. From Martyna, I learned what it’s like to live in a tiny village in Poland, steeped in the intimacy of village life and in Poland’s tradition of excellence in mathematics.

You came to IBM as an intern from the University of Saskatchewan. What was it like, moving from the prairies of Canada to the big city?

Leaving my village in Poland was much harder than moving from Saskatoon to Toronto. I grew up in a village of just 500 people, so when I left at 19 years old I was saying goodbye to many people I’d known since I was born: friends and family, but also the people in the shops, the personae of village life. In a small village, people are always helping each other; you know everyone. Now I live in an apartment building in Markham and I don’t know the people living next-door!

MKPicture3

What was it that led you to choose computer science as a field of study?

Both my parents had studied math, so that was the family business, you could say, and my plan going into university – that lasted about eight months, by which time I’d had enough of calculus to last a lifetime. I took one computer science class and absolutely fell in love. After three years I interviewed at IBM and I can now admit that I was extremely stressed about it! I hardly slept the night before. I wanted so badly to work for IBM. I got the job, and it was a dream come true. My parents, on the other hand, were supportive but disappointed, in a minor key, not by IBM but by my choice of field: I was the last hope in the family to carry on their math legacy (my sisters, most rebelliously, studied psychology and neuroscience), so when I told my father his comment was, “Well, better computer science than statistics.”

You are practically a black sheep! Why was computer science love at first sight – or first class – for you?

Because it’s not about solving the equation. It’s about creating universal solutions, and I found coming up with innovative ways to solve a problem intensely rewarding. Part of it is the immediate value. I have a problem, I write a program, problem solved. It takes longer to write a big piece of software, but each function is a tangible step towards your goal. In math, you come up with a theorem and it takes months to prove. You could have an idea and start exploring it just to realize a week later that you’ve hit a dead end and you have to start all over again.

MKPicture2

The potential for real-world application is also satisfying. My parents’ work might find its applications in 50, 100, or 200 years — that doesn’t mean it’s less valuable, but since I’m a very impatient person I wouldn’t be able to do work with gratification delayed beyond my lifetime. I need to see the impact of my work right away.

What are you working on now?

I work on the infrastructure team, maintaining an environment for testing. Some of the work is fascinating, and some is not especially glamorous, but at IBM you always have a bonus factor: the people are wonderful. That’s what really makes the difference in my work – solving problems with very smart people from all over the world, who also happen to be incredibly nice to be around.

You’ve perhaps found some of the sense of community of your home village — or fostered it, here at IBM. What do you like to do outside of work?

I find myself drawn to the most tedious hobbies imaginable. I picked up painting recently, and I sew … painting was an antidote to being on the computer for hours and hours, a way to rest my eyes and my mind and to be creative using a completely different part of myself. I’ll sit in front of a canvas for hours painting, I find it deeply relaxing. I need to be completely clear here: I have no talent, none. I paint because I love it!

I also have a cat, named ‘Data’ – she has chewed through 23 computer cables in my home. I’m counting, because I actually find it kind of impressive – her focused dedication to cable destruction.

MKPicture1

A lot of the people I interview seem to like food – Sebastian Muszytowski, one of our Python experts in Boeblingen, loves to bake, and Phu Truong knows all the best places to eat out in San Jose.

Not me. I eat once a day, at 6 pm. But I love coffee.

What do you see yourself doing in the future? What excites you?

That’s a tough question! I suppose it’s having an impact. Right now, there is a lot of potential on the infrastructure team – we’re planning to create some regression tools and leverage automation, which is all very exciting!  but the minute I notice there are not as many areas to improve, I will look for another role at IBM.

Yes, it’s important that we use our own machine learning technology, so that’s wonderful.

Absolutely, and the potential is so exciting. Sometimes I’m working on code developed years ago, so I can’t just stroll over to the creator and ask about it, but if we can create self-healing technology, imagine the possibilities!

 


 

Home town: Księży Las, Poland
Currently working on: DB2 regression infrastructure and tooling
Favorite programming language: Prolog
Top 5 painting inspirations:
1) A busy day at work (art relaxes me)
2) Struggling with a problem (engaging the right hemisphere can work miracles!)
3) Cool characters from games and movies
4) Art on DeviantArt
5) Lack of internet connection
Dinesh Nirmal

Vice President Analytics Development.

Follow me on Twitter @dineshnirmalIBM