Desmond Chan

Artificial Intelligence Is Already Deep Inside Your Wallet – Here’s How

Artificial intelligence (AI) is the key for financial service companies and banks to stay ahead of the ever-shifting digital landscape, especially given competition from Google, Apple, Facebook, Amazon and others moving strategically into fintech. AI startups are building data products that not only automate the ingestion of vast amounts of data, but also provide predictive and actionable insights into how people spend and save across digital channels. Financial companies are now the biggest acquirers of such data products, as they can leverage the massive data sets they sit upon to achieve higher profitability and productivity, and operational excellence. Here are the five ways financial service companies are embracing AI today to go even deeper inside your wallet.

Your Bank Knows More About You Than Facebook
Banks and financial service companies today live or die by their ability to differentiate their offering and meet the unique needs of their customers in real-time. Retention is key, and artificial intelligence is already disrupting what it means for financial service companies to “know the customer.” Google, Facebook, Twitter, and other walled gardens already deeply understand this, which is why they are so keen to collect massive amounts of data on their users, even if they don’t have fintech infrastructure yet.

So how does your bank know more about you than Facebook? Using AI platforms, they can bridge customer data across multiple accounts – including bank, credit, loans, social media profiles, and more – and give them a 360-degree view of the customer. Once they have this, predictive applications suggest in real-time the “next best” offer to keep the person happy based on their spending, risk tolerance, investment history, and debt. For example, based on one transaction – a mortgage – financial companies use AI to recommend a checking account to pay for the mortgage, credit cards to buy furniture, home insurance, or even mutual funds that are focused on real estate. Financial services companies can now also predict customer satisfaction and dissatisfaction, allowing them to intercept consumer churn before it happens by offering exclusive deals or promotions before the person gets angry.

Credit “Risk” Is Becoming Competitive Opportunity
A limited amount of data is used for credit risk scoring today, and it’s heavily weighted toward existing credit history, length of credit usage, and payment history. Naturally, this results in many qualified customers – or anyone trying to access credit for the first time – being rejected for loans, credit cards and more. Credit card companies, including Amazon, are realizing there is a big revenue opportunity that is missed by the current credit assessment system. With AI, employment history data, social media data, shopping and purchasing patterns, and are used to build a 360-degree view of the credit “opportunity” as opposed to pure risk. Even better, AI data products can provide real-time updates of credit scores based on recent employment status changes or transactions, so that your credit score is not a fixed number but something that evolves. With this capability, banks and financial services companies are finding overlooked or untapped credit opportunities that even the most sophisticated tech company is missing.

Predict the Next DDOS Attack
The distributed denial-of-service (DDOS) attack against Dyn in October brought to the public forefront the scale and severity of cyber attacks. In the financial realm, security breaches and cyber attacks are not only costly, but also have a damaging impact on brand trust and customer loyalty. Experts and analysts agree that such DDOS attacks will become more prevalent in the future, in part because current cybersecurity practices are built upon rules-based systems and require a lot of human intervention. Many of the current cybersecurity solutions in market are focused on detection, as opposed to prevention. They can tell you an attack is happening, but not how to predict one or what to do once it’s discovered.

Leveraging AI platforms, banks, credit card companies, and financial service providers are beginning to predict and prevent such cyber attacks with far greater precision than what’s in use today. Using traffic pattern analysis and traffic pattern prediction, AI data products inspect financial-based traffic in real-time and identify threats based on previous sessions. Effectively, this means that a financial company can shut down harmful connections before they compromise the entire website or server. Importantly, as more data is ingested, the AI data product evolves and gets smarter as the hacker changes its methodology. This takes the notion of prevention to a whole new level, as it anticipates the bad actors’ next move.

Putting an End to Money Laundering
The estimated amount of money laundered globally in one year is 2 to 5 percent of global GDP, or upwards of $2 trillion in USD. Efforts to combat money laundering are never-ending, as criminals find new ways to stay ahead of law enforcement and technology. Customer activity monitoring is currently done through rules-based filtering, in which rigid and inflexible rules are used to determine if something is suspicious. This system not only creates major loopholes and many false positives, but also wastes investigators’ time and increases operational costs. AI platforms can now find patterns that regular thresholds do not detect, and continuously learn and adapt with new data. Because false positives are reduced, investigators then focus on true anti-money laundering activities to create a more efficient, accurate solution, and at the same time reduce operational costs. Suspicious activity reports are finally living up to their name of truly documenting suspicious behavior as opposed to random red flags in a rules-based system.

Biometrics-Based Fraud Detection
Fraudulent credit card activity is one area where artificial intelligence has made great progress in detection and prevention. But there are other interesting applications that are strengthening financial services companies’ overall value proposition. Account origination fraud – where fraudsters open fake accounts using stolen or made-up information – more than doubled in 2015. That’s because there’s no way to prove with absolute certainty that the person on the mobile device is who they say they are. AI technologies are being developed to compare a variety of biometric indicators – such as facial features, iris, fingerprints, and voice – in order to allow banks and financial service companies to confirm the user’s identity in far more secure ways than just a pin number or password. Mastercard, for example, unveiled facial recognition “security checks” for purchases made on mobile phones. Given its potential to protect user’s identities from being stolen or abused, biometrics in the context of banking and financial services may face fewer regulatory hurdles than practices undertaken by Facebook and Google, both of whom have faced class action lawsuits. This is allowing financial services to move much faster in the field of biometrics.

Beyond the Wallet
The tech giants are in an arms race to acquire as many AI and machine learning startups as possible. But the one thing they don’t have yet, and financial services companies do, are massive amounts of financial data. Up until now, financial services companies required a tremendous amount of experience and human judgment in order to analyze this financial data and provide cost-effective, competitive products and services. However, by adopting “out-of-the-box” AI data products that can ingest huge amounts of data, banks and financial services companies are making valuable predictions and insights in real-time that drive revenue and reduce inefficiencies. The five applications above are not simply isolated use cases, but bellwethers of how intimately AI will be directly tied to enterprise-level financial strategy.

Source: paymentsjournal.com

IoT – Take Charge of Your Business and IT Insights Starting at the Edge

Instead of just being hype, the Internet of Things (IoT) is now becoming a reality. Gartner forecasts that 6.4 billion connected devices will be in use worldwide, and 5.5 million new devices will get connected every day, in 2016. These devices range from wearables, to sensors in vehicles the can detect surrounding obstacles, to sensors in pipelines that detect their own wear-and-tear. Huge volumes of data are collected from these connected devices, and yet companies struggle to get optimal business and IT outcomes from it.

Why is this the case?
Rule-based data models limit insights. Industry experts have a wealth of knowledge manually driving business rules, which in turn drive the data models. Many current IoT practices simply run large volumes of data through these rule-based models, but the business insights are limited by what rule-based models allow. Machine Learning/Artificial Intelligence allows new patterns to be found within stored data without human intervention. These new patterns can be applied to data models, allowing new insights to be generated for better business results.
Analytics in the backend data center delay insights. In current IoT practice, data is collected and analyzed in the backend data center (e.g. OLAP/MPP database, Hadoop, etc.). Typically, data models are large and harder to deploy at the edge due to IoT edge devices having limited computing resources. The trade-off is that large amounts of data travel miles and miles of distance, unfiltered and un-analyzed until the backend systems have the bandwidth to process them. This defeats the spirit of getting business insights for agility in real-time, not to mention the high cost of data transfer and ingestion in the backend data center.
Lack of security measures at the edge reduce the accuracy of insights. Current IoT practice also only secures the backend while security threats can be injected from edge devices. Data can be altered and viruses can be injected during the long period of data transfer. How accurate can the insights be when data integrity is not preserved?

The good news is that H2O can help with:
Pattern-based models. H2O detects patterns in the data with distributed Machine Learning algorithms, instead of depending on pre-established rules. It has been proven in many use cases that H2O’s AI engine can find dozens more patterns than humans are able to discover. Patterns can also change over time and H2O models can be continuously retrained to yield more and better insights.
Fast and easy model deployment with small footprint. The H2O Open Source Machine Learning Platform creates data models, with a minimal footprint, that can score events and make predictions in nanoseconds. The data models are Java-based and can be deployed anywhere with a JVM, or even as a web service. Models can easily be deployed at the IoT edge to yield real-time business and IT insights.
Enabling security measures at the edge. AI is particularly adept at finding and establishing patterns, especially when it’s fed huge amounts of data. Security loopholes and threats take on new forms all the time. H2O models can easily adapt as data show new patterns of security threats. Deploying these adaptive models at the edge means that threats can be blocked early on, before they’re able to cause damage throughout the system.

There are many advantages in enabling analytics at the IoT edge. Using H2O will be crucial in this endeavor. Many industry experts are already moving in this direction. What are you waiting for?

Spam Detection with Sparkling Water and Spark Machine Learning Pipelines

This short post presents the “ham or spam” demo, which has already been posted earlier by Michal Malohlava, using our new API in latest Sparkling Water for Spark 1.6 and earlier versions, unifying Spark and H2O Machine Learning pipelines. It shows how to create a simple Spark Machine Learning pipeline and a model based on the fitted pipeline, which can be later used for prediction whether a particular message is spam or not.

Before diving into the demo steps, we would like to provide some details about the new features in the upcoming Sparkling Water 2.0:

  • Support for Apache Spark 2.0 and backwards compatibility with all previous versions.
  • The ability to run Apache Spark and Scala through H2O’s Flow UI.
  • H2O feature improvements and visualizations for MLlib algorithms, including the ability to score feature importance.
  • Visual intelligence for Apache Spark.
  • The ability to build Ensembles using H2O plus MLlib algorithms.
  • The power to export MLlib models as POJOs (Plain Old Java Objects), which can be easily run on commodity hardware.
  • A toolchain for ML pipelines.
  • Debugging support for Spark pipelines.
  • Model and data governance through Steam.
  • Bringing H2O’s powerful data munging capabilities to Apache Spark.
  • In order to run the code below, start your Spark shell with attached Sparkling Water JAR or use sparkling-shell script that already does this for you.

    You can start the Spark shell with Sparkling Water as follows:

    $SPARK_HOME/bin/spark-submit \
    --class water.SparklingWaterDriver \
    --packages ai.h2o:sparkling-water-examples_2.10:1.6.5 \
    --executor-memory=6g \
    --driver-memory=6g /dev/null
    

    Preferable Spark is Spark 1.6 and Sparkling Water 1.6.x.

    Prepare the coding environment

    Here we just import all required libraries.

    import org.apache.spark.SparkFiles
    import org.apache.spark.ml.PipelineModel
    import org.apache.spark.ml.feature._
    import org.apache.spark.ml.h2o.H2OPipeline
    import org.apache.spark.ml.h2o.features.{ColRemover, DatasetSplitter}
    import org.apache.spark.ml.h2o.models.H2ODeepLearning
    import org.apache.spark.sql.types.{StringType, StructField, StructType}
    import org.apache.spark.sql.{DataFrame, Row, SQLContext}
    import water.support.SparkContextSupport
    import water.fvec.H2OFrame
    

    Add our dataset to Spark environment. The dataset consists of 2 columns where the first one is the label ( ham or spam ) and the second one is the message itself. We don’t have to explicitly ask for Spark context since it’s already available via sc variable.

    val smsDataFileName = "smsData.txt"
    val smsDataFilePath = "examples/smalldata/" + smsDataFileName
    SparkContextSupport.addFiles(sc, smsDataFilePath)
    

    Create SQL support.

    implicit val sqlContext = SQLContext.getOrCreate(sc)
    

    Start H2O services.

    import org.apache.spark.h2o._
    implicit val h2oContext = H2OContext.getOrCreate(sc)
    

    Create helper method which loads the dataset, performs some basic filtering and at last creates Spark’s DataFrame with 2 columns – label and text.

    def load(dataFile: String)(implicit sqlContext: SQLContext): DataFrame = {
    val smsSchema = StructType(Array(
    StructField("label", StringType, nullable = false),
    StructField("text", StringType, nullable = false)))
    val rowRDD = sc.textFile(SparkFiles.get(dataFile)).map(_.split("\t")).filter(r => !r(0).isEmpty).map(p => Row(p(0),p(1)))
    sqlContext.createDataFrame(rowRDD, smsSchema)
    }
    

    Define the pipeline stages

    In Spark, a pipeline is formed of two basic elements – transformers and estimators. Estimators usually encapsulate an algorithm for model generation and their output are transformers. During fitting the pipeline stage, all transformers and estimators are executed and estimators are converted to transformers. The model generated by the pipeline contains only transformers. More about Spark pipelines can be found on Spark’s pipeline overview

    In H2O we created a new type of pipeline stage, which is called OneTimeTransformer. This transformer works similarly to Spark’s estimator in a way that it is only executed during fitting the pipeline stage. It does not however produces a transformer during fitting pipeline stage and the model generated by the pipeline does not contain this OneTimeTransformer.
    An example for one-time transformer is splitting the input data into a validation and training dataset using H2O Frames. We don’t need this one-time transformer to be executed every time we do prediction on the model. We just need this code to be executed when we are fitting the pipeline to the data.

    This pipeline stage is using Spark’s RegexTokenizer to tokenize the messages. We just specify input column and output column for tokenized messages.

    val tokenizer = new RegexTokenizer().
        setInputCol("text").
        setOutputCol("words").
        setMinTokenLength(3).
        setGaps(false).
        setPattern("[a-zA-Z]+")
    

    Remove unnecessary words using Spark’s StopWordsRemover.

    val stopWordsRemover = new StopWordsRemover().
        setInputCol(tokenizer.getOutputCol).
        setOutputCol("filtered").
        setStopWords(Array("the", "a", "", "in", "on", "at", "as", "not", "for")).
        setCaseSensitive(false)
    

    Vectorize the words using Spark’s HashingTF.

    val hashingTF = new HashingTF().
        setNumFeatures(1 << 10).
        setInputCol(tokenizer.getOutputCol).
        setOutputCol("wordToIndex")
    

    Create inverse document frequencies based on hashed words. It creates a numerical representation of how much information a
    given word provides in the whole message.

    val idf = new IDF().
        setMinDocFreq(4).
        setInputCol(hashingTF.getOutputCol).
        setOutputCol("tf_idf")
    

    This pipeline stage is one-time transformer. If setKeep(true) is called in it, it preserves specified columns instead
    of deleting them.

    val colRemover = new ColRemover().
        setKeep(true).
        setColumns(Array[String]("label", "tf_idf"))
    

    Split the dataset and store the splits with the specified keys into H2O’s distributed storage called DKV. This is one-time transformer which is executed only during fitting stage. It determines the frame, which is passed on the output in the following order:

    1. If the train key is specified using setTrainKey method and the key is also specified in the list of keys, then frame with this key is passed on the output
    2. Otherwise, if the default key Р“train.hex” is specified in the list of keys, then frame with this key is passed on the output
    3. Otherwise the first frame specified in the list of keys is passed on the output
    val splitter = new DatasetSplitter().
      setKeys(Array[String]("train.hex", "valid.hex")).
      setRatios(Array[Double](0.8)).
      setTrainKey("train.hex")
    

    Create H2O’s deep learning model.
    If the key specifying the training set is set using setTrainKey, then frame with this key is used as the training frame, otherwise it uses the frame from the previous stage as the training frame

    val dl = new H2ODeepLearning().
      setEpochs(10).
      setL1(0.001).
      setL2(0.0).
      setHidden(Array[Int](200, 200)).
      setValidKey(splitter.getKeys(1)).
      setResponseColumn("label")
    

    Create and fit the pipeline

    Create the pipeline using the stages we defined earlier. As a normal Spark pipeline, it can be formed of Spark’s transformers and estimators, but it also may contain H2O’s one-time transformers.

    val pipeline = new H2OPipeline().
      setStages(Array(tokenizer, stopWordsRemover, hashingTF, idf, colRemover, splitter, dl))
    

    Train the pipeline model by fitting it to a Spark’s DataFrame

    val data = load("smsData.txt")
    val model = pipeline.fit(data)
    

    Now we can optionally save the model to disk and load it again.

    model.write.overwrite().save("/tmp/hamOrSpamPipeline")
    val loadedModel = PipelineModel.load("/tmp/hamOrSpamPipeline")
    

    We can also save this unfitted pipeline to disk and load it again.

    pipeline.write.overwrite().save("/tmp/unfit-hamOrSpamPipeline")
    val loadedPipeline = H2OPipeline.load("/tmp/unfit-hamOrSpamPipeline")
    

    Train the pipeline model again on loaded pipeline just to show deserialized model works as it should.

    val modelOfLoadedPipeline = loadedPipeline.fit(data)
    

    Create helper function for predictions on unlabeled data. This method is using model generated by the pipeline. To make a prediction we call transform method with Spark’s Dataframe as an argument on the generated model. This call executes each transformer specified in the pipeline one after one producing Spark’s DataFrame with predictions.

    def isSpam(smsText: String,
               model: PipelineModel,
               h2oContext: H2OContext,
               hamThreshold: Double = 0.5):Boolean = {
      import h2oContext.implicits._
      val smsTextDF = sc.parallelize(Seq(smsText)).toDF("text") // convert to dataframe with one column named "text"
      val prediction: H2OFrame = model.transform(smsTextDF)
      prediction.vecs()(1).at(0) < hamThreshold
    }
    

    Try it!

    println(isSpam("Michal, h2oworld party tonight in MV?", modelOfLoadedPipeline, h2oContext))
    println(isSpam("We tried to contact you re your reply to our offer of a Video Handset? 750 anytime any networks mins? UNLIMITED TEXT?", loadedModel, h2oContext))
    

    In this article we showed how Spark’s pipelines and H2O algorithms work together seamlessly in Spark environment. We strive to be consistent with Spark API in H2O.ai and make the life of a developer/data scientist easier by hiding H2O internals and exposing the APIs that are natural for Spark users.