Moving from the Lab to the Mainstream - Microsoft's Machine-Learning Tool

Image Credit: Microsoft

Reinforced Learning systems are ready to make the jump from the research bench to real-world applications.

Wondering why you are still seeing advertisements for holidays in your browser during a global pandemic? It's because most companies that place ads generally use traditional Machine Learning (ML) models that make predictions based on past behaviors. Thus, when the temperature drops, these models suggest a trip to some far-flung destination, because that has worked in the past. Therefore the same ads are placed even if consumers are unlikely to be going anywhere due to lockdown restrictions. 

The issue arises from the fact that ML systems rely heavily on past experience, and can't consider changing circumstances to adapt to customer preferences, only adapting to new stimuli if they are specifically retrained to do so. That could be about to change, however. 

Microsoft is introducing Reinforce Learning (RL) systems to developers, services 'learn' more like human beings, rather than being trained with a body of data. 

Just like teaching a human being, RL systems need 'feedback' to learn to use trial and error, with decision processes improved by reinforcement and rewards. The system then balances exploitation of existing data against rewards for exploring new options¹.

One of the issues with adapting to RL systems has always been the fact that they require large amounts of data to function. But Microsoft says their new systems can operate on data sets smaller than those used in traditional ML.

Micro-Choices and Minecraft

The reliance on feedback to learn means that RL systems are well suited to making small choices in repeated situations, adjusting behavior and learning from the results.

The foundational work in developing RL can be traced back as far as 1992² and over recent years it has been used to teach computers how to play video games such as Minecraft that have in-built rewards. Now techniques developed by Microsoft are emerging from the lab and moving into production. 

Perhaps the most significant embodiment of this development is Personalizer, part of Azure Cognitive Services on the Azure AI Platform³. The service was originally devised in conjunction with MSN and Bing to personalize the news received by users. 

Personalizer's Application Programming Interface (API)  —  software that allows applications to communicate with each other  —  prioritizes relevant content, layouts. This results in a more suitable outcome for a user every time they use an app, rather than delivering the same set of options.

This option customization delivered an impressive boost in 'clicks' on MSN and increased engagement with products on the Microsoft homepage and can be applied by any website that wants to adopt the API.

Problem Solving and 'Nudging' User Habits

As well as providing personalized options for a user, RL systems can also make decisions in a range of other areas. This results in on-the-spot problem-solving. An example that the Microsoft research team is working on is systems that can assess a virtual machine (VM) encountering problems and decide whether to fix the issues or simply reboot it.

Microsoft's Anomaly Detector depends on RL and is already being used to scan Windows, Office, Bing, and over 200 other Microsoft products for spikes and dips in activity as well as unexpected changes in trends. Building upon this is the new Metrics Advisor, another RL based system that will delve even deeper than Anomaly Detector gathering more data and suggesting courses of action. 

This could be of immense use to current applications by adding features to monitor business metrics, automate computer operations, and predict when maintenance will have to be performed. 

RL systems could also go further than making their own decisions, however, even influencing the user to make choices. This has already been demonstrated by getting a user to select a new app or game from the Microsoft or X-Box Live homepage, but it could have more positive effects too.

The Microsoft team is looking at ways that RL could 'nudge' users into adopting healthy behavior  —  for example, a chatbot could encourage users to walk more by displaying different messages. 

Applying Reinforced Learning to the 'Real World'

Just like feedback is crucial for learning in RL systems, it is also vital for researchers seeking to develop these systems. Whilst RL is refined enough to be employed by Microsoft in certain applications when it comes to 'real world' circumstances, work still needs to be done.

The Microsoft team points out that the key to this adaption is getting the reinforcements and 'rewards' right. An important element of this is eliminating bias, not an easy task in RL. Bias can be easily seen in ML via the data sets' fed' to a system by an operator, but bias in RL is much subtler. 

Much of this could be related to teaching RL systems to distinguish between short and long-term rewards. As an example, filling Bing with a huge volume of ads could provide short-term rewards in the form of more click-throughs. Yet, ultimately the long-term effect would likely be increased use of ad-blockers or even the adoption of another search engine.

Thus, whilst it's easy to teach an RL system with immediate rewards, adapting them to work with delayed rewards and comprehend how the environment they occupy operates is important. Doing this could make RL systems primed for use in various forms of recognition software personalizing interfaces. 

Ultimately, this could require teaching RL systems a modicum of imagination.


  1. Sutton. R. S., Barto. A. G., [2018], 'Reinforcement Learning: An Introduction,' MIT Press, []
  2. William. R.J., [1992], 'Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning,' Machine Learning, []
  3. Azure AI, []

Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.

Robert Lea

Written by

Robert Lea

Robert is a Freelance Science Journalist with a STEM BSc. He specializes in Physics, Space, Astronomy, Astrophysics, Quantum Physics, and SciComm. Robert is an ABSW member, and aWCSJ 2019 and IOP Fellow.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Lea, Robert. (2021, February 02). Moving from the Lab to the Mainstream - Microsoft's Machine-Learning Tool. AZoRobotics. Retrieved on June 23, 2024 from

  • MLA

    Lea, Robert. "Moving from the Lab to the Mainstream - Microsoft's Machine-Learning Tool". AZoRobotics. 23 June 2024. <>.

  • Chicago

    Lea, Robert. "Moving from the Lab to the Mainstream - Microsoft's Machine-Learning Tool". AZoRobotics. (accessed June 23, 2024).

  • Harvard

    Lea, Robert. 2021. Moving from the Lab to the Mainstream - Microsoft's Machine-Learning Tool. AZoRobotics, viewed 23 June 2024,

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.