Exporting RandomForest Models to Java Source Code

This post shares a tiny toolkit to export WEKA-generated Random Forest models into light-weight, self-contained Java source code for, e.g., Android.

It came out of my need to include Random Forest models into Android apps.

Previously, I used to use Weka for Android. However, I did not find a way to export a Random Forest model in a way that my apps can load it reliably across devices, so the apps had to compute the model on each start — which can take minutes.

androidrf solves the problem in a simple way: a python script parses the console output of WEKA when training a RandomForest model with the -printTree option enabled. Then, it creates a single Java source file implementing those trees with simple if-then statements.

The library ships with three additional Java classes that allow to run and test the generated classifiers.

The code is available on Github under the MIT Licence: androidrf

How to use it

(for people who are familiar with WEKA):

Load your data set into WEKA, choose RandomForest as classifier, and enabled the ‘printTrees’ option for your RandomForest classifier. Hint: limit the depth of the trees with the ‘maxDepth’ option, because otherwise the resulting source files may become huge.

Screen Shot 2015-06-30 at 15.50.09

Save the output of the results buffer into a .txt file. Best, save it into the ‘data’ folder of the androidrf project.

Screen Shot 2015-06-30 at 15.50.26

Open a terminal, enter the ‘data’ folder of the androidrf project, and execute
python to_java_source.py -M filename (without .txt).

Screen Shot 2015-06-30 at 17.53.07

A class with the name FilenameRandomForest should appear in androidrf/src/org/pielot/rf

Screen Shot 2015-06-30 at 17.55.50

All you need to do is to copy the Java class together with the three pre-existing Java classes (Prediction, Evaluation, RandomForest) into your project. It should compile without error.

Screen Shot 2015-06-30 at 18.00.33

The features have been added as fields to your classifier. Hence, in order to specify the features, simply populate those fields. Then, run runClassifiers(List predictions) to obtain a Prediction with the details of the prediction (predicted class, certainty, ..).

Voila! You have a light-weight, portable, working Random Forest model.

 

Share this:
FacebookTwitterGoogle+WhatsAppPocketShare

Cost Explosion in the Health System

Research projects on eHealth are often motivated by the so-called cost explosion that we are supposed to face in Europe. For example, The Economist Insights writes that “basic problem is the spiralling cost of healthcare” and that “healthcare systems [..] are facing financial ruin“.

I used to believe this prediction until a very good book called Lügen mit Zahlen: Wie wir mit Statistiken manipuliert werden (Lying with Numbers: How we are being manipulated with statistics) by Gerd Bosbach, Jens Jürgen Korff offered a different perspective. Since the book is only available in German, I decided to re-do their calculations on my own and present the results in English.

I downloaded the total yearly expenses of the German public health system from the official institution: Gesundheitsberichtserstattung des Bundes. This is how the numbers look:

CostExplosion1

Phew. Certainly not an explosion, but the curve is clearly pointing skywards.

But wait! Did you notice the y-axis? It does not start at 0, a common trick, as pointing out by Bosbach and Korff, to make changes look more dramatic.

Let’s fix the y-axis:

CostExplosion2

Okay. That looks less worrying now. But still, the curve points upward. The cost may not be “spiraling”, but the numbers are certainly getting bigger.

But wait! Numbers are always getting bigger. This is called inflation. So let’s put these number into perspective by showing them as fraction of the German GDP (source: statista.com):

CostExplosion3

Look at that! Expenses have been hovering are around 10-11% of the GDP. It seems convincing that a society should spend a stable fraction of its wealth on health.

Admittedly, there is a slight increase. If I draw a linear trend line on this diagram, health cost will be 16.8% of the GDP in 2060 — but how can really tell what will happen in those 45 years from now.

On the other hand, this is also a matter of how to select the data. If I had only shown data from 2009 – 2013, a trend line computed in these figures would even have looked as if the relative health costs were decreasing.

So, next time somebody pull the “cost-explosion-in-health-system” card, hit them with facts.

Share this:

How fast people expect responses to texts and messages

In February 2013, we did a survey across 44 mobile phone users asking two questions:

Think about the people you exchange the most messages with via your mobile phone:

  1. On average, how fast do they typically respond to one of your messages?
  2. On average, how fast do you typically respond to one of their messages?

The results are stunning:

64% of the respondents believe that people with whom they message the most typically respond to their messages immediately or within a few minutes. Only 9% expect responses after more than an hour.

How fast do THEY respond

68% of the respondents believe that they typically respond to people with whom they exchange a lot of messages immediately or within a few minutes. Only 6% typically respond after more than an hour.

How fast do YOU respond

These numbers are notable, because they reflect people’s expectations. If a friend typically responds immediately, it might feel strange when one day s/he doesn’t. Also, if oneself typically responds within minutes, one might start feeling anxious if circumstances prevent to respond to a message for hours.

In another study, the Do Not Disturb Challenge, where people disabled notifications across all devices for a day, we actually had instances where participants did not respond fast enough and friends got angry as a consequence.

Think about how drastic these expectations are: many activities, such as meetings, driving to work, attending classes, last a lot longer than a few minutes – and they require people’s full attention. Hence, people are faced with a choice: text during meetings or from behind the wheel, or violate expectations.

 

Share this:

The Do Not Disturb Challenge (CHI ’15)

Notifications are alerts intended to draw attention to new online content. Traditionally used in text messaging, email clients and desktop instant messengers, notifications are becoming used by all types of applications across all types of computing devices.

Today in 2015, we are still living in the ‘wild-west land-grab phase’ of notifications: more and more OSes introduce notification centers and more and more apps generate notifications. However, little is known about how the increasing number of notifications affect us.

Hence, in a collaboration between the Scientific Group of Telefonica R&D and Human-Computer Interaction Institute at Carnegie Mellon University, Luz Rello and I envisioned the Do Not Disturb Challenge. As part of challenge, participants disable notifications on their phones, tablets, and computers for a full day.

In December 2014, we rolled out a pilot of the Do Not Disturb Challenge with 12 participants. While participants reacted wildly different to the lack of notifications, for many, it was a strong experience.

The hugest impact was social. People have come to expect timely responses to their messages. Without notifications, many participants felt no longer able to meet these expectations. Some were informing others before the study that they would be less responsive, some kept constantly checking the phone.

At the same time, many participants noted that without the constant interruptions by notifications, they felt more focus, relaxed, and productive. Others realised that not all notifications are the same and deserve the same treatment. For example, many participants felt relieved by the absence of group-chat notifications.

Probably the main take-away so far is that people have very strong and polarized opinions towards (missing) notification alerts. The only consistent findings across the participants was that none of them would keep notifications disabled altogether. Notifications may affect people negatively, but they are essential: can’t live with them, can’t live without them.

The results will be presented at CHI ’15: the ACM Conference on Human Factors in Computing Systems (CHI) to be held from April 18 – 23 in Seoul, South Korea.

Martin Pielot and Luz Rello
The Do Not Disturb Challenge – A Day Without Notifications
CHI EA ’15: Extended abstracts on Human factors in Computing Systems, 2015.

Share this:

Correlations and Causation

Why Using Your Phone Less Won’t Necessarily Make You Healthier

There is evidence that resisting the pull of your device can lead to healthier living.”

This is the conclusion of the article Trying to Live in the Moment (and Not on the Phone) from citing “a recent study by researchers at Kent State University found that students who were heavy cellphone users tended to report higher anxiety levels and dissatisfaction with life than their peers who used their phones less often. 

Does this mean you should throw your mobile phone out of the window right now to live a healthier life??

The answer is no.

What we are reading in this except from the article is a classic misinterpretation of causation and correlation.

Let’s assume the findings are universally true and students who use their cellphone a lot report higher anxiety levels and dissatisfaction with life, then there are three possible explanations:

  1. As the article concludes, the use of cellphones indeed increases anxiety and dissatisfaction. In this case, use of cellphone is the cause and anxiety and dissatisfaction the effects.
  2. However, it could as well be true that cause and effect are reversed: anxiety and dissatisfaction turn people into heavy cellphone users.
  3. Finally, there is the possibility of a tertium quid, an unknown third factor that causes both. For example, people who find it more difficult to interact with others directly may prefer to use the phone, and at the same time be more anxious and dissatisfied with life.

Thus, using the phone less may not make anxiety and dissatisfaction disappear.

 

Share this:

An In-Situ Study of Mobile Phone Notifications (MobileHCI ’14)


Notifications on mobile phones alert users about new messages, emails, social network updates, and other events. However, little is understood about the nature and effect of such notifications on the daily lives of mobile users. Hence, we conducted a one-week, in-situ study involving 15 mobile phones users, where we collected real-world notifications through a smartphone logging application alongside subjective perceptions of those notifications through an online diary.

In summary, we found that mobile phone users have to deal with a large volume of notifications, mostly from messengers and email, each day (63.5 on average per day), which was perceived as the usual. Notifications were largely checked within a few minutes of arrival, regardless of whether the phone was in silent mode or not. Notifications from messengers and social networks were checked fastest.

In particular in the case of personal communication, explanations for these fast reaction times related to high social expectations and the exchange of time-critical information.
Increasing numbers of notifications, in particular from email and social networks, correlated with negative emotions, such as stress and feeling overwhelmed. Personal communication, on the other hand, also related to increased feelings of being connected with others.

These findings highlight that strategies are needed to lower negative emotions. Reviewing previously explored approaches, our findings imply that reducing interruptions and deferring notifications may work in a professional context. For a personal context, strategies around communicating (un)availability and managing expectations appear more suited.

This research is described in detailed in the paper An In-Situ Study of Mobile Phone Notifications, which will be presented at the ACM SIGCHI Conference on Human-Computer Interaction with Mobile Devices and Services, held in Toronto, Canada in September 2014.

Share this:

Large-Scale Evaluation of Call-Availability Prediction (UbiComp ’14)

Roughly 1/3 of all phones calls are not picked up. With this work, we explored whether the called phone can know in advance, whether its user is likely to pick up a call. This would allow to, amongst other things, communicate (non)availability in advance to the call or trigger intelligent muting.

thumbnail
This work shows that mobile phones can predict with an accuracy of 83.2% whether its user will accept an incoming phone call or not. When personalizing those models, the accuracy can be increased to 87%.Therefore, the phone needs to keep track of 15 features, such as the time since the last call, the day of the week, or the ringer mode. The 5 strongest predictors are:

(1) time since the last ringer mode change,
(2) time since the screen was last turned on or off,
(3) screen status (on/off),
(4) time since the phone was last (un)plugged, and
(5) time since the last call.

These findings show that it is possible to create an automated availability status for phone calls. Integrated into any phone call application, it could help to manage expectations by sharing the availability prediction with potential callers, and through that greatly impact the overall user experience. Further, knowing whether is a user is likely to take a call might be useful to intelligently allocate resources in a multi-device messenger environment.

To obtain the necessary data, we instrumented a previously-developed application called Silencer with anonymous data-collection facilities. During a two-month period, the app logged how 418 users reacted to 31311 phone calls. Alongside each call, the above mentioned 15 features were collected. Using a Random Forest, we computed the accuracy of a generic model, of personalized models with different numbers of calls (in average, 50 or more calls are needed to outperform the generic model – so it should be very quick to generate accurate personalized models), and to determine the prediction strength of each feature.

This research is described in detail in the paper Large-Scale Study of Call-Availability Prediction, which will be presented at the ACM International Joint Conference on Pervasive and Ubiquitous Computing, held in September 2014 in Seattle, USA.

Share this:

The fallacy of WhatsApp’s “last seen” status

Last Seen = Fast Response?

LastAccessed

When sending a message with WhatsApp, senders often check the receivers “last seen” status to judge whether the message will be read soon. It shows when the receiver had last openend the application.

it gives me a timeframe and allows me to estimate when my message will be read

Intuitively, if the receiver was online only recently, s/he is likely to be near the phone and see the message soon.

However, results from our recent study on predicting how fast people attend to message notifications indicates that “last seen” is almost as weak as a random guess.

How fast people view WhatsApp messages

We installed an app on the phones of 24 volunteers, which logged, amongst other things, each time that

  • WhatsApp is opened or closed,
  • a WhatsApp message is received, and
  • the user sees the WhatsApp message, either in the notification drawer or in the app

For these volunteers, the median delay between receiving and seeing a WhatsApp was 7.81 minutes, i.e. half of the messages were viewed within 7.81 minutes and the other half later.

We used this time to split the data set into two parts: fast = seen within 7.81 min, slow = seen after 7.81 min. This means, a random guess whether a users sees then message fast or slow has a chance of 50% to be correct.

Not much better than random guess

Next, we used the log data to train a state-of-the-art machine-learning model. We checked how well “last seen” allows it to predict whether the message is seen fast or slow.

It turned out that the prediction was correct in 58.8% of the cases — only 8.8% better than the random guess.

Do not overly rely on “last seen”

Of course, this study has its limitations. The 24 volunteers were in their late twenties and early thirties. Other demographics might exhibit different behavior.

However, the results indicate that we should not overly rely on “last seen” when we want to estimate the availability of our friends.

Share this:

Didn’t you see my message?! (CHI ’14)

“Didn’t you see my message?!”

For the younger generations, not receiving a timely response to a SMS or message is a major source of irritation and frustration.

However, people cannot or do not want to always attend to their phones all the time.

What if your phone would infer these situations and communicate them to your friends?

Only how would the phone know?

Our research at Telefonica Research shows that these predictions can be done by simply monitoring a phone’s screen status (on/off), ringer mode, proximity sensor, the hour of the day, and when the user last visited the notification center.

In a user study, where we tried the system with 24 participants over 2 weeks, we learned that half of the messages are viewed within 6.15 minutes, and the other half after that.

A machine-learning model created on the basis of this data can predict with an accuracy of 70.6% whether a message will be viewed within 6 minutes or later. If the prediction is that the message is going to be viewed within those 6.15 minutes, it is even more conservative: the precision of the model is 81.2% in this case.

This research will be presented at the ACM CHI Conference on Human Factors in Computing Systems, held in Toronto, Canada in May 2014.

Martin Pielot, Rodrigo de Oliveira, Haewoon Kwak, Nuria Oliver
Didn’t You See My Message? Predicting Reactiveness in Mobile Instant Messaging
Proc. CHI ’14 Conference on Human Factors in Computing Systems, ACM, 2014.

Share this:

Telefonica Research at CHI ’14

Telefonica Research will be represented with 2 full papers and 2 ToCHI articles at ACM CHI ’14, the premier international conference of Human-Computer Interaction.

Didn’t You See My Message?

Martin Pielot, Rodrigo de Oliveira, Haewoon Kwak, Nuria Oliver

We found that monitoring the phone (screen activity, notification center access, proximity sensor, ringer mode) allows to predict whether a person will attend to a received message fast or not (pdf).

A brief but more detailed description can be found in in more recent blog post.

Large-scale assessment of mobile notifications

Alireza Sahami Shirazi, Niels Henze, Martin Pielot, Dominik Weber, Albrecht Schmidt

As part of the study, we published an Android app on Google Play that forwards all phone notifications to the browser (via plugin). More than 40,000 people thought this was a brilliant idea and downloaded the app. We used the app as a vehicle to log and analyze all notifications that users receive (pdf).

A Large-scale Study of Daily Information Needs

Karen Church, Mauro Cherubini, Nuria Oliver

My colleagues have conducted one of the most comprehensive studies of information needs to date. For three months, they probed information needs via experience sampling and daily diaries, to understand “the types of needs that occur from day to day, how those needs are addressed and how contextual and demographic factors impact on those needs” (details on Karen’s website.)

Influence of Personality on Satisfaction with Mobile Phone Services

Rodrigo de Oliveira, Mauro Cherubini, Nuria Oliver

My colleagues connected the phone use habits of 603 volunteers with personality traits and customer satisfaction, and found that “(1) extroversion, conscientiousness, and intellect have a significant impact on customer satisfaction—positively for the first two traits and negatively for the latter; (2) extroversion positively influences mobile phone usage; and (3) extroversion and conscientiousness positively influence the users’ perceived usability of mobile services” (ACM Digital Library).

Share this: