Pages

Wednesday, September 25, 2024

Timescales

Humans are notoriously bad at predicting timescales—really bad. Think about it: how often have you started a task thinking it would take just an hour, only to find yourself working on it late into the night? Why do new projects, especially in software, almost always take longer than we anticipate? And remember those bold predictions from the 19th century? By now, we were supposed to be zipping around in flying cars.

These are just a few examples of a broader phenomenon I’ve been reflecting on lately: we consistently misjudge how long things will take.

One key reason for our poor time estimation is optimism bias. We tend to overestimate our efficiency and underestimate potential challenges, assuming things will go smoothly. This sets us up for unrealistic expectations. On a personal level, we often fail to consider distractions or unforeseen complexities. In larger, more complex projects like software development or infrastructure construction, we don’t account for technical setbacks, bureaucratic delays, or miscommunication among stakeholders. 

Optimism bias is a deeply ingrained psychological tendency. It stems from our hope that we can accomplish more in less time than is realistically possible. This bias isn’t just an individual trait—it affects collective decision-making as well.

Our inability to accurately predict timelines has far-reaching consequences. When governments, organizations, or industries miscalculate the duration of complex projects or reforms, the fallout can be severe. Resources get wasted, public frustration builds, and trust in institutions erodes. Infrastructure projects that drag on, often with budget overruns, disrupt communities and lead to economic inefficiencies. Similarly, underestimating the time required for social or political reforms can create disillusionment. Citizens expect rapid change but are often met with slow, incremental progress. This disconnect between expectation and reality can foster cynicism and weaken trust in leadership. It might even contribute to political instability, as people become frustrated with what they perceive as broken promises. In fact, I believe this growing impatience is one reason for the rising popularity of right-wing politics in many Western democracies. People are turning to alternatives that promise quicker, more decisive action.

Yet, some people seem to navigate these complexities more effectively. They appear to defy the unpredictability of timelines and achieve remarkable success. But are they really breaking the mold? I don’t think so. Instead, I believe these individuals are better at understanding the context in which they operate—and crucially, the timescales involved in their decisions. Consider an example: Let’s say you’re convinced that solar power is the future of energy. That’s a valid belief, but the more important question is: over what timescale? Widespread adoption of solar energy could take decades. If you ignore that reality, your belief remains just that—a belief. Worse, it’s not actionable in the present. And unless you’re working in long-term government policy, the present—and the near future—is what matters most for decision-making. Thirty years from now, you might have the satisfaction of saying, “I told you so,” but by then, you may have missed other opportunities because you misunderstood the timeline of your conviction.

Ultimately, success isn’t just about having the right ideas; it’s about understanding when those ideas, and the external factors needed to enable it, will play out, and aligning your decisions with that understanding. This, I start to believe, is the real magic sauce behind being able to make real progress and having real impact.

Wednesday, September 18, 2024

Excellence as founder

I never believed in micromanagement—probably because I never needed it as an employee. I was always self-driven and curious, which often led me to surprise my superiors with unexpected results. I was fortunate to work in environments where everyone around me was like that—first in academia, then in my first industry role. These experiences shaped me, making me believe that everyone was like this: self-driven, curious, treating work as more than "a job” and ultimately capable of achieving great things.


After years of running my first software startup, enduring painful lessons, and ultimately failing to scale, I realize now how naïve I was. Since I never believed in micromanagement to begin with, I didn't think it was necessary in a startup. I assumed that if I just hired motivated people, things would naturally fall into place. And to an extent, I did succeed in hiring motivated, genuinely good people. But I learned that while motivation is a necessary condition for success, it's not a sufficient one. Motivated, pleasant people create a good company culture, but if they lack the ability to push boundaries, think unconventionally, and figure out solutions in a high-stakes environment, that "niceness" amounts to nothing. What you really need are people who go beyond the ordinary—who push past what 95% of equally motivated people do. You need excellence.


As Reid Hoffman (I think) famously said, you need "exceptional talent in all positions" if you want to build something truly big. I now understand why. For the future, I would never again assume that people will "grow" into excellence. You can't bet on anyone's growth trajectory. Exceptional growth can happen, but it's a bonus, not a given. As a founder, you need to hire people who already know what they’re doing and are excited by uncertainty—who are ready to figure out the unknown from day one. Otherwise, you're taking on too much risk as a founder. One of my past investors used to say, "As a startup, you have the freedom to do whatever you want, but you don’t have two things: time and money." Read that again. You don’t have the luxury of fooling around. For most first-time founders, this is challenging in many ways, but the core challenge is figuring out "what excellence looks like."


This is where it gets interesting, because I don’t think enough people talk about this: you have to figure out what excellence looks like. What does an excellent sales leader do? What does an excellent product person look like? What should you expect from an exceptional engineer? What do top-tier startup CEOs do to build partnerships and network? As a startup CEO, it’s your job to get a sense of what excellence looks like and whether you have it in your company. And to do this, you need to micromanage—not in the sense of controlling every move, but in being deeply involved with everything and everyone. You need to point out when something could be done better or faster and constantly push the boundaries - and you need to surround yourself with people who want exactly that.


Now, let me clarify: if you have a business idea that (for whatever reason) has a high likelihood of success, the approach might be different. In such cases, a team of motivated and diligent people may be enough to follow the established path and execute well. But most startups don't have that kind of certainty. Most are searching for product-market fit, which is why it's crucial to set a higher bar for talent and ensure every key position is filled by someone exceptional.


As CEO, your job is also to develop a sense of what excellence looks like and communicate that vision to your team. If your sales leader doesn't know how to handle SalesOps, you need to find a way to teach them, even if that means setting it up yourself based on what you’ve learned from others. Or, if necessary, replace them with someone who does know what excellence in that area looks like.


In my first startup, I failed to do all of this. It was the biggest, most painful lesson I've ever learned. In hindsight, it seems obvious. But at the time, I didn't see it. Even when people tried to tell me, I couldn't hear them. I was blinded by a misplaced sense of loyalty and the belief that excellence would eventually emerge if I just gave people time. That loyalty led to mediocrity and stagnation. It wasn't until late in the journey that I finally understood, flipped the switch, and embraced the responsibility of being "all over everything." It helped, but by then it was too late.


Well, now I know.

Thursday, June 13, 2024

Product first

Over the years I (painfully, by experience), became a believer in "product first". As a startup, everything you do at the beginning needs to lead to a great product. And everyone who tells you something else is stuck in the far away internet past. Yes, facebook or instagram of AirBnB or thousands of unknown but highly successful (SaaS) software companies could get by with shitty interfaces - 15 or 20 years ago. Times change, and the user expects flawless experiences - in B2B and B2C alike, and even for early stage products. Because the truth is, it's 10x easier than it was 15 years ago to actually build these great looking interfaces and user experiences. There is simply no way around. 

There is one exception though: and that is when your technology does something so urgently needed, so unique, that users will not mind if the UX & UI is sloppy. Now, every (!) founder will assume that their product belongs into this category. However, you should not assume you are that type of technology, because only one-in-a-million products belong to this category. So accept the reality of what users expect these days and simply go build it. That's the way to success.






Tuesday, June 4, 2024

Drowning in a sea of AI-generated content

There will be companies building highly personalized AI-generated content creators. Those will be available for $25 a month subscription and will allow everyone who has $25 to spare to seem "active" on social media. Especially LinkedIn will be plagued by a sea of generic AI generated content. And as users, we will be drowning in it. It's going to be a huge challenge for social media companies to filter relevant, and non-generic articles.

Monday, June 3, 2024

Noise Filtering

AI seems to be more of a information technology revolution than the "invent" of a new intelligent species. And it comes at the right time, to help us make sense out of the cacophony of information and voices out there in the internet. AI's immediate use-case is noise filtering - and will replace the search engine as we know it.

Sunday, September 18, 2022

Every B2B software company needs to be PLG (in some form)

I have been at SaaStr last week, the worlds leading enterprise SaaS festival. My main takeaway from the many conversations I had is that procurement of B2B software follows more and more the patterns of consumer purchase behavior. This is very much in line with this HBR article from earlier this year.

While this might seem to some like a "duh" insight, it has profound impact on how companies, and especially B2B SaaS startups, should think about customer acquisition. The often repeated mantra of "founder-led sales" still holds for your first customers. Clearly, you want to get into conversations to understand whether you are actually tackling a significantly important enough problem. But as soon as you start seeing signs of product-market fit, you should shift gears to some form of product-led growth. Does this need to be a fully-functional freemium model? Maybe not. I think the focus still has to be on building the core product in such a way that it provides value to your customers. But should potential buyers be enabled to experience your product - in some form or another - on their own, without needing to schedule a demo call? Definitely. In today's self-serve world the "aha moment" will not occur during a random Zoom call with your SDR, but during a self-guided discovery process on the buyer side.

Wednesday, June 24, 2020

You can't tell people anything

I came across this blog post today with the seemingly absurd title You can't tell people anything. Any other day I certainly would have missed it, but today it very much resonated with me, as it captures a deep feeling and insight that developed in me during the last few months.

At the beginning of the year we have be raising a smaller pre-seed round for experify.io. Since I considered myself sufficiently educated about what it takes on a founder's journey, "being an excellent storyteller" obviously was high on my list. And given my past, I always thought I know what that meant and I was confident that for me that would be one of the easier parts of the job.

All my life I spent time, so much time, explaining things to people. Being educated as a “man of reason”, I was convinced that if I’d only have the right arguments, presented in a way as comprehensible as possible, I can actually make people see and understand what I see and imagine. Of course, within the limitations of language and logic as outlined by Ludwig Wittgenstein, but still. If I could bring the pictures in my mind into the real world as best as possible, that has to make people see what I see.

Oh boy ...  have I been wrong.

Above article summarizes it perfectly: you cannot tell people anything. And that includes startups trying to explain their business to (early-stage) investors. It is impossible at that stage to convince anyone about your startup, unless they have some predisposition to believe in it. Because there is quite literally nothing to be convinced of at this stage. There is no way to get to a "reasonable early-stage investment decision". At this stage the whole game is unreasonable by definition. The best (and only) thing one can do as a startup founder is to tell a story that might ignite a fire in the other party. But for that to happen, the spark needs to be there already. So at the end, my task as a startup CEO is to reach out to hundreds of people, providing the right cues for that handful of potential investment partners that show a suitable predisposition for our business.

Everything else, is a waste of time. I learned the hard way. Do not try to convince people about your startup. It's impossible. Either they "feel it", or they don't . And if they don't, then move on. Do not waste your time.

Network Effects

The whole business world talks about (and chases) network effects. But there is a surprising amount of misunderstanding, also and especially among early-stage investors. When I talk about our vision for experify.io I always stress the point that our business model and (future) defensibility heavily relies on network effects. More often than not what I hear from the other side of the table is something in the line of:

You don't have network effects. Network effects is something you see in social networks leading to exponential user growth."

But that is not the network effect I am talking about. That is called virality. Clearly you can use networks to create virality / exponential growth, e.g. when every user invites > 1 users to your product, but it's not what is generally meant when talking about network effects.

Here is a link to an article by nfx, that very comprehensively explains the several types of network effects. The key take-away is: when talking about network effects, keep the following definition in mind:

A company is said to show network effects if its product(s) become not less, but more valuable with usage.

(By the way: Scott Galloway calls it the Benjamin Button effect -- which I like better, as it stresses the "more valuable with time" nature more and avoids the very common above misunderstandings.)

Saturday, November 23, 2019

Haiku #5

Herbsttag winterlich
Blätter fallen wunderschön
Wasser klar wie Eis

Friday, November 22, 2019

Haiku #4

A strange day in life
Uncertainty everywhere
But good faith unbowed

Monday, January 1, 2018

Haiku #3

ein räucherstäbchen
neues jahr, ein lauter gong
klarheit ganz und gar 

Thursday, September 7, 2017

Academia and Social Media l WTF

I feel so outraged at the moment that I simply cannot not write about it. So here we go, this piece in Science last week:

http://science.sciencemag.org/content/357/6354/880.2#1504269787069

Summary: Scientists should get / be their own social media influencers to popularize their research.

Great idea -- I already envision future tenure requirements: "The successful candidate has at least 50k Twitter followers and maintains a vast network of social media influencers."

Seriously? Are you shitting me?

That kind of bullshit is one of the reasons that drove me away from an academic career path before I even finished my PhD. I am so disgusted by it. And believe me, many people in academia are.

So what is the problem, you might ask. Very simple: if someone starts this kind of thing it quickly becomes the norm, up to the point that scientists will be evaluated against their ability to achieve social media reach. Don't believe me? Well, we already have seen this happening in science in the last 10+ years.
I am talking about "citation metrics", especially the infamous h-index. In many places it literally became the "gold standard" for evaluating scientists. Might it be for hiring decisions, tenure decisions or simply decisions on whether or not to grant a proposal. People will look at your publication history and judge it solely based on how well it has been received by others. Sounds all very reasonable at first, but turns out to be fatally flawed. Why? It promotes "hype research". If the metric I have to optimise to achieve my academic career goals (i.e. get a permanent position) is reach, I will engage in research that currently resonates with as many people as possible. Let me repeat this very slowly: people - will - engage - in - research - that - is - well - perceived. I don't know about you, but for me this rings a very loud alarm bell. This undermines the most important pillars of academia: intellectual independence and the possibility, even the obligation, to engage in unpopular research topics: to be an independent mind; to explore the unknown, the un-hyped. However, incentive schemes like the current ones, make this harder and harder -- especially for young researchers.
I know a couple young assistant professors who bluntly told me that for the next years they simply have "to play the game", do the research that their peers want and if they get tenured they'll be able to explore more freely. This is not utopia. It already is reality in academia. But even worse, once you engaged in "hype research" for six years, and let's say you were able to build yourself a reputation, do you think people will stop doing what they are doing? The apparent fame, the visibility, the invited talks, the citations -- it's basically the opium of science. And what you end up with is a bunch of attention whores, people who take themselves way too serious.

I know this won't resonate with everyone in academia. And it is good that it does not, as there are still academic communities where all of this is less pronounced. But still, a large portion of academia already went into this direction and in our fury to measure success, many others will follow. Also, given that there are way too less permanent academic positions for all the aspiring PhD students and PostDocs, the question on judging the potential of people is indeed a huge challenge. And there must be some kind of objective measure. I just don't think it's citation metrics, and for sure it's not social media reach.

Sunday, June 18, 2017

"Startup"

The word "Startup" is used in an inflationary way these days. Most people seem to not know that:
A startup is a company designed to grow fast.
http://www.paulgraham.com/growth.html

Wednesday, May 24, 2017

Yoshua Bengio on human level AI




Key take-aways, in case you don't have time to watch it.

We are still very far from human-level AI.

Everyone should be aware of that before jumping into crazy scared horror scenario conversations about machines taking over the world and killing the human race.

There is too much hype about AI these days. Especially in public discussion / public media.

I want to add: which is based on a non-existing understanding within the general public on how the current "AI" methods work and what they are able to (not) do. I even think that "Artificial Intelligence" is a very misleading name for the currently available methods. And big, historically trusted players like IBM marketing their efforts as "cognitive computing" is in my opinion even worse. It's misleading and almost deceiving. After listening to talks (read: sales pitches) about IBM's Watson, I got asked by executives if I think their AI could help them to solve *insert really tough business problem here*, because it seems to be so much smarter and efficient than real people).

Deep networks can avoid the curse of dimensionality for compositional functions.

Which also means, they can only learn to do tasks that could be expressed as such. Which again means that tasks that cannot be decomposed cannot be learned. Is creativity such a task?

Thursday, May 11, 2017

Haiku #2

Mond am Firmament
Nachtigall erklingt am See
Leere, Nichts und Ruh

Wednesday, May 10, 2017

Things unsaid at #mcb17 #rp17



Forty-five minutes went by pretty fast and many things have been left unsaid. So here is a collection of thoughts and responses to yesterday's panel at #mcb17 #rp17 and the event in general.

As I said at the end of the panel, I think we all have to learn to differentiate when we talk about "data". It seems that when "data" is said "user data that is used for advertising purposes" is meant. But optimising ads is only one minor part of what user data can be used for. Let me explain.

Datenkraken

How does the advertising business work? Well, at the end it's solving a matching problem: to get "the right" ads to "the right" user. This is, in principle not bad at all, as both, advertiser and user, have a clear benefit from that: the advertiser makes good use of it's marketing bucks, the user (best case) gets product information that is actually of interest. So how do advertisers know where to allocate their marketing budget? Which user might be most interested in their product? That is where big global companies and ad networks come into play, whose business model is to "get to know" the user, in order to (based on statistical arguments) be able to predict which ad might be most interesting to the user. For those companies having as much data as possible, and especially as diverse ("multi-dimensional") data as possible, is a clear market advantage: the more dimensions and the more complete the data, the better the statistical models to predict what ad might be interesting to the user. Short: user data is an integral part of their business model. I do not want to discuss whether this business model is legitimate or not, just want to state what it is, in order to come to the next point.

Data as means to learn and provide service
Every shop owner, to employ a offline real-life example, observes her shop. She observes how people enter, what they look at, which corners they might never go to and at the end what they buy. Short: she gathers data. I think it's fair to say that no one would expect from the shop owner to not use that data in order to optimise their business: to order more of item A, to exclude item B and to maybe re-arrange the shop in such a way that her customers have a better shopping experience and, of course, increasing her profit. When at the panel it was said that it's "to some extent" understandable that website providers might utilise data to provide better service, but that still(!) at the end it's about making more profits, the only thing to reply here is: "yes, of course". It's always about increasing profits in business. But is this bad? No, because increasing your profits means that you were apparently able to provide a product or a service that consumers want and are willing to pay for. (Clearly, if you are not into markets and business in general, you will not buy this argument.) Short: in this scenario user data is used to optimise another underlying business model. 

Data in a (news) publishing company

And this is also how we see data. We are not an ad network. Our business model is not optimising ads. Our business model is to provide excellent high-quality journalism and to help our readers to be universally informed in a world of ever increasing complexity (other former content provides might see this different and actually shifted to the above outlined ad-network based business model). Data, on how our users use our products, might help us to transform the news media business into a "digitally native business". This is the real issue that (news) media companies face: how does news consumption in a digital-only world look like? In a world where there is an overabundance of information, mostly for free, a world where I could keep myself busy 24-7 with simply reading news. What value can a (former traditional) newspaper-publishing company provide? We strongly believe there is a value in what we do, as understanding and analysing a complex world is not getting less, but more important. And yes, we believe that there are people willing to pay for that service. Still data-enabled innovation is (in my personal opinion) an important key to succeed in re-defining, or let's say: extending, this very old business model into a more digital world. Specifically, I am convinced that we will see more "smart news products", news products that help us get the news and information we need in order to be up-to-date.

Personalisation --> Smart News Products

I just introduced the phrase "smart news products". Why? Sure, I could say "personalisation", but as the word "personalisation" seems to be emotionally charged and in negative connotation with "personalised ads", let's stick to "smart news products". So what in general makes a product "smart"? Mostly it's about being adaptive to the user who uses it. Example: my smart phone is, well, smart. Because it's not a one-size-fits-all product. It has apps, that customise my personal experience. Google's search is smart. It shows me from all those potentially million search results, the ones that are most likely important to me. So when I am in Berlin and I am searching the phrase "bakery", what I am actually looking for (most likely) is "bakery in Berlin" and google deduces this from my location. It's "smart", and it uses data (in this case the implicit information that I am in Berlin) to do so. In a similar way, I imagine "smart news products", news products that are to some extent adaptive to the user, using data. So now I hear you already thinking "filter bubble!". This is a tricky issue and worth a separate blog post. The only thing I want to say here is that "the filter bubble" is not an inevitable effect of using "algorithms" and data to build smart products. It can be avoided, and has to do with the question how I, as data scientist, design the recommendation algorithm for the smart news product. Sure, I could go ahead and say "you always read articles tagged X, I show you everything I have on X", but that's only *one* choice (and frankly a pretty bad one). I could also go ahead and say, "you always read articles tagged X and because I have this data, I will go ahead and show you articles that are not tagged X in order to increase the diversity of information you see". That  is another completely valid (and possible) design choice for the underlying algorithm. Technically it's not as simple as it sounds, but it's doable. Clearly, for a smart news product the reality has to lie somewhere in the middle, or both as two products, one named "more on X", one named "things you usually don't read". Would this be valuable added service to our readers? I don't know, but we will try. 


Tuesday, April 18, 2017

Haiku #1

snow falls in april
waters bursting, birds confused
stranger things with time

Thursday, February 23, 2017

Wednesday, December 28, 2016

From "The DevOps Handbook"

In an age where competitive advantage requires fast time to market and relentless experimentation, organizations that are unable to replicate these outcomes are destined to lose in the marketplace to more nimble competitors and could potentially go out of business entirely, much like the manufacturing organizations that did not adopt Lean principles.

Monday, December 19, 2016

Apache Zeppelin .... again

This is a „what I learned about Apache Zeppelin“ post.

To be honest, we have quite some issues with Apache Zeppelin:
-          With many concurrent users, it becomes unresponsive
-          It accumulates zombie processes
-          Once an interpreter has been started, it cannot be stopped from the Zeppelin interface and does not free up executor resources à clutters the system
-          User permissions are an issue as well

So here are a couple of learnings and/or insights.

1.)     Unresponsiveness & zombie processes
So the issue is the following: zeppelin starts spark in client mode, meaning that the spark driver process will not be distributed over the cluster, but will run on the submitting machine, which in our case is hdp-master. So, clearly, how many spark interpreters can be started is dependent on the resources available on hdp-master. At least according to one of the main developers, this resource restrictions should be more of a problem than the zeppelin daemon itself: http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Limit-on-multiple-concurrent-interpreters-isolated-notebooks-tp4732p4737.html

Concerning the zombie processes after shutdown of the daemon: this should not happen. There are even mechanisms within Zeppelin that should prevent this. However, seems like we are not the only ones experiencing this problem. A bugticket has been issued: http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Interpreter-zombie-processes-tp4738.html and https://issues.apache.org/jira/browse/ZEPPELIN-1832

2.)     Different interpreters & cluster resources
There are several issues with this one, especially if you do not use spark dynamic resource allocation. If you would (which works fine, I tried it on the new gcloud instance), an idle interpreter would simply consume resources for the driver (and maybe one executor, depending on config). Only if you actually start a computation it would ask for more free resources. So that’s one option to circumvent this problem at least a little bit. Second, a “stop” button in Zeppelin would be a very useful features, and actually there is such a features request already and might come soon(ish) (can’t find bugticket anymore)

3.)     Another note on unresponsiveness
We say that a zeppelin notebook is unresponsive, if the status bar does not change. So now, this is a bad definition of “unresponsive”, as there might be many reasons why the status bar is not progressing, for example the spark operation might take long (e.g. many shuffles) or zeppelin is waiting for cluster resources. In an ideal word, Zeppelin would have a more verbose Status bar, telling the user what it is actually doing at the moment. The immediate reaction to restart the interpreter, kill all other processes, restart zeppelin, might not be always necessary – and combined with issue 1) might worsen the problem globally. I’ll create a feature request for this one.

In Zeppelin 0.6, many improvements have been made. Bugs have been fixed and features have been added. For example, user authentication is possible, also hooking Zeppelin up with github in order to version (and share) notebooks. Also, there is now the possibility to start a new interpreter for each notebook automatically.

Concerning multitenancy for Zeppelin: there is this project http://help.zeppelinhub.com/zeppelin_multitenancy/ However, this is in beta and does only run on a Spark standalone cluster. It’s not clear when Spark on YARN will be supported. We’ll have to wait.

Some general notes: Zeppelin is still incubating into Apache Foundation – and that for quite some time, the whole project is roughly four years old. The (dev) community is rather small, although many people (seem) to use it. Not sure whether it will ever gain real traction. I would not bet too much on Zeppelin for the future. Using it internally for analysis & prototyping purposes is certainly fine, if we can live with the drawbacks. At this moment however, I would not include it into any “production” workflows (especially not for external customers).

Alternatives to Zeppelin are: https://github.com/andypetrella/spark-notebook and http://jupyter.org/ . Jupyter is great if you want to use the python interpreter. There are Scala bindings as well, but I did not dig deeper and test how it works.

Friday, December 9, 2016

Bose Hearphones

http://hearphones.bose.com

That will be very useful -- especially the "reduce World Volume" option. Looking forward to try them.

Friday, December 2, 2016

DBeaver -- Free Universal SQL Client

If you are looking for a good, universal (meaning: cross-database) SQL (&no-sql) client, look no further. I highly recommend DBeaver

OBS: If you want the Cassandra driver as well, you have to download the Enterprise Edition (which is also for free, but not OSS).

Docker ♡

I can only repeat myself ...

Saturday, November 12, 2016

Google DNI und das "demokratiefördernde Element"

Als Antwort auf: https://jourtagblog.wordpress.com/2016/11/02/googles-geld-gutes-geld/

"Das demokratiefördernde Element geht verloren, was durchaus negative Auswirkungen haben kann." Urs Bühler, zitiert in obigem blogpost.

Dass es negative Auswirkungen haben wird, wenn "das demokratiefördernde Element" des Journalismus verloren geht, ist unbestritten. Ob die DNI jedoch dazu führt, dass dieses Element verloren geht, darüber lässt sich trefflich streiten. Urs Bühler's Standpunkt ist einer. Meiner ist ein anderer. Der Medienbranche geht es schlecht. Einige Medienhäuser sind in der glücklichen Lage a) liquide Mittel und b) eine kluge Unternehmensführung zu haben um in Innovation investieren zu können/wollen. Realität ist: bei den allermeisten Medienhäusern ist wenigstens eine der zwei Bedingungen nicht erfüllt. Doch Innovation ist notwendig um Menschen auf neuen Kanälen mit unseren Inhalten erreichen zu können und sie davon zu überzeugen, dass es sich lohnt für diese Inhalte zu zahlen. Da die finanziellen Mittel, welche Google im Zuge des DNI Funds bereitstellt, zur Umsetzung von Projekten ausgeschüttet werden, deren Ziel es ist mittels innovativer technologischer Ideen genau das zu erreichen, fördert die DNI "das demokratische Element", anstatt es abzuschaffen.

https://www.digitalnewsinitiative.com/

Thursday, October 6, 2016

Zeppelin Zombies ...

                                   
Image result for apache zeppelin Image result for zombie

As I discussed already earlier, we are (semi-happily) using Apache Zeppelin as Spark notebook. However, at some point Zeppelin notebooks were so slowly responding and running into time-out errors, that it was impossible to work with. Restarting the Zeppelin server did not help -- and for quite some time we were clueless what suddenly happened. At some point we figures out that Zeppelin has severe problems shutting down processes when errors occurred -- and starts accumulating zombie processes over time. We had a couple of hundred, that cluttered our system. Killing these zombie processes and restarting Zeppelin server did the trick -- now everything is running as smooth as before.

Monday, August 29, 2016

Get your sh** together Pro Tips -- vol. I

Next time you receive a newsletter/campaign email from sender x, and you haven't actually read the last five emails of sender x:

Immediately open that mail, scroll down to the bottom and click unsubscribe.

Voilà: potentially 100's of emails less per year.

Friday, August 26, 2016

An Apache Spark pyspark setup script, incl. virtualenv

Here is a little script that I employed to get pyspark running on our cluster. Why is this necessary? Well, if you want to use the ML libraries within Apache Spark from the Python API, you need Python 2.7. However, in case your cluster runs on CentOS, it comes with Python 2.6 due to dependencies. DO NOT REMOVE IT. Otherwise bad things will happen.

Instead, it's best practice to have a separate Python 2.7 installation. And to be completely isolated, best practice is to create a virtualenv, which you will use to install all packages you are going to use with pyspark.

Also, if you plan to run pyspark within Zeppelin, you have to be sure that the virtualenv is accessible to user Zeppelin. This is why I install the whole thing in /etc. Also, make sure to run this on all cluster nodes, otherwise Spark executors cannot launch the local Python processes.


#!/bin/bash
# run as root

# info on python2.7 req's here: http://toomuchdata.com/2014/02/16/how-to-install-python-on-centos/
# info on installing python for spark: http://blog.cloudera.com/blog/2015/09/how-to-prepare-your-apache-hadoop-cluster-for-pyspark-jobs/
# info on python on local environment http://stackoverflow.com/questions/5506110/is-it-possible-to-install-another-version-of-python-to-virtualenv

#install needed system libraries
yum groupinstall "Development tools"
yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel

#setup local python2.7 installation
mkdir /etc/spark-python
mkdir /etc/spark-python/python
cd /etc/spark-python/python
wget http://www.python.org/ftp/python/2.7.9/Python-2.7.9.tgz
tar -zxvf Python-2.7.9.tgz
cd Python-2.7.9

make clean
./configure --prefix=/etc/spark-python/.localpython
make
make install

#setup local pip installation
cd /etc/spark-python/python
wget https://pypi.python.org/packages/8b/2c/c0d3e47709d0458816167002e1aa3d64d03bdeb2a9d57c5bd18448fd24cd/virtualenv-15.0.3.tar.gz#md5=a5a061ad8a37d973d27eb197d05d99bf

tar -zxvf virtualenv-15.0.3.tar.gz
cd virtualenv-15.0.3/
/etc/spark-python/.localpython/bin/python setup.py install

cd /etc/spark-python
/etc/spark-python/.localpython/bin/virtualenv spark-venv-py2.7 --python=/etc/spark-python/.localpython/bin/python2.7

#activate venv
cd /etc/spark-python/spark-venv-py2.7/bin
source ./activate

#pip install packages of your choice
/etc/spark-python/spark-venv-py2.7/bin/pip install  --upgrade pip
/etc/spark-python/spark-venv-py2.7/bin/pip install py4j
/etc/spark-python/spark-venv-py2.7/bin/pip install numpy
/etc/spark-python/spark-venv-py2.7/bin/pip install scipy
/etc/spark-python/spark-venv-py2.7/bin/pip install scikit-learn
/etc/spark-python/spark-venv-py2.7/bin/pip install pandas

After you did this, make sure to set variable PYSPARK_PYTHON in /etc/spark-env.sh to the path of the new binary, in this case /etc/spark-python/spark-venv-py2.7/bin/python

Also, if you use Zeppelin make sure to set the correct python path in interpreter settings. Simply alter/add property zeppelin.pyspark.python and set it's value to the python binary as above.

Tags: Apache Spark, Python, pyspark, Apache Zeppelin, Ambari, Hortonworks HDP

Thursday, August 18, 2016

Virtual Reality Hackdays 2015

In order for it not to be forgotten, here is the webpage of the "Virtual Reality Hackdays" I co-organized last year.


Monday, August 1, 2016


OK, now it's official. My side-project roound.io is online ... check it out, signup for the newsletter and stay tuned. We are going to launch in beta soon!

Thursday, July 7, 2016

Apache Zeppelin autocomplete / code completion



For those of you using Apache Zeppelin as interactive Spark notebook: if you have been wondering whether there is an autocompletion function. The answer is "yes". No, its not "tab" it's

Ctrl + .

It's not optimal (as of now), but works fairly well.
Tags: Zeppelin, Apache Zeppelin, autocompletion, auto-completion, code completion

Saturday, June 18, 2016

DAO vulnerability -- Ethereum

Yesterday someone exploited the so called "DAO vulnerability" to steal some 3 Mio. Ether. This incidence, of course, led to a panic attack by many people trading Ether, which resulted in Ether prices plummeting. This article by The Verge even titled "How an experimental cryptocurrency lost (and found) $53 million". So here is the catch: the author of this article, and in the same vein everyone else already summoning the death of the "Ethereum cryptocurrency", actually miss the point about Ethereum. Ethereum is not a cryptocurrency, but "... a decentralized platform that runs smart contracts: applications that run exactly as programmed without any possibility of downtime, censorship, fraud or third party interference." The execution of these smart contracts is fueled by Ether, but Ether is not a cryptocurrency, like bitcoin. It was never thought to be yet-another cryptocurrency. So don't blame the project, if you lost "real" money yesterday. Ether is not there to be traded in the first place, it's a commodity, to be used in the Ethereum network. And to be sure: Buterin nicely explains that the attack is not a bug in Ethereum itself, but a mistake in the code powering the DAO project. As it seems, a common bug, though.

So, what will happen? Well, I think the Ethereum community learned a valuable lesson. The attack might foster the creation of long-awaited "best-practices" for smart-contracts, maybe even projects to "security check" your own smart-contract code. Learning the hard way is often the only way to learn. In this sense: no, the project certainly is not dead -- quite the opposite, it might never have been more alive.

Thursday, June 9, 2016

Scraping tables from websites (the easy way)

Just need to scrape one table from one website? Use googledocs as described here.

Wednesday, May 25, 2016

Monday, April 18, 2016

You are a physicist. And you are working at a newspaper. But you don't write articles. What do you do? And why?

As I am getting this question a lot, I am trying to give an answer here.

To be quick: As a data scientist at Neue Zürcher Zeitung I am dealing with predictive analytics, statistical modeling, advanced data analysis tasks as well as everything "algorithms" (e.g. recommendation & personalization). I am mainly using R for tasks involving small data and Apache Spark for tasks dealing with not-so-small data. Python and bash are my favorite scripting languages. I just happen to have a background in theoretical physics -- could be engineering, math or computer science as well.

But why media? I have always been looking for challenges and opportunities, intellectual and societal. And, well, being in media these days one finds both. The publishing business is super exciting, because everything is changing: how news are done, the way stories are told, distribution channels, the audience, the technology, the business models, ... Indeed, many of these issues are still open, are just being explored and the future of many publishing houses is still uncertain. So why is this?

Not-so-long ago news were mainly distributed using one medium: paper. For the individual there were exactly two possibilities: either you want to be informed daily, then you'd have to pay for a newspaper subscription, or you don't. If you (or your parents) happen to belong to the first group, chances are you only had one daily newspaper subscription. At the end, they are not cheap (NZZ subscription for example is roughly 600CHF/year) and you chose the one that suited you best. However, fast-forward less than 10 years and you find that reality today looks quite different. If you want to be informed, you can do this mainly for "free" on the www. Also, as you have immediate access to all these manifold resources and they do not cost you a dime, you have a much wider variety at hand. No need to stick to one newspaper. 
So newspapers are not only suffering from the problem that technology has been changing fast (from print to web 1.0 to web 2.0 to mobile,...), but that this change undermines the very basis of the (news) publishing industry: loyal customers who are paying for the service you provide and, given this loyal, well-known customer base, to be able to monetize on the advertisement market. 
Actually, from a balance-sheet perspective, the publishing industry has mainly been an advertisement industry -- only 20%-40% of revenue have been revenues due to subscriptions -- the rest was advertisement.

So what do you do, when the very basis of your business model is eroding? You innovate -- and this is where, among others, people like me come in.
Innovation comes in two parts. First you innovate in the sense that you optimize your current operations: you cut costs were possible and increase efficiency. And data, clearly, should be the basis for this: understand the numbers, then you can optimize. For example in marketing: use predictive analytics to help you decide where to put your marketing budget best. Or use customer analytics to better understand the need of your readership and to improve the customer experience accordingly.
The second part is what I like to call "true innovation". True innovation for me is not mere optimization, but novelty -- doing things that have not been done before. For this, on the one hand, data can be used as a decision criterium ("where to innovate"). On the other hand, data can also be the very basis for innovation. Here I am mainly thinking about data-driven / algorithmic products & services: things like smarter search, automated recommendations or personalization, in all its facets, that have the potential to greatly improve the customer experience, explore new ways of news consumption and reach a more tech-savvy audience.

I am contributing my part to this transformation at NZZ. Founded in 1780, NZZ is one of the oldest still published newspapers in the world. A heritage like this comes with a lot of responsibility -- balancing the tradition with the modern is a worthwhile challenge. At the end, a diverse and well-functioning media landscape is the basis for democracy. And I am glad to be part of it.

Sunday, April 3, 2016

Why you should go to college

April 2016: just found this weblog that I drafted ... don't know when. Certainly some time end 2012. Don't know why I did not publish it at that time. I still agree. So here it is:


I just read the following slashdot

http://news.slashdot.org/story/1212/03/1317234/just-say-no-to-college

commenting on an article from the NYTimes

http://www.nytimes.com/2012/12/02/fashion/saying-no-to-college.html?pagewanted=all

And, as an academic, I simply cannot not comment on this. First of all: Kids, please, think twice! Especially the part where he talks about not attending college at all. Let me explain why.
Often people confuse correlations and causality. Example: because Einstein played the violin, if I play the violin I will be super smart. There is no correlation here (at least not in this direction). Dropping college, because Mark Zuckerberg dropped college and now is who he is, will not bring you big bucks. Not seriously attending college at all and instead traveling through India will not make you a legend.

In general I disagree with how college education is judged in the article. Maybe what follows is simply my European perspective, but anyways. Attending University is about more than just "getting a degree at the end". It is about developing your mind, in an environment where free-thinking is allowed and, even more, specifically wanted. You are surrounded by smart people 24-7, somewhat isolated from reality. This enclave permits you to read and learn and work on the things you would not be able to in "the corporate world" - simply because in reality you would have to think about surviving. University life is different - and it is supposed to be. It is a period in your life to not worry about these things - because you have a scholarship, your parents can afford to pay or (if you happen to be in the USA) you got a students loan. But you won't need a lot of money anyways: you share a flat, you ride a bike, you eat noodles every day. But you are free from all wordly hastles. Free to think. Free to learn. Free to transform yourself into a beautiful and sharp mind. In classes (sure, not in all) you will be exposed to cutting-edge research or crazy theories you will never ever need in real life but that are simply fascinating and mind-boggling. You will spend nights awake discussing with your mates about Darwin and Freud and Einstein and this fu**** integral that took you the whole night to solve. College is about suffering on many levels: intellectually, financially and even physically. You will be some kind of ascetic, living only for the mere purpose of embedding yourself in an intellectual world and to fill your head with knowledge. College will lead you to the edge of wisdom, to the edge of your mind and will push you beyond. Sure, a hacking course will teach you how to program Angry Birds and eventually to become a Millionaire. But attending university is a once-in-a-lifetime cultural experience. An experience you will only be able to appreciate at a young age. An experience and exposition to human culture you should not miss. Sure, if during this experience you realize that you had enough and instead are inspired to found an awesome company, then dropping out might be the right choice. But remember (and this was the case for Zuckerberg and Brin and many others): college atmosphere most likely was the reason that you had this spark of inspiration in the first place.

Sunday, December 9, 2012

Aha-Moments and Modern Art

In a retrospect, one of the awesome things in childhood were these aha-moments when you realize you just learned something really fundamental. Being an adult, these moments are somewhat rare - and if they occur it is mostly in a job (science) related setting. But today I was lucky to experience a "true" aha-moment, in an area I did not expect. 
Today in the afternoon I went to the Kunsthaus in Zurich - a very impressive collection, basically from all epochs of European art (I highly recommend a visit). Usually I like the classics and old masters most. Maybe this is because they are often very precise, observant, almost analytic. I also very much enjoy photography, especially of nature and people, since it captures reality and has the power to directly convey feelings via empathy. I never could relate to most pieces of modern art, though. But this changed today - and is my personal aha-moment of the year. 
In Kunsthaus there was a video installation by contemporary artist Pipilotti Rist, which basically consisted of a dark, weakly lit room with a 70s-style floor lamp, velvet carpet and velvet bar seats, with opened woman's purses on it and silent, somewhat harmonic, but still unidentifiable sound coming from somewhere. When I entered the room I felt somewhat uncomfortable. Even more strange, I felt it was not so much the room itself that made me feel like that, but it was more the presence of the few other visitors. I kept on strolling around the room, somewhat observing the purses, looking into them, because the sound seemed to come out of there. And indeed, in every of these purses was a little TV, showing a private scene: a young girl swimming in a swimming pool with her mother; a woman swimming in a pool and being filmed from below, such that the only thing you see are her breasts; a scene of a woman with, what seemed to be, a lot of blood all over her body; big red lips moving like they would passionately kiss someone. All of these scenes made me feel even more uncomfortable, because it was basically some kind of a "peep show" - a short glance into a most private scene of the live of someone else. 
I left the room, thinking that there is no sense whatsoever in it. I could not say what the artist wanted to tell me, what she wanted me to see. Being back in the main art hall, I sat down for a second and started feeling comfortable again. And then I realized: this is what the artist wanted to "show" me. As the classic sculptures and photographs earlier made me feel the joy of beauty or feel empathy with other people, this video installation served the exact same purpose, although much more extrem and intense: to trigger an emotion. An emotion, however, you would not expect to be exposed to or even wanted to be exposed to. In our everyday life, most (if not all) of the things we do take place in our comfort-zone. But this video installation forced me out of the zone and triggered this very precise feeling of being uncomfortable. And that was when I realized that with a large number of modern art pieces I have seen so far, I completely missed the point. I was much too focused on what can be seen, and not on what do I feel. So next time you see a modern piece of art and you think it's shit because you cannot "understand" it: there might not be anything to "understand", but only to "feel". And rather than analyzing the art installation, you should analyze yourself in order to "see" what the artist wants you to "see".

Friday, October 12, 2012

Nobel Peace Prize 2012 - The European Union

The Nobel Peace Prize 2012 is awarded to the European Union! Since I read a lot of (sorry to say that) "bullshit" comments on a very popular social networking site about the sense and nonsense of this decision, notably the ones talking about how this decision is perfectly in line with all the previous "stupid" decisions about Nobel Peace Prize laureates (especially the prize to Barack Obama is very often quoted here), I feel I have to write something about it.

Let me begin with an observation: I read from a lot of people from countries within the EU that this is the most hilarious and ridiculous decision ever (without any reasons). Well, the strange thing here is: it is de facto the recipients complaining about receiving the prize. What does this tell us? That a) they honestly and truly do not consider the EU a good endeavor, that has brought peace to Europe and is fostering inter-cultural exchange or b) they don't see that they are the European Union. Since I find it hard to imagine that there is anyone who really thinks in terms of a), it must be b). Now the question is: what is worse? a) implies a very hard-lined world view, which is even objectively wrong, but hey - at least it is a clear stand point. b) instead implies "unknowingness" or even ignorance and a very confined world view. Especially for a lot of Germans "the EU" is merely a huge bureaucratic apparatus that gets generously founded by their taxes but does not produce any visible and immediate output or personal benefit. Well, this is completely wrong: the immediate output is visible for every one, every day, in every European country (and especially the more wealthy ones like Germany): we live in peace - for over sixty years. One has to keep in mind: when we are talking about Europe, we are talking about 27 states, with 27 different national interests, 27 very different histories and 0.5 Billion people living on a rather confined space (as e.g. compared to North America). It is a tremendous political and societal achievement to somewhat coordinate these interests and to build a web of international contracts, agreements, treaties, etc. that secures this stable, peaceful state. For most European citizens, especially the younger ones, this state is the state we have always known. We cannot imagine Europe being full of hate and national conflicts. Not thinking about "the alternative" however creates in most of those people's heads the illusion that the status quo is something "normal", "natural" and not worth reflecting on. Also they do not see that they personally contribute to the European Union by living in a country that, as being part of the EU, accepts and complies to European law. And this is exactly the problem with those comments: people take peace as granted and do not consider themselves as part of the machinery that created this European state of peace.

However, here is an interesting point: In some sense, these comments which let me to write this text, are in a very strange way the perfect, living evidence that the European Union (i.e. we all) is a role model for peaceful co-existence and deserves this prize. Because: if you reach a state where people take their peaceful lives as granted, not being in doubt about it or fearing any severe international conflict, you really did your job 100% right.

Tuesday, September 4, 2012

Why Evernote is Awesome and Search Its Killer Feature


You know this? Let's assume you are working on whatever new (research) project and are doing some initial information search on the web: relevant papers, journals, conferences, people, and so on. Your learning curve will be steep - you will be retrieving information that seems relevant to you every other minute. How do you keep track of this vast amount of information?

  1. Old style: use a notebook. This is great for sketching initial ideas and noting down basic knowledge. However, it apparently lacks all the online possibilities (videos, talks, slides, ...) plus you would have to print every interesting paper ... hence you will separate these resources somehow, which is suboptimal.
  2. You could use bookmarking. Sure, this will let you save the links to the information you found but does this help you in organizing your new project? In my opinion, it does not. Will tell later why.
  3. Use evernote! If you don't know it or haven't tried it: it's totally awesome! And it's free (in the basic version)!

Evernote easily let's you save articles and links, using it's webclipper, a very useful tool. Additionally to that you can store typed notes, handwritten notes from you tablet, pictures taken with your phone, and so on, in one big virtual notebook. This alone is great, since it allows you to later easily "reconstruct" the line of thought you were following. 

So far, so good. I think you got it: having one place to store all kinds of information is enormously useful. However, personally I think that the search functionality in evernote is the Killer Feature! Not because of the mere fact that you can perform search on your notes (if you were using emacs and storing notes in .txt files on your hd you could do this 20 years ago) - but because of how they implement it. 
There are basically two ways of searching your evernote: a) open evernote (the client or online) and type something in the search field. Fine, these are the basics you would expect. Problem here: you have to have a clue, that a certain piece of information is already in your collection, otherwise you wouldn't do a search in evernote but in google, right? And this is where awesome possibility b) comes in: when you install the webclipper in your browser, it will ask you whether you want your evernotes searched, whenever you do a google search. How awesome is this?! Think about it for a moment: with enabling this features (and I highly recommend to do so) you will basically enable searching through your "virtual memory" - on the fly - whenever you look for whatever information on google. Meaning: with evernote you are not only able to store any relevant information, but you will also be able to find this information again! Even when you are unaware that you have it! Without evernote you would most likely spend a lot of time to look for it again, using google. 
Summa summarum: the crucial part here really is the integration of evernote search and google search. And the latter, most of you will agree, is exactly what we do all the day: searching information on google, and not actually thinking about the stuff resting in our bookmark folders, delicious accounts or note-files somewhere on our hd. In this sense, one could also say that evernote is not only a tool for storing information, but for optimizing your personal information retrieval, in that it remembers what you already found. And this is truly ... awesome!

PS: Putting in my five cents - if you lost a lot of money because you were too *...* buying stocks of a social network company without a good business plan - if evernote goes IPO: invest!

Side remark:
I was waiting for such a service for quite some time. Actually, with some friends I built our own little cloud-based system to do similar things. However, it never really matured (science keeps us busy ;-) and we were merely using it on a private basis.

Sunday, September 2, 2012

Raspberry Pi & Spotify

Like other people we have a radio in the kitchen. However, ours is so old that you don't want to touch the volume control, because the random noise produced by doing so gives you a headache. Now, since I am a proud Raspberry Pi owner as well, I had this idea to use it as a spotify-kitchen-radio, meaning: have the RPi in our kitchen, connected to the LAN, some speakers plugged in and spotify running. However, there is no ARM-build spotify client, so it's not that easy. However, there is an ARM version of libspotify, that provides an API to spotify's service (you will need a premium account, which I recommend to everyone anyways - spotify is really awesome and its only 10 bucks per month). So here are some comments, preliminary results and some advice on how to make the spotify API work on your RPi (and at least being able to play your playlists) 

1.) If you use Raspbian - that won't work, since it is build using the hard float ABI capability of the ARM. If you try to install libspotify and make the examples work, eveything seems find at the beginning, but then you get an error like
libspotify.so.12.1.51: cannot open shared object file: No such file or directory
This is due to the fact that libspotify seems to have been build using soft-float ABI. As long as spotify doesn't release a hard float build, you will have to go to step 2. This insight is the crucial part of the game here (and a "cannot open shared object file" error is not an obvious hint in this direction) 

2.) If you want libspotify to work, you will have your RPi running the soft-float build of Raspbian, also available here: http://www.raspberrypi.org/downloads 

3.) Once you have that, things are straight foward. Download libspotify from the spotify developer page and follow the instructions in the readme. This is also where you will need a spotify premium membership in order to get an appkey. 

4.) In order to test it, you can use the jukebox example. Simply, after building it, run jukebox/jukebox. It will ask you for your login credentials and a playlist to run. If you don't hear anything, try another playlist. This "terminal version" of spotify seems to not tell you when a title is not available anymore, but instead simply keeps silent. 

Advice: The jukebox example requires you to have alsa installed and *configured*. So, before testing the spotify api and complaining that it does not play any sound, you should configure the sound card. See e.g. here or simply google for raspberry pi and alsa Have fun! 

PS: As a kitchen-radio this is still a bit uncomfortable. What I would ultimately like to have is a LAN-internal web-interface to the pi and libspotify, so that from every computer/tablet/smartphone in the LAN, I can access the local web-interface and search for/play titles, artists, albums ...

Sunday, March 25, 2012

Information ranking based on social media

Importance ranking of webpages was suggested to be more and more based on "social signals". I.e. how often is a webpage shared rather than linked. But this raises questions like: will the importance given to a shared piece of information differ by the "social" status of the person who shares it? I.e. is some link shared by Barack Obama "more worth" than some link shared by me? If so, who decides who is "more trustworthy"? These questions haven't been answered. However google & co. started implementing this kind of social ranking already. If you have a g+ account and you do a google search, eventually you will find "personal search results", based on the things shared by people you have in your circles. And to be honest, this service is amazingly useless at this stage. Let's say I perform a google search "android tablet". Most likely I am looking for some product information about android tablets or some wikipedia entry or whatever general information. However, the "personal result" only seems to perform a full-text search over all the posts of the people I follow on g+. A full-text search...that's it? Is this supposed to be the new awesome world of social ranking? There is no useful information in the 110 personal results whatsoever, since most people mention the terms "android" and "table" in a rather specific content: either they are talking about an app or a special feature of some android tablet or the success of android tablets in general or ... In this respect the "social signals" are not used in a constructive manner - they just add more clutter to the other 530.000 search results. The challenge will be to add a useful social dimension to improve information filtering. And I feel we are far away from that. Something else is needed here.

Thursday, December 15, 2011

reply to "Peer review without peers?"

I just came upon this post by Aaron Shaw about a somewhat unusual idea for the scientific peer review process. Since I did not want to leave a lengthy text in the comments section of his post, I decided to put it here. Aaron, I am happy about comments you might have.

So here is the thing: we (here at ETH) were thinking quite a bit lately about issues of scientific evaluation and peer review. In this vein, especially the following questions arise: 1) How can one judge the value of research performed in an interdisciplinary research environment and 2) How can we get *good* research by *unknown* people in high-impact journals and *bad* research by *established people* out of them, prohibiting a view scientists to de facto decide what is "hype" at the moment and what is not. But I will try to post about this another time. So let's talk about Aaron's post.

Aaron is basically talking about the idea to use wisdom-of-crowds effects for scientific peer review:
...what if you could reproduce academic peer review without any expertise, experience or credentials? What if all it took were a reasonably well-designed system for aggregating and parsing evaluations from non-experts?
And he continues:
I’m not totally confident that distributed peer review would improve existing systems in terms of precision (selecting better papers), but it might not make the precision of existing peer review systems any worse and could potentially increase the speed. If it worked at all along any of these dimensions, implementing it would definitely reduce the burden on reviewers. In my mind, that possibility – together with the fact that it would be interesting to compare the judgments of us professional experts against a bunch of amateurs – more than justifies the experiment.
First of all, I agree, it would be an interesting thing to test, whether non-expert crowds might perform as good as "experts" in a peer review process. Here is my predicted outcome: for the social sciences and qualitative economy papers, this might be the case most of the time. It will *not* work for the vast majority of papers in the quantitative sciences. But this is actually not the point I want to make here. The point is the following: what Aaron and people were thinking of is how to "speed up" the peer review process and "...reduce the burden on reviewers." Humbly, I think those are *completely wrong* incentives from an academic point of view. Reviewing is a mutual service scientists provide among their peers. When our goal is to reduce "the burden" of reviewing so many papers, we all should write less. (This might be a good idea anyways). Also, the problem with peer review without peers: non-experts will not know the existing literature and redundancy will be increased (even more) and this is something you can not get rid of without peers. If we however would go into this direction the "reviewing crowd" would basically be a detector of "spam papers" and nothing more. But those are also not the papers which need a lot of time to review, they are often very easy identified. What really is it that makes peer review so time consuming is a) the complexity of papers and b) the quantity. We should not aim at reducing a), because this is just the way it goes in scientific evolution: once the easy work has been done, the complicated details remain. (Einstein famously (supposedly) said that he does not understand his GRT anymore, since mathematicians started working on it). So I assume, in order to get rid of all the papers to review but maintain scientific excellence, option b) is the only choice. And, as I said earlier, this might not be a bad idea at all. It might also have a positive effect on the content and excellence of the published papers. But decreasing the number of published papers is complicated and would require us to *rethink* how science is done today. But this is material for another post.