Hi what’s on your mind ?
Member since 2017-07-15T03:50:57Z. Last seen 2025-01-12T00:12:02Z.
2732 blog posts. 128 comments.
Hi what’s on your mind ?
278 dnetesn 10 hrs 227
http://nautil.us/blog/is-facebook-really-scarier-than-google
Is On Twitter, in a thread that went viral, François Chollet, an A.I. software engineer at Google DeepMind, argued, “Facebook is, in effect, in control of your political beliefs and your worldview.”Photograph by Joe Penniston / Flickr Mark Zuckerberg, the founder and C.E.O. of Facebook, admitted recently his company knew, in 2015, that the data firm Cambridge Analytica, which assisted with Donald Trump’s election campaign, had improperly acquired information on 50 million Facebook users. “This was a breach of trust,” Zuckerberg said, in a Facebook post. “We need to fix that.”
But that’s not the only thing Facebook needs to fix. “The problem with Facebook is not just the loss of your privacy and the fact that it can be used as a totalitarian panopticon,” said François Chollet, an artificial intelligence and machine learning software engineer at Google DeepMind, in a tweet yesterday. “The more worrying issue, in my opinion, is its use of digital information consumption as a psychological control vector.” He elaborated on this point in a thread that’s been shared thousands of times. I caught it when global-surveillance critic and The Intercept writer Glenn Greenwald quote-retweeted Chollet, calling it a “great thread” on “Facebook’s menacing use” of psychology-manipulating A.I. “But remember,” Greenwald added, “Google is also a major exploiter of artificial intelligence with very little transparency or public oversight.”
Chollet bristled at this. “This is the laziest kind of thinking—just because two things share some superficial similarity (they’re large tech [companies]) doesn’t mean they’re equivalent,” he said. Why? Look to the Newsfeed, Facebook’s signature feature, Chollet went on, in this Twitter thread. “If Facebook gets to decide, over the span of many years, which news you will see (real or fake), whose political status updates you’ll see, and who will see yours, then Facebook is, in effect, in control of your political beliefs and your worldview.”
Fil Menczer, Professor of Informatics and Computer Science at Indiana University, told Nautilus, in a discussion about his work on fake news, that the algorithms used on social media platforms, like Facebook, bias our decision-making in ways that exploit our socio-cognitive biases—but search engines like Google aren’t entirely innocent.
If Zuckerberg is morally bankrupt, he’s trying to hide it.
“The algorithmic biases feed into social and cognitive biases, like confirmation bias, which in turn feed into the algorithmic biases. Before, people were looking at the evening news on TV, or reading the local paper, for example. But the fact that the medium has changed to online social networks, where you shape the sources of information to which you are exposed, now means that you become even more vulnerable,” he said. “Search engines and social media, for example, try to predict what content may be most engaging for someone. Ranking algorithms use popularity as one of the ingredients in their formulas. That means that the more people in your group interact or engage with a piece of fake news, the more likely you are to see it. The social network can act as an amplifier because the people near you have opinions similar to you, so they are more likely to be tricked by a certain kind of fake news, which means you are more likely to see it, too.”
François CholletImage from Google Developers / YouTube For Chollet, though, the sort of danger Facebook poses is unique. “There’s only one company where the product is an opaque algorithmic newsfeed, that has been running large-scale mood/opinion manipulation experiments, that is neck-deep in an election manipulation scandal, that has shown time and time again to have morally bankrupt leadership. Essentially nothing about the threat described applies to Google. Nor Amazon. Nor Apple. It could apply to Twitter, in principle, but in practice it almost entirely doesn’t,” he said. What seems to clinch it for him is that Facebook is ambitiously pursuing advances in A.I. “What do you use AI…for, when your product is a newsfeed?” he wondered. “Personally, it really scares me. If you work in A.I., please don’t help them. Don’t play their game. Don’t participate in their research ecosystem. Please show some conscience.”
He takes the decision to work for his current employer, Alphabet—former motto “Don’t be evil,” and now “Do the right thing”—as a demonstration of scrupulousness. “For me, working at Google”—an Alphabet subsidiary—“is a deliberate choice,” he said. “If I start feeling uncomfortable, I’ll definitely leave. If I had been working at [Facebook], I would have left in 2017.” For now, he’s glad to labor for a company whose products—like Gmail and Android—are “anti-Facebook,” he went on. These empower people, while Facebook’s Newsfeed “seeks to maximally waste your time.”
If Zuckerberg is morally bankrupt, he’s trying to hide it. On Wednesday, he told CNN that he’d be “happy to” to testify to Congress and, on Facebook, announced several changes that will, supposedly, rectify what went wrong with the Cambridge Analytica data-breach scandal. “I started Facebook,” he wrote, “and at the end of the day I’m responsible for what happens on our platform.”
This is worth keeping in mind, though: That, also at the end of the day, “The motivation for Facebook is not to make you a better person—to improve you morally or intellectually—and it’s not even designed to improve your social group,” Simon DeDeo, an assistant professor at Carnegie Mellon University, where he runs the Laboratory for Social Minds, and external faculty at the Santa Fe Institute, told Nautilus. “It’s designed to make money, to show you things you want to see to hopefully induce you to purchase things.”
Brian Gallagher is the editor of Facts So Romantic, the Nautilus blog. Follow him on Twitter @brianga11agher.
Victor Gomes, an editorial intern at Nautilus, contributed reporting to this post.
騰訊(00700)主席馬化騰在2018(深圳)IT領袖峰會上表示,很多人看不懂騰訊在新零售的布局,但公司主要發展目的並非是做新零售,而是希望微信用戶與線下實體商舖連起來,這樣便會帶來很多商機,而騰訊做的便是連接。
他續稱,雲計算未來的發展需要大數據的支持,只要連接好便會有廣告收入,未來公司會用數碼化方式,在社交體系內投放廣告,意味着廣告收入也會隨之增加。
馬化騰認為,隨着內地4G技術的普及,整個社會經濟線上線下迅速融合,而這其中不可忽視的則是二維碼技術。雖然看似不起眼的技術,但卻是最有價值的,因為掃碼是最簡單最容易被接受的方式,若沒有掃碼也不可能有共享單車等模式出現。
All Respondents JavaScript 69.8% HTML 68.5% CSS 65.1% SQL 57.0% Java 45.3% Bash/Shell 39.8% Python 38.8% C# 34.4% PHP 30.7% C++ 25.4% C 23.0% TypeScript 17.4% Ruby 10.1% Swift 8.1% Assembly 7.4% Go 7.1% Objective-C 7.0% VB.NET 6.7% R 6.1% Matlab 5.8% VBA 4.9% Kotlin 4.5% Scala 4.4% Groovy 4.3% Perl 4.2%
78,334 responses; select all that apply
Professional Developers For the sixth year in a row, JavaScript is the most commonly used programming language. Python has risen in the ranks, surpassing C# this year, much like it surpassed PHP last year. Python has a solid claim to being the fastest-growing major programming language.
We see close alignment in the technology choices of professional developers and the developer population overall.
All Respondents Node.js 49.6% Angular 36.9% React 27.8% .NET Core 27.2% Spring 17.6% Django 13.0% Cordova 8.5% TensorFlow 7.8% Xamarin 7.4% Spark 4.8% Hadoop 4.7% Torch/PyTorch 1.7%
51,620 responses; select all that apply
Professional Developers Node.js and AngularJS continue to be the most commonly used technologies in this category, with React and .Net Core also important to many developers.
All Respondents MySQL 58.7% SQL Server 41.2% PostgreSQL 32.9% MongoDB 25.9% SQLite 19.7% Redis 18.0% Elasticsearch 14.1% MariaDB 13.4% Oracle 11.1% Microsoft Azure (Tables, CosmosDB, SQL, etc) 7.9% Google Cloud Storage 5.5% Memcached 5.5% Amazon DynamoDB 5.2% Amazon RDS/Aurora 5.1% Cassandra 3.7% IBM Db2 2.5% Neo4j 2.4% Amazon Redshift 2.2% Apache Hive 2.2% Google BigQuery 2.1% Apache HBase 1.7%
66,264 responses; select all that apply
Professional Developers Like last year, MySQL and SQL Server are the most commonly used databases.
All Respondents Linux 48.3% Windows Desktop or Server 35.4% Android 29.0% AWS 24.1% Mac OS 17.9% Raspberry Pi 15.9% WordPress 15.9% iOS 15.5% Firebase 14.5% Azure 11.0% Arduino 10.6% Heroku 10.5% Google Cloud Platform/App Engine 8.0% Serverless 4.5% Drupal 3.0% Amazon Echo 2.9% Windows Phone 2.7% SharePoint 2.7% ESP8266 2.2% Salesforce 2.2% Apple Watch or Apple TV 1.9% IBM Cloud or Watson 1.4% Google Home 1.4% Gaming console 1.3% Mainframe 0.8%
65,999 responses; select all that apply
Professional Developers Linux and Windows Desktop or Server are the most common choices that our respondents say they have done development work for this year.
Most Loved, Dreaded, and Wanted
Loved Rust 78.9% Kotlin 75.1% Python 68.0% TypeScript 67.0% Go 65.6% Swift 65.1% JavaScript 61.9% C# 60.4% F# 59.6% Clojure 59.6% Bash/Shell 59.1% Scala 58.5% SQL 57.5% HTML 55.7% CSS 55.1% Haskell 53.6% Julia 52.8% Java 50.7% R 49.4% Ruby 47.4% Erlang 47.2% C++ 46.7% Hack 42.1% PHP 41.6% Ocaml 41.5%
% of developers who are developing with the language or technology and have expressed interest in continuing to develop with it Dreaded Wanted
For the third year in a row, Rust is the most loved programming language among our respondents, followed close behind by Kotlin, a language we asked about for the first time on our survey this year. This means that proportionally, more developers want to continue working with these than other languages.
Also for the third year in a row, Visual Basic 6 ranks as the most dreaded programming language. Most dreaded means that a high percentage of developers who are currently using the technology express no interest in continuing to do so.
Python is the most wanted language for the second year in a row, meaning that it is the language that developers who do not yet use it most often say they want to learn.
Loved TensorFlow 73.5% React 69.4% Torch/PyTorch 68.0% Node.js 66.4% .NET Core 66.0% Spark 66.0% Spring 60.0% Django 58.3% Angular 54.6% Hadoop 53.9% Xamarin 49.0% Cordova 40.4%
% of developers who are developing with the language or technology and have expressed interest in continuing to develop with it Dreaded Wanted
TensorFlow, one of the fastest growing technologies on Stack Overflow, is most loved by developers, while Cordova is most dreaded. React is the framework developers say they most want to work with if they do not already.
Loved Redis 64.5% PostgreSQL 62.0% Elasticsearch 59.9% Amazon RDS/Aurora 58.8% Microsoft Azure (Tables, CosmosDB, SQL, etc) 56.7% Google Cloud Storage 56.5% MongoDB 55.1% MariaDB 53.3% Google BigQuery 52.4% SQL Server 51.6% Amazon DynamoDB 50.9% Neo4j 49.7% MySQL 48.7% SQLite 48.1% Cassandra 46.4% Apache Hive 46.2% Amazon Redshift 44.8% Apache HBase 43.6% Memcached 42.2% Oracle 36.9% IBM Db2 21.8%
% of developers who are developing with the language or technology and have expressed interest in continuing to develop with it Dreaded Wanted
For the second year in a row, Redis is most loved database, meaning that proportionally more developers wanted to continue working with it than any other database. IBM's Db2 offering ranks as the most dreaded database, and for the second year in a row, MongoDB is the most wanted database.
Loved Linux 76.5% Serverless 75.2% AWS 68.6% Raspberry Pi 67.7% ESP8266 67.4% iOS 64.6% Apple Watch or Apple TV 64.0% Mac OS 63.9% Firebase 63.8% Android 63.8% Google Cloud Platform/App Engine 62.5% Gaming console 61.3% Windows Desktop or Server 61.2% Azure 61.0% Arduino 58.1% Google Home 57.6% Amazon Echo 53.2% Heroku 52.2% IBM Cloud or Watson 43.7% Predix 39.1% WordPress 36.8% Windows Phone 31.2% Mainframe 31.1% Salesforce 30.3% Drupal 29.6%
% of developers who are developing with the language or technology and have expressed interest in continuing to develop with it Dreaded Wanted
Linux is once again the most loved platform for development, with serverless infrastructure also loved this year. Sharepoint is the most dreaded development platform for the second year in a row, and many developers say they want to start developing for the Android platform and the Raspberry Pi.
Development Environments and Tools
All Respondents Visual Studio Code 34.9% Visual Studio 34.3% Notepad++ 34.2% Sublime Text 28.9% Vim 25.8% IntelliJ 24.9% Android Studio 19.3% Eclipse 18.9% Atom 18.0% PyCharm 12.0% Xcode 10.6% PHPStorm 9.0% NetBeans 8.2% IPython / Jupyter 7.4% Emacs 4.1% RStudio 3.3% RubyMine 1.6% TextMate 1.1% Coda 0.6% Komodo 0.6% Zend 0.4% Light Table 0.2%
75,398 responses; select all that apply Web Developers Mobile Developers Sysadmin/DevOps
Visual Studio Code just edged out Visual Studio as the most popular developer environment tool across the board, but there are differences in tool choices by developer type and role. Developers who write code for mobile apps are more likely to choose Android Studio and Xcode, the most popular choice by DevOps and sysadmins is Vim, and data scientists are more likely to work in IPython/Jupyter, PyCharm, and RStudio.
All Respondents Windows 49.9% MacOS 26.7% Linux-based 23.2% BSD/Unix 0.2%
76,179 responses Professional Developers We asked our respondents what operating systems they use for work. About half said they mainly use Windows, and the remainder were about evenly split between MacOS and Linux.
1 31.9% 2 51.1% 3 14.4% 4 1.2% More than 4 1.4%
76,398 responses Over 65% of respondents use two or more monitors to get work done at their main workstation; the median number of monitors for respondents at their main workstation is two.
Top Paying Technologies
Global F# $74,000 Ocaml $73,000 Clojure $72,000 Groovy $72,000 Perl $69,000 Rust $69,000 Erlang $67,000 Scala $67,000 Go $66,000 Ruby $64,000 Bash/Shell $63,000 CoffeeScript $60,000 Haskell $60,000 Julia $60,000 TypeScript $60,000 C# $59,000 Objective-C $58,000 R $58,000 Swift $57,000 Lua $56,000 Python $56,000 SQL $56,000 JavaScript $55,000 HTML $54,000 CSS $53,000
Median of 56,835 responses; USD United States Globally, developers who use F#, Ocaml, Clojure, and Groovy earn the highest salaries, with median salaries above $70,000 USD. There are regional variations in which languages are associated with the highest pay. Erlang and Scala developers in the US are among the highest paid, while Clojure, Erlang, and Haskell developers earn the most in India.
Correlated Technologies How Technologies Are Connected
This year, we covered a few new topics ranging from artificial intelligence to ethics in coding. Here are a few of the top takeaways from this year’s results:
DevOps and machine learning are important trends in the software industry today. Languages and frameworks associated with these kinds of works are on the rise, and developers working in these areas command the highest salaries. Only tiny fractions of developers say that they would write unethical code or that they have no obligation to consider the ethical implications of code, but beyond that, respondents see a lot of ethical gray. Developers are not sure how they would report ethical problems, and have differing ideas about who ultimately is responsible for unethical code. Developers are overall optimistic about the possibilities that artificial intelligence offers, but are not in agreement about what the dangers of AI are. Python has risen in the ranks of programming languages on our survey, surpassing C# in popularity this year, much like it surpassed PHP last year. When assessing a prospective job, different kinds of developers apply different sets of priorities. Women say their highest priorities are company culture and opportunities for professional development, while men say their highest priorities are compensation and working with specific technologies.
164 jsnell 23 hrs 63
http://www.tedinski.com/2018/03/13/how-compilers-are-designed.html
Programming languages and compilers used to have a reputation for being one of the most complex topics in computer science. I think that reputation persists, but only because the topic is pretty much infinitely deep, especially where programming languages are concerned. The shallow waters here—the basics of how to design and implement a compiler—are pretty well solved. Compilers are still big, but we know how to break them down into easily understood pieces. I’d like to share some of those insights with you, because they have implications beyond writing compilers.
There is one key idea that makes compiler structure manageable to understand: a compiler consists of a pipeline of relatively simple transformations from one form of data to the next. The key idea here is data. We can completely segment one component of the compiler from another by having the right form of data structure in between them. To build that data structure, you don’t need to know anything that happens to it afterwards, you just need to know what it means.
Once we have those data structures, a component of the compiler is just a function from one input type to another output type. As black boxes go, this is the easiest possible thing in the world to reason about. Finding a bug in the whole compiler is just a matter of finding an input that does not produce the expected output, and then tracking that input through the pipeline to the black box that produces the first unexpected output. The internals of that box may still be complex, but by narrowing down the scope of the problem so much, we’re already off to a very good start with debugging.
A high-level diagram of a compiler pipeline, with emphasis on the front-end. This diagram shows a general structure for how to design a compiler front-end. The general tasks look something like this:
A file is parsed into an abstract syntax tree (AST) that represents the program structure in tree form. (Or graph form, but that doesn’t seem to stop us from calling it an AST.) This AST is analyzed for semantic errors, before being transformed into a high-level intermediate representation (HIR). This HIR is suitable for applying certain language-specific optimizations before translating down to a much more general purpose IR. Many compilers these days just use LLVM, which has its own suite of IRs to perform optimizations, before ultimately generating machine code. This kind of design is traditionally justified by the front-end/back-end split. If we tried to build a compiler for every language and every architecture, we’d end up having to build N*M compilers. But with a common IR in the middle, we only have to build N front-ends plus M back-ends. This is an obvious win.
But I think this justification sells the design short. It doesn’t explain why we’re so incredibly enamored with creating more IRs. My high-level diagram is, to some extent, misleading. One of the nice things about pipelines is the composition operator is associative: we can take a series of transformations and pretend they’re just one big transformation, and I sure took that liberty above. Real compilers can have many more IRs than the uninitiated would ever guess. GHC (the Haskell compiler) has 3 separate HIRs (CoreExpr, STG, and Cmm) and several different normalizations of those IRs, all before it ever gets to something like the LLVM IR in the back-end.
In fact, just parsing to an AST? Let’s zoom and enhance:
Just going from a file to an AST can involve several intermediate representations. In this diagram we see a common traditional compiler design:
We first handle the problem of lexing an input file into a sequence of tokens: if, (, x, ==, and so on. Given a token stream, we can then parse this into a tree structure according to the grammar of the language. For many languages, there often some transformation step from the structure we construct while parsing to the structure we actually use as the AST (which is to say, the thing we do name-lookups and type checking on). I’ve called this “desugaring” here for lack of an all-encompassing word, but these days you usually find a step called “desugaring” after error checking (i.e. after the AST). We want to give nice error messages about the code you actually wrote, and “desugaring” often implies transforming away the code as written into something simpler. A quick example: Haskell has user-definable operators like >>=, so during parsing a Haskell compiler likely needs to construct an intermediate list-like structure, for example [a, '+', b, '*', c]. During parsing, it may not have seen all operator precedence declarations yet, so it doesn’t know how to arrange this sequence of operations into a tree. Once it has finished parsing, it can go back and figure out how to arrange those lists into trees, constructing something closer to its real AST. You may be unsurprised to hear there’s even more intermediate representations before it gets there in GHC.
Why so many IRs?
The pipeline style of design is pretty much the distilled essence of the idea of breaking a large problem down into small pieces. We start with a big problem: a compiler is a function from an input source file to an output object file. We drop an IR in the middle, and now we have two smaller problems: build a front-end and a back-end. Drop another IR in the middle of the front-end and now we again have two smaller problems: parse an input file to an AST, then translate the AST to the IR. Rinse, repeat. Now the problem is tiny and totally self-contained. Crush it.
We see this show up in other popular approaches to programming. Shell scripting languages like Bash involve not only the obvious pipe (|) operator for composing together commands, but also composition operators like $(...). Instead of breaking problems down into ones small enough to solve, here we break problems down into ones small enough to already be solved by some tool we have just laying around, like grep. (I really have to get around to writing that post about composition someday.)
We also see this in what is probably the most eagerly and widely adopted functional programming language feature. We can write programs by operating on streams of data, using map, filter, fold, flatMap, groupBy, reduce, comprehensions, and so on. This can reduce big problems into ones small enough to solve with a little lambda, or allow us to compose together functions we already have available. And even if the function is not yet written, we’ve still reduce the size of the problem it has to solve.
The traditional object-oriented design aesthetic involves a lot of emphasis on encapsulation to achieve loose coupling. The actual designs of data are to be hidden away, so that they can change. Interfaces necessarily hide data members, because you don’t know what actual implementation of that interface you might get. Hiding away data representation often gets sold as the key thing that makes OO good at breaking down large problems into smaller pieces.
But here we are, looking at how compilers are designed, and we’re achieving loose coupling between components by exposing a data schema, publicly committing to all its representational details. Nothing is encapsulated at all.
“But what if we need to change that representation?” one might ask. But this is no real objection. You can make breaking changes to interfaces, too. If it looks like you want to make a breaking change to an interface, you either make the breaking change, or you define a new version of the interface next to it. Likewise with data like this. Data can be an interface.
The fact that data can be an interface should be kind of obvious (file formats? DB schemas? protocols?) but this fact seems to get lost in some OO ideology. It’s a natural tool to use between systems, but it often seems to get lost when we’re design a single program in an OO language. Instead of passing data from one part of the system to another, we often end up passing references to objects, which ends up creating a dependency between those parts.
So this ends my musing about design in general. Now let me tell you a few related war stories about compiler design.
Story time: breaking the rules poorly
The advantage of the pipeline design is that it flexibly “zooms” in or out on parts of the problem. All that goes to hell as soon as you back-feed outputs of a later stage into inputs of an earlier stage. Now you have one monolithic block of code where you’ve semi-pointlessly drawn some boxes inside of it to pretend like it’s modular like the rest of the pipeline, but it’s not. You can’t understand it without understanding the whole thing.
Breaking the design like this usually isn’t the fault of the compiler, it’s the fault of the language it’s compiling. The most famous example of this is The Lexer Hack. That’s literally the title of the Wikipedia page. Hah. Not just any lexer hack, the lexer hack.
The C language screwed up its design back in the day. Among other problems, an ambiguity was discovered in its grammar when typedef declarations were introduced. The trouble was how to parse something like x * y;. Multiplication of two variables? Or declaration of y as pointer to type x?
The official answer is: look up x in the environment while lexing and recognize it differently as a result of that lookup. Either as one token (identifier, and so it’s a multiply) or as another token (typedef-name, and so it’s a declaration). The problem: name binding analysis is something most naturally computed on the abstract syntax tree. So we’re having to back-feed information from a later stage to an earlier stage here.
The way a compiler like Clang copes with this problem is to box the whole AST construction effort into a “compilation unit” object that contains a whole lot of mutable state for us. We then run the whole ball of code statefully inside this context, mutating away and transitioning back and forth between lexing, parsing, name binding, and even type checking. (We have C++ to thank for that last one, as it complicates parsing even further by making it type-dependent, and also hands us a Turing-complete type system just to make things even more interesting.)
Clang tries to keep things separate, mostly. There’s still an AST representation that’s getting constructed, it’s just one that’s already had name analysis and type checking information filled in. The code in the Parse module tries to call over to a separate Sema module that does name binding and type checking. So it’s not completely mixed up. But it is partially conflated by necessity: all the logic about where scopes open and close has to live in Parse because that’s the code that’s driving everything.
This is a pretty decent compiler design for trying to cope with a language design decision that never should have happened. If you’ve ever wondered why Java tooling is so much better than C/C++ tooling, this design mistake is part of the answer. (The preprocessor is the bigger problem by far, though.)
Story time: breaking the rules well
This is not to say that back-feeding information is always terrible. The problem with C’s lexer hack is that’s a lot of complexity. It’s hard to reason about what’s going on, everything we’re trying to do gets more complicated. But it’s also possible to make the same mistake in a different direction.
I’ve described traditional compilers as separating lexing from parsing, but this separation is likely a bad decision, in retrospect. If you’ve ever tried to use a lexer & parser generator, you’ve probably gotten an error message like Syntax error and marveled at how it manages to not even have any clue where the syntax error was, much less what the darned problem is. What we’re suffering from here is over-abstraction making things harder for ourselves.
There’s nothing about parsing that requires separate lexing and parsing. By separating these things, we can run into problems like our parser not having any notion of where it is within the file it’s parsing. (It’s parsing an abstract token stream, it has no notion of a file!) By separating these things, we end up having to manually re-implement something like location tracking because ostensibly that token stream might have come from something other than the lexer, and these tools are trying not to couple with each other.
Indeed, if we stop and look at a high-level at what a language’s grammar is, we see that not only is a lexer an implicit part of the specification, and the parser also an implicit part, but there’s also an implicit data structure involved: the “concrete syntax tree.” (Basically, look at the grammar as a set of algebraic data type declarations.) We might start to get suspicious of our design when our single high-level description of a language’s grammar gets cut up into three artificially separated components that create more work for us.
And there’s potential benefit to back-feeding information, too. Layout and parsing context information can inform the lexer and make it possible to parse grammars you otherwise couldn’t.
So in this particular instance, perhaps we went too far in breaking things down into pipelined steps. More intermediate representations doesn’t necessarily mean better.
145 adamnemecek 14 hrs 72
View PDF: https://www.bankofengland.co.uk/-/media/boe/files/quarterly-bulletin/2014/money-creation-in-the-modern-economy.pdf OR View this PDF using Google Docs
CNBC股市名嘴Jim Cramer在節目裏指出,有7家「雲端王者」(cloud kings)潛力極厚,建議投資者在回調時買進。這7家企業的名字是Adobe、Salesforce、ServiceNow、Red Hat、VMware、Splunk、Workday。除了Adobe以外,都未聽過?這七家企業的業務大大改變了客戶的營運方式,過去一年升幅達四成至一倍,遠遠跑贏納斯特綜合指數,且市值已達數百億甚至上千億美元。
Adobe在人工智能(AI)和雲端應用程式發展迅速。Salesforce則在營銷、社群交媒體等工具上,幫助企業更了解客戶。ServiceNow幫助公司創造軟件,讓工作自動化、降低人工成本。Workday的對象,則是人力資源之類的非技術部門。
SAAS發展潛力巨大
說起雲端,大部分人都以為那只是把資料放上網,不過是平了伺服器成本,但事實當然遠不止此。雲端分為三大範疇,最初的Iaas(Infrastructure as a service)和Paas(Platform as a service),大概便是人們認識的「把資料放上網」,但未來發展空間最大的,卻可能是較少人熟悉的Saas(Software as a service)。
據研究機構估計,Saas佔雲端市場一半以上,且發展起點較遲,市場分布又極為鬆散。行業整合下,領頭羊的增長空間極大。Saas是做什麼的呢?便以銷售軟件做例好了。舉例說,你今天要見一新客戶,但除了名字、公司名稱和職位外,皆一無所知。假如現在有一軟件,更新客戶名稱後,便會替你連結客戶的社交媒體,看看你們之間是否有共同朋友,客戶最近在社交媒體發布了些什麼,客戶企業網頁又有沒有什麼更新,讓你在見面前,已洞悉先機。這樣的服務,你會否願意付一點錢?
又或者,現代企業隨時要管理網頁,以及四、五個社交平台,應付客人查詢。如果有一個軟件,可以讓你在一個地方,同時管理幾個網上平台,甚至同時發布更新。這樣的服務,你又會否願意付一點錢?
提升企業溝通效率
Saas的根本,便是人們從買軟件安裝,到一站式軟件供應,提供軟件是它,維修是它、系統升級也是它,從此不用再買什麼word、excel,一切租用便可,企業也可減少IT人手。
雲端較傳統伺服器更佳者,是同步化(synchronize)做得好。以正在美上市的Dropbox作例,以往有同事改了文件內容,你只可以看到修改時間和誰修改。同事不說,你可能好幾天後才驚覺數字已更新。但現在你只要下載一個App,每次打開,便會告訴你公司哪些文件在什麼時候被誰更新了。前線銷售人員可以第一時間知悉存貨和報價,同事準備簡報,也不用修改後便reply to all好幾十次,企業溝通效率直線上升。
但現在雲端企業,多在香港以外上市。香港也不是沒有,但企業規模較小,業績穩定性有待觀察。未來港股或會跑贏美股,但以科技股為主的納指則可能跑贏港股,而雲端則可能跑贏納指。雲端離普及仍遠,Saas更是仍在初期階段,投資者宜多留注意。
hcl.hkej@gmail.com
(編者按:郝承林著作《致富新世代2──科網君臨天下》現已發售)
歡迎訂購:實體書、電子書
79 dimitrov 3 hrs 12
http://adilmoujahid.com/posts/2018/03/intro-blockchain-bitcoin-python/
Blockchain is arguably one of the most significant and disruptive technologies that came into existence since the inception of the Internet. It's the core technology behind Bitcoin and other crypto-currencies that drew a lot of attention in the last few years.
As its core, a blockchain is a distributed database that allows direct transactions between two parties without the need of a central authority. This simple yet powerful concept has great implications for various institutions such as banks, governments and marketplaces, just to name a few. Any business or organization that relies on a centralized database as a core competitive advantage can potentially be disrupted by blockchain technology.
Putting aside all the hype around the price of Bitcoin and other cryptocurrencies, the goal of this blog post is to give you a practical introduction to blockchain technology. Sections 1 and 2 cover some core concepts behind blockchain, while section 3 shows how to implement a blockchain using Python. We will also implement 2 web applications to make it easy for end users to interact with our blockchain.
Please note that I'm using Bitcoin here as a medium for explaning the more general technology of "Blockchain", and most of the concepts described in this post are applicable to other blockchain use cases and crypto-currencies.
Below is an animated gif of the two web apps that we will build in section 3.
It all started with a white paper released in 2008 by an unknown person or entity using the name Satoshi Nakamoto. The white paper was titled “Bitcoin: A Peer-to-Peer Electronic Cash System” and it laid the foundation of what later became known as Blockchain. In the original Bitcoin white paper, Satoshi described how to build a peer-to-peer electronic cash system that allows online payments to be sent directly from one party to another without going through a centralized institution. This system solves an important problem in digital money called double-spending.
1.1. What is Double-Spending?
Suppose that Alice wants to pay Bob 1$. If Alice and Bob use physical cash, then Alice will not longer have the 1$ after the transaction is executed. If Alice and Bob use digital money, then the problem gets more complicated. Digital money is in digital form and can be easily duplicated. If Alice sends a digital file worth 1$ to Bob by email for example, Bob cannot know for sure if Alice has deleted her copy of the file. If Alice still has the 1$ digital file, then she can choose to send the same file to Carol. This problem is called double-spending.
Alt Text
One way of solving the double-spending problem is to have a trusted third party (a bank for example) between Alice, Bob and all other participants in the network. This third party is responsible for managing a centralized ledger that keeps track of and validates all the transactions in the network. The drawback of this solution is that for the system to function, it requires trust in a centralized third party.
1.2. Bitcoin: A Decentralized Solution for the Double-Spending Problem
To solve the double-spending problem, Satoshi proposed a public ledger, i.e., Bitcoin’s blockchain to keep track of all transactions in the network. Bitcoin’s blockchain has the following characteristics:
Distributed: The ledger is replicated across a number of computers, rather than being stored on a central server. Any computer with an internet connection can download a full copy of the blockchain. Cryptographic: Cryptography is used to make sure that the sender owns the bitcoin that she's trying to send, and to decide how the transactions are added to the blockchain. Immutable: The blockchain can be changed in append only fashion. In other words, transactions can only be added to the blockchain but cannot be deleted or modified. Uses Proof of Work (PoW): A special type of participants in the network called miners compete on searching for the solution to a cryptographic puzzle that will allow them to add a block of transactions to Bitcoin’s blockchain. This process is called Proof of Work and it allows the system to be secure (more on this later). Alt Text
Sending bitcoin money goes as follows:
Step 1 (one-time effort): Create a bitcoin wallet. For a person to send or receive bitcoins, she needs to create a bitcoin wallet. A bitcoin wallet stores 2 pieces of information: A private key and a public key. The private key is a secret number that allows the owner to send bitcoin to another user, or spend bitcoins on services that accept them as payment method. The public key is a number that is needed to receive bitcoins. The public key is also referred to as bitcoin address (not entirely true, but for simplicity we will assume that the public key and the bitcoin address are the same). Note that the wallet doesn’t store the bitcoins themselves. Information about bitcoins balances are stored on the Bitcoin’s blockchain. Step 2: Create a bitcoin transaction. If Alice wants to send 1 BTC to Bob, Alice needs to connect to her bitcoin wallet using her private key, and create a transaction that contains the amount of bitcoins she wants to send and the address where she wants to send them (in this case Bob’s public address). Step 3: Broadcast the transaction to Bitcoin’s network. Once Alice creates the bitcoin transaction, she needs to broadcast this transaction to the entire Bitcoin’s network. Step 4: Confirm the transaction. A miner listening to Bitcoin’s network authenticates the transaction using Alice's public key, confirms that Alice has enough bitcoins in her wallet (in this case at least 1 BTC), and adds a new record to Bitcoin’s Blockchain containing the details of the transaction. Step 5: Broadcast the blockchain change to all miners. Once the transaction is confirmed, the miner should broadcast the blockchain change to all miners to make sure that their copies of the blockchain are all in sync. 2. A Technical Deep Dive on Blockchain
The goal of this section is to go deeper into the technical building blocks that power the blockchain. We will cover public key cryptography, hashing functions, mining and security of the blockchain.
2.1. Public Key Cryptography
Public-key cryptography, or asymmetrical cryptography, is any cryptographic system that uses pairs of keys: public keys which may be disseminated widely, and private keys which are known only to the owner. This accomplishes two functions: authentication, where the public key verifies a holder of the paired private key sent the message, and encryption, where only the paired private key holder can decrypt the message encrypted with the public key. [1]
RSA and Elliptic Curve Digital Signature (ECDSA) are the most popular public-key cryptography algorithms.
In the case of Bitcoin, ECDSA algorithm is used to generate Bitcoin wallets. Bitcoin uses a variety of keys and addresses, but for the sake of simplicity, we will assume in this blog post that each Bitcoin wallet has one pair of private/public keys and that a Bitcoin address is the wallet's public key. I recommend this article, if you're interested in the complete technical details of Bitcoin wallets.
To send or receive BTCs, a user starts by generating a wallet which contains a pair of private and public keys. If Alice wants to send Bob some BTCs, she creates a transaction in which she enters both her and Bob's public keys, and the amount of BTCs she wants to send. She then sign the transaction using her private key. A computer on the blockchain uses Alice's public key to verify that the transaction is authentic and adds the transaction to a block that will be later added to the blockchain.
Alt Text
2.2. Hashing Functions and Mining
All Bitcoin transactions are grouped in files called blocks. Bitcoin adds a new block of transactions every 10 minutes. Once a new block is added to the blockchain, it becomes immutable and can't be deleted or modified. A special group of participants in the network called miners (computers connected to the blockchain) are responsible for creating new blocks of transactions. A miner has to authenticate each transaction using the sender's public key, confirm that the sender has enough balance for the requested transaction, and add the transaction to the block. Miners are completely free to choose which transactions to include in the blocks, therefore the senders need to include a transaction fee to incentivise the miners to add their transactions to the blocks.
For a block to be accepted by the blockchain, it needs to be "mined". To mine a block, miners need to find an extremely rare solution to a cryptographic puzzle. If a mined block is accepted by the blockchain, the miner receive a reward in bitcoins which is an additional incentive to transaction fees. The mining process is also referred to as Proof of Work (PoW), and it's the main mechanism that enables the blockchain to be trustless and secure (more on blockchain security later).
Hashing and Blockchain's Cryptographic Puzzle
To understand the blockchain's cryptographic puzzle, we need to start with hash functions. A hash function is any function that can be used to map data of arbitrary size to data of fixed size. The values returned by a hash function are called hashes. Hash functions are usually used to accelerate database lookup by detecting duplicated records, and they are also widely used in cryptography. A cryptographic hash function allows one to easily verify that some input data maps to a given hash value, but if the input data is unknown, it is deliberately difficult to reconstruct it by knowing the stored hash value. [2]
Bitcoins uses a cryptographic hash function called SHA-256. SHA-256 is applied to a combination of the block's data (bitcoin transactions) and a number called nonce. By changing the block data or the nonce, we get completely different hashes. For a block to be considered valid or "mined", the hash value of the block and the nonce needs to meet a certain condition. For example, the four leading digits of the hash needs to be equal to "0000". We can increase the mining complexity by making the condition more complex, for example we can increase the number of 0s that the hash value needs to start with.
The cryptograhic puzzle that miners need to solve is to find a nonce value that makes the hash value satisfies the mining condition. You can use the app below to simulate block mining. When you type in the "Data" text box or change the nonce value, you can notice the change in the hash value. When you click the "Mine" button, the app starts with a nonce equals to zero, computes the hash value and checks if the leading four digits of the hash value is equal to "0000". If the leading four digits are not equal to "0000", it increments the nonce by one and repeats the whole process until it finds a nonce value that satisify the condition. If the block is considered mined, the background color turns green.
2.3. From Blocks to Blockchain
As discussed in the previous section, transactions are grouped in blocks and blocks are appended to the blockchain. In order to create a chain of blocks, each new block uses the previous block’s hash as part of its data. To create a new block, a miner selects a set of transactions, adds the previous block’s hash and mines the block in a similar fashion described above.
Any changes to the data in any block will affect all the hash values of the blocks that come after it and they will become invalid. This give the blockchain its immutability characteristic.
Alt Text
You can use the app below to simulate a blockchain with 3 blocks. When you type in the "Data" text box or change the nonce value, you can notice the change in the hash value and the "Prev" value (previous hash) of the next block. You can simulate the mining process by clicking on the “Mine” button of each individual block. After mining the 3 blocks, try changing the data in block 1 or 2, and you will notice that all the blocks that come after become invalid.
Both mining simulators above were adapted from Anders Brownworth's excellent Blockchain Demo.
2.4. Adding Blocks to the Blockchain
All the miners in the Bitcoin network compete with each other to find a valid block that will be added to the blockchain and get the reward from the network. Finding a nonce that validated a block is rare, but because of the number of miners, the probability of a miner in the network validating a block is extremely high. The first miner to submit a valid block gets his block added to the blockchain and receives the reward in bitcoins. But what happens if two miners or more submit their blocks at the same time?
Resolving Conflicts
If 2 miners solve a block at almost the same time, then we will have 2 different blockchains in the network, and we need to wait for the next block to resolve the conflict. Some miners will decide to mine on top of blockchain 1 and others on top of blockchain 2. The first miner to find a new block resolves the conflict. If the new block was mined on top of blockchain 1, then blockchain 2 becomes invalid, the reward of the previous block goes to the miner from blockchain 1 and the transactions that were part of blockchain 2 and weren’t added to the blockchain go back to the transactions pool and get added to the next blocks. In short, if there is a conflict on the blockchain, then the the longest chain wins.
Alt Text
2.5. Blockchain and Double-Spending
In this section, we will cover the most popular ways for performing double-spending attacks on the blockchain, and the measures that users should take to prevent damages from them.
Race Attack
An attacker sends the same coin in rapid succession to two different addresses. To prevent from this attack, it is recommended to wait for at least one block confirmation before accepting the payment. [3]
Finney Attack
An attacker pre-mines a block with a transaction, and spends the same coins in a second transaction before releasing the block. In this scenario, the second transaction will not be validated. To prevent from this attack, it is recommended to wait for at least 6 block confirmations before accepting the payment. [3]
Majority Attack (also called 51% attack)
In this attack, the attacker owns 51% of the computing power of the network. The attacker starts by making a transaction that is brodcasted to the entire network, and then mines a private blockchain where he double-spends the coins of the previous transaction. Since the attacker owns the majority of the computing power, he is guaranteed that he will have at some point a longer chain than the "honest" network. He can then release his longer blockchain that will replace the "honest" blockchain and cancel the original transaction. This attack is highly unlikely, as it’s very expensive in blockchain networks like Bitcoin. [4]
In this section, we will implement a basic blockchain and a blockchain client using Python. Our blockchain will have the following features:
Possibility of adding multiple nodes to the blockchain Proof of Work (PoW) Simple conflict resolution between nodes Transactions with RSA encryption Our blockchain client will have the following features:
Wallets generation using Public/Private key encryption (based on RSA algorithm) Generation of transactions with RSA encryption We will also implement 2 dashboards:
"Blockchain Frontend" for miners "Blockchain Client" for users to generate wallets and send coins The blockchain implementation is mostly based on this github project. I made a few modifications to the original code in order to add RSA encryption to the transactions. Wallet generation and transaction encryption is based on this Jupyter notebook. The 2 dashboard are implemented from scratch using HTML/CSS/JS.
You can download the complete source code from https://github.com/adilmoujahid/blockchain-python-tutorial.
Please note that this implementation is for educational purposes only and shouldn't be use in production as it doesn't have good security, doesn't scale well and lacks many important features.
3.1. Blockchain Client Implementation
You can start the blockchain client from the terminal by going to the blockchain_client folder, and typing python blockchain_client.py. In your browser, go to http://localhost:8080 and you'll see the dashboard below.
The dashboard has 3 tabs in the navigation bar:
Wallet Generator: To generate wallets (Public/Private keys pair) using RSA encryption algorithm Make Transaction: To generate transactions and send them to a blockchain node View Transasctions: To view the transactions that are on the blockchain In order to make or view transactions, you will need at least one blockchain node running (to be covered in next section).
Below is some explanation of the most important parts in the blockchain_client.py code.
We define a python class that we name Transaction that has 4 attributes sender_address, sender_private_key, recipient_address, value. These are the 4 pieces of information that a sender needs to create a transaction.
The to_dict() method returns the transaction information in a Python dictionary format (without the sender's private key). The sign_transaction() method takes the transaction information (without the sender's private key) and signs it using the sender's private key.
class Transaction: def init(self, sender_address, sender_private_key, recipient_address, value): self.sender_address = sender_address self.sender_private_key = sender_private_key self.recipient_address = recipient_address self.value = value def getattr(self, attr): return self.data[attr] def to_dict(self): return OrderedDict({'sender_address': self.sender_address, 'recipient_address': self.recipient_address, 'value': self.value}) def sign_transaction(self): """ Sign transaction with private key """ private_key = RSA.importKey(binascii.unhexlify(self.sender_private_key)) signer = PKCS1_v1_5.new(private_key) h = SHA.new(str(self.to_dict()).encode('utf8')) return binascii.hexlify(signer.sign(h)).decode('ascii') The line below initate a Python Flask app that we will use to create different APIs to interact with the blockchain and its client.
Below we define the 3 Flask routes that returns html pages. One html page for each tab.
@app.route('/') def index(): return render_template('./index.html') @app.route('/make/transaction') def make_transaction(): return render_template('./make_transaction.html') @app.route('/view/transactions') def view_transaction(): return render_template('./view_transactions.html') Below we define an API that generates wallets (Private/Public keys pairs).
@app.route('/wallet/new', methods=['GET']) def new_wallet(): random_gen = Crypto.Random.new().read private_key = RSA.generate(1024, random_gen) public_key = private_key.publickey() response = { 'private_key': binascii.hexlify(private_key.exportKey(format='DER')).decode('ascii'), 'public_key': binascii.hexlify(public_key.exportKey(format='DER')).decode('ascii') } return jsonify(response), 200
Below we define an API that takes as input sender_address, sender_private_key, recipient_address, value, and returns the transaction (without private key) and the signature.
@app.route('/generate/transaction', methods=['POST']) def generate_transaction(): sender_address = request.form['sender_address'] sender_private_key = request.form['sender_private_key'] recipient_address = request.form['recipient_address'] value = request.form['amount'] transaction = Transaction(sender_address, sender_private_key, recipient_address, value) response = {'transaction': transaction.to_dict(), 'signature': transaction.sign_transaction()} return jsonify(response), 200
3.2. Blockchain Implementation
You can start a blockchain node from the terminal by going to the blockchain folder, and type python blockchain_client.py or python blockchain_client.py -p . If you don't specify a port number, it will default to port 5000. In your browser, go to http://localhost: to see the blockchain frontend dashboard.
The dashboard has 2 tabs in the navigation bar:
Mine: For viewing transactions and blockchain data, and for mining new blocks of transactions. Configure: For configuring connections between the different blockchain nodes. Below is some explanation of the most important parts in the blockchain.py code.
We start by defining a Blockchain class that has the following attributes:
transactions: List of transactions that will be added to the next block. chain: The actual blockchain which is an array of blocks. nodes: A set containing node urls. The blockchain uses these nodes to retrieve blockchain data from other nodes and updates its blockchain if they're not in sync. node_id: A random string to identify the blockchain node. The Blockchain class also implements the following methods:
register_node(node_url): Adds a new blockchain node to the list of nodes. verify_transaction_signature(sender_address, signature, transaction): Checks that the provided signature corresponds to transaction signed by the public key (sender_address). submit_transaction(sender_address, recipient_address, value, signature): Adds a transaction to list of transactions if the signature verified. create_block(nonce, previous_hash): Adds a block of transactions to the blockchain. hash(block): Create a SHA-256 hash of a block. proof_of_work(): Proof of work algorithm. Looks for a nonce that satisfies the mining condition. valid_proof(transactions, last_hash, nonce, difficulty=MINING_DIFFICULTY): Checks if a hash value satisfies the mining conditions. This function is used within the proof_of_work function. valid_chain(chain): checks if a bockchain is valid. resolve_conflicts(): Resolves conflicts between blockchain's nodes by replacing a chain with the longest one in the network. class Blockchain: def init(self): self.transactions = [] self.chain = [] self.nodes = set() #Generate random number to be used as node_id self.node_id = str(uuid4()).replace('-', '') #Create genesis block self.create_block(0, '00') def register_node(self, node_url): """ Add a new node to the list of nodes """ ... def verify_transaction_signature(self, sender_address, signature, transaction): """ Check that the provided signature corresponds to transaction signed by the public key (sender_address) """ ... def submit_transaction(self, sender_address, recipient_address, value, signature): """ Add a transaction to transactions array if the signature verified """ ... def create_block(self, nonce, previous_hash): """ Add a block of transactions to the blockchain """ ... def hash(self, block): """ Create a SHA-256 hash of a block """ ... def proof_of_work(self): """ Proof of work algorithm """ ... def valid_proof(self, transactions, last_hash, nonce, difficulty=MINING_DIFFICULTY): """ Check if a hash value satisfies the mining conditions. This function is used within the proof_of_work function. """ ... def valid_chain(self, chain): """ check if a bockchain is valid """ ... def resolve_conflicts(self): """ Resolve conflicts between blockchain's nodes by replacing our chain with the longest one in the network. """ ... The line below initate a Python Flask app that we will use to create different APIs to interact with the blockchain.
app = Flask(name) CORS(app) Next, we initiate a Blockchain instance.
blockchain = Blockchain() Below we define the 2 Flask routes that return the html pages for our blockchain frontend dashboard.
@app.route('/') def index(): return render_template('./index.html') @app.route('/configure') def configure(): return render_template('./configure.html') Below we define Flask APIs to manage transactions and mining the blockchain.
'/transactions/new': This API takes as input 'sender_address', 'recipient_address', 'amount' and 'signature', and adds the transaction to the list of transactions that will be added to next block if the signature is valid. '/transactions/get': This API returns all the transactions that will be added to the next block. '/chain': This API returns all blockchain data. '/mine': This API runs the proof of work algorithm, and adds the new block of transactions to the blockchain. @app.route('/transactions/new', methods=['POST']) def new_transaction(): values = request.form # Check that the required fields are in the POST'ed data required = ['sender_address', 'recipient_address', 'amount', 'signature'] if not all(k in values for k in required): return 'Missing values', 400 # Create a new Transaction transaction_result = blockchain.submit_transaction(values['sender_address'], values['recipient_address'], values['amount'], values['signature']) if transaction_result == False: response = {'message': 'Invalid Transaction!'} return jsonify(response), 406 else: response = {'message': 'Transaction will be added to Block '+ str(transaction_result)} return jsonify(response), 201 @app.route('/transactions/get', methods=['GET']) def get_transactions(): #Get transactions from transactions pool transactions = blockchain.transactions response = {'transactions': transactions} return jsonify(response), 200 @app.route('/chain', methods=['GET']) def full_chain(): response = { 'chain': blockchain.chain, 'length': len(blockchain.chain), } return jsonify(response), 200 @app.route('/mine', methods=['GET']) def mine(): # We run the proof of work algorithm to get the next proof... last_block = blockchain.chain[-1] nonce = blockchain.proof_of_work() # We must receive a reward for finding the proof. blockchain.submit_transaction(sender_address=MINING_SENDER, recipient_address=blockchain.node_id, value=MINING_REWARD, signature="") # Forge the new Block by adding it to the chain previous_hash = blockchain.hash(last_block) block = blockchain.create_block(nonce, previous_hash) response = { 'message': "New Block Forged", 'block_number': block['block_number'], 'transactions': block['transactions'], 'nonce': block['nonce'], 'previous_hash': block['previous_hash'], } return jsonify(response), 200
Below we define Flask APIs to manage blockchain nodes.
'/nodes/register': This API takes as input a list of node urls, and adds them to the list of nodes. '/nodes/resolve': This API resolves conflicts between blockchain nodes by replacing a local chain with the longest one available in the network. '/nodes/get': This API returns the list of nodes. @app.route('/nodes/register', methods=['POST']) def register_nodes(): values = request.form nodes = values.get('nodes').replace(" ", "").split(',') if nodes is None: return "Error: Please supply a valid list of nodes", 400 for node in nodes: blockchain.register_node(node) response = { 'message': 'New nodes have been added', 'total_nodes': [node for node in blockchain.nodes], } return jsonify(response), 201 @app.route('/nodes/resolve', methods=['GET']) def consensus(): replaced = blockchain.resolve_conflicts() if replaced: response = { 'message': 'Our chain was replaced', 'new_chain': blockchain.chain } else: response = { 'message': 'Our chain is authoritative', 'chain': blockchain.chain } return jsonify(response), 200 @app.route('/nodes/get', methods=['GET']) def get_nodes(): nodes = list(blockchain.nodes) response = {'nodes': nodes} return jsonify(response), 200
Conclusion
In this blog post, we covered some core concepts behind blockchain and we learned how to implement one using Python. For the sake of simplicity, I didn't cover some technical details, for example: Wallet addresses and Merkel trees. If you want to learn more about the subject, I recommend reading the original Bitcoin white paper and follow up with bitcoin wiki and Andreas Antonopoulos's excellent book: Mastering Bitcoin: Programming the Open Blockchain.
References
Go Top
201 jcbrand 1 hr 49
Slack has finally decided to close down their IRC and XMPP gateways.
True to form, you can only read their announcement if you already have a Slack account and are logged in to a workspace.
Here's the gist of their announcement:
As Slack has evolved over the years, we've built features and capabilities — like Shared Channels, Threads, and emoji reactions (to name a few) — that the IRC and XMPP gateways aren't able to handle. Our priority is to provide a secure and high-quality experience across all platforms, and so the time has come to close the gateways. They're of course being economical with the truth here.
Firstly, I've used their XMPP gateway, and emojis work fine.
Perhaps their XMPP gateway can't handle "Shared Channels" and "Threads", but that's because they purposefully stopped working on it.
Of those 3 features which they mention specifically, all of them could be handled via XMPP.
A "Shared Channel" simply means a chatroom which people from outside your workspace can participate in. If a workspace is mapped to a members-only chatroom, then making something a shared channel simply means updating the members list or making the chatroom open (so anybody can join it).
Threads can be implemented by adding a element in the message stanza, as documented in XEP-201.
And emoji... there's nothing in XMPP that prevents people from sending emoji.
We all know the real reason Slack has closed off their gateways. Their business model dictates that they should.
Slack's business model is to record everything said in a workspace and then to sell you access to their record of your conversations.
They're a typical walled garden, information silo or Siren Server
So they have to close everything off, to make sure that people can't extract their conversations out of the silo.
We saw it with Google, who built Gtalk on XMPP and even federated with other XMPP servers, only to later stop federation and XMPP support in favour of trying to herd the digital cattle into the Google+ enclosure.
Facebook, who also built their chat app on XMPP at first allowed 3rd party XMPP clients to connect and then later dropped interoperability.
Twitter, although not using or supporting XMPP, had a vibrant 3rd party client ecosystem which they killed off once they felt big enough.
Slack, like so many others before them, pretend to care about interoperability, opening up just so slightly, so that they can lure in people with the promise of "openness", before eventually closing the gate once they've achieved sufficient size and lock-in.
When we talk about "federation" in networks, we mean the ability to communicate between different service providers.
For example, email is federated. You can set up your own email server, and then send emails to people with their own email servers, or to people with Gmail or Yahoo! accounts.
You can email any other email address in the world, regardless of where that email address is hosted.
If email never existed, and a company like Slack today would come out with this brand new concept of "Electronic Mail", let's call it digimail, do you think they would standardise the digimail protocol and allow you to send messages to other digimail purveyors?
We all know the answer to that. They won't, and neither would Google, Microsoft or Facebook.
Heck, Facebook is actively trying to replace email since years.
The reason email is federated, is because it was developed before surveillance capitalism was a thing and because it was established and entrenched long before these companies came around.
There's a reason why your email address is still the de facto way to sign up for any service on the web (sometimes with one or two degrees of separation), and it's because of federation.
XMPP is designed to allow federation. Think about that. Instead of having to sign up to various different chat providers, all which try to lock you in and monetize your conversations, you could instead have one chat account, and use that to chat with anybody else, regardless of which chat provider their using.
Alas, that's the dream, but because XMPP came much later to the scene, it didn't develop the critical mass as email has, and here we are. With dozens of chat apps, all non-interoperable and closed off.
One of the sad things that has come out of Slack's meteoric rise to success, has been how many free and open source projects have jumped over to using it (after previously using IRC or XMPP).
In so doing, they have closed off their discussions from search engines and they prevent people from accessing their past archives.
Slack has many cool features, and they work very well, I'm not going to deny it.
However, the XMPP Software Foundation has done a lot of work in recent years to enable protocol extensions that provide features that people have come to expect from chat applications, for example:
Unfortunately XMPP clients have been lagging far behind in various respects.
One of the main problems is funding. The modern digital economy is largely set up around surveillance capitalism and user lock-in.
So attempts to create software that doesn't follow these precepts, often end up unfunded or underfunded.
However, our "weakness", is also our strength.
XMPP clients, and the XMPP network can provide something that Slack never can. Federation, free and open software, interoperability, extensibility and user choice.
For the last few years I've been working in my spare time on making a JavaScript XMPP chat client, called converse.js.
Originally the idea was to make a Gtalk like chat client that you integrate in your website, and it can still be used like that.
A screenshot of converse.js as overlayed chatboxes However, in the last year I've updated it further so that it can also be used as a fullscreen application, like Slack is used.
You can try the fullscreen version at https://inverse.chat/
A screenshot of converse.js as a fullscreen application If you have no-one to chat to, then come join the discuss@conference.conversejs.org chat room.
This link will take you directly there: https://inverse.chat/#converse/room?jid=discuss@conference.conversejs.org
Converse.js still lacks lots of features that Slack has, but that's not because XMPP itself can't support those features.
What converse.js however does have, is that it's free and open source software, based on a standard protocol and it can be extended, updated and improved upon, by anyone.
We're actively working on adding new features and more and more people are joining in.
Moreover, anybody can host it and you can integrate it into any website.
Ultimately, I believe in the power and utility of interoperability and software freedom, even though the current trend is to close off and lock down.
These information silos are as powerful as we make them. If enough projects choose standardised protocols and FOSS software, we will be able to create viable alternatives that foster freedom instead of lock-in.
148 danielam 15 hrs 86
Picture a day like this: You wake up and head to your job at a small company you own and manage together with your fellow workers, doing high-tech, advanced manufacturing that’s too specialized for bigger factories. For lunch, you swing by a restaurant owned by another worker cooperative, this one a national-scale firm that serves millions of customers each year. Back at work, you’ve got a meeting with a local agricultural co-op that’s contracted your company to help design some more efficient processing material for the food they produce and export across the world. Afterward, you meet up with your partner, who works in a social cooperative jointly owned by caregivers and the elders who live and receive care there. The two of you swing by the local grocery store—part of a national chain owned by its millions of customers—and pick up a bottle of co-op-produced wine. This is a day in the life of the cooperative economy in Northern Italy’s Emilia Romagna region.
Emilia Romagna, a region with nearly 4.5 million people whose capital is the medieval university city of Bologna, has one of the densest cooperative economies in the world. About two out of every three inhabitants are co-op members, together producing around 30 percent of the region’s GDP.
Emilia Romagna’s co-op economy is a product of organizing going back to at least the 1850s ... Doing business through co-ops is one of the clearest ways to democratize our economic institutions. But as anyone who has developed or worked in a cooperative will tell you, co-ops aren’t magic. Building institutions that go against the grain of corporate capitalism while managing to survive in the markets it creates is not easy to pull off. There’s plenty of room to fail, and even more room to do better. While cooperatives in the United States claim about 130 million memberships, these are by and large within consumer- and producer-owned co-ops, not cooperative workplaces. Only around 7,000 people nationwide are part of worker co-ops.
That’s why it’s helpful to learn from countries where the cooperative economy is more developed and more densely integrated than in the United States—not because they’re utopian, post-capitalist wonderlands, but because they’ve got the hard-won experience that can teach other co-op creators how to scale up the community-owned economy effectively and creatively.
How did Northern Italy’s complex, intertwined, and resilient cooperative network develop and grow? That’s the question Vera Zamagni, professor of economic history at the University of Bologna, has been trying to answer throughout her career as one of Emilia Romagna’s foremost cooperative scholars.
In her work, Zamagni shows that Emilia Romagna’s co-op economy is a product of organizing going back to at least the 1850s, developing in conjunction with a rich, high-value-added agricultural tradition and surviving despite a brutal historical encounter with fascism.
When an Italian dairy cooperative can raise more than $6 million in financing by issuing bonds backed by aging wheels of Parmesan cheese—as one did earlier this year when the Parmesan market proved too uncertain for banks—it’s easy to feel like we’ve fallen through the looking glass. We can’t exactly replicate what the people of Emilia Romagna have created, but there’s plenty we can learn. Here are six key lessons for building a cooperative-rich economy.
For many U.S. co-op advocates, the Basque Country’s Mondragon—which has tens of thousands of worker-owners and cooperative businesses linked into a single, giant cooperative corporation—is the go-to reference for convincing people that co-ops can scale. Not for Zamagni. “North America is fond of the Mondragon corporation because it resembles more closely the typical American corporation in size, but with different management principles,” she says. But it’s a unique case that has never been replicated elsewhere.
In Emilia Romagna, the cooperative movement is more a networked ecosystem than a single, overarching corporation. This has key advantages. If you can’t build a giant firm because the sector you are working in requires flexibility and specialization, or if the people involved are simply uninterested in being part of a giant corporation, then, Zamagni says, the network form can give you all the advantages of scale without overcentralization. In Italy, the cooperative movement is not a single company, but a whole interwoven fabric of “horizontal, vertical, [and] complementary networks” that support each other financially.
For those aspiring to build co-ops in the United States, the ecosystem pattern—in which different cooperative businesses and development efforts interact in a loose web of mutual support—is likely a much better place to start than trying to replicate the more monolithic approach of an initiative like Mondragon. A networked ecosystem—decentralized and resilient—can harness energy and interest at different levels and in different sectors to develop, grow, and thrive.
If the amount of worker cooperatives in Emilia Romagna is impressive, the scale of consumer cooperatives in Italy’s retail sector is awe-inspiring. Coop is the largest retail chain in Italy, with its supermarkets and “hypermarkets” claiming close to 20 percent of market share, and the whole enterprise is owned by its 7.4 million consumer members across the country. How did it get so big?
The answer is, as it turns out, crowdfunding. According to Zamagni, in the wake of a 1971 law that exempted co-ops from certain kinds of banking limitations, Coop was able to raise a lot of money in small amounts from many, many members. Coop became the Italian retail leader in part because it could tap its already sizable membership base for the loans it needed to expand. This kind of bottom-up lending covered more than half the funds Coop needed for a critical two-decade-long expansion effort in the ’80s and ’90s.
Many people worry—with good reason—that cooperatives won’t be able to compete with traditional corporations without abandoning their social mission. But focusing on cooperatives as market-driven enterprises might be a mistake. In Italy, social co-ops are on the rise, not as a way to produce goods and services for sale, but as a way to more effectively deliver social services.
Zamagni explains that bureaucratic welfare services were high-cost and low-quality, so citizens started self-organizing to deliver key care-related services themselves, which the government then helped formalize with new laws for multi-stakeholder cooperatives. This allowed caregivers and those receiving care to work together to govern the delivery of services.
The results have been impressive. In Bologna, for instance, as much as 85 percent of the city’s social services are provided through social co-ops. Some of the most interesting segments in the film WEconomics: Italy, which profiles Bologna’s co-op economy, revolve around social co-ops. The filmmakers take us inside a child care cooperative and an equally vibrant elder care cooperative. Both are workplaces built around compassion—not profit—and are designed with the interests of workers and those receiving care in mind. Here, cooperatives are community institutions that humanize social services in a way that neither state nor market mechanisms alone could.
Care work is already a key sector for the much smaller U.S. worker-co-op movement, accounting for somewhere close to a third of the 7,000 or so worker-owners in the United States. But the Italian example shows we can think a lot bigger.
The growth of Italian co-ops has been fueled by deep connections to broader sets of political commitments and values. The largest two federations, Legacoop and Confcooperative, are organized with strong historic ties to the Italian Communist Party and the Catholic Church. For Zamagni, these “strong communitarian ideologies” helped people set up businesses grounded in solidarity rather than pure profit.
Interestingly, because Italy has had two or three competing cooperative foundations with different sets of political values since the period after World War II, funding cooperatives has not been identified with one particular political camp. By law, cooperatives in Italy have to contribute a share of their profits to a cooperative federation to fund the further development of more cooperatives, but they get to pick which one. The pluralism here is worth noting—people have different reasons for wanting to democratize the economy, and it might be OK if they build parallel organizations to do so.
In a video released this February, Carrier workers in Indianapolis confront a seemingly heartless corporate functionary who explains that their jobs are being eliminated for the good of the company’s bottom line. In the United States, that’s usually the end of the story, but Italian cooperative law opens up more possibilities.
Under the Marcora Law, the money due to workers as unemployment insurance can be used as capital to cooperatize their workplace instead. With the help of the law, more than 9,000 workers who would have otherwise been out of a job have instead created 257 new worker-owned businesses in the past 30 years, like WBO Italcables in Naples, a steel factory cooperatized in December 2015 after its multinational owners shuttered the plant.
With a suite of complementary policies facilitating access to capital, cooperatives in Italy have been able to expand far more than they would have if they were playing by the same rules as non-cooperative enterprises. Zamagni highlights in particular the 1977 bill that exempted profits saved by cooperatives from corporate taxation and the law that obligated cooperatives to shift 3 percent of profits to one of the cooperative development funds managed by major co-op umbrella federations, greatly accelerating both the amount of money co-ops could reinvest in themselves and the larger movement. The Marcora Law to save jobs by turning companies into co-ops doesn’t make sense in isolation. It only works in conjunction with a robust, well-funded cooperative development ecosystem and with the policies that make cooperatives recognizable under the laws that support them with public subsidies.
With all of this cooperative energy, you might make the mistake of thinking that the Italian economy is doing amazingly well. It’s not. The Euro debt crisis is still far from over, and youth unemployment in Italy has been staggeringly high at more than 40 percent.
While the youth unemployment rate in Emilia Romagna is still high, it is nowhere near the catastrophic levels in Southern Italy, where in 2014 some regions saw nearly 60 percent of people aged 15-24 in the labor force unable to find work.
A 2013 report from the European Research Institute on Cooperative and Social Enterprises showed that “during the course of the crisis … the growth patterns of the various cooperative forms differed greatly from that of other forms of enterprise.” The analysis demonstrated an “anti-cyclical function” of cooperatives.
During the first four years of the ongoing European crisis, the report shows that cooperatives actually created a net increase in jobs. Employment in Italian cooperatives increased by 8 percent between 2007 and 2011. Furthermore, this “anti-cyclical” performance appears to be “caused primarily by the creation of new cooperatives.” In other words, as the global economy crumbled, people in Italy turned to cooperatives for a way forward.
As confidence in the current economic system continues to erode—with 70 percent of Americans believing the economy is rigged against them—we should pay close attention to the lessons Italy can teach us about how cooperatives can be a part of an alternative.
27 craigkerstiens 2 hrs 0
https://www.citusdata.com/blog/2018/02/21/three-approaches-to-postgresql-replication/
The Citus distributed database scales out PostgreSQL through sharding, replication, and query parallelization. For replication, our database as a service (by default) leverages the streaming replication logic built into Postgres.
When we talk to Citus users, we often hear questions about setting up Postgres high availability (HA) clusters and managing backups. How do you handle replication and machine failures? What challenges do you run into when setting up Postgres HA?
The PostgreSQL database follows a straightforward replication model. In this model, all writes go to a primary node. The primary node then locally applies those changes and propagates them to secondary nodes.
In the context of Postgres, the built-in replication (known as “streaming replication”) comes with several challenges:
Postgres replication doesn’t come with built-in monitoring and failover. When the primary node fails, you need to promote a secondary to be the new primary. This promotion needs to happen in a way where clients write to only one primary node, and they don’t observe data inconsistencies. Many Postgres clients (written in different programming languages) talk to a single endpoint. When the primary node fails, these clients will keep retrying the same IP or DNS name. This makes failover visible to the application. Postgres replicates its entire state. When you need to construct a new secondary node, the secondary needs to replay the entire history of state change from the primary node. This process is resource intensive—and makes it expensive to kill nodes in the head and bring up new ones. The first two challenges are well understood. Since the last challenge isn’t as widely recognized, we’ll examine it in this blog post.
Three approaches to replication in PostgreSQL
Most people think that when you have a primary and secondary architecture, there’s only one way to set up replication and backups. In practice, Postgres deployments follow one of three approaches.
PostgreSQL streaming replication to replicate data from primary to secondary node. Back up to S3 / Blob storage. Volume level replication to replicate at the storage layer from primary to secondary node. Back up to S3 / Blob storage. Take incremental backups from the primary node to S3. Reconstruct a new secondary node from S3. When secondary is close enough to primary, start streaming from primary. There’s also an easy way to identify which approach you’re using. Let’s say you added a new secondary node. How do you reconstruct the new secondary node’s state?
Approach 1: Streaming replication in PostgreSQL (with local storage)
Figure 1 - Streaming replication in Postgres, with local storage
This first approach is the most common one. You have a primary node. The primary node has the tables’ data and write-ahead logs (WAL). (When you modify a row in Postgres, the change first gets committed to an append-only redo log. This redo log is known as a write-ahead log, or WAL.) This Postgres WAL log then gets streamed over to a secondary node.
In this first approach, when you build a new secondary node, the new secondary needs to replay the entire state from the primary node—from the beginning of time. The replay operation may then introduce a significant load on the primary node. This load becomes more important if your database’s primary node serves live traffic.
In this approach, you can use local disks or attach persistent volumes to your instances. In the diagram above, we’re using local disks because that’s the more typical setup.
Approach 2: Replicated Block Device
Figure 2 - Replicated Block Device
The second approach relies on disk mirroring (sometimes called volume replication.) In this approach, changes get written to a persistent volume. This volume then gets synchronously mirrored to another volume. The nice thing about this approach is that it works for all relational databases. You can use it for MySQL, PostgreSQL, or SQL Server.
However, the disk mirroring approach to replication in Postgres also requires that you replicate both table and WAL log data. Further, each write to the database now needs to synchronously go over the network. You can’t miss a single byte because that could leave your database in a corrupt state.
Approach #3: Reconstruct from WAL (and switch to streaming replication)
Figure 3 - Reconstruct from WAL
The third approach turns the replication and disaster recovery process inside out. You write to the primary node. The primary node does a full database backup every day, and incremental backups every 60 seconds.
When you need to construct a new secondary node, the secondary reconstructs its entire state from backups. This way, you don’t introduce any load on the primary database. You can bring up new secondary nodes and reconstruct them from S3 / Blob storage. When the secondary node is close enough to the primary, you can start streaming WAL logs from the primary and catch up with it. In normal state, the secondary node follows the primary node.
In this approach, write-ahead logs are first class citizens. This design lends itself to a more cloud-native architecture. You can bring up or shoot down replicas at will without impacting your relational database’s performance. You can also use synchronous or asynchronous replication depending on your requirements.
How do these different approaches to Postgres replication compare?
Here’s a simple table that compares these approaches to each other. For each approach, you can think of its benefits as drawbacks for the other approaches.
Type of Postgres Who does this? Primary benefits Simply streaming replication (local disk) On-prem Manual EC2 Simpler to setup High I/O performance and large storage Replicated block device RDS Azure Postgres Works for MySQL, PostgreSQL Data duratbility in cloud environments Reconstruct from WAL (and switch to streaming replication) Heroku Citus Cloud Node reconstruction in background Enables fork and PITR Simple streaming replication is the most common approach. Most on-prem deployments follow this approach. It’s easy to set up. Further, when you set it up using local disks, you can store 10s of TBs of data.
Comparatively, the disk mirroring approach abstracts away the storage layer from the database. In this approach, when you lose an instance, you don’t lose your ephemeral disk with it. This approach also works across database technologies, for example with MySQL and Postgres.
In the third method, when you have a new machine, you reconstruct that machine’s state from WAL logs. Since you’re treating your WAL logs as a first class citizen, certain features become trivial. For example, let’s say you wanted to performance test your application against production data, but not against the production database. In the third method, you can “fork” your database from a particular point in time in WAL logs without impact to production, and test your app against the forked database.
Which PostgreSQL replication method is more “cloud-native”?
PostgreSQL comes with three different replication methods. As with so many things, each replication method has its pros and cons.
The third approach reconstructs a new secondary node by replaying write-ahead logs (WAL) from blob storage such as S3. As a result, reconstructing a new replica doesn’t introduce any additional load on the primary node. This enables a high-availability (HA) solution where you can easily bring up or shoot down database nodes—a property that’s beneficial in cloud-native environments.
If you’re interested in reading more posts from our team, sign up for our monthly newsletter and get the latest content delivered straight to your inbox.
Thanks for signing up.
Test test test
589 geordilaforge 9 hrs 175
Software-as-a-service (SaaS) is a billing and delivery model for software which is so superior to the traditional method for selling software licenses that it restructures businesses around itself. This has led SaaS businesses to have a distinct body of practice. Unfortunately, many entrepreneurs discover this body of practice the hard way, by making mistakes that have been made before, rather than by spending their mistake budget on newer, better mistakes.
This shouldn’t include you, so we’ll take you through a whirlwind tour of the state of play of SaaS businesses. You should gain a better understanding of the SaaS business model, be able to anticipate whether to sell your product on a low-touch or high-touch model, and (if you’re already operating a SaaS business) be able to evaluate its health and start improving it.
If you are a software entrepreneur, and you do not sell mobile applications (which have a separate billing model, imposed by the platforms’ app stores), you should thoroughly understand the business of SaaS. This will let you make better decisions for your product (and company), allow you to see business-threatening problems months or years in advance of them being obvious, and help you in communicating with investors.
Why is SaaS taking over the world?
Customers love SaaS because it “just works.” There is typically nothing to install to access it. Hardware failures and operational errors, which are extraordinarily common among machines which are not maintained by professionals, do not result in meaningful data loss. SaaS companies achieve availability numbers (for example, percent of time where the software is accessible and operating correctly) which materially improve upon the numbers achievable by almost every IT department (and every individual, full-stop).
SaaS also generally appears less expensive than software sold on other billing models, which matters for e.g. users who are not sure which software they should adopt over long terms, or who have only a short-term need for the software.
Developers love SaaS principally because of the delivery model, not the billing model.
Most SaaS is developed continuously and run on the company’s infrastructure. (There are significant exceptions in SaaS in the enterprise, but the overwhelming majority of B2C and B2B SaaS sold outside the enterprise is accessed over the internet from servers maintained by the software company.)
Software companies historically have not controlled the environments their code executes in. This is historically a major source of both development friction and customer support cases. All software deployed on customers’ hardware suffers from differences in configurations of systems, interactions with other installed software, and operator error. This has to be both accounted for in development and dealt with as a customer services issue. Companies which sell their software on both SaaS and installable models frequently see 10+ times more support requests per customer from customers who install the software locally.
Businesses and investors love SaaS because the economics of SaaS are impossibly attractive relative to selling software licenses. Revenue from SaaS is generally recurring and predictable; this makes cash flows in SaaS businesses impressively predictable, which allows businesses to plan against them and (via investors) trade future cash flows for money in the status quo, which allows them to (generously) fund present growth. This has made SaaS companies into some of the fastest growing software companies in history.
SaaS sales models
There are, broadly speaking, two ways to sell SaaS. The selling model dictates almost everything else about the SaaS company and the product, to a degree which is shocking to first-time entrepreneurs. One of the classic mistakes in SaaS, which can take years to correct, is a mismatch between a product or market and the selected model to sell it on.
You will find that the sales model for SaaS defines much more about a product (and company) than other distinctions, like whether a company sells to customers (B2C) or businesses (B2B), whether it is bootstrapped or riding the VC rocket ship trajectory, or what technology stack it is built on.
Low-touch SaaS sales
Some products sell themselves.
Low-touch SaaS is designed for the majority of customers to purchase it without sustained one-on-one interaction with a human being. The primary sales channels are the software’s website, email marketing, and (very frequently) a free trial for the software, with the trial being aggressively optimized to be very, very low-friction to start, onboard, and successfully make sustained use of the SaaS.
Low-touch products sometimes involve sales teams, but they’re frequently structured as so-called “Customer Success” teams, which are less focused on convincing people to buy the software and more on ensuring that users of the free trial successfully onboard and convert to paying users by the end of their trials.
Customer support in low-touch products is generally handled primarily in scalable fashions, by optimizing the product to avoid incidents which would require human intervention, by creating educational resources which scale across the customer base, and by using humans as a last-resort. That said, many low-touch companies have excellent customer support teams. The economics of SaaS depend on the long-term satisfaction of customers, so even a product which expects only one ticket (a countable discrete interaction with a customer) every 20 customer-months might invest comparatively heavily in their CS team.
Low-touch SaaS is generally sold on a month-to-month subscription with price points clustering around $10 for B2C applications in the $20 to $500 range for B2B. This corresponds to an average contract value (ACV) of approximately $100 to $5k. The term ACV isn’t commonly even used by low-touch SaaS businesses, which typically describe themselves by their monthly price points, but it is important to do comparisons to high-touch SaaS applications.
If you asked a low-touch SaaS entrepreneur for their most important metric, they would say MRR—monthly recurring revenue.
Basecamp is the paradigmatic example of a low-touch SaaS business. Atlassian (which makes JIRA, Trello, Confluence, and several other products) is possibly the publicly-traded company with the most success with the model.
High-touch SaaS sales
Some customers need some help in deciding whether or how to adopt certain products.
High-touch SaaS is designed around there being a human-intensive process to convince businesses to adopt the software, successfully operationalize it, and continue using it.
The beating heart of the organization is almost always the sales teams, which are often broken down into specialized roles: sales development representatives (SDRs) who find prospects for the software, account executives (AEs) who own the sales process against particular customers, and account managers (AMs) who are responsible for the happiness and continued performance of an individualized portfolio of accounts.
The sales team is typically supported by marketing, whose primary job is generating a sufficient pipeline of qualified leads for the sales team to evaluate and close.
There are many truly excellent products sold on the high-touch model, but to a first approximation, engineering and product are generally considered less important in high-touch SaaS businesses than the sales engine is.
The organization of customer support is highly variable across high-touch SaaS companies; a commonality is that it is generally expected to be heavily utilized. The number of tickets per account per period is expected to be orders of magnitude higher than it is in low-touch SaaS.
Note that while, in principle, one can make high-touch sales to consumers (for example, insurance has historically been sold primarily through commissioned agents), in SaaS, the overwhelming majority of high-touch businesses sell to businesses (B2B). Within B2B, there is a wide range of expected customer profiles, ACVs (defined variously as average contract value or annual contract value), and deal complexity.
On the low-end, SaaS sold to “small and medium sized businesses” (SMBs) on a high-touch model generally has an ACV of $6k to $15k, though this can range higher. The exact definition of an SMB varies heavily depending on who you ask; operationally, it is “any business with sufficient sophistication to successfully adopt software which costs $10,000”, which probably excludes your local flower shop but includes a dental practice with 2 partners and 4 employees.
The high end is usually called “the enterprise” and targets extremely large businesses or governments. True enterprise deals start in the six figures; there is no ceiling. (There is a $70 million ACV customer in Inovalon’s annual report, for example.)
If you asked a high-touch SaaS entrepreneur for their most important metric, they would say ARR—annually recurring revenue. (This is essentially all of the non-churned revenue of the company, minus certain non-recurring items such as one-time setup fees, consulting services, and similar. Since the economics of SaaS are attractive because of growth over time, one-off revenue, particularly comparatively low-margin one-off revenue, is not maximally interesting to entrepreneurs or investors.)
Salesforce is the paradigmatic example of a high-touch SaaS business, and they literally wrote the book on the model. Small high-touch SaaS businesses exist in multitudes, though they’re less visible than low-touch SaaS businesses, principally because visibility is a customer acquisition strategy in low-touch SaaS and not always optimal in high-touch SaaS. For example, there many small SaaS businesses which quietly make six or seven figures a year selling services to a tightly defined vertical.
Hybrid sales approaches
There exist companies which successfully run a low-touch and high-touch business with functionally the same product. They are exceedingly rare relative to SaaS businesses. The most common result of attempting both models simultaneously is that only one of the models receives any traction, and (because these models weave themselves into all operations of the company) it typically strangles the other.
A more common form of hybridization is adopting certain elements of the other sales model. For example, many low-touch SaaS businesses have customer success teams which, if you squint at them, look almost like inside sales. High-touch companies typically borrow fewer tactics than low-touch companies; the most common one is having a product that the company does not (materially) sell which they distribute in a low-touch fashion for the purpose of lead generation for the product the company actually sells.
The fundamental equation of SaaS
The SaaS model, fundamentally, works by financializing software: instead of selling software as a product with a sticker price, it sells the software as if it were a financial instrument, with a probabilistically forecastable cash flow.
There are more sophisticated ways to model a SaaS business, but the no-MBA-required version just makes a few simplifying assumptions (like ignoring the time-value of money) and uses high-school math. If you only learn one thing about SaaS, learn this equation; it is the Rosetta Stone to understanding all material facts about a SaaS business.
The core insight is really simple: one’s revenue, over the long term, is the number of customers times the average lifetime revenue per customer.
The number of customers you get is a product of two factors: acquisition (how effective you are at attracting the attention of prospects in low-touch SaaS or identifying and getting in front of them in high-touch SaaS) times your conversion rate (the percent of prospects you convert into paying customers.)
The average lifetime revenue per customer (often called lifetime value (LTV)) is the product of how much they pay you for a particular period (such as one month) and how many periods they persist using your service.
The average revenue per user (ARPU) is simply the average revenue for an account over any particular period.
The churn is the percent of customers over a given period who do not continue paying for services. For example, if you have 200 customers pay you in January and only 190 of those pay you in February, the churn would be 5%.
The lifetime of a customer can, with a few simplifiying assumptions, be calculated as the sum of an infinite geometric series; this works out to simply taking the inverse of churn. A product which loses 5% of its customers per month has an expected customer lifetime of 20 months; if it charges each customer $30 a month, it has an expected lifetime revenue of $600 per new customer signed up.
Implications of the SaaS business model
Improvements to a SaaS business are multiplicatively effective.
A 10% improvement to acquisition (via e.g. better marketing) and a 10% improvement to conversion rate (via e.g. product improvements or more effective sales techniques) sum to a 21% improvement (1.1 * 1.1), not a 20% improvement.
Improvements to a SaaS business are incredibly leveraged.
Because the margins in SaaS are so high, the long-term valuation of a SaaS business is effectively tied to some revenue of its long-term revenues. Thus, a 1% improvement in conversion rates doesn’t simply mean a 1% increase in revenue next month or even over the long term… it implies a 1% increase in enterprise value of the company.
Price is the easiest lever to improve a SaaS business.
Aquisition, conversion, and churn often require major cross-functional efforts to improve. Pricing typically requires replacing a small number with a bigger one.
SaaS businesses eventually asymptote.
Given fixed acquisition, conversion, and churn, there will be at which one’s business hits a revenue plateau. This is predictable in advance: the number of customers at the plateau is equal to acquisition times conversion divided by churn rate.
A SaaS business which loses ability to improve acquisition, conversion, or churn will, with almost mathematical certainty, stop growing. A SaaS business which stops growing before it can cover fixed costs (like e.g. salaries for the engineering team) dies ignominiously, even if they did everything right.
SaaS businesses can be capital-intensive to grow.
SaaS businesses have large front-loaded costs to grow, particularly when growing aggressively; marketing and sales dominates the marginal cost per customer and, often, the total expenditures of the company. The marketing and sales costs attributable to a particular customer occur very early in that customer’s lifecycle; the revenue to eventually pay for those costs comes later.
This means that a SaaS company optimizing for growth will almost always spend more money in a given period than they collect from customers. The money spent has to come from somewhere. Many SaaS companies choose to fund the growth via selling equity in the companies to investors. SaaS companies are particularly attractive to investors because the model is very well-understood: create a product, achieve some measure of product-market fit, spend a lot of money on marketing and sales according to a relatively repeatable playbook, and eventually sell one’s stake in the business to someone else (the public markets, an acquirer, or another investor looking for a derisked business with good growth potential).
Margins, to a first approximation, don’t matter.
Most businesses care quite a bit about their cost-of-goods-sold (COGS), the cost to satisfy a marginal customer.
While some platform businesses (like AWS) have material COGS, at the typical SaaS company, the primary source of value is the software and it can be replicated at an extremely low COGS. SaaS companies frequently spend less than 5~10% of their marginal revenue per customer on delivering the underlying service.
This allows SaaS entrepreneurs to almost ignore every factor of their unit economics except customer acquisition cost (CAC; the marginal spending on marketing and sales per customer added). If they’re quickly growing, the company can ignore every expense that doesn’t scale directly with the number of customers (i.e. engineering costs, general and administrative expenses, etc), on the assumption that growth at a sensible CAC will outrun anything on the expenses side of the ledger.
SaaS businesses take a while to grow.
While tales of so-called “hockeystick” growth curves are common in the press, the representative experience of SaaS companies is that they take a very long time dialing in the product, marketing approaches, and sales approaches before things start to work very well. This has been referred to, memorably, as the Long Slow SaaS Ramp of Death.
Growth expectations vary widely in the SaaS industry.
Bootstrapped SaaS businesses often take 18 months before they’re profitable enough to be competitive with reasonable wages for the founding team. After achieving that point, bootstrapped businesses have a wide range of acceptable outcomes for growth rates; 10~20% year over year growth rates in revenue can produce very, very happy outcomes for all concerned.
Funded SaaS businesses are designed to trade cash for growth, which means they’re designed to lose a lot of money upfront while perfecting their model; almost no funded SaaS business ever has failed at that goal.
After they perfect the model, they scale it, which generally results in losing more money, faster. That this is a successful outcome for the business is counterintuitive to many observers of the software industry. If the business can continue growing, there is no size of accumulated deficit that it cannot eventually repay. If growth does not happen, the business fails.
There exist many lower-stress businesses in life than SaaS companies being managed for aggressive growth; it’s likened to riding a rocket ship, where you burn fuel aggressively to achieve acceleration and, by the way, if anything goes wrong you explode.
The rule of thumb for growth rate expectations at a successful SaaS company being managed for aggressive growth is 3, 3, 2, 2, 2: starting from a material baseline (e.g. over $1 million in annual recurring revenue (ARR)), the business needs to triple annual revenues for two consecutive years and then double them for three consecutive years. A funded SaaS business which consistently grows by 20% per year early in its life is likely a failure in the eyes of its investors.
Benchmarks to know
One of the most popular questions for SaaS founders is “Are my numbers any good?”
This is surprisingly difficult to answer, because of the differences across industries, business models, stages of a company, and goals of founders. In general, though, experienced SaaS entrepreneurs have a few rules of thumb.
Low-touch SaaS benchmarks
Conversion rate:
Most low-touch SaaS uses a free trial, with the signup either requiring minimal information or a credit card that will be billed if the user doesn’t cancel the trial. This decision dominates the character of the free trial: users who sign in to a relatively low-friction trial may not be very serious about evaluating the software and need to affirmatively decide to purchase the software later, while users who provide a credit card number generally have done more up-front research and are, essentially, committing to pay unless they affirmatively declare they are dissatisfied with the product.
This results in cosmically different conversion rates:
Conversion rates of low-touch SaaS trials with credit card not required:
substantially below 1%: generally evidence of poor product-market fit ~1%: roughly the baseline for competent execution 2%+: extremely good Conversion rates of low-touch SaaS trials with credit card required:
substantially below 40%: generally evidence of poor product-market fit 40%: roughly the baseline for competent execution 60%: doing well! In general, requiring a credit card upfront will, on net, increase the number of new paying customers you get (it increases the trial-to-paying-customer conversion rate by more than it decreases the number of trials started). This factor reverses as a company gets increasingly sophisticated about activating free trial users (ensuring they make meaningful use of the software), typically via better in-product experiences, lifecycle email, and customer success teams.
Conversion rate (to trials):
You should measure your conversion rate between unique page views and trials started, but it isn’t the most actionable metric in your company, and it is difficult to give a good guideline for your expectations driven from this number.
Conversion rate to the trial is incredibly sensitive to whether you are attracting high-quality visitors or not. Counterintuitively, companies which are better at marketing have lower conversion rates than companies which are worse at it.
The companies with better marketing attract many more prospects, including typically a larger percentage and absolute number of prospects who are not a good fit for the offering. Companies that are worse at marketing are only discovered by the cognoscenti of their markets, who tend to be disproportionately good customers; they’re so dissatisfied with the status quo that they’re actively searching for solutions, often intensely, and they’re willing to use a no-name company if it is possibly better than their current situation. The rest of the market might not be actively looking for a solution right now, might be satisfied with going with well-known players or only those who show up prominently on Google, and might not be incentivized to take on vendor risk for dealing with a newer provider.
Churn rates:
In low-touch SaaS, most customers are on month-to-month contracts, and churn rates are quoted monthly. (Selling annual accounts is certainly a good idea, too, both for the upfront cash collected and because they have lower churn rates. When reporting churn, though, typically the impact of them is blended in to produce a monthly number.)
2%: a very sticky product, with strong product-market fit and substantial investments in reducing involuntary churn 5%: roughly where you expect to start 7%: you likely have either low-hanging fruit for preventing voluntary churn or are selling to a difficult market 10%+: Evidence of very poor product market fit and an existential threat to the company Some markets structurally have higher churn than others: selling to “pro-sumers” or informal businesses such as freelancers exposes oneself to their high rate of exiting the business, which materially impacts churn rates. More established businesses fail far less frequently and have far less need to optimize their cash flows to the last $50.
Since higher price points preferentially select for better customers, increasing prices is even more effective than entrepreneurs expect: increasing prices by 25% can result in “accidentally” decreasing churn by 20%, simply by changing the mix of customers who buy the product. This factor leads many, many low-touch SaaS businesses to march “upmarket” over time.
High-touch SaaS benchmarks
High-touch SaaS businesses generally have much, much more heterogeneity with regards to both how they measure their conversion rates (largely due to differences in how they define an “opportunity”) and in their realized conversion rates given similar definitions, due to differences in their industry, sales process, and so forth.
Churn rates, though, are closely clustered: roughly 10% annualized churn is reasonable for companies in their early years. 7% is an excellent churn rate. Note that mediocre high-touch SaaS businesses have materially lower churn rates than even the best low-touch SaaS businesses, structurally.
High-touch businesses often measure so-called “logo” churn (one business counts as one logo, regardless of how many units at that business use one’s software, how many seats they use, what they are paying, etc) and revenue churn. This is less important in low-touch SaaS, as those churn rates tend to be quite similar.
Because high-touch SaaS businesses typically price their offerings such that they can increase the amount of revenue over the lifetime of a customer, by selling more seats or by offering additional products or similar, many of them track net revenue churn, which is the difference in revenue per cohort per year. The gold-standard for a high-touch SaaS business is negative net revenue churn: the impact of upgrades, increases in contract size on a year-to-year basis, and cross-selling to existing customers exceeds the revenue impact of customers deciding to terminate (or reduce) their use of the software. (Virtually no low-touch SaaS business achieves net negative churn; their churn rates are too high to outrun.)
Product/market fit
SaaS isn’t just about the metrics. The hardest thing to put a number on early in the lifetime of a SaaS company is called product/market fit, a term coined by Marc Andreessen, which informally means “Have you found a group of people who love the thing you have built for them?”
Products which don’t have product/market fit yet are plagued by relatively low conversion rates and high churn rates. Products which achieve product/market fit often accelerate their growth rates materially, have much higher conversion rates, and are generally more pleasant to work on.
Serial SaaS entrepreneurs often struggle to describe product/market fit other than to say “If you have it you will know that you have it, and if you have any doubt whether you have it, you do not.” It’s the difference between every sales conversation being you pushing a boulder up a hill and the customer practically pulling your hand off to get your software.
Many SaaS with product/market fit did not launch with it; it sometimes takes months or years of iterating to get there. The most important theme while iterating is to talk to many, many more customers than feels natural. Low-touch SaaS entrepreneurs can make an excuse to attempt to speak with literally every person who signs up for a free trial; the economics of this are unsustainable at the price point but running a SaaS company without product/market fit is also unsustainable, so it’s entirely justified by how much you learn.
Achieving product/market fit isn’t just a matter of listening to feature requests and building those features. It is also listening closely to the commonalities of your best customers and leaning in on them. This can result in changes to the marketing, messaging, and design of the product to more closely target the needs of the best customers.
Who are the “best” customers? Generally speaking, they’re the segments (by industry, size, user profile, or similar) where you have high conversion rates, low churn rates, and (almost always) relatively higher ACV. By far the most common change in emphasis of low-touch SaaS businesses is to launch with a product which serves a wide spectrum of users at a wide spectrum of sophistication, and then double-down on one or two niches for their most sophisticated users.
Stripe Atlas is going to be publishing further guides on finding product/market fit, interviewing users, and optimizing every facet of your online business. If you’d like to hear about them, please give us your email address. If you have any thoughts about what other guides would be useful to your online business, please write us at atlas@stripe.com .
Green checkmark We sent an email to
Check your inbox to confirm your email address and we’ll let you know when we publish our next guide.
Back to guides
Test
The Business of SaaS
Software-as-a-service (SaaS) is a billing and delivery model for software which is so superior to the traditional method for selling software licenses that it restructures businesses around itself. This has led SaaS businesses to have a distinct body of practice. Unfortunately, many entrepreneurs discover this body of practice the hard way, by making mistakes that have been made before, rather than by spending their mistake budget on newer, better mistakes.
This shouldn’t include you, so we’ll take you through a whirlwind tour of the state of play of SaaS businesses. You should gain a better understanding of the SaaS business model, be able to anticipate whether to sell your product on a low-touch or high-touch model, and (if you’re already operating a SaaS business) be able to evaluate its health and start improving it.
If you are a software entrepreneur, and you do not sell mobile applications (which have a separate billing model, imposed by the platforms’ app stores), you should thoroughly understand the business of SaaS. This will let you make better decisions for your product (and company), allow you to see business-threatening problems months or years in advance of them being obvious, and help you in communicating with investors.
Why is SaaS taking over the world?
Customers love SaaS because it “just works.” There is typically nothing to install to access it. Hardware failures and operational errors, which are extraordinarily common among machines which are not maintained by professionals, do not result in meaningful data loss. SaaS companies achieve availability numbers (for example, percent of time where the software is accessible and operating correctly) which materially improve upon the numbers achievable by almost every IT department (and every individual, full-stop).
SaaS also generally appears less expensive than software sold on other billing models, which matters for e.g. users who are not sure which software they should adopt over long terms, or who have only a short-term need for the software.
Developers love SaaS principally because of the delivery model, not the billing model.
Most SaaS is developed continuously and run on the company’s infrastructure. (There are significant exceptions in SaaS in the enterprise, but the overwhelming majority of B2C and B2B SaaS sold outside the enterprise is accessed over the internet from servers maintained by the software company.)
Software companies historically have not controlled the environments their code executes in. This is historically a major source of both development friction and customer support cases. All software deployed on customers’ hardware suffers from differences in configurations of systems, interactions with other installed software, and operator error. This has to be both accounted for in development and dealt with as a customer services issue. Companies which sell their software on both SaaS and installable models frequently see 10+ times more support requests per customer from customers who install the software locally.
Businesses and investors love SaaS because the economics of SaaS are impossibly attractive relative to selling software licenses. Revenue from SaaS is generally recurring and predictable; this makes cash flows in SaaS businesses impressively predictable, which allows businesses to plan against them and (via investors) trade future cash flows for money in the status quo, which allows them to (generously) fund present growth. This has made SaaS companies into some of the fastest growing software companies in history.
SaaS sales models
There are, broadly speaking, two ways to sell SaaS. The selling model dictates almost everything else about the SaaS company and the product, to a degree which is shocking to first-time entrepreneurs. One of the classic mistakes in SaaS, which can take years to correct, is a mismatch between a product or market and the selected model to sell it on.
You will find that the sales model for SaaS defines much more about a product (and company) than other distinctions, like whether a company sells to customers (B2C) or businesses (B2B), whether it is bootstrapped or riding the VC rocket ship trajectory, or what technology stack it is built on.
Low-touch SaaS sales
Some products sell themselves.
Low-touch SaaS is designed for the majority of customers to purchase it without sustained one-on-one interaction with a human being. The primary sales channels are the software’s website, email marketing, and (very frequently) a free trial for the software, with the trial being aggressively optimized to be very, very low-friction to start, onboard, and successfully make sustained use of the SaaS.
Low-touch products sometimes involve sales teams, but they’re frequently structured as so-called “Customer Success” teams, which are less focused on convincing people to buy the software and more on ensuring that users of the free trial successfully onboard and convert to paying users by the end of their trials.
Customer support in low-touch products is generally handled primarily in scalable fashions, by optimizing the product to avoid incidents which would require human intervention, by creating educational resources which scale across the customer base, and by using humans as a last-resort. That said, many low-touch companies have excellent customer support teams. The economics of SaaS depend on the long-term satisfaction of customers, so even a product which expects only one ticket (a countable discrete interaction with a customer) every 20 customer-months might invest comparatively heavily in their CS team.
Low-touch SaaS is generally sold on a month-to-month subscription with price points clustering around $10 for B2C applications in the $20 to $500 range for B2B. This corresponds to an average contract value (ACV) of approximately $100 to $5k. The term ACV isn’t commonly even used by low-touch SaaS businesses, which typically describe themselves by their monthly price points, but it is important to do comparisons to high-touch SaaS applications.
If you asked a low-touch SaaS entrepreneur for their most important metric, they would say MRR—monthly recurring revenue.
Basecamp is the paradigmatic example of a low-touch SaaS business. Atlassian (which makes JIRA, Trello, Confluence, and several other products) is possibly the publicly-traded company with the most success with the model.
High-touch SaaS sales
Some customers need some help in deciding whether or how to adopt certain products.
High-touch SaaS is designed around there being a human-intensive process to convince businesses to adopt the software, successfully operationalize it, and continue using it.
The beating heart of the organization is almost always the sales teams, which are often broken down into specialized roles: sales development representatives (SDRs) who find prospects for the software, account executives (AEs) who own the sales process against particular customers, and account managers (AMs) who are responsible for the happiness and continued performance of an individualized portfolio of accounts.
The sales team is typically supported by marketing, whose primary job is generating a sufficient pipeline of qualified leads for the sales team to evaluate and close.
There are many truly excellent products sold on the high-touch model, but to a first approximation, engineering and product are generally considered less important in high-touch SaaS businesses than the sales engine is.
The organization of customer support is highly variable across high-touch SaaS companies; a commonality is that it is generally expected to be heavily utilized. The number of tickets per account per period is expected to be orders of magnitude higher than it is in low-touch SaaS.
Note that while, in principle, one can make high-touch sales to consumers (for example, insurance has historically been sold primarily through commissioned agents), in SaaS, the overwhelming majority of high-touch businesses sell to businesses (B2B). Within B2B, there is a wide range of expected customer profiles, ACVs (defined variously as average contract value or annual contract value), and deal complexity.
On the low-end, SaaS sold to “small and medium sized businesses” (SMBs) on a high-touch model generally has an ACV of $6k to $15k, though this can range higher. The exact definition of an SMB varies heavily depending on who you ask; operationally, it is “any business with sufficient sophistication to successfully adopt software which costs $10,000”, which probably excludes your local flower shop but includes a dental practice with 2 partners and 4 employees.
The high end is usually called “the enterprise” and targets extremely large businesses or governments. True enterprise deals start in the six figures; there is no ceiling. (There is a $70 million ACV customer in Inovalon’s annual report, for example.)
If you asked a high-touch SaaS entrepreneur for their most important metric, they would say ARR—annually recurring revenue. (This is essentially all of the non-churned revenue of the company, minus certain non-recurring items such as one-time setup fees, consulting services, and similar. Since the economics of SaaS are attractive because of growth over time, one-off revenue, particularly comparatively low-margin one-off revenue, is not maximally interesting to entrepreneurs or investors.)
Salesforce is the paradigmatic example of a high-touch SaaS business, and they literally wrote the book on the model. Small high-touch SaaS businesses exist in multitudes, though they’re less visible than low-touch SaaS businesses, principally because visibility is a customer acquisition strategy in low-touch SaaS and not always optimal in high-touch SaaS. For example, there many small SaaS businesses which quietly make six or seven figures a year selling services to a tightly defined vertical.
Hybrid sales approaches
There exist companies which successfully run a low-touch and high-touch business with functionally the same product. They are exceedingly rare relative to SaaS businesses. The most common result of attempting both models simultaneously is that only one of the models receives any traction, and (because these models weave themselves into all operations of the company) it typically strangles the other.
A more common form of hybridization is adopting certain elements of the other sales model. For example, many low-touch SaaS businesses have customer success teams which, if you squint at them, look almost like inside sales. High-touch companies typically borrow fewer tactics than low-touch companies; the most common one is having a product that the company does not (materially) sell which they distribute in a low-touch fashion for the purpose of lead generation for the product the company actually sells.
The fundamental equation of SaaS
The SaaS model, fundamentally, works by financializing software: instead of selling software as a product with a sticker price, it sells the software as if it were a financial instrument, with a probabilistically forecastable cash flow.
There are more sophisticated ways to model a SaaS business, but the no-MBA-required version just makes a few simplifying assumptions (like ignoring the time-value of money) and uses high-school math. If you only learn one thing about SaaS, learn this equation; it is the Rosetta Stone to understanding all material facts about a SaaS business.
The core insight is really simple: one’s revenue, over the long term, is the number of customers times the average lifetime revenue per customer.
The number of customers you get is a product of two factors: acquisition (how effective you are at attracting the attention of prospects in low-touch SaaS or identifying and getting in front of them in high-touch SaaS) times your conversion rate (the percent of prospects you convert into paying customers.)
The average lifetime revenue per customer (often called lifetime value (LTV)) is the product of how much they pay you for a particular period (such as one month) and how many periods they persist using your service.
The average revenue per user (ARPU) is simply the average revenue for an account over any particular period.
The churn is the percent of customers over a given period who do not continue paying for services. For example, if you have 200 customers pay you in January and only 190 of those pay you in February, the churn would be 5%.
The lifetime of a customer can, with a few simplifiying assumptions, be calculated as the sum of an infinite geometric series; this works out to simply taking the inverse of churn. A product which loses 5% of its customers per month has an expected customer lifetime of 20 months; if it charges each customer $30 a month, it has an expected lifetime revenue of $600 per new customer signed up.
Implications of the SaaS business model
Improvements to a SaaS business are multiplicatively effective.
A 10% improvement to acquisition (via e.g. better marketing) and a 10% improvement to conversion rate (via e.g. product improvements or more effective sales techniques) sum to a 21% improvement (1.1 * 1.1), not a 20% improvement.
Improvements to a SaaS business are incredibly leveraged.
Because the margins in SaaS are so high, the long-term valuation of a SaaS business is effectively tied to some revenue of its long-term revenues. Thus, a 1% improvement in conversion rates doesn’t simply mean a 1% increase in revenue next month or even over the long term… it implies a 1% increase in enterprise value of the company.
Price is the easiest lever to improve a SaaS business.
Aquisition, conversion, and churn often require major cross-functional efforts to improve. Pricing typically requires replacing a small number with a bigger one.
SaaS businesses eventually asymptote.
Given fixed acquisition, conversion, and churn, there will be at which one’s business hits a revenue plateau. This is predictable in advance: the number of customers at the plateau is equal to acquisition times conversion divided by churn rate.
A SaaS business which loses ability to improve acquisition, conversion, or churn will, with almost mathematical certainty, stop growing. A SaaS business which stops growing before it can cover fixed costs (like e.g. salaries for the engineering team) dies ignominiously, even if they did everything right.
SaaS businesses can be capital-intensive to grow.
SaaS businesses have large front-loaded costs to grow, particularly when growing aggressively; marketing and sales dominates the marginal cost per customer and, often, the total expenditures of the company. The marketing and sales costs attributable to a particular customer occur very early in that customer’s lifecycle; the revenue to eventually pay for those costs comes later.
This means that a SaaS company optimizing for growth will almost always spend more money in a given period than they collect from customers. The money spent has to come from somewhere. Many SaaS companies choose to fund the growth via selling equity in the companies to investors. SaaS companies are particularly attractive to investors because the model is very well-understood: create a product, achieve some measure of product-market fit, spend a lot of money on marketing and sales according to a relatively repeatable playbook, and eventually sell one’s stake in the business to someone else (the public markets, an acquirer, or another investor looking for a derisked business with good growth potential).
Margins, to a first approximation, don’t matter.
Most businesses care quite a bit about their cost-of-goods-sold (COGS), the cost to satisfy a marginal customer.
While some platform businesses (like AWS) have material COGS, at the typical SaaS company, the primary source of value is the software and it can be replicated at an extremely low COGS. SaaS companies frequently spend less than 5~10% of their marginal revenue per customer on delivering the underlying service.
This allows SaaS entrepreneurs to almost ignore every factor of their unit economics except customer acquisition cost (CAC; the marginal spending on marketing and sales per customer added). If they’re quickly growing, the company can ignore every expense that doesn’t scale directly with the number of customers (i.e. engineering costs, general and administrative expenses, etc), on the assumption that growth at a sensible CAC will outrun anything on the expenses side of the ledger.
SaaS businesses take a while to grow.
While tales of so-called “hockeystick” growth curves are common in the press, the representative experience of SaaS companies is that they take a very long time dialing in the product, marketing approaches, and sales approaches before things start to work very well. This has been referred to, memorably, as the Long Slow SaaS Ramp of Death.
Growth expectations vary widely in the SaaS industry.
Bootstrapped SaaS businesses often take 18 months before they’re profitable enough to be competitive with reasonable wages for the founding team. After achieving that point, bootstrapped businesses have a wide range of acceptable outcomes for growth rates; 10~20% year over year growth rates in revenue can produce very, very happy outcomes for all concerned.
Funded SaaS businesses are designed to trade cash for growth, which means they’re designed to lose a lot of money upfront while perfecting their model; almost no funded SaaS business ever has failed at that goal.
After they perfect the model, they scale it, which generally results in losing more money, faster. That this is a successful outcome for the business is counterintuitive to many observers of the software industry. If the business can continue growing, there is no size of accumulated deficit that it cannot eventually repay. If growth does not happen, the business fails.
There exist many lower-stress businesses in life than SaaS companies being managed for aggressive growth; it’s likened to riding a rocket ship, where you burn fuel aggressively to achieve acceleration and, by the way, if anything goes wrong you explode.
The rule of thumb for growth rate expectations at a successful SaaS company being managed for aggressive growth is 3, 3, 2, 2, 2: starting from a material baseline (e.g. over $1 million in annual recurring revenue (ARR)), the business needs to triple annual revenues for two consecutive years and then double them for three consecutive years. A funded SaaS business which consistently grows by 20% per year early in its life is likely a failure in the eyes of its investors.
Benchmarks to know
One of the most popular questions for SaaS founders is “Are my numbers any good?”
This is surprisingly difficult to answer, because of the differences across industries, business models, stages of a company, and goals of founders. In general, though, experienced SaaS entrepreneurs have a few rules of thumb.
Low-touch SaaS benchmarks
Conversion rate:
Most low-touch SaaS uses a free trial, with the signup either requiring minimal information or a credit card that will be billed if the user doesn’t cancel the trial. This decision dominates the character of the free trial: users who sign in to a relatively low-friction trial may not be very serious about evaluating the software and need to affirmatively decide to purchase the software later, while users who provide a credit card number generally have done more up-front research and are, essentially, committing to pay unless they affirmatively declare they are dissatisfied with the product.
This results in cosmically different conversion rates:
Conversion rates of low-touch SaaS trials with credit card not required:
substantially below 1%: generally evidence of poor product-market fit ~1%: roughly the baseline for competent execution 2%+: extremely good Conversion rates of low-touch SaaS trials with credit card required:
substantially below 40%: generally evidence of poor product-market fit 40%: roughly the baseline for competent execution 60%: doing well! In general, requiring a credit card upfront will, on net, increase the number of new paying customers you get (it increases the trial-to-paying-customer conversion rate by more than it decreases the number of trials started). This factor reverses as a company gets increasingly sophisticated about activating free trial users (ensuring they make meaningful use of the software), typically via better in-product experiences, lifecycle email, and customer success teams.
Conversion rate (to trials):
You should measure your conversion rate between unique page views and trials started, but it isn’t the most actionable metric in your company, and it is difficult to give a good guideline for your expectations driven from this number.
Conversion rate to the trial is incredibly sensitive to whether you are attracting high-quality visitors or not. Counterintuitively, companies which are better at marketing have lower conversion rates than companies which are worse at it.
The companies with better marketing attract many more prospects, including typically a larger percentage and absolute number of prospects who are not a good fit for the offering. Companies that are worse at marketing are only discovered by the cognoscenti of their markets, who tend to be disproportionately good customers; they’re so dissatisfied with the status quo that they’re actively searching for solutions, often intensely, and they’re willing to use a no-name company if it is possibly better than their current situation. The rest of the market might not be actively looking for a solution right now, might be satisfied with going with well-known players or only those who show up prominently on Google, and might not be incentivized to take on vendor risk for dealing with a newer provider.
Churn rates:
In low-touch SaaS, most customers are on month-to-month contracts, and churn rates are quoted monthly. (Selling annual accounts is certainly a good idea, too, both for the upfront cash collected and because they have lower churn rates. When reporting churn, though, typically the impact of them is blended in to produce a monthly number.)
2%: a very sticky product, with strong product-market fit and substantial investments in reducing involuntary churn 5%: roughly where you expect to start 7%: you likely have either low-hanging fruit for preventing voluntary churn or are selling to a difficult market 10%+: Evidence of very poor product market fit and an existential threat to the company Some markets structurally have higher churn than others: selling to “pro-sumers” or informal businesses such as freelancers exposes oneself to their high rate of exiting the business, which materially impacts churn rates. More established businesses fail far less frequently and have far less need to optimize their cash flows to the last $50.
Since higher price points preferentially select for better customers, increasing prices is even more effective than entrepreneurs expect: increasing prices by 25% can result in “accidentally” decreasing churn by 20%, simply by changing the mix of customers who buy the product. This factor leads many, many low-touch SaaS businesses to march “upmarket” over time.
High-touch SaaS benchmarks
High-touch SaaS businesses generally have much, much more heterogeneity with regards to both how they measure their conversion rates (largely due to differences in how they define an “opportunity”) and in their realized conversion rates given similar definitions, due to differences in their industry, sales process, and so forth.
Churn rates, though, are closely clustered: roughly 10% annualized churn is reasonable for companies in their early years. 7% is an excellent churn rate. Note that mediocre high-touch SaaS businesses have materially lower churn rates than even the best low-touch SaaS businesses, structurally.
High-touch businesses often measure so-called “logo” churn (one business counts as one logo, regardless of how many units at that business use one’s software, how many seats they use, what they are paying, etc) and revenue churn. This is less important in low-touch SaaS, as those churn rates tend to be quite similar.
Because high-touch SaaS businesses typically price their offerings such that they can increase the amount of revenue over the lifetime of a customer, by selling more seats or by offering additional products or similar, many of them track net revenue churn, which is the difference in revenue per cohort per year. The gold-standard for a high-touch SaaS business is negative net revenue churn: the impact of upgrades, increases in contract size on a year-to-year basis, and cross-selling to existing customers exceeds the revenue impact of customers deciding to terminate (or reduce) their use of the software. (Virtually no low-touch SaaS business achieves net negative churn; their churn rates are too high to outrun.)
Product/market fit
SaaS isn’t just about the metrics. The hardest thing to put a number on early in the lifetime of a SaaS company is called product/market fit, a term coined by Marc Andreessen, which informally means “Have you found a group of people who love the thing you have built for them?”
Products which don’t have product/market fit yet are plagued by relatively low conversion rates and high churn rates. Products which achieve product/market fit often accelerate their growth rates materially, have much higher conversion rates, and are generally more pleasant to work on.
Serial SaaS entrepreneurs often struggle to describe product/market fit other than to say “If you have it you will know that you have it, and if you have any doubt whether you have it, you do not.” It’s the difference between every sales conversation being you pushing a boulder up a hill and the customer practically pulling your hand off to get your software.
Many SaaS with product/market fit did not launch with it; it sometimes takes months or years of iterating to get there. The most important theme while iterating is to talk to many, many more customers than feels natural. Low-touch SaaS entrepreneurs can make an excuse to attempt to speak with literally every person who signs up for a free trial; the economics of this are unsustainable at the price point but running a SaaS company without product/market fit is also unsustainable, so it’s entirely justified by how much you learn.
Achieving product/market fit isn’t just a matter of listening to feature requests and building those features. It is also listening closely to the commonalities of your best customers and leaning in on them. This can result in changes to the marketing, messaging, and design of the product to more closely target the needs of the best customers.
Who are the “best” customers? Generally speaking, they’re the segments (by industry, size, user profile, or similar) where you have high conversion rates, low churn rates, and (almost always) relatively higher ACV. By far the most common change in emphasis of low-touch SaaS businesses is to launch with a product which serves a wide spectrum of users at a wide spectrum of sophistication, and then double-down on one or two niches for their most sophisticated users.
Stripe Atlas is going to be publishing further guides on finding product/market fit, interviewing users, and optimizing every facet of your online business. If you’d like to hear about them, please give us your email address. If you have any thoughts about what other guides would be useful to your online business, please write us at atlas@stripe.com .
Green checkmark We sent an email to
Check your inbox to confirm your email address and we’ll let you know when we publish our next guide.
Back to guides