Posts by Infoman

Infoman

「失去了母語，有如失去了文化的根源」。也許有許多移居海外的僑胞，認同這句話，也因此不論身在何方，移民家長還是希望自己的孩子能夠學習中文。除了讓他們未來能夠多一種語言優勢外，主要也是希望下一代不要「忘本」。《蘋果》在教師節（台灣的教師節為9月28日）這天，專訪兩位在美國致力於中文教學多年的資深教育家。她們不約而同表示，中國崛起讓現代華裔家長對中文的注重對比上一代有過之而無不及，而且現在中文學校越來越常見非華裔面孔。

駐洛杉磯記者：陳志

除此之外，「早期中文學校的學生多半是台灣孩子，而現在來自中國的孩子越來越多，已經超越了台灣孩子的數量」。已是從心所欲之年的曹笑蓮，在美國從事中文教育超過40年。目前擔任南加州中文學校聯合會執行委員的她，可說是海外中文教育的開創者之一，見證了移民結構這些年對中文環境的改變。她表示，雖然現在的華裔學生不只來自台灣，但中文對兩岸三地的華人子弟都一樣扮演文化傳承的關鍵角色；「唯有良好的中文基礎，才能把漢文化的精髓給傳遞下去」。

針對台灣日前鬧的沸沸揚揚的廢除文言文議題，曹笑蓮直言自己「真的不同意」。她認為，廢古文的教育議題涉及層面廣泛；然而單純從教育者的立場看來，摒除國家認同或是本土意識等敏感議題，學習文言文對於下一代在文學素養以及氣質的培養上，的確有正面的助益，「對孩子來說，不管是在寫作方面或是談吐方面，都有非常大的幫助」。曹笑蓮表示，自己收到許多家長的意見，「他們都希望我們能以僑胞身份，向台灣政府傳達保留文言文的想法」。

南加州中文學校聯合會會長黃錦雲，也認為中文教育已經成為世界上的一門顯學。「許多家長本身就是第二代ABC，許多人告訴我，他們不希望孩子像自己一樣不會中文」。黃錦雲說，當年許多第一代移民都忽略了中文的重要性，甚至刻意要孩子別學習中文來融入環境，但是後來都感到非常後悔。「這些移民長大後，發現自己失去了一個文化的根本，更重要的是，他連（中文）語言能力都很薄弱，更因此失去許多機會」。

「現在連非華裔的家長也體認到中文能力的重要性，也會把孩子送到學校來學習中文」。黃錦雲認為，從過去多是華人移民把孩子送來學習，到現在也有各種族裔面孔，都顯示中文學校的環境變遷極大。黃錦雲強調，使用中文的人口佔了全世界總數五分之一。尤其自中國崛起之後，對於中文的需求更是極高，「從市場的需求來看，只要會中文，就更有競爭力！」

針對中文教育的未來趨勢，曹笑蓮與黃錦雲都向家長喊話，希望他們能替孩子做出最好的決定。曹笑蓮說，「錢會花完，物質資產會消逝，但讓孩子學習中文有如是無形遺產，對他們有極大的助益」。而即使經過多年，兩人對中文教育的熱情與動力絲毫未減，都希望能看著下一代能傳承中華文化，也不約而同地強調，「未來也一定會繼續下去！」

2017/09/30
Infoman

DNS for Service Discovery in HAProxy

62 phil21 11 hrs 20

https://www.haproxy.com/blog/dns-service-discovery-haproxy/

http://news.ycombinator.com/item?id=15358405

HAProxy Technologies’ R&D has released a patchset that allows DNS to be utilized for service discovery in HAProxy. The patchset has already been merged into the HAProxy 1.8 development branch and will soon be backported to HAProxy Enterprise Edition 1.7r2.

HAProxy is a mature, high-performance software component that’s been reliably serving the load balancing and ADC markets for over 15 years now. Over time, the role and functionality required from HAProxy has evolved significantly. When HAProxy was first released, application architectures were quite static and very few configuration changes occurred during runtime. Nowadays, to satisfy the requirements of modern cloud and microservices architectures, an ADC must be very flexible and it must respond to changes in its environment during runtime. Typically, this includes the requirements to:

Follow servers as they move from one IP to another Enable dynamic scaling of backend servers This is especially important in environments such as Amazon AWS or Kubernetes, where a scheduler is monitoring the CPU and memory usage on individual backend servers and can decide to expand or reduce the service footprint at any given point in time.

One way to update the HAProxy configuration during runtime is through the HAProxy’s Runtime API. That topic was nicely covered in one of our previous blog posts – Dynamic Scaling for Microservices with the HAProxy Runtime API.

Another way to update the configuration during runtime is by using DNS for service discovery, which will be explained in this post.

Please note that the two solutions are not mutually exclusive: the DNS approach allows changing the servers’ statuses, IP addresses, ports, and weights. The Runtime API allows for even more elaborate configuration updates (agent related configs, health checks, maxconn, ACL and MAP statements, etc.).

An important advantage of using DNS for service discovery is that there are no explicit runtime configuration changes or configuration file updates that need to take place. With the Runtime API, custom software or a script is required to invoke the Runtime API and update the runtime and/or on-disk configuration.

Service Discovery

According to Wikipedia: “[Service discovery] is the automatic detection of devices and services offered by these devices on a computer network”.

In essence, a service registry is used to monitor the network and maintain an accurate list of nodes where individual services are available. This allows other components in the network to perform efficient service discovery even when changes in configuration occur very often (such as in cloud or microservices environments).

The methods that client components use for querying service discovery systems for information can vary. Two typical ones are:

JSON/REST interface (usually restricted to a single service discovery system) DNS – the Domain Name System Why DNS for Service Discovery?

There are a couple advantages to using DNS for service discovery:

DNS is a well established standard (happy 30th birthday in November 2017!) It makes HAProxy compatible out of the box with all service discovery systems that can expose discovery information via DNS There is no need to implement support for a custom API to discover services It is resilient. Don’t believe the claims that a failure in DNS may lead to a global infrastructure failure – it is up to the client components to deal with that The disadvantages of using DNS for service discovery are:

When using A or AAAA DNS records, only the IP address of a node can be updated When using SRV DNS records, only the IP address, network port, and server weight of a node can be updated There may be a delay between the moment a node failure is detected and the moment the clients execute their DNS service discovery queries (while most JSON/REST implementations use long-polling mechanisms to trigger updates on the clients instantaneously) DNS support in HAProxy

The following section provides a summary of the DNS related features in HAProxy.

In HAProxy 1.5, we’ve added the ability to resolve server hostnames to IP addresses using the system’s libc resolver (and the resolve.conf file). The resolution was done at startup and it couldn’t be repeated at runtime since the libc’s getaddrinfo() call is a blocking call, and it would break the even-driven nature of HAProxy.

In HAProxy 1.6, we’ve introduced a full implementation of an event-driven DNS client that was able to perform DNS resolutions and update HAProxy during runtime. The resolution that was happening during configuration parsing was still required and was done by the libc’s getaddrinfo() call mentioned above.

Only CNAME, A, and AAAA response types were supported, and the runtime resolutions were triggered by the check task (no health checks meant no runtime DNS resolution). Also, each resolution was autonomous and atomic; there was no caching and multiple DNS resolutions for the same hostname were running in parallel.

In HAProxy 1.7, we’ve enabled HAProxy to bypass the libc’s DNS resolution at configuration parsing time (see the init-addr server’s parameter).

And finally, in HAProxy 1.8 (which is expected to have a stable release in November 2017), we have introduced support for SRV query types, EDNS payload size (DNS responses up to 8K), and a dedicated timer for obsolete records. Since this version, the DNS resolution task is also autonomous and works independently of the servers’ health checks, and a DNS response cache exists (multiple objects needing to resolve the same hostname consume one query and use the same response).

DNS SRV Records

Popular service discovery systems can export data using the DNS SRV query type.

The SRV records are contained in the ANSWER section of the DNS responses and have the following structure:

_service._proto.name. TTL class SRV priority weight port target

Where the description of the fields is as follows:

_service: standard network service name (taken from /etc/services) or a port number _proto: standard protocol name (“tcp” or “udp”) name: name of the service — the name used in the query TTL: validity period for the response (this field is ignored by HAProxy as HAProxy maintains its own expiry data which is defined in the configuration) class: DNS class (“IN”) SRV: DNS record type (“SRV”) priority: priority of the target host. Lower value = higher preference (this field is ignored by HAProxy, but may become used later for indicating active / backup state) weight: relative weight in case of records with the same priority. Higher number = higher preference port: port on which the service is configured target: canonical hostname of the machine providing the service, ending in a dot Usually, the DNS server will also return the resolution for the targets mentioned, and it will provide that information in the ADDITIONAL SECTION.

Resolution in Practice

Putting the theory into practice, let’s say we have a service called “red”. It is delivering an HTTP application on port 8080, configured on four multiple servers or containers in our architecture.

To get the list of servers, HAProxy will perform a DNS query equivalent to the following shell command:

dig -t srv _http._tcp.red.domain.local

And the service discovery system will answer with the following records:

;; QUESTION SECTION: ;_http._tcp.red.domain.local. IN SRV ;; ANSWER SECTION: _http._tcp.red.domain.local. 30 IN SRV 10 25 8080 3963643338366463.red.domain.local. _http._tcp.red.domain.local. 30 IN SRV 10 25 8080 3366376637306635.red.domain.local. _http._tcp.red.domain.local. 30 IN SRV 10 25 8080 3464316362303933.red.domain.local. _http._tcp.red.domain.local. 30 IN SRV 10 25 8080 3963326437356461.red.domain.local. ;; ADDITIONAL SECTION: 3963643338366463.red.domain.local. 30 IN A 10.1.29.2 3366376637306635.red.domain.local. 30 IN A 10.1.29.3 3464316362303933.red.domain.local. 30 IN A 10.1.71.2 3963326437356461.red.domain.local. 30 IN A 10.1.71.3 Upon receiving this response, HAProxy will configure itself to the number of records returned, modulo it’s configuration (mainly the DNS related hold timers).

It may happen that a service is configured on 50, 100, or more servers. In such cases, the corresponding DNS responses will be large and exceed the default 512 bytes limit allowed by DNS. HAProxy has support for accepting EDNS payloads (DNS responses of up to 8K in size) and is parsing these answers entirely and correctly.

Conclusion

By using DNS for service discovery in HAProxy, we can conveniently scale backend servers without making explicit runtime configuration changes or configuration file updates.

If you would like to use DNS for service discovery before waiting for the stable release of HAProxy 1.8, please see our HAProxy Enterprise Edition – Trial Version or contact HAProxy Technologies for expert advice on how to best integrate the solution into your existing infrastructure.

In parallel with continuous technical work and improvements, we are also planning on publishing further blog posts describing effective use of HAProxy in microservices environments and integration of HAProxy with 3rd party tools and orchestration systems.

Stay tuned for more blog posts on using microservices with HAProxy, and happy scaling!

2017/09/29
Infoman
Strategy for People Who Want to be Founders

Your initial goal is to accumulate the prerequisites to being a founder.
1. Identify potential teammates you can work with who have the required technical skills to help you build your MVP.
2. Figure out the financial plan. I.e. Do you have enough money in savings, do you have friends/family who can provide seed investment, can you bootstrap, can you reduce your spending and save enough give yourself 6-12 months to work on your idea?
3. Identify a problem that you and your potential teammates are passionate about solving.
4. The least important (but still required) is having an idea on how of how to solve the problem.
Many people who want to be founders have one or more of these prerequisites missing. A popular mistake is trying to hustle around a missing prerequisite instead of solving the underlying issue. If your team doesn’t have the technical skills to build your MVP, don’t work with a dev shop. Make friends with people who do have these skills. Convince them to join you.

Often times there are hard barriers preventing people from starting a company. In these cases my best advice is to move to a tech hub (preferably the Bay Area) and work for a tech company until you can save the money, make friends with the right potential teammates, or discover the problem that you are passionate about. It usually takes at least 10 years to build a large and impactful company. It’s fine to delay your start date in order to give yourself the best chance of success.

Notice that one of the prerequisites isn’t experience. Experience is over-valued (not completely unimportant – but massively over-valued) by people who are thinking about starting a company. In almost all cases, no matter what knowledge you bring to the table, you will learn most of what you need to know about your problem, your customer, and the best solution after you start your company
2017/09/29
Infoman
Three Paths in the Tech Industry: Founder, Executive, or Employee

https://blog.ycombinator.com/three-paths-in-the-tech-industry-founder-executive-or-employee/

http://news.ycombinator.com/item?id=15357584

When people ask me for tech career advice I find it helpful to lay out the three paths I’ve encountered most in my career: founder, executive, and employee. I’m leaving out investor because the best path to being an investor that I’ve seen starts with being successful (or failing) at one of these three first.

Below I’ll outline what I see as the pros/cons and useful strategies for each role.

I wrote this post though because when I talk to people about their careers I’m surprised at how often they focus on only one of these paths without taking the time to consider other options. Often when they get advice, people tell them to follow the path that they followed (as a YC partner and former founder and I’m very guilty of this).

I don’t attach value judgements to these three paths. In my ten years in The Bay Area I’ve seen friends lead successful and fulfilled lives following all three.

Founder

Pro • Work on something you’re passionate about • Bring something new into the world • High level of responsibility often inspires extreme productivity • Choose the people you work with

• Learn new skills at an extremely fast pace

Con • Incredibly stressful. Even success hurts • Probably won’t maximize your personal earnings • significant financial/skillset/location hurdles to get started • Large scale success often requires decade plus commitment

• Commitment level can significantly hurt personal relationships

Strategy for People Who Want to be Founders Your initial goal is to accumulate the prerequisites to being a founder.
1. Identify potential teammates you can work with who have the required technical skills to help you build your MVP.
2. Figure out the financial plan. I.e. Do you have enough money in savings, do you have friends/family who can provide seed investment, can you bootstrap, can you reduce your spending and save enough give yourself 6-12 months to work on your idea?
3. Identify a problem that you and your potential teammates are passionate about solving.
4. The least important (but still required) is having an idea on how of how to solve the problem.
Many people who want to be founders have one or more of these prerequisites missing. A popular mistake is trying to hustle around a missing prerequisite instead of solving the underlying issue. If your team doesn’t have the technical skills to build your MVP, don’t work with a dev shop. Make friends with people who do have these skills. Convince them to join you.

Often times there are hard barriers preventing people from starting a company. In these cases my best advice is to move to a tech hub (preferably the Bay Area) and work for a tech company until you can save the money, make friends with the right potential teammates, or discover the problem that you are passionate about. It usually takes at least 10 years to build a large and impactful company. It’s fine to delay your start date in order to give yourself the best chance of success.

Notice that one of the prerequisites isn’t experience. Experience is over-valued (not completely unimportant – but massively over-valued) by people who are thinking about starting a company. In almost all cases, no matter what knowledge you bring to the table, you will learn most of what you need to know about your problem, your customer, and the best solution after you start your company.

Executive (Senior Manager at a Large Company)

Pro • Stable income/benefits/etc. • High level of prestige (only very successful founders have more prestige) • A higher likelihood of having a huge impact (given that most startups fail).

• Doesn’t require building a team and acquiring money to get started

Con • Producing results isn’t necessarily how you move up the corporate ladder. Internal politics are usually as important, if not more so. • Success can be hampered or even prevented by others inside of your organization • Requires the ability to pick companies that will be growing and successful for a long time

• Takes a long time to get a significant amount of responsibility

Strategies for People Who Want to be Executives I actually see two strategies within this path.

The first strategy is to pick a company that is growing quickly. If you do manage to pick a company like this early, then you’ll get more responsibility as the company grows – I also assume you are a friendly and productive team player. For example, if you were one of the first 100 people at Facebook and you stayed there for ten years, you would have many opportunities to become an executive. The hard part here is that picking a company that will grow rapidly for many years is extremely difficult (in many ways your task is similar to a VC).

The other path is to go to work for a more established company. The people I’ve seen do this effectively don’t think about working their way up within one company alone. They often think about moving diagonally up between a set of established name brand companies until they eventually land in an executive role. For example, you start out of college at Google, get hired at Dropbox as a team lead, move to Yahoo to become a director, move back to Google – so on and so forth.

People on the executive path either have to think like VCs and pick the a company that is going to be one of the winners over the next 10 years, or have their head on a swivel to constantly look for better and better opportunities both inside and outside of their current company.

Employee (individual contributor / middle manager)

Pro • Stable income/benefits/etc. • More work and fewer meetings • More often directly affecting the customer through your work • With a high demand skill-set you have flexibility in where/how much you work

• Often have more time to spend with friends and family

Con • Productivity can be blocked by bad management • You often don’t have control what you work on • Often don’t get a voice in major decisions – even when you “know the right answer” • It’s harder to become very wealthy • It can be boring

• If you don’t maintain a high demand skillset or your productively drops it’s easier to be fired

Strategies for People Who Want to be Employees Your strategy for picking a place to work is similar to an exec’s. You either need to spot and join a quickly growing company or find a way into a well known successful company. It’s much easier to go between brand name companies when you start with a brand name company. Also, in my experience, it’s much easier to optimize this path as a software developer.

The last thing I’ll say is it takes time to be good at each role. When you’re in college there is this idea that you should take your 20s to discover yourself and the find the work that is most enjoyable to you. The problem is that if it takes 5-10 years to truly get good at something and you spend 10 years discovering what you want to get good at…it’s going to take a long time for you to feel like a highly skilled productive person (and to recieve the rewards that come with this). It’s not that you shouldn’t explore, it’s that you need to understand the costs of that exploration is and plan accordingly.

Thanks to Daniel Gross, Aaron Harris, and Craig Cannon for reading drafts of this post.
2017/09/29
Infoman

WHAT ARE DAPPS?

The emergence of DApps

A new model for building successful and massively scalable applications is emerging. Bitcoin led the way with its open-source, peer-to-peer nature, cryptographically-stored records (block chain), and limited number of tokens that power the use of its features. In the last year dozens of applications are adopting the Bitcoin model in order to succeed. Ethereum, Omni and the SAFE Network are just a few of those “decentralized applications” that use a variety of methods to operate. Some use their own block chain (Ethereum), some use existing blockchains and issue their own tokens (Omni Layer), and others operate at two layers above an existing block chain and issue their own tokens (SAFE Network).

This paper describes why decentralized applications have the potential to be immensely successful, how the different types of decentralized applications can be classified, and introduces terminology that aims to be accurate and helpful to the community. Finally, this paper postulates that these decentralized applications will some day surpass the world’s largest software corporations in utility, user-base, and network valuation due to their superior incentivization structure, flexibility, transparency, resiliency, and distributed nature.

Definition of a DApp

For an application to be considered a Dapp (pronounced Dee-app, similar to Email) it must meet the following criteria:

The application must be completely open-source, it must operate autonomously, and with no entity controlling the majority of its tokens. The application may adapt its protocol in response to proposed improvements and market feedback but all changes must be decided by consensus of its users. The application’s data and records of operation must be cryptographically stored in a public, decentralized blockchain in order to avoid any central points of failure. The application must use a cryptographic token (Bitcoin or a token native to its system) which is necessary for access to the application and any contribution of value from (miners / farmers) should be rewarded in the application’s tokens. The application must generate tokens according to a standard crytptographic algorithm acting as a proof of the value nodes are contributing to the application (Bitcoin uses the Proof of Work Algorithm). fractals_4

Bitcoin as a DApp

Satoshi Nakamoto, the creator of Bitcoin described his invention as “A Peer-to-Peer Electronic Cash System[2]”. Bitcoin has been shown to effectively solve the problems that arise from a trust-less and scalable electronic cash system by using a peer-to-peer, distributed ledger, the Bitcoin block chain. In addition to being a peer-to-peer electronic cash system however, Bitcoin is also an application that users can interact with through computer software). But most importantly for the purposes of this paper, based on the criteria outlined above, Bitcoin is a decentralized application. Here is why:

All Bitcoin software applications are open-source, no entity (government, company, or organization) controls Bitcoin and all records related to the use of Bitcoin are open and public. Bitcoin generates its tokens, the bitcoins, with a predetermined algorithm that cannot be changed, and those tokens are necessary for Bitcoin to function. Bitcoin miners are rewarded with bitcoins for their contributions in securing the Bitcoin network. All changes to Bitcoin must be approved by a majority consensus of its users through the proof-of-work mechanism. Nomenclature and its importance

Decentralized applications were initially described as Decentralized Autonomous Corporations, DAC, in an article written by Daniel Larimer, of Invictus Innovations. This papers avoids the term corporation for two reasons.

First, because it carries with it unnecessary preconceptions. For instance, a corporation is established in a jurisdiction, it has shares, a CEO, employees, etc. DApps, like Bitcoin, have none of these characteristics. In addition, the narrative is very important for the way DApps are perceived by various nations and jurisdictions. The same way that governments struggle to learn and regulate Bitcoin because the concept of currency is associated with it, governments might be compelled to regulate an open-source computer program that is a decentralized application.

Second, because traditional corporations may engage in several techniques to raise capital (like selling shares of its stock and pay dividends or borrowing against its stock and pay interest) that a Dapp does not need. The concept of a DApp is so powerful and elegant, because it does not include these traditional corporate techniques. The ownership of the DApp’s tokens is all that is required for the holder to use the system. It’s that simple. The value of the tokens is determined by how much people value the application. All the incentives, all the monetization, all the ways to raise support are built into this beautifully simple structure. DApps are not required to recreate the functions that used to be necessary in centralized corporations in order to balance the power of shareholders and offer returns for investors and employees.

Classification of DApps

There are several characteristics according to which decentralized applications can be classified. For the purposes of this paper, we will classify Dapps based on whether they have their own block chain or they use the block chain of another Dapp. Based on this criterion, there are three types of Dapps.

Type I decentralized applications have their own block chain. Bitcoin is the most famous example of a type I decentralized application but Litecoin and other “alt-coins” are of the same type.

Type II decentralized applications use the block chain of a type I decentralized application. Type II decentralized applications are protocols and have tokens that are necessary for their function. The Omni Protocol is an example of a type II decentralized application.

Type III decentralized applications use the protocol of a type II decentralized application. Type III decentralized applications are protocols and have tokens that are necessary for their function. For example the SAFE Network that uses the Omni Protocol to issue ‘safecoins’ that can be used to acquire distributed file storage is an example of a type III decentralized application.

A useful analogy for a type I Dapp is a computer operating system (like Windows, Mac OS X, Linux, Android, iOS) for a type II Dapp a general purpose software program (like a word processor, a spreadsheet software, a file synchronization system such as Dropbox) and for type III Dapp, a specialized software solution (like a mail-merge tool that uses a word processor, an expense report macro that uses a spreadsheet, or a blogging platform that uses Dropbox.) Using this analogy, it may be expected that due to network effects and the ecosystem surrounding each decentralized application, there will be a few type I Dapps, more type II Dapps and even more type III Dapps.

At this point, it is important to mention that there are currently several excellent open-source projects that leverage type I Dapps. Colored coins and CoinJoin, for example, are based on the Bitcoin block chain and provide useful features to their users. These projects however cannot be classified as type II Dapps, according to our definition, because they don’t issue and manage a token. (The development and operation of these projects depends on donations instead.)

The operation of a DApp

fractal_1

Mechanisms for establishing consensus

There are two common mechanism by which Dapps can establish consensus: the proof-of-work, POW, mechanism and the proof of stake, POS, mechanism.

With the proof-of-work mechanism, decisions about changes in a Dapp are made based on the amount of work that each stakeholder contributes to the operation of the Dapp. Bitcoin uses that approach for its day-to-day operation. The mechanism for establishing consensus through POW is commonly called mining.

With the proof-of-stake mechanism, decisions about changes in the Dapp are made based on the percent ownership that various stakeholders have over the application. For instance, the vote of a stakeholder who controls 10% of the tokens issued by a Dapp, carries a 10% weight. The Omni Protocol is based on the POS mechanism.

The two mechanisms can be used in parallel, as is the case with Peercoin. Such a combination allows a Dapp to operate with less energy consumption than proof-of-work alone, and allows it to be more resistant to 51% attacks.

Mechanisms for distributing tokens

There are three common mechanisms by which Dapps can distribute their tokens: mining, fund-raising and development.

With the mining mechanism, tokens are distributed to those who contribute most work to the operation of a Dapp. Taking Bitcoin as an example, Bitcoins are distributed through a predetermined algorithm to the miners that verify transactions and maintain the Bitcoin block chain.

With the fund-raising mechanism, tokens are distributed to those who fund the initial development of the DApp. Taking the Master Protocol as an example, Mastercoins were initially distributed to those who sent Bitcoins to a given address at the rate of 100 Mastercoins per bitcoin sent. The Bitcoins collected were then used to fund the development of applications that promoted the development of the Master Protocol.

With the development mechanism, tokens are generated using a predefined mechanism and are only available for the development of the DApp. For example, in addition to its fund-raising mechanism, the Master Protocol used the collaboration mechanism to fund its future development. An additional 10% of the Mastercoins generated through fund-raising was set aside for development of the Master Protocol. Those Mastercoins become available through a pre-determined schedule and are distributed via a community-driven bounty system where decisions are made based on the proof-of-stake mechanism.

To summarize: Tokens of a Dapp that establishes consensus through proof-of-work are distributed by mining, by people buying directly from miners and by trading for goods and services; that is the case with Bitcoin. Tokens of a DApp that establishes consensus through proof-of-stake are distributed based on the contribution of stakeholders during a fundraiser, by people collaborating on the development of the DApp and by trading for goods and services; that is the case with the Omni Protocol.

Formation and development of a DApp

Development of decentralized applications takes place in three steps.

Step 1: A whitepaper is published describing the Dapp and its features

As in the case of Bitcoin, the most common way by which a DApp takes form is by the public release of a whitepaper that describes the protocol, its features, and its implementation. After the public release, feedback from the community is necessary for the further development of the DA.

Step 2: Initial tokens are distributed

If the DApp is using the mining mechanism to distribute its tokens, a reference software program is released so that it can be used for mining. In the case of Bitcoin, a reference software program was released and the initial transaction block was created.

If the DApp is using the fund-raising mechanism, a wallet software becomes available to the stakeholders of the DApp, so that they can exchange the tokens of the DApp. In the case of Mastercoin, an Exodus fund-raising address and a wallet script were publicly released.

If the DApp is using the development mechanism, a bounty system is put in place that allows the suggestion of tasks to be performed, the tracking of the people who are working on those tasks and the criteria by which bounties can be awarded.

Step 3: The ownership stake of the DApp is spread

As tokens from mining, fund-raising and collaboration are distributed to a greater number of participants, the ownership of the DApp becomes less and less centralized and participants that held a majority stake at an earlier point have less and less control. As the DApp matures, participants with more diverse skills are incentivized to make valuable contributions, and the ownership of the Dapp is distributed further. Through market forces the tokens of a DApp are transferred to those who value it the most. Those individuals then can contribute to the development of the DApp in the areas since they have an expertise.

digitalvisualization_5

The case of Bitcoin illustrates the point. By some estimates, Satoshi Nakamoto mined many of the first 1,000,000 bitcoins. As developers contributed code to Bitcoin and miners contributed computational power to the Bitcoin network, the market began to value bitcoins more highly. As the system matured even more, people with diverse skills started valuing Bitcoin and contributing to its development. Now that more than 12 million bitcoins are in circulation and Satoshi Nakamoto’s high original ownership stake has been diluted.

Legal model for the operation of DApps

Operating under open-source licenses allows DApps to be open for innovation without restrictions of copyright or patent. In addition, by being completely open-source, decentralized applications can operate under the legal model of open-source software. Bitcoin, for example, uses the MIT open-source software license. The Master Protocol similarly, requires all code that is based on it to be open-source and available to the community.

Issuance and holding of tokens

From a technical perspective, those issuing tokens as part of a crowd-sale are selling access to software for the users of that software. The private keys associated with the tokens that the users purchase are literally the passwords that the users need to access a Dapp’s software. From a tax perspective, those holding tokens are holding digital property. If the tokens have no market value outside of their use in the DApp, it is hard to determine their actual value.

Non-profit organization

There are no legal entities required for a DApp to operate because it is not a company. Owners of tokens do not need to be represented by a corporation and contributor do not need any specific legal entity either. However, sometimes tokens are issued by a non-profit organization that will never receive financial benefits from the DApp. Such an organization will have the following responsibilities:

Issuance of initial tokens Holding of developer tokens Management of bounty payments Determining the Dapp direction Ideally, the non-profit organization will make decisions in a decentralized manner, using a “proof of stake” voting mechanism for any decision.

Best practices for creating a DApp and Frequently Asked Questions

What qualifies a software application as a Dapp?

The application must be completely open-source, it must operate autonomously, with no entity controlling the majority of its tokens, and its data and records of operation must be cryptographically stored in a public, decentralized block chain. The application must generate tokens according to a standard algorithm or set of criteria and possibly distribute some or all of its tokens at the beginning of its operation. These tokens must be necessary for the use of the application and any contribution from users should be rewarded by payment in the application’s tokens. The application may adapt its protocol in response to proposed improvements and market feedback but all changes must be decided by majority consensus of its users. What is a token?

The purpose of a token is to allow access to a computer application. For example, an individual must own a number of Bitcoins in order to be able to perform any transaction on the Bitcoin network. Tokens in DApps do not represent any underlying asset, they do not give rights to a dividend, and no equity is represented through them. Although the value of a DApp token may increase or decrease over time, tokens are not equity securities.

fractals_16

How do tokens get distributed?

There are several ways by which the tokens of a DApp may be distributed:

Crowd-sale tokens: An initial one-time sale of tokens is a common way to initially fund a Dapp. The funds raised from such a crowd-sale should be controlled by an entity that is independent of the founders, commonly a Foundation. Developer tokens: A portion of the tokens can be set aside for developers working on the project. As the market sets a valuation for the project, the developer tokens will gain value, attracting additional contributors to the project. Premined tokens: It is best if no tokens are premined because most communities and investors are negatively predisposed to it. A premine may be successful only if a meaningful reason is provided by the founders. Minable tokens: Distribution of tokens by mining incentivizes the community to contribute resources to the DApp. In Bitcoin for example, there is a block reward every ten minutes, that incentivizes miners to provide hashing power to Bitcoin. Similarly, DApps need to determine how to incentivize the network to contribute the required resource as this is the most important decision about the token distribution. How do I start developing a DApp?

To develop a Dapp it is advised to follow these steps:

Create a whitepaper that has at least the following sections: Intentions and goals of the DApp Plans for token distribution Mechanism for establishing consensus Structure of the non-profit that oversees the DApp Management of developer bounties Technical description of the DApp Gain community engagement by releasing the plan and by revising it based on feedback. Set a date when the community can contribute to the crowd-sale. Sell the initial tokens based on your whitepaper and establish a non-profit to oversee the development of the DApp. Begin executing your idea while the non-profit plans future development. Why is a DApp a profitable model for developers, users and contributors?

The model allows contributors to get involved with the project as purchasers of tokens, as project contributors or as providers of resources to the network. All of these contributors benefit from the exchange of the tokens and from the possible appreciation of their value.

What is a user-behavior reward?

A user-behavior reward is given to contributors that provide utility to the network. The whitepaper should outline what constitutes utility for the DApp. (For example, hashing power is utility on the Bitcoin network and it is rewarded.) Utility should be measurable, like in the case of a data storage DApp, amount of storage is measurable.

The current state of type II and III Dapps

One mechanism by which type II Dapps can leverage the block chain of type I Dapps is by embedding additional data to the transactions taking place in the type I DA. The Master Protocol, for example, embeds additional data on the transactions of the Bitcoin network. Although currently (February 2014) Mastercoin embeds its additional data in an ad-hoc way into the Bitcoin block chain, the release of the 0.9 version of the Bitcoin reference client will provide a standard method for that embedding. By using the methodology of “provably prune-able outputs,” type II decentralized applications that are based on Bitcoin will be able to embed data in a systematic way and Bitcoin miners will have the option to prune those data.

Given this development, several type III Dapps are in various stages of development. They include:

MaidSafe provides a “proof of resource” mechanism and decentralized data structure for storing files privately or publicly in the cloud. StorJ provides a front-end, Dropbox-like cloud storage of files utilizing MaidSafe and other systems in the back-end. Ethereum provides consensus-based scripting and computing resources. OpenGarden provides mesh network-based Internet services. Scalion provides an incentivized version of the Tor Network with nodes serving as Tor relays and exits. Shared Miles provides a proof of transportation mechanism that allows for an open source transportation standard. BlockAuth provides a multi signature OAuth-style system for sharing private data with third parties. API Protocol provides an open source standard for hosting, normalizing, and sharing API data. Conclusion

DApps have the potential to become self-sustaining because they empower their stakeholders to invest in the development of the DApp. Because of that, it is conceivable that DApps for payments, data storage, bandwidth and cloud computing may one day surpass the valuation of multinational corporations like Visa, Dropbox, Comcast, and Amazon that are are currently active in the space.

Appendix

A proposed metaphor for DApps

It would be beneficial to have a well-grounded and easily accessible metaphor for Dapps. Such a metaphor would ideally have the virtue of simplexity, so that it could be used for human-computer interfaces.

fractals_18

Such a metaphor could be a zygote. A zygote is the point where one biological cell generation ends and the next one begins. A zygote acclimates, and it responds to the outside world without changing its genes, it cannot be regulated, it is stuck with its own genes and its recursive. The zygote is autonomous because it is stuck with its own genes, it is an application because it is a cell, it is distributed, and it is authorized to act as a single entity from other cells; it shares, in other words many of the characteristics of a decentralized application.

Johnston’s Law

“Everything that can be decentralized, will be decentralized”. David A. Johnston

Based on the economic and efficiency advantages of decentralized applications its clear that existing centralized services will be displaced over time by decentralized alternatives. This shift is likely to come most quickly for services in which the network effect advantages of Metcalfe’s Law are most critical to the success of the service provider.

Authors: David Johnston, Sam Onat Yilmaz, Jeremy Kandah, Nikos Bentenitis, Farzad Hashemi, Ron Gross, Shawn Wilkinson and Steven Mason

Taken from: https://github.com/DavidJohnstonCEO/DecentralizedApplications/blob/master/README.md

2017/09/29
Infoman

Can an App Really Teach You to Sing?

Apps try to teach everything from Spanish to guitar, but not everyone benefits.

by Michelle Cheng September 28, 2017 Four years after quitting choir, Mansi Sidana, 19, wanted to brush up on her singing. Instead of rejoining her group, Sidana chose to use a free app called Vanido that bills itself as a “personal singing coach.” Five months in, Sidana is still using it.

Vanido, which detects Sidana’s pitch and gives feedback in real time, creates daily personalized exercises to help users like her improve their voices and recognize music notes. Since its launch in January, Vanido has gained over 40,000 users who have collectively completed over two million exercises on the app, according to co-creator Himanshu Singh; more importantly, from Sidana’s perspective, it has also helped her improve her voice.

Vanido is just one of many apps promising to help smartphone users refine a skill or learn an entirely new one. Many of these apps have now been on the market for several years, like the popular language learning service Duolingo, while Vanido and others like guitar-teaching Fender Play are just entering the fray.

Subscribe to Weekend Reads Our guide to stories in the archives that put technology in perspective. Sign Up Manage your newsletter preferences Yet while they can be fun, can these apps really teach you how to sing, learn a new language, or play guitar? The answer, according to Victor Lee, a professor of instructional technology and learning sciences at Utah State University, is that it depends on how well the content is presented, and how much the user interacts with the app.

And when they are successful, learning experts say, this is because they are designed for a specific audience: self-motivated or novice learners.

Learning via app has worked for Brandon Elam, 38, a self-described “do-it-yourself type” who wanted to be able to listen to songs on the radio and then pick up the guitar and play songs by ear. Elam first tried learning from a book but that didn’t work for him. Since July, he has been using the apps Justin Guitar and Ultimate Guitar’s Tab Pro every day, and he says that he can see he can see himself “improve over the months.”

While highly motivated people tend to gain the most, as Lee points out, most of us are not all that motivated. An ongoing issue these apps face is face is figuring out how to keep people compelled to continue using them—something that massive open online courses, often referred to as MOOCs, have struggled with as well.

Learning apps, however, may be more promising. Apps like Duolingo and Vanido use gaming tactics—one example being so-called “streaks” that track users’ daily log of the app—to keep users engaged.

Another problem many apps face is that they lack a social component. While Vanido is designed to help users improve their singing skills, how do users actually know that they are improving without personal feedback and encouragement from a teacher, coach, or mentor?

Jan Plass, a professor at NYU Steinhardt whose work focuses on designing simulations and games for learning, also mentions that apps are still not very adept at accepting speech from users, especially when they give a creative response while speaking to the app. But, as Vanido demonstrates, that could change with AI creating more personalized lessons.

Mobile learning still has a long way to go, but experts say that the nature of apps has taken learning out of the traditional context—and they see that as a good thing.

“They empower learners to get what they need to learn wherever they are,” says Plass.

2017/09/28
Infoman

Battery-Free Cell Phone

114 jacquesm 10 hrs 36

http://www.techradar.com/news/this-phone-needs-no-battery

Http://news.ycombinator.com/item?id=15350799

With festival season in full swing, many of you no doubt wonder on a regular basis why smartphone batteries aren't getting better. The answer is simple - they are getting better, but manufacturers choose to use that power to bump up specs rather than delivering improved battery life.

Now, however, engineers at the University of Washington have taken a very interesting leap into a totally different way of powering a phone. They've used commercial, off-the-shelf components to design a cellphone that can make and receive calls without a battery at all.

“We’ve built what we believe is the first functioning cellphone that consumes almost zero power,” said Shyam Gollakota, an associate professor in the Paul G. Allen School of Computer Science & Engineering at the UW and co-author on a paper describing the technology.

Instead, the phone pulls power from its environment - either from ambient radio signals harvested by an antenna, or ambient light collected by a solar cell the size of a grain of rice. The device consumes just 3.5 microwatts of power during use.

“To achieve the really, really low power consumption that you need to run a phone by harvesting energy from the environment, we had to fundamentally rethink how these devices are designed," said Gollakota.

Analog to digital

The biggest step was eliminating the analog-to-digital converter that turns your voice into data. Instead, they took advantage of tiny vibrations that happen in a microphone when a person is talking into it, or a speaker when someone is listening.

An antenna converts that motion into radio signals in a way that uses almost no power. The only downside is that the user has to press a button to switch between speaking and listening modes. In testing, the team was able to make calls and even use Skype.

Right now the battery-free phone needs a custom base station to transmit and receive calls. But the team says there's no reason why the technology couldn't be integrated into standard mobile network infrastructure.

“You could imagine in the future that all cell towers or Wi-Fi routers could come with our base station technology embedded in it,” said co-author Vamsi Talla, a former UW electrical engineering doctoral student and Allen School research associate.

“And if every house has a Wi-Fi router in it, you could get battery-free cellphone coverage everywhere."

Nasa is building satellites to deflect asteroids from Earth impact

2017/09/28
Infoman

Technology preview: Private contact discovery for Signal

403 stablemap 21 hrs 131

https://signal.org/blog/private-contact-discovery

http://news.ycombinator.com/item?id=15340729

At Signal, we’ve been thinking about the difficulty of private contact discovery for a long time. We’ve been working on strategies to improve our current design, and today we’ve published a new private contact discovery service.

Using this service, Signal clients will be able to efficiently and scalably determine whether the contacts in their address book are Signal users without revealing the contacts in their address book to the Signal service.

Background

Signal is social software. We don’t believe that privacy is about austerity, or that a culture of sharing and communication should mean that privacy is a thing of the past. We want to enable online social interactions that are rich and expressive in all the ways that people desire, while simultaneously ensuring that communication is only visible to its intended recipients.

This squares technology with intent: when someone shares a photo with their friends, their intent is to share it with their friends. Not the service operator, ad networks, hackers, or governments.

Social software needs a social graph

The building block of almost all social software is the social graph. For people to be able to use software like Signal, they have to know how to contact their friends and colleagues using Signal.

This raises two related concerns:

Building a social graph isn’t easy. Social networks have value proportional to their size, so participants aren’t motivated to join new social networks which aren’t already large. Very few people want to install a communication app, open the compose screen for the first time, and be met by an empty list of who they can communicate with. We don’t want the Signal service to have visibility into the social graph of Signal users. Signal is always aspiring to be as “zero knowledge” as possible, and having a durable record of every user’s friends and contacts on our servers would obviously not be privacy preserving. Almost all social software has to deal with the first problem. Most solve it by leveraging an existing social graph, such as Facebook (“Sign in with Facebook”), but that means the social graph is “owned” by Facebook.

Instead, Signal began by using the social graph that already lives on everyone’s phones: the address book. Rather than a centralized social graph owned by someone else, the address book is distributed and user-owned. Additionally, having the social graph already on the device means that the Signal service doesn’t need to store a copy of it. Any time someone installs or reinstalls Signal, their social graph is already available locally.

In order to turn the address book on a device into a social graph for Signal, the Signal client needs to be able to determine which of the device’s contacts are Signal users. In other words, the client needs to ask the service “Here are all my contacts, which of them are Signal users?”

Since we don’t want the Signal service to know the contents of a client’s address book, this question needs to be asked in a privacy preserving way.

Traditionally, in Signal that process has looked like:

The client calculates the truncated SHA256 hash of each phone number in the device’s address book. The client transmits those truncated hashes to the service. The service does a lookup from a set of hashed registered users. The service returns the intersection of registered users. The obvious problem with this method is that the hash of a user identifier can almost always be inverted. Regardless of whether the identifier is a phone number, a user name, or an email address, the “keyspace” of all possible identifiers is too small.

This method of contact discovery isn’t ideal because of these shortcomings, but at the very least the Signal service’s design does not depend on knowledge of a user’s social graph in order to function. This has meant that if you trust the Signal service to be running the published server source code, then the Signal service has no durable knowledge of a user’s social graph if it is hacked or subpoenaed.

Trust but verify

Of course, what if that’s not the source code that’s actually running? After all, we could surreptitiously modify the service to log users’ contact discovery requests. Even if we have no motive to do that, someone who hacks the Signal service could potentially modify the code so that it logs user contact discovery requests, or (although unlikely given present law) some government agency could show up and require us to change the service so that it logs contact discovery requests. More fundamentally for us, we simply don’t want people to have to trust us. That’s not what privacy is about.

Doing better is difficult. There are a range of options that don’t work, like using bloom filters, encrypted bloom filters, sharded bloom filters, private information retrieval, or private set intersection. What if, instead, there were just a way for clients to verify that the code running on our servers was the code they wanted to be running on our servers, and not something that we or a third party had modified?

Modern Intel chips support a feature called Software Guard Extensions (SGX). SGX allows applications to provision a “secure enclave” that is isolated from the host operating system and kernel, similar to technologies like ARM’s TrustZone. SGX enclaves also support a feature called remote attestation. Remote attestation provides a cryptographic guarantee of the code that is running in a remote enclave over a network.

Originally designed for DRM applications, most SGX examples imagine an SGX enclave running on a client. This would allow a server to stream media content to a client enclave with the assurance that the client software requesting the media is the “authentic” software that will play the media only once, instead of custom software that reverse engineered the network API call and will publish the media as a torrent instead.

However, we can invert the traditional SGX relationship to run a secure enclave on the server. An SGX enclave on the server-side would enable a service to perform computations on encrypted client data without learning the content of the data or the result of the computation.

Private contact discovery using SGX is fairly simple at a high level:

Run a contact discovery service in a secure SGX enclave. Clients that wish to perform contact discovery negotiate a secure connection over the network all the way through the remote OS to the enclave. Clients perform remote attestation to ensure that the code which is running in the enclave is the same as the expected published open source code. Clients transmit the encrypted identifiers from their address book to the enclave. The enclave looks up a client’s contacts in the set of all registered users and encrypts the results back to the client. Since the enclave attests to the software that’s running remotely, and since the remote server and OS have no visibility into the enclave, the service learns nothing about the contents of the client request. It’s almost as if the client is executing the query locally on the client device.

Unfortunately, doing private computation in an SGX enclave is more difficult than it may initially seem.

Building a secure SGX service

An SGX enclave runs in hardware encrypted RAM, which prevents the host OS from being able to see an enclave’s memory contents (such as the contact information a client transmits to the enclave). However, the host OS can still see memory access patterns, even if the OS can’t see the contents of the memory being accessed.

That presents a number of potentially fatal problems. For example, consider an enclave function that is written as follows:

private List<long> findRegisteredUsers(Set<long> registeredUsers, List<long> clientContacts) { List<long> results = new LinkedList<>(); for (long clientContact : clientContacts) { if (registeredUsers.contains(clientContact)) { results.add(clientContact); } } return results; } The above code iterates over the list of contacts a client submits and checks a local hash table of all registered users in order to determine whether each of the client’s contacts is a registered user or not.

Even with encrypted RAM, the server OS can learn enough from observing memory access patterns in this function to determine the plaintext values of the contacts the client transmitted!

Consider that the OS probably knows the layout of the hash table containing all registered users. This is because it can watch it being built (the enclave has to get the list of all registered users from the OS or some other untrusted source), and those values need to be able to change in real time (also provided by an untrusted source) as new users sign up. Even if the OS didn’t know the layout of the hash table, it would still be able to map it out by submitting its own requests to the enclave through the normal enclave interface and observing where the enclave reads into the hash table for those known values.

With that knowledge, the OS only has to watch which hash table memory addresses the enclave reads from when processing a client request.

It gets worse! The amount of encrypted RAM available to an enclave is limited to 128MB, which isn’t enough to store a large data set of registered users. SGX makes it possible to store the set of all registered users outside the enclave and access that memory from within the enclave, but then the OS really knows what the layout of the registered users hash table is, since they’re not even in encrypted RAM.

Oblivious RAM

Given the above example, it’s clear that standard solutions deployed inside an SGX enclave won’t be enough to provide meaningful protection for what we’re trying to build. This class of problems has been studied under the discipline of Oblivious RAM (ORAM). The objective of ORAM is to define a CPU<->RAM interface, such that the RAM behaves like RAM by retrieving memory for the CPU, but the RAM learns nothing about the memory access pattern of the CPU.

There are some elegant generalized ORAM techniques, like Path ORAM, but unfortunately they don’t work well for this problem. Most perform best when applications have a relatively small number of keys that map to large values, whereas we have an extremely large number of keys and zero-sized values. The more complicated attempts to provide generalized ORAM for data sets like ours, such as Recursive Path ORAM, don’t scale very well and are difficult to build for concurrent access.

If we think about solutions specific to our problem, one possibility is simply to make things inefficient:

private List<long> findRegisteredUsers(List<long> registeredUsers, List<long> clientContacts) { List<long> results = new LinkedList<>(); for (long clientContact : clientContacts) { for (long registeredUser : registeredUsers) { if (registeredUser == clientContact) { results.add(clientContact); } } } return results; } This function still has problems, but it solves our initial concern by linearizing everything. Instead of a lookup into a hash table of all registered users, this code does a full linear scan across the entire data set of all registered users for every client contact submitted. By touching the entire memory space consistently for each client contact, the OS can’t learn anything from the access patterns. However, with a billion registered users, this would obviously be way too slow.

It’s possible to speed up by inverting the original approach instead:

private List<long> findRegisteredUsers(List<long> registeredUsers, Set<long> clientContacts) { List<long> results = new LinkedList<>(); for (long registeredUser : registeredUsers) { if (clientContacts.contains(registeredUser)) { results.add(registeredUser); } } return results; } This is much faster. The above code still iterates across the entire set of registered users, but it only does so once for the entire collection of submitted client contacts. By keeping one big linear scan over the registered user data set, access to unencrypted RAM remains “oblivious,” since the OS will simply see the enclave touch every item once for each contact discovery request.

The full linear scan is fairly high latency, but by batching many pending client requests together, it can be high throughput.

However, there are still problems. If the OS knew the content or the layout of the clientContacts hash table, this wouldn’t be oblivious.

Oblivious hash table construction

The “normal” way to construct a hash table of client contacts would be:

private Set<long> constructClientContactsTable(List<long> clientContacts) { Set<long> results = new HahsSet<>(); for (long clientContact : clientContacts) { results.add(clientContact); } } The above code iterates over the list of client contacts that we receive in an encrypted request and puts them in a hash table.

Obviously, this won’t work, since it reveals to the OS which hash buckets contain elements (the hash table memory addresses that received writes) and which are empty (the hash table memory addresses that didn’t receive writes). As the findRegisteredUsers function iterates over the list of all registered users, the OS knows which registered user the enclave is checking (it’s in unencrypted memory), and can then observe whether the read into the clientContacts hash table references a memory address that it saw a write to during the constructClientContactsTable process.

However, if the clientContacts hash table could be constructed obliviously instead, we wouldn’t have to worry about non-oblivious accesses to it.

There are two primary ways that observers from outside the enclave can monitor memory access patterns: page faults and memory bus snooping. In the former, the OS causes every memory access to page fault, which reveals (only) the address of the page the memory address is in. That results in a 4KB level of memory access granularity the OS is able to observe. However, if someone were to physically attach an FPGA to the memory bus, they could observe memory access at the granularity of a cache line.

To make hash table construction oblivious, we start with a bucketed hash table that looks like this:

|Bucket 0| |Bucket 1| ... |Bucket N| -------- -------- -------- |B01||B01| |B01||B01| |B01||B01| --- --- --- --- --- --- |B02||B02| |B02||B02| |B02||B02| --- --- --- --- --- --- |...||...| |...||...| |...||...| --- --- --- --- --- --- |B64||B64| |B64||B64| |B64||B64| --- --- --- --- --- --- Each logical “bucket” (Bucket 1, Bucket 2, …, Bucket N) of the hash table is composed of two cache lines (B1, B2, … B64), one for storing client queries and one for storing lookup results. In order to populate the hash table with all of the client contacts in the request batch, the enclave iterates over each cache line in the hash table:

for (requestCacheLine : requestCacheLines) { for (clientContact : clientContacts) { if (getHashIndex(clientContact) == requestCacheLine) { requestCacheLine[nextAvailable] = clientContact; } else { requestCacheLine[dummySlot] = clientContact; } } } Rather than iterating over the client contacts and adding them to the appropriate place in the hash table, the above code iterates over every cache line, and then checks to see which client contacts should be stored there. If there are N total client contacts in the full request batch, each cache line in the hash table receives N writes during this process.

This is a very inefficient way to construct a hash table, but after the hash table is constructed, the OS is left with no knowledge of what the hash table contains or how it is laid out. This means that it can be used just like a normal hash table in the critical path of the contact discovery lookup without revealing anything critical to observers.

Basic recipe

There are many other concerns when building a secure enclave. Since branches can potentially be observed through memory access patterns, critical sections should not contain branches. Memory access patterns should always look the same for all outcomes, with care and attention similar to “constant time” programming considerations in cryptographic software.

Using the oblivious hash table construction method above, we can put together a general recipe for doing private contact discovery in SGX without leaking any information to parties that have control over the machine, even if they were to attach physical hardware to the memory bus:

The contact discovery service passes a batch of encrypted contact discovery requests to the enclave. The enclave decrypts the requests and uses the oblivious hash table construction process to build a hash table containing all submitted contacts. The enclave iterates over the list of all registered users. For each registered user, the enclave indexes into the hash table containing the batch of client contacts and does a constant-time comparison against every contact in that cache line. The enclave writes to the “results” cache line for that same hash index, regardless of whether there was a successful compare or not. After iterating through the entire set of registered users, the enclave builds an oblivious response list for each client request in the batch. The enclave encrypts the appropriate response list to each requesting client, and returns the batch results to the service. The service transmits the encrypted response back to each requesting client. By pushing the inefficiencies into the setup and teardown sections of batch processing, the critical section remains fast enough for the entire process to scale to over a billion users.

Oblivious limitations

The OS is still capable of learning minor things from careful observation. The hash table containing a batch of client contacts is composed of buckets that each contain space for 12 contacts. Using the oblivious hash table construction process, the OS does not know how many elements (if any) are in a bucket, or what those elements may be. However, the OS does know that there can’t be more than 12 contacts in a hash bucket.

As the enclave iterates across the set of all registered users while processing a batch, the OS might see the enclave index into the cache lines for a single bucket more than 12 times. If there were 13 users that correspond to the same hash index, the OS knows that it’s not possible for all 13 of those users to have been in the client request, since they wouldn’t have fit into the hash bucket. However, the OS doesn’t learn whether any subset of them were or were not in the client request, which is the oblivious property that we desire.

View source

We’ve put all of this together into a full contact discovery service, scalable to billions of users, that allows Signal users to discover their social graph without revealing their contacts to the Signal service. Like everything we build, the contact discovery service is open source.

The enclave code builds reproducibly, so anyone can verify that the published source code corresponds to the MRENCLAVE value of the remote enclave.

Check it out and let us know what you think! This is a beta technology preview, but we’ll be deploying the service into production and integrating it into clients as we finish testing over the next few months.

Acknowledgments

Huge thanks to Jeff Griffin for doing the heavy lifting and writing all the enclave code, Henry Corrigan-Gibbs for introducing us to SGX, Raluca Ada Popa for explaining ORAM state of the art to us, and Nolan Leake for systems insight.

2017/09/27
Infoman

React 16

https://facebook.github.io/react/blog/2017/09/26/react-v16.0.html

http://news.ycombinator.com/item?id=15339983

We're excited to announce the release of React v16.0! Among the changes are some long-standing feature requests, including fragments, error boundaries, portals, support for custom DOM attributes, improved server-side rendering, and reduced file size.

New render return types: fragments and strings

You can now return an array of elements from a component's render method. Like with other arrays, you'll need to add a key to each element to avoid the key warning:

render() { // No need to wrap list items in an extra element! return [ // Don't forget the keys :)
First item
Second item
Third item

We've added support for returning strings, too:

render() { return 'Look ma, no spans!'; } See the full list of supported return types.

Better error handling

Previously, runtime errors during rendering could put React in a broken state, producing cryptic error messages and requiring a page refresh to recover. To address this problem, React 16 uses a more resilient error-handling strategy. By default, if an error is thrown inside a component's render or lifecycle methods, the whole component tree is unmounted from the root. This prevents the display of corrupted data. However, it's probably not the ideal user experience.

Instead of unmounting the whole app every time there's an error, you can use error boundaries. Error boundaries are special components that capture errors inside their subtree and display a fallback UI in its place. Think of error boundaries like try-catch statements, but for React components.

For more details, check out our previous post on error handling in React 16.

Portals

Portals provide a first-class way to render children into a DOM node that exists outside the DOM hierarchy of the parent component.

render() { // React does not create a new div. It renders the children into domNode. // domNode is any valid DOM node, regardless of its location in the DOM. return ReactDOM.createPortal( this.props.children, domNode, ); } See a full example in the documentation for portals.

Better server-side rendering

React 16 includes a completely rewritten server renderer. It's really fast. It supports streaming, so you can start sending bytes to the client faster. And thanks to a new packaging strategy that compiles away process.env checks (Believe it or not, reading process.env in Node is really slow!), you no longer need to bundle React to get good server-rendering performance.

Core team member Sasha Aickin wrote a great article describing React 16's SSR improvements. According to Sasha's synthetic benchmarks, server rendering in React 16 is roughly three times faster than React 15. "When comparing against React 15 with process.env compiled out, there’s about a 2.4x improvement in Node 4, about a 3x performance improvement in Node 6, and a full 3.8x improvement in the new Node 8.4 release. And if you compare against React 15 without compilation, React 16 has a full order of magnitude gain in SSR in the latest version of Node!" (As Sasha points out, please be aware that these numbers are based on synthetic benchmarks and may not reflect real-world performance.)

In addition, React 16 is better at hydrating server-rendered HTML once it reaches the client. It no longer requires the initial render to exactly match the result from the server. Instead, it will attempt to reuse as much of the existing DOM as possible. No more checksums! In general, we don't recommend that you render different content on the client versus the server, but it can be useful in some cases (e.g. timestamps).

See the documentation for ReactDOMServer for more details.

Support for custom DOM attributes

Instead of ignoring unrecognized HTML and SVG attributes, React will now pass them through to the DOM. This has the added benefit of allowing us to get rid of most of React's attribute whitelist, resulting in reduced file sizes.

Reduced file size

Despite all these additions, React 16 is actually smaller compared to 15.6.1!

react is 5.3 kb (2.2 kb gzipped), down from 20.7 kb (6.9 kb gzipped). react-dom is 103.7 kb (32.6 kb gzipped), down from 141 kb (42.9 kb gzipped). react + react-dom is 109 kb (34.8 kb gzipped), down from 161.7 kb (49.8 kb gzipped). That amounts to a combined 32% size decrease compared to the previous version (30% post-gzip).

The size difference is partly attributable to a change in packaging. React now uses Rollup to create flat bundles for each of its different target formats, resulting in both size and runtime performance wins. The flat bundle format also means that React's impact on bundle size is roughly consistent regardless of how your ship your app, whether it's with Webpack, Browserify, the pre-built UMD bundles, or any other system.

MIT licensed

In case you missed it, React 16 is available under the MIT license. We've also published React 15.6.2 under MIT, for those who are unable to upgrade immediately.

New core architecture

React 16 is the first version of React built on top of a new core architecture, codenamed "Fiber." You can read all about this project over on Facebook's engineering blog. (Spoiler: we rewrote React!)

Fiber is responsible for most of the new features in React 16, like error boundaries and fragments. Over the next few releases, you can expect more new features as we begin to unlock the full potential of React.

Perhaps the most exciting area we're working on is async rendering—a strategy for cooperatively scheduling rendering work by periodically yielding execution to the browser. The upshot is that, with async rendering, apps are more responsive because React avoids blocking the main thread.

This demo provides an early peek at the types of problems async rendering can solve:

Ever wonder what "async rendering" means? Here's a demo of how to coordinate an async React tree with non-React work https://t.co/3snoahB3uV pic.twitter.com/egQ988gBjR

— Andrew Clark (@acdlite) September 18, 2017 Tip: Pay attention to the spinning black square.

We think async rendering is a big deal, and represents the future of React. To make migration to v16.0 as smooth as possible, we're not enabling any async features yet, but we're excited to start rolling them out in the coming months. Stay tuned!

Installation

React v16.0.0 is available on the npm registry.

To install React 16 with Yarn, run:

yarn add react@^16.0.0 react-dom@^16.0.0 To install React 16 with npm, run:

npm install --save react@^16.0.0 react-dom@^16.0.0 We also provide UMD builds of React via a CDN: <script crossorigin="" src="<a href=" https:="" <a="" href="http://unpkg.com" rel="nofollow">unpkg.com="" react@16="" umd="" react.production.min.js"="" rel="nofollow">https://unpkg.com/react@16/umd/react.production.min.js"></script> <script crossorigin="" src="<a href=" https:="" <a="" href="<a href=" http:="" unpkg.com"="" rel="nofollow">http://unpkg.com" rel="nofollow">unpkg.com="" react-dom@16="" umd="" react-dom.production.min.js"="" rel="nofollow">https://unpkg.com/react-dom@16/umd/react-dom.production.min.js"></script>

Refer to the documentation for detailed installation instructions.

Upgrading

Although React 16 includes significant internal changes, in terms of upgrading, you can think of this like any other major React release. We've been serving React 16 to Facebook and Messenger.com users since earlier this year, and we released several beta and release candidate versions to flush out additional issues. With minor exceptions, if your app runs in 15.6 without any warnings, it should work in 16.

New deprecations

Hydrating a server-rendered container now has an explicit API. If you're reviving server-rendered HTML, use ReactDOM.hydrate instead of ReactDOM.render. Keep using ReactDOM.render if you're just doing client-side rendering.

React Addons

As previously announced, we've discontinued support for React Addons. We expect the latest version of each addon (except react-addons-perf; see below) to work for the foreseeable future, but we won't publish additional updates.

Refer to the previous announcement for suggestions on how to migrate.

react-addons-perf no longer works at all in React 16. It's likely that we'll release a new version of this tool in the future. In the meantime, you can use your browser's performance tools to profile React components.

Breaking changes

React 16 includes a number of small breaking changes. These only affect uncommon use cases and we don't expect them to break most apps.

React 15 had limited, undocumented support for error boundaries using unstable_handleError. This method has been renamed to componentDidCatch. You can use a codemod to automatically migrate to the new API. ReactDOM.render and ReactDOM.unstable_renderIntoContainer now return null if called from inside a lifecycle method. To work around this, you can use portals or refs. setState: Calling setState with null no longer triggers an update. This allows you to decide in an updater function if you want to re-render. Calling setState directly in render always causes an update. This was not previously the case. Regardless, you should not be calling setState from render. setState callbacks (second argument) now fire immediately after componentDidMount / componentDidUpdate instead of after all components have rendered. When replacing with , B.componentWillMount now always happens before A.componentWillUnmount. Previously, A.componentWillUnmount could fire first in some cases. Previously, changing the ref to a component would always detach the ref before that component's render is called. Now, we change the ref later, when applying the changes to the DOM. It is not safe to re-render into a container that was modified by something other than React. This worked previously in some cases but was never supported. We now emit a warning in this case. Instead you should clean up your component trees using ReactDOM.unmountComponentAtNode. See this example. componentDidUpdate lifecycle no longer receives prevContext param. (See #8631) Shallow renderer no longer calls componentDidUpdate because DOM refs are not available. This also makes it consistent with componentDidMount (which does not get called in previous versions either). Shallow renderer does not implement unstable_batchedUpdates anymore. Packaging

There is no react/lib/ and react-dom/lib/ anymore. Even in CommonJS environments, React and ReactDOM are precompiled to single files (“flat bundles”). If you previously relied on undocumented React internals, and they don’t work anymore, let us know about your specific case in a new issue, and we’ll try to figure out a migration strategy for you. There is no react-with-addons.js build anymore. All compatible addons are published separately on npm, and have single-file browser versions if you need them. The deprecations introduced in 15.x have been removed from the core package. React.createClass is now available as create-react-class, React.PropTypes as prop-types, React.DOM as react-dom-factories, react-addons-test-utils as react-dom/test-utils, and shallow renderer as react-test-renderer/shallow. See 15.5.0 and 15.6.0 blog posts for instructions on migrating code and automated codemods. The names and paths to the single-file browser builds have changed to emphasize the difference between development and production builds. For example: react/dist/react.js → react/umd/react.development.js react/dist/react.min.js → react/umd/react.production.min.js react-dom/dist/react-dom.js → react-dom/umd/react-dom.development.js react-dom/dist/react-dom.min.js → react-dom/umd/react-dom.production.min.js JavaScript Environment Requirements

React 16 depends on the collection types Map and Set. If you support older browsers and devices which may not yet provide these natively (e.g. IE < 11), consider including a global polyfill in your bundled application, such as core-js or babel-polyfill.

A polyfilled environment for React 16 using core-js to support older browsers might look like:

import 'core-js/es6/map'; import 'core-js/es6/set'; import React from 'react'; import ReactDOM from 'react-dom'; ReactDOM.render(
Hello, world!
, document.getElementById('root') ); React also depends on requestAnimationFrame (even in test environments). A simple shim for test environments would be:

global.requestAnimationFrame = function(callback) { setTimeout(callback, 0); }; Acknowledgments

As always, this release would not have been possible without our open source contributors. Thanks to everyone who filed bugs, opened PRs, responded to issues, wrote documentation, and more!

Special thanks to our core contributors, especially for their heroic efforts over the past few weeks during the prerelease cycle: Brandon Dail, Jason Quense, Nathan Hunzaker, and Sasha Aickin.

Infoman

Yahoo open sources its search engine Vespa

182 mkagenius 6 hrs 22

https://www.oath.com/press/open-sourcing-vespa-yahoo-s-big-data-processing-and-serving-eng/

http://news.ycombinator.com/item?id=15345483

By Jon Bratseth, Distinguished Architect, Vespa

Logo for Vespa, Yahoo's big data processing engine

Ever since we open sourced Hadoop in 2006, Yahoo – and now, Oath – has been committed to opening up its big data infrastructure to the larger developer community. Today, we are taking another major step in this direction by making Vespa, Yahoo's big data processing and serving engine, available as open source on GitHub.

Map of Vespa's big data architecture.

Vespa architecture overview

Building applications increasingly means dealing with huge amounts of data. While developers can use the the Hadoop stack to store and batch process big data, and Storm to stream-process data, these technologies do not help with serving results to end users. Serving is challenging at large scale, especially when it is necessary to make computations quickly over data while a user is waiting, as with applications that feature search, recommendation, and personalization.

By releasing Vespa, we are making it easy for anyone to build applications that can compute responses to user requests, over large datasets, at real time and at internet scale – capabilities that up until now, have been within reach of only a few large companies.

Serving often involves more than looking up items by ID or computing a few numbers from a model. Many applications need to compute over large datasets at serving time. Two well-known examples are search and recommendation. To deliver a search result or a list of recommended articles to a user, you need to find all the items matching the query, determine how good each item is for the particular request using a relevance/recommendation model, organize the matches to remove duplicates, add navigation aids, and then return a response to the user. As these computations depend on features of the request, such as the user's query or interests, it won't do to compute the result upfront. It must be done at serving time, and since a user is waiting, it has to be done fast. Combining speedy completion of the aforementioned operations with the ability to perform them over large amounts of data requires a lot of infrastructure – distributed algorithms, data distribution and management, efficient data structures and memory management, and more. This is what Vespa provides in a neatly-packaged and easy to use engine.

With over 1 billion users, we currently use Vespa across many different Oath brands – including Yahoo.com, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Gemini, Flickr, and others – to process and serve billions of daily requests over billions of documents while responding to search queries, making recommendations, and providing personalized content and advertisements, to name just a few use cases. In fact, Vespa processes and serves content and ads almost 90,000 times every second with latencies in the tens of milliseconds. On Flickr alone, Vespa performs keyword and image searches on the scale of a few hundred queries per second on tens of billions of images. Additionally, Vespa makes direct contributions to our company's revenue stream by serving over 3 billion native ad requests per day via Yahoo Gemini, at a peak of 140k requests per second (per Oath internal data).

With Vespa, our teams build applications that:

Select content items using SQL-like queries and text search Organize all matches to generate data-driven pages Rank matches by handwritten or machine-learned relevance models Serve results with response times in the lows milliseconds Write data in real-time, thousands of times per second per node Grow, shrink, and re-configure clusters while serving and writing data To achieve both speed and scale, Vespa distributes data and computation over many machines without any single master as a bottleneck. Where conventional applications work by pulling data into a stateless tier for processing, Vespa instead pushes computations to the data. This involves managing clusters of nodes with background redistribution of data in case of machine failures or the addition of new capacity, implementing distributed low latency query and processing algorithms, handling distributed data consistency, and a lot more. It's a ton of hard work!

As the team behind Vespa, we have been working on developing search and serving capabilities ever since building alltheweb.com, which was later acquired by Yahoo. Over the last couple of years we have rewritten most of the engine from scratch to incorporate our experience onto a modern technology stack. Vespa is larger in scope and lines of code than any open source project we've ever released. Now that this has been battle-proven on Yahoo's largest and most critical systems, we are pleased to release it to the world.

Vespa gives application developers the ability to feed data and models of any size to the serving system and make the final computations at request time. This often produces a better user experience at lower cost (for buying and running hardware) and complexity compared to pre-computing answers to requests. Furthermore it allows developers to work in a more interactive way where they navigate and interact with complex calculations in real time, rather than having to start offline jobs and check the results later.

Vespa can be run on premises or in the cloud. We provide both Docker images and rpm packages for Vespa, as well as guides for running them both on your own laptop or as an AWS cluster.

We'll follow up this initial announcement with a series of posts on our blog showing how to build a real-world application with Vespa, but you can get started right now by following the getting started guide in our comprehensive documentation.

Managing distributed systems is not easy. We have worked hard to make it easy to develop and operate applications on Vespa so that you can focus on creating features that make use of the ability to compute over large datasets in real time, rather than the details of managing clusters and data. You should be able to get an application up and running in less than ten minutes by following the documentation.

We can't wait to see what you build with it!

Infoman

2018 香港加息周期起點

美國聯邦儲備局剛宣布，將於10月開始縮減資產負債表（俗稱「縮表」），標誌過去9年投放市場的流動資金陸續回籠，環球市場將恢復2008年環球金融海嘯前的常態。為此，香港官僚已急不及待，再三警告須提防走資潮，港元利率飆升，影響百業民生；外滙基金更坐言起行，搶先增發票據，再度抽資，推高隔夜和短期同業拆息。不過，港元奉行貨幣局聯繫滙率制度，美元港元利息理應同步同軌，況且熱錢往往跑快一步，走資加息的論調，既費解，也困惑。

須了解熱錢動向

若要理解為何熱錢撤退，須先理解熱錢為何湧入。2008年環球金融海嘯爆發，存貸各不相讓，慎防有失，資金市場癱瘓，美國聯儲局率先推行量化寬鬆措施，再三購入金融債務，重新啟動市場流動性，恢復固有交易秩序。

所謂量化寬鬆，其實是以中央銀行信用取代原來的債務信用，拆解三角債死結，令市場鬆綁。聯儲局資產負債同步增加，資產是所購入債務，負債是銀行清算結餘，屬於存款儲備；銀行儲備上升，貸款及存款循環擴張，資產及負債也同步增加，為經濟注入新血，恢復元氣。

另一方面，聯儲局也降低利息至零作配合，減低官民舉債負擔，加速鬆綁效應。多年來市場已消化銀根寬鬆和零息的新形勢，現時聯儲局確認終止寬鬆貨幣措施，並啟動收縮資產負債，配合利率正常化，市場也須重新適應。早前利率開始提升後，「縮表」是預期之內，但撥亂反正不能一蹴而就。

量化寬鬆下，美元泛濫，找尋機會投機投資增值，港元首當其衝則有多個因素所致。其一、港元掛鈎美元，幾乎全無外滙風險；其二、港元利率跟隨美元下調，推高實質及金融資產價格，等於資產通脹；其三、人民幣滙率改革後連年升值，香港離岸市場暢旺，吸引外資借道港元炒賣尋租。據官方估計，2008年尾季至2009年底的15個月，流入熱錢共達6400億港元等值之多。

熱錢流入，香港銀行海外美元存款（資產）與客戶港元存款（負債）同步增加，清算結餘對活期存款（支票及儲蓄）比率相應下降，須向外滙基金兌換港元補充結餘。

在客觀供求規律下，港元滙率高出官價，最終觸及外滙基金強方保證，而港元美元息差也拉闊，以維持固定滙率。

翻查紀錄，2008年10月量化寬鬆啟動後，港元滙率從官價7.8升至7.52；銀行清算比率跳升至10%，比前高十倍；港元同業拆息（HIBOR）隔夜及一周跌至1厘之下，年底更跌至0.5厘之下；而一個月拆息，年底亦跌破0.5厘；銀行存貸利率牌價也相應下調。

港元利率結構受扭曲

事後分析，熱錢湧入有三個特徵：其一、港元滙率驟升至強方保證（7.75）價位；其二，港元同業市場流動性驟增，銀行清算結餘比率跳升；其三、港元利率驟降，與基礎利率（即貼現窗利率）同步，而基礎利率是美聯邦基金利率（即同業拆息）目標價加0.5厘。

聯儲局結束量化寬鬆措施，啟動資金回籠，熱錢回流也有3個特徵：其一、港元回弱至官價（7.8）水平；其二、港元同業市場流動性減低，銀行清算結餘比率回落至正常水平；其三、港元利率回升，與基礎利率同步。若對照近來市情，兩個特徵已呈現。港元滙率已回落至官價，銀行清算結餘比率亦回落至6%。

2008年環球金融海嘯平息後，國際間加強監管善後，提高銀行儲備及流動資產標準，清算結餘比率業難以返回從前之1%水平，5%可能是新常態。熱錢是「聰明錢」，往往早着先鞭，聯儲局縮減資產負債既是定局，提早撤資快人一步並非意料之外。不過，港元存貸利率仍未正常化，令人困惑熱錢是否撤離？若否，為何港元偏弱，而清算結餘也回落？

其實，港元利率結構過去9年被扭曲，最明顯是最優惠利率（BLR）從未隨基礎利率起跌，而儲蓄利率近零，似有若無。最優惠利率向來與儲蓄利率同步，因為儲蓄向來是最穩定的散戶資金來源，而散戶信貸是以最優惠利率計息。港元利率協定2001年全面撤銷，但兩者關係實際仍密切。

統計自1971年起至2001年止30年，最優惠利率平均約儲蓄的2.5倍。環球海嘯前最優惠利率是5厘，儲蓄利率應是2厘，而基礎利率是3.5厘；現今基礎利率是1.5厘，若以該比率作準則，推算儲蓄利率應為0.85厘（=1.5*﹝2/3.5﹞），最優惠利率應為2.125厘左右。且參照按揭利率印證，無論以「HIBOR加」或「BLR減」報價，皆是2厘多水平，與推算吻合。若撇開利率扭曲及後遺症等影響，港元同業市場拆息實際漸趨常態，所謂港元有待加息論調，乃以偏概全。因此3個撤資特徵應皆齊備，合理推斷是投機熱錢洞悉先機，應已獲利撤退。

發債抽資換湯不換藥

不過，上述分析仍有不足之處。所有進出資本無論投資或投機，最終「落戶」銀行資產中「貨幣金融」賬目，即是清算結餘、外滙基金票據、鈔票發行儲備。故此整體增減足以顯示資本性流向，而短期驟變更能反映熱錢進出。銀行業務統計並無區分本土及離岸，港元存款轉換外幣存款是從本土賬轉移至離岸賬，理論上乃資本外流，反之是資本回流，而實際卻無資金進出。

若離岸外幣存款起跌大，或影響清算結餘增減，造成熱錢進退假象。其實，離岸外幣（包括人民幣）存款增減也影響港元滙率，因為買賣須經美元作媒介，若存款擴張令美元求大於供，港元偏軟；反之存款收縮則美元供大於求，港元偏強。故此單憑港元偏強偏軟或同業拆息上落難下定論以判斷資本性流向，須綜合多個指標互相印證，始可窺全豹。

美國啟動利率及流動性正常化多時，香港銀行卻遲遲未調整存息貸息牌價，歸根究柢是9年來從未隨市場走勢調整，何來加息空間？利率協議早已撤銷，卻疑似名亡實存，實匪夷所思。不過，美國金融貨幣正常化指日可待，香港也解困有期，不必逆周期操作揠苗助長。

熱錢特色是機動性強、觸覺性高，豈會落後於形勢？對照最新金融貨幣指標，切合撤資特徵。外滙基金卻急不及待治標，前後兩度增發票據抽資，似乎另有盤算。其實，9年來香港銀行謹慎有餘，2008年9月底儲備比率是37%，港元貸存比率是80%；今年6月底儲備比提高至43%，港元貸存比率下跌至68%。

發債抽資是換湯不換藥，港元銀根並無收縮，實際無助應對熱錢撤資，也無助港元利率正常化，反而發放收緊金融貨幣訊息，令銀行取態更謹慎，面臨加息周期，對經濟百弊而無一利。

鄭宏泰為香港中文大學香港亞太研究所助理所長；陸觀豪為退休銀行家、香港中文大學香港亞太研究所名譽研究員

Infoman

Why SQL is beating NoSQL, and what this means for the future of data

104 nreece 7 hrs 45

https://blog.timescale.com/why-sql-beating-nosql-what-this-means-for-future-of-data-time-series-database-348b777b847a

http://news.ycombinator.com/item?id=15335717

After years of being left for dead, SQL today is making a comeback. How come? And what effect will this have on the data community?

SQL awakens to fight the dark forces of NoSQL Since the dawn of computing, we have been collecting exponentially growing amounts of data, constantly asking more from our data storage, processing, and analysis technology. In the past decade, this caused software developers to cast aside SQL as a relic that couldn’t scale with these growing data volumes, leading to the rise of NoSQL: MapReduce and Bigtable, Cassandra, MongoDB, and more.

Yet today SQL is resurging. All of the major cloud providers now offer popular managed relational database services: e.g., Amazon RDS, Google Cloud SQL, Azure Database for PostgreSQL (Azure launched just this year). In Amazon’s own words, its PostgreSQL- and MySQL-compatible database Aurora database product has been the “fastest growing service in the history of AWS”. SQL interfaces on top of Hadoop and Spark continue to thrive. And just last month, Kafka launched SQL support. Your humble authors themselves are developers of a new time-series database that fully embraces SQL.

In this post we examine why the pendulum today is swinging back to SQL, and what this means for the future of the data engineering and analysis community.

To understand why SQL is making a comeback, let’s start with why it was designed in the first place.

Like all good stories, ours starts in the 1970s Our story starts at IBM Research in the early 1970s, where the relational database was born. At that time, query languages relied on complex mathematical logic and notation. Two newly minted PhDs, Donald Chamberlin and Raymond Boyce, were impressed by the relational data model but saw that the query language would be a major bottleneck to adoption. They set out to design a new query language that would be (in their own words): “more accessible to users without formal training in mathematics or computer programming.”

Query languages before SQL (a, b) vs SQL © (source) Think about this. Way before the Internet, before the Personal Computer, when the programming language C was first being introduced to the world, two young computer scientists realized that, “much of the success of the computer industry depends on developing a class of users other than trained computer specialists.” They wanted a query language that was as easy to read as English, and that would also encompass database administration and manipulation.

The result was SQL, first introduced to the world in 1974. Over the next few decades, SQL would prove to be immensely popular. As relational databases like System R, Ingres, DB2, Oracle, SQL Server, PostgreSQL, MySQL (and more) took over the software industry, SQL became established as the preeminent language for interacting with a database, and became the lingua franca for an increasingly crowded and competitive ecosystem.

(Sadly, Raymond Boyce never had a chance to witness SQL’s success. He died of a brain aneurysm 1 month after giving one of the earliest SQL presentations, just 26 years of age, leaving behind a wife and young daughter.)

For a while, it seemed like SQL had successfully fulfilled its mission. But then the Internet happened.

While Chamberlin and Boyce were developing SQL, what they didn’t realize is that a second group of engineers in California were working on another budding project that would later widely proliferate and threaten SQL’s existence. That project was ARPANET, and on October 29, 1969, it was born.

Some of the creators of ARPANET, which eventually evolved into today’s Internet (source) But SQL was actually fine until another engineer showed up and invented the World Wide Web, in 1989.

The physicist who invented the Web (source) Like a weed, the Internet and Web flourished, massively disrupting our world in countless ways, but for the data community it created one particular headache: new sources generating data at much higher volumes and velocities than before.

As the Internet continued to grow and grow, the software community found that the relational databases of that time couldn’t handle this new load. There was a disturbance in the force, as if a million databases cried out and were suddenly overloaded.

Then two new Internet giants made breakthroughs, and developed their own distributed non-relational systems to help with this new onslaught of data: MapReduce (published 2004) and Bigtable (published 2006) by Google, and Dynamo (published 2007) by Amazon. These seminal papers led to even more non-relational databases, including Hadoop (based on the MapReduce paper, 2006), Cassandra (heavily inspired by both the Bigtable and Dynamo papers, 2008) and MongoDB (2009). Because these were new systems largely written from scratch, they also eschewed SQL, leading to the rise of the NoSQL movement.

And boy did the software developer community eat up NoSQL, embracing it arguably much more broadly than the original Google/Amazon authors intended. It’s easy to understand why: NoSQL was new and shiny; it promised scale and power; it seemed like the fast path to engineering success. But then the problems started appearing.

Classic software developer tempted by NoSQL. Don’t be this guy. Developers soon found that not having SQL was actually quite limiting. Each NoSQL database offered its own unique query language, which meant: more languages to learn (and to teach to your coworkers); increased difficulty in connecting these databases to applications, leading to tons of brittle glue code; a lack of a third party ecosystem, requiring companies to develop their own operational and visualization tools.

These NoSQL languages, being new, were also not fully developed. For example, there had been years of work in relational databases to add necessary features to SQL (e.g., JOINs); the immaturity of NoSQL languages meant more complexity was needed at the application level. The lack of JOINs also led to denormalization, which led to data bloat and rigidity.

Some NoSQL databases added their own “SQL-like” query languages, like Cassandra’s CQL. But this often made the problem worse. Using an interface that is almost identical to something more common actually created more mental friction: engineers didn’t know what was supported and what wasn’t.

SQL-like query languages are like the Star Wars Holiday Special. Accept no imitations. (And always avoid the Star Wars Holiday Special.) Some in the community saw the problems with NoSQL early on (e.g., DeWitt and Stonebraker in 2008). Over time, through hard-earned scars of personal experience, more and more software developers joined them.

Initially seduced by the dark side, the software community began to see the light and come back to SQL.

First came the SQL interfaces on top of Hadoop (and later, Spark), leading the industry to “back-cronym” NoSQL to “Not Only SQL” (yeah, nice try).

Then came the rise of NewSQL: new scalable databases that fully embraced SQL. H-Store (published 2008) from MIT and Brown researchers was one of the first scale-out OLTP databases. Google again led the way for a geo-replicated SQL-interfaced database with their first Spanner paper (published 2012) (whose authors include the original MapReduce authors), followed by other pioneers like CockroachDB (2014).

At the same time, the PostgreSQL community began to revive, adding critical improvements like a JSON datatype (2012), and a potpourri of new features in PostgreSQL 10: better native support for partitioning and replication, full text search support for JSON, and more (release slated for later this year). Other companies like CitusDB (2016) and yours truly (TimescaleDB, released this year) found new ways to scale PostgreSQL for specialized data workloads.

In fact, our journey developing TimescaleDB closely mirrors the path the industry has taken. Early internal versions of TimescaleDB featured our own SQL-like query language called “ioQL.” Yes, we too were tempted by the dark side: building our own query language felt powerful. But while it seemed like the easy path, we soon realized that we’d have to do a lot more work: e.g., deciding syntax, building various connectors, educating users, etc. We also found ourselves constantly looking up the proper syntax to queries that we could already express in SQL, for a query language we had written ourselves!

One day we realized that building our own query language made no sense. That the key was to embrace SQL. And that was one of the best design decisions we have made. Immediately a whole new world opened up. Today, even though we are just a 5 month old database, our users can use us in production and get all kinds of wonderful things out of the box: visualization tools (Tableau), connectors to common ORMs, a variety of tooling and backup options, an abundance of tutorials and syntax explanations online, etc.

Google has clearly been on the leading edge of data engineering and infrastructure for over a decade now. It behooves us to pay close attention to what they are doing.

Take a look at Google’s second major Spanner paper, released just four months ago (Spanner: Becoming a SQL System, May 2017), and you’ll find that it bolsters our independent findings.

For example, Google began building on top of Bigtable, but then found that the lack of SQL created problems (emphasis in all quotes below ours):

“While these systems provided some of the benefits of a database system, they lacked many traditional database features that application developers often rely on. A key example is a robust query language, meaning that developers had to write complex code to process and aggregate the data in their applications. As a result, we decided to turn Spanner into a full featured SQL system, with query execution tightly integrated with the other architectural features of Spanner (such as strong consistency and global replication).” Later in the paper they further capture the rationale for their transition from NoSQL to SQL:

The original API of Spanner provided NoSQL methods for point lookups and range scans of individual and interleaved tables. While NoSQL methods provided a simple path to launching Spanner, and continue to be useful in simple retrieval scenarios, SQL has provided significant additional value in expressing more complex data access patterns and pushing computation to the data. The paper also describes how the adoption of SQL doesn’t stop at Spanner, but actually extends across the rest of Google, where multiple systems today share a common SQL dialect:

Spanner’s SQL engine shares a common SQL dialect, called “Standard SQL”, with several other systems at Google including internal systems such as F1 and Dremel (among others), and external systems such as BigQuery… For users within Google, this lowers the barrier of working across the systems. A developer or data analyst who writes SQL against a Spanner database can transfer their understanding of the language to Dremel without concern over subtle differences in syntax, NULL handling, etc. The success of this approach speaks for itself. Spanner is already the “source of truth” for major Google systems, including AdWords and Google Play, while “Potential Cloud customers are overwhelmingly interested in using SQL.”

Considering that Google helped initiate the NoSQL movement in the first place, it is quite remarkable that it is embracing SQL today. (Leading some to recently wonder: “Did Google Send the Big Data Industry on a 10 Year Head Fake?”.)

In computer networking, there is a concept called the “narrow waist.”

This idea emerged to solve a key problem: On any given networked device, imagine a stack, with layers of hardware at the bottom and layers of software on top. There can exist a variety of networking hardware; similarly there can exist a variety of software and applications. One needs a way to ensure that no matter the hardware, the software can still connect to the network; and no matter the software, that the networking hardware knows how to handle the network requests.

The Networking Narrow Waist (source) In networking, the role of the narrow waist is played by Internet Protocol (IP), acting as a common interface between lower-level networking protocols designed for local-area network, and higher-level application and transport protocols. (Here’s one nice explanation.) And (in a broad oversimplification), this common interface became the lingua franca for computers, enabling networks to interconnect, devices to communicate, and this “network of networks” to grow into today’s rich and varied Internet.

We believe that SQL has become the narrow waist for data analysis.

We live in an era where data is becoming “the world’s most valuable resource” (The Economist, May 2017). As a result, we have seen a Cambrian explosion of specialized databases (OLAP, time-series, document, graph, etc.), data processing tools (Hadoop, Spark, Flink), data buses (Kafka, RabbitMQ), etc. We also have more applications that need to rely on this data infrastructure, whether third-party data visualization tools (Tableau, Grafana, PowerBI, Superset), web frameworks (Rails, Django) or custom-built data-driven applications.

Like networking we have a complex stack, with infrastructure on the bottom and applications on top. Typically, we end up writing a lot of glue code to make this stack work. But glue code can be brittle: it needs to be maintained and tended to.

What we need is a common interface that allows pieces of this stack to communicate with one another. Ideally something already standardized in the industry. Something that would allow us to swap in/out various layers with minimal friction.

That is the power of SQL. Like IP, SQL is a common interface.

But SQL is in fact much more than IP. Because data also gets analyzed by humans. And true to the purpose that SQL’s creators initially assigned to it, SQL is readable.

Is SQL perfect? No, but it is the language that most of us in the community know. And while there are already engineers out there working on a more natural language oriented interface, what will those systems then connect to? SQL.

So there is another layer at the very top of the stack. And that layer is us.

SQL is back. Not just because writing glue code to kludge together NoSQL tools is annoying. Not just because retraining workforces to learn a myriad of new languages is hard. Not just because standards can be a good thing.

But also because the world is filled with data. It surrounds us, binds us. At first, we relied on our human senses and sensory nervous systems to process it. Now our software and hardware systems are also getting smart enough to help us. And as we collect more and more data to make better sense of our world, the complexity of our systems to store, process, analyze, and visualize that data will only continue to grow as well.

Master Data Scientist Yoda Either we can live in a world of brittle systems and a million interfaces. Or we can continue to embrace SQL. And restore balance to the force.

Like this post? Please recommend and/or share.

And if you’d like to learn more about TimescaleDB, please check out our GitHub (stars always appreciated), and please let us know how we can help.

Suggested reading for those who’d like to learn more about the history of databases (aka syllabus for the future TimescaleDB Intro to Databases Class):

A Relational Model of Data for Large Shared Data Banks (IBM Research, 1970) SEQUEL: A Structured English Query Language (IBM Research, 1974) System R: Relational Approach to Database Management (IBM Research, 1976) MapReduce: Simplified Data Processing on Large Clusters (Google, 2004) C-Store: A Column-oriented DBMS (MIT, others, 2005) Bigtable: A Distributed Storage System for Structured Data (Google, 2006) Dynamo: Amazon’s Highly Available Key-value Store (Amazon, 2007) MapReduce: A major step backwards (DeWitt, Stonebreaker, 2008) H-Store: A High-Performance, Distributed Main Memory Transaction Processing System (MIT, Brown, others, 2008) Spark: Cluster Computing with Working Sets (UC Berkeley, 2010) Spanner: Google’s Globally-Distributed Database (Google, 2012) Early History of SQL (Chamberlin, 2012) How the Internet was Born (Hines, 2015) Spanner: Becoming a SQL System (Google, 2017)

Infoman

YC’s Essential Startup Advice

258 craigcannon 3 hrs 98

https://blog.ycombinator.com/ycs-essential-startup-advice/

http://news.ycombinator.com/item?id=15331016

A lot of the advice we give startups is tactical; meant to be helpful on a day to day or week to week basis. But some advice is more fundamental. We’ve collected here what we at YC consider the most important, most transformative advice for startups. Whether common sense or counter-intuitive, the guidance below will help most startups find their path to success.

The first thing we always tell founders is to launch their product right away; for the simple reason that this is the only way to fully understand customers’ problems and whether the product meets their needs. Surprisingly, launching a mediocre product as soon as possible, and then talking to customers and iterating, is much better than waiting to build the “perfect” product. This is true as long as the product contains a “quantum of utility” (Do Things That Don’t Scale by Paul Graham) for customers whose value overwhelms problems any warts might present.

Once launched, we suggest founders do things that don’t scale (Do Things That Don’t Scale by Paul Graham). Many startup advisors persuade startups to scale way too early. This will require the building of technology and processes to support that scaling, which, if premature, will be a waste of time and effort. This strategy often leads to failure and even startup death. Rather, we tell startups to get their first customer by any means necessary, even by manual work that couldn’t be managed for more than ten, much less 100 or 1000 customers. At this stage, founders are still trying to figure out what needs to be built and the best way to do that is talk directly to customers. For example, the Airbnb founders originally offered to “professionally” photograph the homes and apartments of their earliest customers in order to make their listings more attractive to renters. Then, they went and took the photographs themselves. The listings on their site improved, conversions improved, and they had amazing conversations with their customers. This was entirely unscalable, yet proved essential in learning how to build a vibrant marketplace.

Talking to users usually yields a long, complicated list of features to build. One piece of advice that YC partner Paul Buchheit (PB) always gives in this case is to look for the “90/10 solution”. That is, look for a way in which you can accomplish 90% of what you want with only 10% of the work/effort/time. If you search hard for it, there is almost always a 90/10 solution available. Most importantly, a 90% solution to a real customer problem which is available right away, is much better than a 100% solution that takes ages to build.

As companies begin to grow there are often tons of potential distractions. Conferences, dinners, meeting with venture capitalists or large company corporate development types (Don’t Talk to Corp Dev by Paul Graham), chasing after press coverage and so on. (YC co-founder Jessica Livingston created a pretty comprehensive list of the wrong things on which to focus [How Not To Fail by Jessica Livingston .]) We always remind founders not to lose sight that the most important tasks for an early stage company are to write code and talk to users. For any company, software or otherwise, this means that in order to make something people want: you must launch something, talk to your users to see if it serves their needs, and then take their feedback and iterate. These tasks should occupy almost all of your time/focus. For great companies this cycle never ends. Similarly, as your company evolves there will be many times where founders are forced to choose between multiple directions for their company. Sam Altman always points out that it is nearly always better to take the more ambitious path. It is actually extraordinary how often founders manage to avoid tackling these sorts of problems and focus on other things. Sam calls this “fake work”, because it tends to be more fun than real work (The Post YC Slump by Sam Altman).

When it comes to customers most founders don’t realize that they get to choose customers as much as customers get to choose them. We often say that a small group of customers who love you is better than a large group who kind of like you. In other words, recruiting 10 customers who have a burning problem is much better than 1000 customers who have a passing annoyance. It is easy to make mistakes when choosing your customers so sometimes it’s also critical for startups to fire their customers. Some customers can cost way more than they provide in either revenue or learning. For example, Justin.tv/Twitch only became a breakout success when they focused their efforts toward video game broadcasters and away from people trying to stream copy written content (Users You Don’t Want by Michael Seibel.)

Growth is always a focus for startups, since a startup without growth is usually a failure. However, how and when to grow is often misunderstood. YC is sometimes criticised for pushing companies to grow at all costs, but in fact we push companies to talk to their users, build what they want, and iterate quickly. Growth is a natural result of doing these three things successfully. Yet, growth is not always the right choice. If you have not yet made something your customers want – in other words, have found product market fit, it makes little sense to grow (The Real Product Market Fit by Michael Seibel). Poor retention is always the result. Also, if you have an unprofitable product, growth merely drains cash from the company. As PB likes to say, it never makes sense to take 80 cents from a customer and then hand them a dollar back. The fact that unit economics really matter shouldn’t come as a surprise, but too many startups seem to forget this basic fact (Unit Economics by Sam Altman).

Startup founders’ intuition will always be to do more whereas usually the best strategy is almost always to do less, really well. For example, founders are frequently tempted to chase big deals with large companies which represent amazing, company validating relationships. However, deals between large companies and tiny startups seldom end well for the startup. They take too long, cost too much, and often fail completely. One of the hardest things about doing a startup is choosing what to do, since you will always have an infinite list of things that could be done (Startup Priorities by Geoff Ralston). It is vital that very early a startup choose the one or two key metrics it will use to measure success, then founders should choose what to do based nearly exclusively on how the task will impact those metrics. When your early stage product isn’t working it’s often tempting to immediately build new features in order to solve every problem the customer seems to have instead of talking to the customer and focusing only on the most acute problem they have.

Founders often find it surprising to hear that they shouldn’t worry if their company seems badly broken. It turns out that nearly every startup has deep, fundamental issues, even those that will end up being billion dollar companies. Success is not determined by whether you are broken at the beginning, but rather what the founders do about the inevitable problems. Your job as a founder will often seem to be continuously righting a capsized ship. This is normal.

It is very difficult as a new startup founder not to obsess about competition, actual and potential. It turns out that spending any time worrying about your competitors is nearly always a very bad idea. We like to say that startup companies always die of suicide not murder. There will come a time when competitive dynamics are intensely important to the success or failure of your company, but it is highly unlikely to be true in the first year or two.

A few words on fundraising (A Guide to Seed Fundraising by Geoff Ralston). The first, best bit of advice is to raise money as quickly as possible and then get back to work. It is often easy to actually see when a company is fundraising by looking at their growth curve and when it flattens out they are raising money. Equally important is to understand that valuation is not equal to success or even probability of success (Fundraising Rounds are not Milestones by Michael Seibel). Some of Y Combinator’s very best companies raised on tiny initial valuations (Airbnb, Dropbox, Twitch, are all good examples). By the way, it is vital to remember that the money you raise IS NOT your money. You have a fiduciary and ethical/moral duty to spend the money only to improve the prospects of your company.

It is also important to stay sane during the inevitable craziness of startup life. So we always tell founders to make sure they take breaks, spend time with friends and family, get enough sleep and exercise in between bouts of extraordinarily intense, focused work. Lastly, a brief word on failure. It turns out most companies fail fast because founders fall out. The relationships with your cofounders matter more than you think and open, honest communications between founders makes future debacles much less likely. In fact, it turns out that one of the best things you can do to make your startup successful, in fact, to be successful in life, is to simply be nice (Mean People Fail by Paul Graham.)

The Pocket Guide of Essential YC Advice

• Launch now • Build something people want • Do things that don’t scale • Find the 90 / 10 solution • Find 10-100 customers who love your product • All startups are badly broken at some point • 007 – formidable (need this) • Write code – talk to users • “It’s not your money” • Growth is the result of a great product not the precursor • Don’t scale your team/product until you have built something people want • Valuation is not equal to success or even probability of success • Avoid long negotiated deals with big customers if you can • Avoid big company corporate development queries – they will only waste time • Avoid conferences unless they are the best way to get customers • Pre-product market fit – do things that don’t scale: remain small/nimble • Startups can only solve one problem well at any given time • Founder relationships matter more than you think • Sometimes you need to fire your customers (they might be killing you) • Ignore your competitors, you will more likely die of suicide than murder • Most companies don’t die because they run out of money • Be nice! Or at least don’t be a jerk

• Get sleep and exercise – take care of yourself

References

Do Things That Don’t Scale by Paul Graham ↩ Don’t Talk to Corp Dev by Paul Graham ↩ How Not To Fail by Jessica Livingston ↩ The Post YC Slump by Sam Altman ↩ Users You Don’t Want by Michael Seibel ↩ The Real Product Market Fit by Michael Seibel ↩ Unit Economics by Sam Altman ↩ Startup Priorities by Geoff Ralston ↩ A Guide to Seed Fundraising by Geoff Ralston. ↩ Fundraising Rounds are not Milestones by Michael Seibel ↩ Mean People Fail by Paul Graham ↩

PixelNN – Example-Based Image Synthesis

393 pentestercrab 10 hrs 118

http://www.cs.cmu.edu/~aayushb/pixelNN/

http://news.ycombinator.com/item?id=15328356

teaserImage We present a simple nearest-neighbor (NN) approach that synthesizes high-frequency photorealistic images from an "incomplete" signal such as a low-resolution image, a surface normal map, or edges. Current state-of-the-art deep generative models designed for such conditional image synthesis lack two important things: (1) they are unable to generate a large set of diverse outputs, due to the mode collapse problem. (2) they are not interpretable, making it difficult to control the synthesized output. We demonstrate that NN approaches potentially address such limitations, but suffer in accuracy on small datasets. We design a simple pipeline that combines the best of both worlds: the first stage uses a convolutional neural network (CNN) to maps the input to a (overly-smoothed) image, and the second stage uses a pixel-wise nearest neighbor method to map the smoothed output to multiple high-quality, high-frequency outputs in a controllable manner. We demonstrate our approach for various input modalities, and for various domains ranging from human faces to cats-and-dogs to shoes and handbags.

Paper

PixelNN: Example-based Image Synthesis. A. Bansal, Y. Sheikh, and D. Ramanan

arXiv | bibtex

Comparison with Pix-to-Pix

Comparison Multiple Outputs

Multiple Multiple Multiple Edges-to-Faces

Edges2Faces Normals-to-Faces

Normals2Faces Edges-to-Cats-&-Dogs

Edges2CatsAndDogs Normals-to-Cats-&-Dogs

Normals2CatsAndDogs Example Frequency Analysis

We did frequency analysis via FFT to understand the frequency content in the output of our images. FreqAnalysis

A. Bansal, B. Russell, and A. Gupta. Marr Revisited: 2D-3D Model Alignment via Surface Normal Prediction. In CVPR, 2016

A. Bansal, X. Chen, B. Russell, A. Gupta, and D. Ramanan. PixelNet: Representation of the pixels, by the pixels, and for the pixels. In arXiv, 2017

Comments, questions to Aayush Bansal.

Infoman

How Web Pages Can Extend (or Drain) Mobile Device Battery Life

Dr. Angela Nicoara on mobile browser energy consumption and ways developers can minimize energy use through design.

by Jenn Webb | @JennWebb | +Jenn Webb | Comment | May 23, 2013 Comment According to recent Global Mobile Data Traffic Forecasts (PDF), the number of mobile-connected devices will surpass the world’s population this year, and by 2015, there will be 788 million mobile-only Internet users. A recent paper, “Who Killed My Battery: Analyzing Mobile Browser Energy Consumption (PDF),” pulled together by the Deutsche Telekom Innovation Center in Silicon Valley and Stanford University researchers and published in the ACM 21st International World Wide Web Conference (WWW 2012) proceedings (PDF), takes a look at the growing popularity of mobile web browsing and the effects on energy consumption.

I reached out to Dr. Angela Nicoara, senior research scientist at the Deutsche Telekom Innovation Center in Silicon Valley who worked on the project, to find out why mobile browser energy consumption is a growing concern and what developers need to know going forward. Our interview follows. Dr. Nicoara will present the researchers’ findings in the “Who Killed My Battery: Analyzing Mobile Browser Energy Consumption” session at the Fluent 2013 conference next week in San Francisco, CA.

Why is browser energy consumption becoming more of an issue with the growth of smartphones and mobile browsing?

Dr. Angela Nicoara: Despite the explosive growth of smartphones and growing popularity of mobile web browsing, their utility has been and will remain severely limited by the battery life. Smartphones’ energy constraints are here to stay, and as such, optimizing the energy consumption of the phone browser while surfing the Web is of critical importance today and will remain so in the foreseeable future.

Our research, “Who Killed My Battery: Analyzing Mobile Browser Energy Consumption,” has focused on solving two of the most important and difficult problems pertaining to energy consumption on smartphones: developing an infrastructure for measuring the precise energy used by a mobile browser to render web pages and developing techniques to offload browser-heavy computations to the cloud.

A fundamental challenge arises as a result of power inefficiency of mobile web browsers at popular websites (e.g., financial, e-commerce, email, blogging, news sites) and how much energy is consumed to render a particular web page. Our work is the first of this kind to show how the structure of web pages can impact battery usage in mobile web browsers. Our research in this area has influenced and will influence the computing industry through the design and implementation of an infrastructure for measuring the precise energy used by a mobile browser to render web pages.

Learn from experts at this free, live webcast. Angela Nicoara

Who Killed My Battery: Analyzing Mobile Browser Energy Consumption

Presented by Angela Nicoara Date: May 24, 2013 at 10am PT

What tools and methods are used to measure mobile browser energy consumption?

Dr. Nicoara: We developed novel techniques and tools to precisely measure the energy needed to render individual web elements, such as images, JavaScript, cascade style sheets (CSS) and plug-in objects, as well as designed a system that has the potential to dramatically reduce energy consumption of smartphones.

Our results show that for popular websites, downloading and parsing CSS and JavaScript consumes a significant fraction of the total energy needed to render the web page. We also show that by redesigning websites, the energy needed to render web pages is minimized. Another fundamental challenge stems from estimating the point at which offloading browser computations to a remote server can save energy on the phone. Given the smartphone’s limited energy, there is a strong desire to minimize its work, which can be helped by performing expensive browser computations off the phone. We explored the possibilities that arise when offloading heavy computations to a server cloud to save energy.

While researchers assume that measuring the energy consumed by a mobile operation can be known using a high-level API for finding out the battery level, we pioneered another approach: providing support for obtaining very precise, fine-grained energy use by a mobile browser to render web pages by hooking an external high-precision digital power multimeter to an open mobile phone’s battery. Given the inadequate nature of the existing tools, we then advanced our research in this field, aiming to accurately model the power draw of the Android mobile platform.

How can developers put this information to use?

Dr. Nicoara: We measured the energy needed to render financial, e-commerce, email, blogging, news and social networking sites. The tools are sufficiently precise to measure the energy needed to render individual web elements, such as cascade style sheets (CSS), JavaScript, images, and plug-in objects. Using the collected data, concrete recommendations are made on how to design web pages so as to minimize the energy needed to render the page.

Our research findings can help developers overcome the resource limitations of smartphones, one of the biggest challenges faced by today’s mobile industry. It allows developers to easily design energy-efficient websites by following concrete guidelines and recommendations. With our research, the improved battery life from our technologies dramatically enhances the usability of mobile devices and impacts consumers’ daily lives.

Infoman

Posts by Infoman

DNS for Service Discovery in HAProxy

Strategy for People Who Want to be Founders

Three Paths in the Tech Industry: Founder, Executive, or Employee

WHAT ARE DAPPS?

Can an App Really Teach You to Sing?

Battery-Free Cell Phone

Technology preview: Private contact discovery for Signal

React 16

Hello, world!

Yahoo open sources its search engine Vespa

Why SQL is beating NoSQL, and what this means for the future of data

YC’s Essential Startup Advice

PixelNN – Example-Based Image Synthesis

How Web Pages Can Extend (or Drain) Mobile Device Battery Life