Posts by Infoman

2017/09/27
  • Yahoo open sources its search engine Vespa

    182 mkagenius 6 hrs 22

    https://www.oath.com/press/open-sourcing-vespa-yahoo-s-big-data-processing-and-serving-eng/

    http://news.ycombinator.com/item?id=15345483

    By Jon Bratseth, Distinguished Architect, Vespa

    Logo for Vespa, Yahoo's big data processing engine

    Ever since we open sourced Hadoop in 2006, Yahoo – and now, Oath – has been committed to opening up its big data infrastructure to the larger developer community. Today, we are taking another major step in this direction by making Vespa, Yahoo's big data processing and serving engine, available as open source on GitHub.

    Map of Vespa's big data architecture.

    Vespa architecture overview

    Building applications increasingly means dealing with huge amounts of data. While developers can use the the Hadoop stack to store and batch process big data, and Storm to stream-process data, these technologies do not help with serving results to end users. Serving is challenging at large scale, especially when it is necessary to make computations quickly over data while a user is waiting, as with applications that feature search, recommendation, and personalization.

    By releasing Vespa, we are making it easy for anyone to build applications that can compute responses to user requests, over large datasets, at real time and at internet scale – capabilities that up until now, have been within reach of only a few large companies.

    Serving often involves more than looking up items by ID or computing a few numbers from a model. Many applications need to compute over large datasets at serving time. Two well-known examples are search and recommendation. To deliver a search result or a list of recommended articles to a user, you need to find all the items matching the query, determine how good each item is for the particular request using a relevance/recommendation model, organize the matches to remove duplicates, add navigation aids, and then return a response to the user. As these computations depend on features of the request, such as the user's query or interests, it won't do to compute the result upfront. It must be done at serving time, and since a user is waiting, it has to be done fast. Combining speedy completion of the aforementioned operations with the ability to perform them over large amounts of data requires a lot of infrastructure – distributed algorithms, data distribution and management, efficient data structures and memory management, and more. This is what Vespa provides in a neatly-packaged and easy to use engine.

    With over 1 billion users, we currently use Vespa across many different Oath brands – including Yahoo.com, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Gemini, Flickr, and others – to process and serve billions of daily requests over billions of documents while responding to search queries, making recommendations, and providing personalized content and advertisements, to name just a few use cases. In fact, Vespa processes and serves content and ads almost 90,000 times every second with latencies in the tens of milliseconds. On Flickr alone, Vespa performs keyword and image searches on the scale of a few hundred queries per second on tens of billions of images. Additionally, Vespa makes direct contributions to our company's revenue stream by serving over 3 billion native ad requests per day via Yahoo Gemini, at a peak of 140k requests per second (per Oath internal data).

    With Vespa, our teams build applications that:

    Select content items using SQL-like queries and text search Organize all matches to generate data-driven pages Rank matches by handwritten or machine-learned relevance models Serve results with response times in the lows milliseconds Write data in real-time, thousands of times per second per node Grow, shrink, and re-configure clusters while serving and writing data To achieve both speed and scale, Vespa distributes data and computation over many machines without any single master as a bottleneck. Where conventional applications work by pulling data into a stateless tier for processing, Vespa instead pushes computations to the data. This involves managing clusters of nodes with background redistribution of data in case of machine failures or the addition of new capacity, implementing distributed low latency query and processing algorithms, handling distributed data consistency, and a lot more. It's a ton of hard work!

    As the team behind Vespa, we have been working on developing search and serving capabilities ever since building alltheweb.com, which was later acquired by Yahoo. Over the last couple of years we have rewritten most of the engine from scratch to incorporate our experience onto a modern technology stack. Vespa is larger in scope and lines of code than any open source project we've ever released. Now that this has been battle-proven on Yahoo's largest and most critical systems, we are pleased to release it to the world.

    Vespa gives application developers the ability to feed data and models of any size to the serving system and make the final computations at request time. This often produces a better user experience at lower cost (for buying and running hardware) and complexity compared to pre-computing answers to requests. Furthermore it allows developers to work in a more interactive way where they navigate and interact with complex calculations in real time, rather than having to start offline jobs and check the results later.

    Vespa can be run on premises or in the cloud. We provide both Docker images and rpm packages for Vespa, as well as guides for running them both on your own laptop or as an AWS cluster.

    We'll follow up this initial announcement with a series of posts on our blog showing how to build a real-world application with Vespa, but you can get started right now by following the getting started guide in our comprehensive documentation.

    Managing distributed systems is not easy. We have worked hard to make it easy to develop and operate applications on Vespa so that you can focus on creating features that make use of the ability to compute over large datasets in real time, rather than the details of managing clusters and data. You should be able to get an application up and running in less than ten minutes by following the documentation.

    We can't wait to see what you build with it!

  • 2018 香港加息周期 起點

    美國聯邦儲備局剛宣布,將於10月開始縮減資產負債表(俗稱「縮表」),標誌過去9年投放市場的流動資金陸續回籠,環球市場將恢復2008年環球金融海嘯前的常態。為此,香港官僚已急不及待,再三警告須提防走資潮,港元利率飆升,影響百業民生;外滙基金更坐言起行,搶先增發票據,再度抽資,推高隔夜和短期同業拆息。不過,港元奉行貨幣局聯繫滙率制度,美元港元利息理應同步同軌,況且熱錢往往跑快一步,走資加息的論調,既費解,也困惑。

    須了解熱錢動向

    若要理解為何熱錢撤退,須先理解熱錢為何湧入。2008年環球金融海嘯爆發,存貸各不相讓,慎防有失,資金市場癱瘓,美國聯儲局率先推行量化寬鬆措施,再三購入金融債務,重新啟動市場流動性,恢復固有交易秩序。

    所謂量化寬鬆,其實是以中央銀行信用取代原來的債務信用,拆解三角債死結,令市場鬆綁。聯儲局資產負債同步增加,資產是所購入債務,負債是銀行清算結餘,屬於存款儲備;銀行儲備上升,貸款及存款循環擴張,資產及負債也同步增加,為經濟注入新血,恢復元氣。

    另一方面,聯儲局也降低利息至零作配合,減低官民舉債負擔,加速鬆綁效應。多年來市場已消化銀根寬鬆和零息的新形勢,現時聯儲局確認終止寬鬆貨幣措施,並啟動收縮資產負債,配合利率正常化,市場也須重新適應。早前利率開始提升後,「縮表」是預期之內,但撥亂反正不能一蹴而就。

    量化寬鬆下,美元泛濫,找尋機會投機投資增值,港元首當其衝則有多個因素所致。其一、港元掛鈎美元,幾乎全無外滙風險;其二、港元利率跟隨美元下調,推高實質及金融資產價格,等於資產通脹;其三、人民幣滙率改革後連年升值,香港離岸市場暢旺,吸引外資借道港元炒賣尋租。據官方估計,2008年尾季至2009年底的15個月,流入熱錢共達6400億港元等值之多。

    熱錢流入,香港銀行海外美元存款(資產)與客戶港元存款(負債)同步增加,清算結餘對活期存款(支票及儲蓄)比率相應下降,須向外滙基金兌換港元補充結餘。

    在客觀供求規律下,港元滙率高出官價,最終觸及外滙基金強方保證,而港元美元息差也拉闊,以維持固定滙率。

    翻查紀錄,2008年10月量化寬鬆啟動後,港元滙率從官價7.8升至7.52;銀行清算比率跳升至10%,比前高十倍;港元同業拆息(HIBOR)隔夜及一周跌至1厘之下,年底更跌至0.5厘之下;而一個月拆息,年底亦跌破0.5厘;銀行存貸利率牌價也相應下調。

    港元利率結構受扭曲

    事後分析,熱錢湧入有三個特徵:其一、港元滙率驟升至強方保證(7.75)價位;其二,港元同業市場流動性驟增,銀行清算結餘比率跳升;其三、港元利率驟降,與基礎利率(即貼現窗利率)同步,而基礎利率是美聯邦基金利率(即同業拆息)目標價加0.5厘。

    聯儲局結束量化寬鬆措施,啟動資金回籠,熱錢回流也有3個特徵:其一、港元回弱至官價(7.8)水平;其二、港元同業市場流動性減低,銀行清算結餘比率回落至正常水平;其三、港元利率回升,與基礎利率同步。若對照近來市情,兩個特徵已呈現。港元滙率已回落至官價,銀行清算結餘比率亦回落至6%。

    2008年環球金融海嘯平息後,國際間加強監管善後,提高銀行儲備及流動資產標準,清算結餘比率業難以返回從前之1%水平,5%可能是新常態。熱錢是「聰明錢」,往往早着先鞭,聯儲局縮減資產負債既是定局,提早撤資快人一步並非意料之外。不過,港元存貸利率仍未正常化,令人困惑熱錢是否撤離?若否,為何港元偏弱,而清算結餘也回落?

    其實,港元利率結構過去9年被扭曲,最明顯是最優惠利率(BLR)從未隨基礎利率起跌,而儲蓄利率近零,似有若無。最優惠利率向來與儲蓄利率同步,因為儲蓄向來是最穩定的散戶資金來源,而散戶信貸是以最優惠利率計息。港元利率協定2001年全面撤銷,但兩者關係實際仍密切。

    統計自1971年起至2001年止30年,最優惠利率平均約儲蓄的2.5倍。環球海嘯前最優惠利率是5厘,儲蓄利率應是2厘,而基礎利率是3.5厘;現今基礎利率是1.5厘,若以該比率作準則,推算儲蓄利率應為0.85厘(=1.5*﹝2/3.5﹞),最優惠利率應為2.125厘左右。且參照按揭利率印證,無論以「HIBOR加」或「BLR減」報價,皆是2厘多水平,與推算吻合。若撇開利率扭曲及後遺症等影響,港元同業市場拆息實際漸趨常態,所謂港元有待加息論調,乃以偏概全。因此3個撤資特徵應皆齊備,合理推斷是投機熱錢洞悉先機,應已獲利撤退。

    發債抽資換湯不換藥

    不過,上述分析仍有不足之處。所有進出資本無論投資或投機,最終「落戶」銀行資產中「貨幣金融」賬目,即是清算結餘、外滙基金票據、鈔票發行儲備。故此整體增減足以顯示資本性流向,而短期驟變更能反映熱錢進出。銀行業務統計並無區分本土及離岸, 港元存款轉換外幣存款是從本土賬轉移至離岸賬,理論上乃資本外流,反之是資本回流,而實際卻無資金進出。

    若離岸外幣存款起跌大,或影響清算結餘增減, 造成熱錢進退假象。其實,離岸外幣(包括人民幣)存款增減也影響港元滙率,因為買賣須經美元作媒介, 若存款擴張令美元求大於供,港元偏軟;反之存款收縮則美元供大於求,港元偏強。故此單憑港元偏強偏軟或同業拆息上落難下定論以判斷資本性流向,須綜合多個指標互相印證,始可窺全豹。

    美國啟動利率及流動性正常化多時,香港銀行卻遲遲未調整存息貸息牌價,歸根究柢是9年來從未隨市場走勢調整,何來加息空間?利率協議早已撤銷, 卻疑似名亡實存,實匪夷所思。不過,美國金融貨幣正常化指日可待,香港也解困有期,不必逆周期操作揠苗助長。

    熱錢特色是機動性強、觸覺性高,豈會落後於形勢?對照最新金融貨幣指標,切合撤資特徵。外滙基金卻急不及待治標,前後兩度增發票據抽資,似乎另有盤算。其實,9年來香港銀行謹慎有餘,2008年9月底儲備比率是37%,港元貸存比率是80%;今年6月底儲備比提高至43%,港元貸存比率下跌至68%。

    發債抽資是換湯不換藥,港元銀根並無收縮, 實際無助應對熱錢撤資,也無助港元利率正常化,反而發放收緊金融貨幣訊息,令銀行取態更謹慎,面臨加息周期,對經濟百弊而無一利。

    鄭宏泰為香港中文大學香港亞太研究所助理所長;陸觀豪為退休銀行家、香港中文大學香港亞太研究所名譽研究員