Member since 2017-07-15T03:50:57Z. Last seen 2025-04-09T16:00:01Z.
2754 blog posts. 128 comments.
名采論壇
前行會成員林奮強接受電視台訪問時公然建議香港後生一代離開香港,去越南、印度、墨西哥等地方尋找生活,他認為香港已經「唔係一個會令人開心的地方」,年輕人努力工作,不可能安居樂業,只有去比香港落後的國家與地區,才可以靠雙手養妻活兒。 這是極殘酷的現象,一代人,甚至不止一代人,在自己出生的地方,無論怎樣勤奮工作,都不能依靠個人努力謀求生存。而這個地方,明明每年被評選為亞洲前五大最宜居城巿。那麼,這個城巿究竟適宜甚麼樣的人居住?若本地人在沒有援助之下、依靠個人無法謀求生存,又如何會是宜居城巿? 當一個地方貧富懸殊越來越大,年輕人在這個地方看不到希望、找不到方向,這個地方的政策制訂者、決策者、實施者是有責任反躬自省的。奇怪的是,特區許多高官、前高官往往只是若無其事聳聳肩說,沒能力在此地生活?很簡單,去別的地方好了。而他們建議的別的地方,通常是遠較香港落後的地方。也就是說,變相承認港人治港能力不足,連第三世界領袖及官員的能力都不如。 更奇怪的是,世上有哪個國家或地區的地方官員會公開建議民眾棄國而去?如果愛國的前提是珍惜自己的國人身份,那,這些高官或前高官豈非公然不愛國?
觸發事件是一個大鉤子
觀眾迫不及待想知道劇情重要疑問的答案,因此直到最後一幕的高潮出現時都不會分心。
觸發事件愈快出現愈好,但一定要等到時機成熟······。
每一個故事設定的世界和人物都不同,因此,每一個觸發事件也各不相同,安排出現的時機也不同。
太早出現,可能會讓觀眾迷惑;出現得太晚,或許會讓觀眾慼到無聊。
他的極致簡約,讓他的劇作凌駕其他人之上。他幾乎什麼都不說。
2017-08-31 | Pramod HS
Ryan Dahl is a Software Engineer working at Google Brain. He is the creator of Node.js, JavaScript runtime built on Chrome's V8 JavaScript engine. Currently, he is working on deep learning research projects. His focus is mostly on image-to-image transformations like colorization and super resolution. He has contributed to several open source projects including HTTP Parser, libuv.
Pramod: Hello People. Welcome to mapping the journey. When we hear about Node.js, it's Ryan Dahl. He showed us that we are doing I/O completely wrong and also taught us how to build software using pure async programming model. Today's guest is the man himself Ryan Dahl, hacker, brilliant programmer and creator of Node. I am very excited and honored to have you on the show. Welcome Ryan
Ryan: Hello! Nice to be there... here.
Pramod: Ryan we know you as Creator of Node, tell us about your life before Tech?
Ryan: Sure. I grew up in San Diego, my mom had got an Apple 2C when I was six years old, so I guess I've had kind of early access to computers. I'm 36, by the way. So, I kind of came of age just as the internet was coming out. I went to community college in San Diego and then went to UCSD afterward, where I studied math. Then, I went to grad school for Math, in the University of Rochester. Yeah. There, I studied algebraic topology, which was kind of a very abstract subject, that I found very beautiful for a couple of years, but later I got bored of it because it was not so applicable to real life, it seemed. After grad school, well... so, that was a PhD program, and once I realized that I was not wanting to be a mathematician for the rest of my life, I dropped out of that program, and bought a one-way ticket to South America and went there for a year, where I was kind of in starving student mode, and found a job doing some web sites with this guy, Eric. And that's kind of how my programming career started. It was working on the Ruby on Rails website for a snowboard company.
Pramod: Nice! That must be an experience right to drop out of the Ph.D. program, travel to South America and working as a Web Dev.
Ryan: Yeah. I mean, so... coming from grad school, you're used to dealing with very abstract problems, and working on websites was a very concrete process. But I was really trying to make it into... trying to make it into a beautiful mathematical theory like I was exposed to at grad school. And I think that got me thinking about... I guess I really liked how Ruby made development so much more, I guess, you could express your ideas more clearly in Ruby. And that was really interesting at the time. And I think Rails was really interesting in that. It gave this new structure, and probably, it wasn't totally new, but I think Rails popularized the structure of model view controller. And you know, those two things combined together, really was pretty interesting to me.
Pramod: Yes, building web applications, is very interesting. And Ruby is a perfect tool. Next, you continued working as a freelance web developer in Germany. One of the projects you worked on was Node. And I think you continued working on it for the next six to eight months?
Ryan: Right. So, after South America, I moved with my girlfriend to Germany, because she was German and had to return to university. And I started going to Ruby conferences there, where people were talking about this new paradigm of model view controller. And one of the guys there was Chris Neukirchen, if I'm pronouncing that correctly. And he developed this project called Rack, which was a simplified abstraction of a web server. So, it really turned a web server into a single-function interface, where you get a request, and then you return a response. That, combined with some freelance work that I was doing on Nginx Module, for Engineyard, got me thinking about how... let me step back a second. In Nginx, everything is asynchronous. So, when you build a module for it, you have to be very careful to be non-blocking. And yeah, I think the combination of Chris Neukirchen’s rack plus how Nginx structured its web server with non-blocking IO, led me to start thinking about how those two things could be combined.
Pramod: Now you sort of had the idea with Rack and Nginx. "How did you convince yourself, okay I will spend my next 6 months building framework that can run Javascript on the server side, which may increase the performance a great deal?
Ryan: So, those two pieces that kind of simplified web server interface, which was Rack, and the asynchronous part, which was Nginx, I had been thinking about. And then Chrome was released in December 2008. And along with that was the V8 Javascript interpreter. I shouldn't say, interpreter. It's a jitted run-time. So, when V8 came out, I started poking around with it, and it looked really interesting and clean, and fast, and suddenly, I clicked that: oh! Javascript is actually single-threaded, and everybody is already doing non-blocking. I'm using my fingers to do air quotes, but like in the web browser, people are already doing non-blocking requests when they make AJAX request and stuff. And I thought: oh, wow! I think JavaScript plus asynchronous IO plus some HTTP server stuff would actually be a really cool thing. And I was so excited about that idea that I just worked on it non-stop for the next four years.
Pramod: Yes Javascript plus async i/o worked really well. I believe developers were waiting to see a framework that did that. And Just curious, during this time, was there any mentor or did you ever consult with anyone"? or Was it just you?
Ryan: It was basically just me. I had some programming friends who gave advice, and definitely... I mean, the first bit of it was really just me in my room. But later on, I ended up moving to San Francisco and working at Joyent and meeting lots of really great programming professionals. And yeah, many people mentored and gave ideas that contributed to Node after that.
Pramod: Nice. Take us through the journey you went through with the development of Node. I know it's a long time Ryan since you created Node around 2009.
Ryan: I think at least for myself, there's no greater moment in my life than when I'm like, in the flow, and having an idea that I believe in. And have the time to sit down and really work on it. And I think Node was an idea waiting to happen and had I not done it, somebody else would have. But it just so happened that I was relatively unemployed and had some free time, and could work on it non-stop for months, which is what was required to kind of get an initial product out. So yeah, it was great, it was fun.
Pramod: Great. That's fantastic. You did it really well. Node is built on the idea of "pure async" programming model. How did this idea work out for node?
Ryan: Yeah, I think that's a really interesting question. Now, it's been several years, and I haven't worked on Node myself since like, 2012, or 2013. And Node, of course, is a big project at this point. So, yeah, I think... when it first came out, I went around and gave a bunch of talks, trying to convince people that they should... that maybe we were doing I/O wrong and that maybe if we did everything in a non-blocking way, that we would solve a lot of the difficulties with programming. Like, maybe we could forget about threads entirely and only use process abstractions and serialized communications. But within a single process we could handle many, many requests by being completely asynchronous. I believed strongly in this idea at the time, but over the past couple of years, I think that's probably not the end-all and be-all idea for programming. In particular, when Go came out. Well, I think Go came out a long time ago, but when I first started hearing about Go, which was around 2012, they actually had a very nice runtime that had proper green threads and really easy to use abstractions around that, that I think makes blocking I/O - again, blocking I/O in quotes, because it's all in green threads at the interface of... between Go and the operating system, I think it is actually all non-blocking I/O. But the interface that they present to the user is blocking, and I think that that's a nicer programming model, actually. And you can think through what you're doing in many situations more easily if it's blocking. You know, if you have a bunch of serial actions, it's nice to be able to say: do thing A, wait for a response, maybe error out. Do thing B, wait for a response, error out. And in Node, that's more difficult, because you have to jump into another function call.
Pramod: Yeah, I like the programming model of Go. Using goroutines is so much easy and fun. In fact, we are using at work for building a distributed application.
Ryan: Yeah, I think it's... for a certain class of application, which is like, if you're building a server, I can't imagine using anything other than Go. That said, I think Node's non-blocking paradigm worked out really well for Java Script, where you don't have threads. And I think that a lot of the problems with kind of the call-back soup problem, where you have to jump into many anonymous functions in order to complete what you're doing has been alleviated these days, with the async keyword, the async feature that's in Javascript now. So, kind of the newer versions of Javascript have made this easier. That said, I think Node is not the best system to build a massive server web. I would definitely use Go for that. And honestly, that's basically the reason why I left Node. It was the realization that: oh, actually, this is not the best server side system ever. Yeah. I think where Node has really shined is, weirdly, on the client side. So, doing kind of scripting around building websites. So, browser FI, for example. Kind of bundles up client-side Javascript. So, you can have all this server-side processing of client-side Javascript. And then, you know, maybe little servers to... maybe little development servers, and here and there, maybe some real servers serving live traffic. Node can be useful, or it can be the right choice for it. But if you're building a massively distributed DNS server, I would not choose Node.
Pramod: This should be a good takeaway for all the developers around the world. Choosing a right tool for application is so much important. You are not biased at all with Node. You introduced Node.js to the world in JsConf 2009 Berlin. Were you surprised with the success and traction it suddenly received?
Ryan: Yeah. I mean, I was basically in a continual state of surprise for four years. Because it grew very fast, and people liked it very much. So yeah, definitely.
Pramod: There after you joined Joyant and worked on Node full time and you moved to SF right? How was the experience? Developers loved it and You were the center of it all.
Ryan: Definitely, it was an experience of a lifetime, and I definitely felt in the center of it all, being at conferences and whatnot. At one point, I went to Japan, and people were asking to take their photo with me, and I realized... I don't know, I felt kind of weird about that. Also online, I think around that time, I felt like whenever I commented on something, I would get like, 100 responses from people. And so, I found that I had to choose my words very carefully and how I presented myself because it seems like people were really listening, which was strange. And I didn't like that aspect of it. I mean, I'm a programmer, and I wanna write code and sometimes share my opinion without thinking too carefully about it. And so, I think I'm not one to... yeah. I didn't enjoy that aspect of it so much.
Pramod: You were what, 29, 30, when you introduced Node? And Node made such a massive impact.
Ryan: Yeah. I mean, I was definitely a more of a novice developer at that point.
Pramod: Ok. Ryan, there were many server side javascript projects at the same time. Node was not the only one. What do you attribute the success of Node?
Ryan: Right. So, there were several people, that were trying to get the server side Java Script thing going. I can't even enumerate them anymore, I totally forgot what they are. Well, whatever. The thing is that they were all doing blocking I/O, and that really didn’t jive with how Javascript is structured because you don't have threads. And so, if you're doing blocking I/O, you literally can't serve requests. Like, you're doing one at a time, and that's just never going to work. That, plus the fact that I like, sat down and made the HTTP server work well. So, I had a demo where you could... I had an HTTP server, and then a raw TCP server. And I made those things work well so that people could kind of sit down and build a website without going through too much hassle. And honestly, building a web server is not the easiest thing, and I think a lot of these systems kind of left to their community to build that, and thus, nobody did. Because there is nothing to use the system with. I think it's important that when you release a software framework, or maybe any sort of software, that you have a demo that people can sit down and use immediately. And that was one of the things that went right with Node, it was that people could download it and use the web server immediately.
Pramod: Yes. Good Demos & if people could download, install and use it easily makes a big difference. Also, people knew javascript, they could start coding in no time. When I started working on node, it was that much more easy as I knew javascript well.
Ryan: Yeah. I think we take for granted how easy it is to switch between languages. I mean, even if you know another language, kind of making that context, which is pretty difficult. And there's a lot of people who are very familiar with Javascript. And kind of giving them these tools to be able to use it in other contexts excites people. You suddenly are able to do it a lot more than you were able to do before.
Pramod: Yes. In 2012 node already had a huge developer base. Why did you step away from, turning over the reins to Joyent’s Isaac Schlueter?
Ryan: Yeah. Right. So, I mean, I think it was kind of a combination of a couple of things. So, I think that the main thing was that, at that point, I had been working on Node for four years. And I kind of had gotten to the point where I wanted it. I never wanted Node to be a really massive API. I wanted it to be this kind of small, compact, core that people could build modules on top of. And there was a couple of key things that... key features that I wanted to support. So, extension modules a]was added early on, we kind of got all the networking libraries worked out, HTTP, UDP, TCP, we could access all the file systems. And then, kind of a big chunk, which was maybe a year of work with like, five people, was putting it to Windows and doing that properly. And that we wanted to use Windows abstractions for asynchronous IO, which is their IO completion ports. And so, that required rewriting the core library, which, the result of that was the libuv library. Yeah, but at some point, all of that was done, and we had released on Windows. And you know, it was kind of at the point where it's like: ok. I mean, this is what I had intended to create, and I'm very happy I got the chance to kind of follow through with it. And of course, there's going to be, you know, an infinite amount of bugs to fix for the rest of my life, but... you know, there are enough people involved to that. I don't need to necessarily do that, and I would like to go do other things. So, that plus the fact that Go came out, and I didn't see Node as being the ultimate solution to servers. And then, yeah, also just not wanting to be at the center of attention whenever I made a blog post.
Pramod: Nice. Yes, Some people do not enjoy being in the limelight. When you started working on Node, you definitely had some goals. How is Node.js today stacked up against it?
Ryan: I mean... Node is used by hundreds of thousands, if not millions of people at this point, and I think it's certainly exceeded any expectation of what I thought would happen with it. Yeah, it's cool.
Pramod: Ryan after your wonderful journey with Node, what did you decide to work on?
Ryan: So, after Node, I moved... after I left Joyent and quit the Node project, I moved to New York, and took some time off to work on my own projects. So, I had a bunch of projects. You know, at the time, Instagram had come out and it was new, and it seemed really simple, and everybody was saying: wow, that's so simple, I could have built that. And I couldn't help but think the same thing. So, I had a social network project, I had a build system project for C++, I had another build system project for HTML, which was kind of like, browser FI, which would kind of package up your Javascript and HTML, but in a smarter way. Yeah, I had a bunch of projects, none of which panned out, really, in my mind. Although I think some of them are currently still on the backburner, like my social network project, which I will get back to at some point. Yeah, I was doing that for a while. And then I started reading about... well, I started hearing about convolutional networks and how image classification had been solved, and that got me really interested in machine learning.
Pramod: Also you were part of Google Brain's Residency program. How was that experience?
Ryan: Yeah. So, I just spent a year out in Mountain View. So, stepping back a second. So, TensorFlow has released two years ago now. And with it, they announced this Google Brain residency program, where they invite like, 20 people to come to Google Brain, which is one of Google's machine learning research labs. And people... I think the idea with it was not necessarily people who had really studied machine learning but had some background in Math and programming and were interested in machine learning to like, come and kind of play around with these new ideas. Because machine learning is changing really fast and there is a large body of work that has been done, but now that the community has kind of narrowed in on neural networks as being the most useful algorithm for machine learning, that maybe just bringing in a bunch of people and just playing around with that, and this new ML framework, called TensorFlow, would result in some interesting ideas. So yeah, I spent a year there, basically programming models and writing papers about those models. I worked on mostly image to image transformation problems. So, you know, if you have some input image and you want to predict some output image. I find this problem really interesting, because, for example... let me give some examples. So, the problem of colorization. You can take a black and white photo as input and you can try to predict the colors of the photo as an output. What's cool about this problem is that there's infinite training data. You can take any color photo and de-saturate it, and then that's your input image, right? So, one of the problems of machine learning is that you need lots of data, and with these sort of tasks, that problem is not a problem. And also, there's been a lot of work in generative models recently, that is models that output images. In particular, there's been generative adversarial networks, and pixel CNN, which have demonstrated the ability to kind of learn the manifold of natural images, that is like, really understand what is a real image and what is not a real image, what looks like a real image. So yeah, my idea was to kind of take this recent work in generative models and take this infinite training data and see if we can do some image transformation problems. So, I did some work on super-resolution, which is taking a low-resolution image and increasing the resolution. That's also an image to image transformation problem. And I've done two projects now on colorization.
Pramod: Nice explanation Ryan. Yes, I have read that tensor-flow has been a great platform for many machine learning problems. Image classification, transformation, I really don't get it much but I believe it's very interesting. Are you continuing your work with ML?
Ryan: Right. So now, I'm still at Google, as a software engineer, working on the same sort of problems. Studying generative models and trying to help the researchers build the next generation systems, next generation models.
Pramod: Nice Generative models that's so much different than javascript, node or web-development work you did before.
Ryan: Yeah, I guess so. But I also started in Math, so I have a fairly decent foundation of math knowledge, I guess. And yeah, I guess... I think people like to kind of bend other people into certain areas, and I don't really wanna do that. I don't wanna be a Javascript person, and I don't wanna be a machine learning person. You know, I think people... it's interesting to just explore what's possible. What's exciting is building something new that hasn't been done before that could benefit humanity in some way.
Pramod: Nice. Yes, nice to know that Machine learning requires a good math background. In one of your recent blogs on Optimistic Nihilism, you say that we would be able to someday emulate brains and build a machine that understands and thinks like humans do. How far are we in achieving that?
Ryan: Yeah. I have to be a bit careful about kind of prophesying... I mean, this is really my opinion. We are nowhere near matching human intelligence. I mean, the machine learning systems that we're using are very, very simplistic, and basic don't work at all. In fact, I have a blog post about my residency, in which I kind of enumerate all the difficulties there are with developing these models. I think people who don't work in the field have this idea that you can kind of take this model and push some data through it, and it's just going to work. But that's really not the case. These things are very finicky and not well understood, and it takes many, many months of careful tweaking and experiments to get even the most modest results. So, we are nowhere near it, but that said, I think the foundation... there's really some promising technology that has been developed recently, which is namely that convolutional networks seem to work, and that propagation seems to work. And the fact that these things are based on a model, this neural network model, that is not really brain-like, but it is somehow inspired by the brain, is pretty enticing. We also have GPUs, and we showed how we can train on these and distribute training across GPUs to some extent. So, I think the kind of... the foundations of building bigger, smarter systems is happening. And personally, I'm an atheist, and I believe that there's nothing more in my brain other than the chemicals, and neurons that are the matter of my brain. And I think that my consciousness, all of our consciousnesses are encoded, somehow, in interactions between those neurons. So, I don't see why we wouldn't be able to, someday, with enough research and work in this field, emulate that sort of behavior. It's too far out to predict how long that would be.
Pramod: Great. You have seen it all Ryan and Where do you want to see tech in next 20 years?
Ryan: I mean, I am very excited about machine learning and the possibilities that it brings. I think that even before, like, way before we get to real artificial intelligences, that there's many applications of this technology. I mean, basically, any system where you could... where a smart gas would help you, is going to benefit enormously from this technology. So, there is just uncountable industrial processes that could benefit from this sort of thing, you know. I think recycling centers, with sorting... sorting recycling with computer vision, and what not. I mean, there's just many, many systems that could benefit from simple machine learning systems. And I think that we're going to, more and more, see these systems get applied to different processes. So, I think that's going to affect the technology sector greatly, and all of humanity greatly.
Pramod: Yes machine learning is very exciting. I get so much excited when I see autonomous cars in Mountain View. Someday I would like to sit back and give complete control. Thank you, Ryan, for providing us with this nice framework Node and thanks for being on the show. Also good luck with your future projects. It was wonderful speaking to you
Ryan: Yeah, great. Thanks for having me, it's fun to talk about it.
Pramod: Thank you. That is it, listeners. I really enjoyed speaking to Ryan, such a humble and awesome guy. He has achieved so much in his early years in tech. Such an inspiring story. Bye for now, I will meet you all in 2 weeks time with another interesting journey. Shukriya.
科技發展日新月異,現有的知識和技能轉眼變得落伍。近年不時有公司裁員,政府應幫本地員工提升科技技能,讓他們有機會加入快速發展的數碼經濟,幫助市民掌握新科技,在人工智能、數據經濟之中自我提升,免受淘汰;同時,政府應加強培訓、吸引和挽留科技專才,確保發展創新和科技有足夠科技人才。
LinkedIn的全球十大最渴求的技能榜上,絕大部分與IT相關,包括雲端運算、數據分析、手機程式開發、資訊保安等。國際間最受進修人士歡迎的ICT MOOC課程,現時均不為持續進修基金認可。
改革進修資助
雖然政府於2002年推出持續進修基金,但課程有限、手續繁複,申請人數因而連年下降,香港人持續進修比率低於國際水平。有調查訪問曾進修的青年,六成沒有申請持續進修基金,當中近半人表示修讀課程不屬資助範疇。持續進修基金的資助範疇過時僵化,未有反映互聯網經濟、ICT和數據經濟等最新機會,必須更新與時並進。
新加坡2016年推出SkillsFuture資助計劃,當中最多人修讀ICT相關課程,新興範疇如數據分析大受歡迎。該國自行開辦數據分析遙距網上課程(MOOC),亦邀請STEM教育平台推出認可課程;在資訊科技類別,更有不同程度課程,由網絡基建、項目管理、資訊保安、大數據等均有涵蓋,Udemy的網上自學課程亦適用。
ICT界人才供不應求,從業員期望能提升技能。筆者曾質詢創新及科技局有關靠政府資助修讀ICT課程的人數,並促請政府藉持續進修協助應付新世代職場需要的科技技能,幫助年輕人以ICT開拓發展空間。制訂全面的科技人才培育策略、加強人才及技能培訓制度建設、針對香港高端科技人才荒、研究如何培訓、加速創新及科技產業發展所需的勞動力,是政府責任,目標是要幫助青年向上流動。
跨界別合作培訓人才
金管局金融科技促進辦公室與應科院2016年底推出「金融科技人才培育計劃」,與銀行和大專院校攜手合作,協助業界培育新一代金融科技專才;該計劃反應熱烈,數碼港同樣有類似實習計劃,值得進一步推展至更多行業。
筆者建議政府鼓勵商界與科技公司、專業協會、高等院校合作提供課程、實習機會、獎學金和專業認證,培育軟件工程師、數據分析、資訊保安等科技專才,並資助本地IT畢業生和在職IT從業員修讀短期課程,掌握新的知識和技能,開發更多令香港市民受益的產品和服務。目前IT市場最搶手的人才之一是數據科學家,政府可考慮鼓勵傳統行業與科技業界合作,舉辦更多數據分析和應用的課程,並擴大數據科學相關學科的招生數目。
英國政府近年積極與IT界合作,吸納當地業界的意見,令電腦學科更為切合業界的需要,亦聯同業界推出認證課程,協助IT人學習新技能。紐約市政府於2014年推出Tech Talent Pipeline 計劃,與各級政府及私人公司投放近一億港元,與大型科企合作推出培訓課程,沒有IT背景者亦可報名參加。
筆者建議改革持續進修基金,政府與科技企業可合作提供本地化的科技MOOC課程,並推動市民以MOOC進修,重點增加更多資訊和通訊科技業、電腦科學、數據分析等資助課程,簡化報名和資助程序,協助在職IT人進修和在職人士進修ICT自我增值。
政府亦應主動招募本地和海外科技畢業生,同時加強內部技術培訓,協助政府部門利用科技改善內部運作和電子服務。最後,政府須要重新啟動資訊科技專業資歷認證,提升IT人專業地位,吸引人才入行。
莫乃光_立法會(資訊科技界)議員
Many of us fear death. We believe in death because we have been told we will die. We associate ourselves with the body, and we know that bodies die. But a new scientific theory suggests that death is not the terminal event we think.
One well-known aspect of quantum physics is that certain observations cannot be predicted absolutely. Instead, there is a range of possible observations each with a different probability. One mainstream explanation, the “many-worlds” interpretation, states that each of these possible observations corresponds to a different universe (the ‘multiverse’). A new scientific theory – called biocentrism – refines these ideas. There are an infinite number of universes, and everything that could possibly happen occurs in some universe. Death does not exist in any real sense in these scenarios. All possible universes exist simultaneously, regardless of what happens in any of them. Although individual bodies are destined to self-destruct, the alive feeling – the ‘Who am I?’- is just a 20-watt fountain of energy operating in the brain. But this energy doesn’t go away at death. One of the surest axioms of science is that energy never dies; it can neither be created nor destroyed. But does this energy transcend from one world to the other?
Consider an experiment that was recently published in the journal Science showing that scientists could retroactively change something that had happened in the past. Particles had to decide how to behave when they hit a beam splitter. Later on, the experimenter could turn a second switch on or off. It turns out that what the observer decided at that point, determined what the particle did in the past. Regardless of the choice you, the observer, make, it is you who will experience the outcomes that will result. The linkages between these various histories and universes transcend our ordinary classical ideas of space and time. Think of the 20-watts of energy as simply holo-projecting either this or that result onto a screen. Whether you turn the second beam splitter on or off, it’s still the same battery or agent responsible for the projection.
According to Biocentrism, space and time are not the hard objects we think. Wave your hand through the air – if you take everything away, what’s left? Nothing. The same thing applies for time. You can’t see anything through the bone that surrounds your brain. Everything you see and experience right now is a whirl of information occurring in your mind. Space and time are simply the tools for putting everything together.
Death does not exist in a timeless, spaceless world. In the end, even Einstein admitted, “Now Besso” (an old friend) “has departed from this strange world a little ahead of me. That means nothing. People like us…know that the distinction between past, present, and future is only a stubbornly persistent illusion.” Immortality doesn’t mean a perpetual existence in time without end, but rather resides outside of time altogether.
This was clear with the death of my sister Christine. After viewing her body at the hospital, I went out to speak with family members. Christine’s husband – Ed – started to sob uncontrollably. For a few moments I felt like I was transcending the provincialism of time. I thought about the 20-watts of energy, and about experiments that show a single particle can pass through two holes at the same time. I could not dismiss the conclusion: Christine was both alive and dead, outside of time.
Christine had had a hard life. She had finally found a man that she loved very much. My younger sister couldn’t make it to her wedding because she had a card game that had been scheduled for several weeks. My mother also couldn’t make the wedding due to an important engagement she had at the Elks Club. The wedding was one of the most important days in Christine’s life. Since no one else from our side of the family showed, Christine asked me to walk her down the aisle to give her away.
Soon after the wedding, Christine and Ed were driving to the dream house they had just bought when their car hit a patch of black ice. She was thrown from the car and landed in a banking of snow.
“Ed,” she said “I can’t feel my leg.”
She never knew that her liver had been ripped in half and blood was rushing into her peritoneum.
After the death of his son, Emerson wrote “Our life is not so much threatened as our perception. I grieve that grief can teach me nothing, nor carry me one step into real nature.”
Whether it’s flipping the switch for the Science experiment, or turning the driving wheel ever so slightly this way or that way on black-ice, it’s the 20-watts of energy that will experience the result. In some cases the car will swerve off the road, but in other cases the car will continue on its way to my sister’s dream house.
Christine had recently lost 100 pounds, and Ed had bought her a surprise pair of diamond earrings. It’s going to be hard to wait, but I know Christine is going to look fabulous in them the next time I see her.
“Biocentrism” and “Beyond Biocentrism” (BenBella Books) lay out Lanza’s theory of everything.
Read more at http://www.robertlanza.com/does-death-exist-new-theory-says-no-2/#f1EsplLp3CtfqOu1.99
http://www.nextscientist.com/phd-productivity-strategy-deep-work/
The worst PhD productivity advice is to work long hours.
This advice is shared by successful people in Academia, so it should be good advice, shouldn’t it?
They think they became successful thanks to hard work. I say they became successful despite their hard work.
This people propose a brute force approach to PhD productivity.
Brute force productivity is a waste of resources, in this case your time, health and happiness.
Why should you work in your PhD 24/7 till exhaustion if there is a more effective and efficient way?
I want to present you a PhD productivity strategy that is the secret weapon to eliminate working crazy hours during your PhD.
Secret PhD productivity strategy: do deep work in less hours
Let me first tell you something that happened to me.
When I was one year into my PhD, my colleague who just started was leaving the lab at 5pm.
He walked past the fume hood where I was pipetting my hand numb and asked if I’ll be staying late.
I told him that I had at least several more hours to go, to which he said: ‘You’re all working so hard and I get to leave already, not feeling like a real PhD yet’.
There was a mix of regret and guilt in his voice.
I told him, ‘wait for it, you’ll be staying late soon enough’.
It was meant as a joke… but was it really?
See if any of the below sound familiar:
‘To be successful in your PhD you have to do over-hours.’ ‘It’s normal to work through the evenings and weekends.’ ‘Reading articles and writing is something you do in your free time.’ These are the unspoken PhD rules.
Whether this is the mindset with which you enter your PhD or something that grows on you while you’re doing the academic grind, you learn to accept it.
We start our PhD as an undefined trajectory of four plus years at the end of which you get to shape the generated data into a thesis (if you’re lucky).
In the beginning of our PhD we’re fully dependent on our supervisors for guidance.
Unfortunately supervisors don’t always have our best interests in mind. This is why there are so many ugly PhD stories featuring years of hard work with nothing to show for it.
Maybe you’re expecting me to share a success story in which I graced over my PhD, not working long hours and had my defense exactly on the four year mark.
Sorry, I’ll have to disappoint you.
All of my prior working experience was in the academic environment where I was surrounded by PhDs and Postdocs who were working crazy long hours and wearing it as a badge of honor.
So I followed their lead.
In the end, I had weeks of unused holidays. Moreover, I still worked on writing my thesis for over a year after my deadline while starting on my new job.
The worst of it is that I know that I’m not an exception.
Do these overworked days mean that four years is too short for a PhD?
Would we benefit from more time?
Or are there other factors that we’re not aware of?
The Bests PhD Productivity Strategy
Enter Deep Work …
When I came across Cal Newport’s book ‘Deep work’ I was in shock.
I felt like he laid out the way of working that enables to use time most effectively in a setting that requires a lot of thinking, so perfect for the academia.
To give some background, Cal Newport is an Associate Professor of Computer Science, who works regular hours and yet manages to stay ahead of the game. After all, aside from driving his research and publishing consistently he also manages to write books and run a blog.
To put the practice of deep work into the PhD perspective, I’ve made a list of concepts discussed in the book that will help you to accelerate your PhD without eating away at your free time and costing you stress.
Deep work, what’s new?
According to Cal Newport, deep work is ‘Distraction-free concentration that pushes your cognitive capacities to their limit’.
Sounds good you’ll say, but what’s new about this? We’re in academia after all, isn’t this the definition of deep work?
True, academia requires more deep work than an average job and in general academic setting also aids the practice of deep work. As a PhD your day won’t be broken up by many meetings and replying to constant flow of emails doesn’t diminish your attention span.
So aren’t you already ahead of the game?
To answer this let me set up a scene.
The deadline of submitting an article, preparing presentation for a congress or something else significant that has been luring in the distance, has finally caught up.
With your back up against the wall you end up working through the night armed with determination and sufficient coffee supply.
Have you ever wondered how productive these several hours under pressure turn out to be?
In many cases you’ll finish projects you couldn’t complete in weeks and outperform yourself. These kinds of spurs usually happen in sporadic manner fuelled by approaching deadlines that you can’t push back.
What if you could embed this kind of performance into your PhD on a more regular basis and achieve comparable results every week? And no, I don’t mean pulling weekly all-nighters with stress and caffeine overdose.
Instead, you can use the power of deep work to boost your PhD productivity.
How to apply deep work in your PhD: 8 PhD Productivity Strategies
So let’s take a look at how to master the skill of deep work and apply it to your PhD.
Deep work is a skill and it shouldn’t be confused with a habit.
For example, getting into the habit of flossing your teeth daily before bed is a matter of sticking to it as the action itself is not challenging and doesn’t require you to push your limits.
On the other hand, if you want to start doing a 10 km run every day without any prior preparation, your first attempt will probably not be very successful. You’ll find that you will need to slowly build up your condition until reaching your target.
In the case of deep work, you’re using your brain as a muscle and doing long stretches of uninterrupted deep work is like an intense work out requiring top condition.
You must train your deep work muscle to improve yourP PhD productivity
Our first deep work sessions may not be as productive as we would hope. The challenge is not to get demotivated and claim that this is not the right way for you, but instead consistently build up the skill until mastery.
For your PhD
PhD requires a lot of deep work. You do ambitious research and it is not clear how to succeed.
Think of reading scientific literature, analyzing complex data and writing articles. You’ll probably notice how this gets easier as your PhD time goes by.
We usually tend to account this to learning more about the field and overlook the fact of gaining more deep work experience the further we get into our PhD.
In other words, you’re unknowingly training your deep work skill due to the nature of the PhD work.
Just picture how good you’ll get doing it intentionally.
One of the practices to enhance the deep work skill mentioned in the book is ‘productive meditation’.
In productive meditation you use stretches of time when you’re physically occupied but not engaged mentally (e.g. jogging or commuting) to focus on solving a specific work-related problem which requires deep thinking.
Every time you catch your mind wandering off, bring it back to this single task.
#2 Establish routine and don’t rely on deadline pressure
Deep work means working to the limit of your cognitive capacity, which is uncomfortable just like forcing yourself to do the additional ten push-ups or run that extra mile.
Inner resistance kicks in and you need to break through the mental barriers to go through with it. The higher these barriers are, the more difficult it will be to overcome them and the more energy and mental capacity you’ll be wasting.
Setting up routine to get you into deep work helps. You must allocate several hours for deep work into your daily schedule.
PhD productivity: have a routine
Try using one day per week for deep work, or a whole week in a month. You’ll need to find what works best for you and your project.
For your PhD
“I write when I’m inspired, and I see to it that I’m inspired at nine o’clock every morning.”
William Faulkner
Why do we put off writing articles, reading literature and other tasks important for our end goal until last minute?
You hear people talk about being ‘effective under pressure’, giving this a positive twist when actually we’re just being shameless procrastinators.
We fill our days with urgent but shallow tasks and will only start acting on the difficult tasks when we’re cornered by deadlines.
Rather than leaving your deep work sessions to be determined by random deadlines, integrate them into your days.
If you leave your deep work sessions to chance, trying to skim in some intense thinking into the gaps of your agenda, you will not get far.
You know how when you’re starting to write an article and spend the next several hours fighting off the procrastination (checking your email, cleaning your lab bench, and updating your lab journal) only to start writing by the time you need to stop.
Commit to doing deep work at specific times of the day and in specific surroundings. It will be easier to follow through as you condition yourself to get into deep work and there is less space for mental sabotage.
You’ll also quickly see that getting writing done doesn’t require waiting for inspiration.
One of the core elements of deep work is ‘distraction free concentration’.
Distractions give you escape routes from focusing on the important task, diminishing the power of deep work.
If you are all the time distracted you will make little progress and your PhD motivation will go down.
It’s not a coincidence that the times you’re most productive are in the very late or early hours when you’re left to yourself.
PhD productivity: avoid distractions
For your PhD
Very few PhDs have the luxury of an individual office. Usually the working area is more like a can of sardines with PhDs almost sitting on each other’s lap.
Finding solitude to concentrate in such environment will be challenging but don’t let that stop you.
Figure out if there are times when the work area is empty. Early mornings tend to be an unpopular stretch of time.
Yes, get really early out of bed to work on your thesis. I know it’s not very exciting, but contemplate the benefits:
if you get working at that time, you’ll make sure that it was worth the effort. with no distractions it will be that much easier. You could also choose to practice deep work on an outside location, like the library or your home. They key is to be creative and find the option that works for you.
Once you get yourself to sit and fully focus, it helps to know what you are striving to achieve, so that your deep work is intentional.
You have to define the outcome you want to achieve and be specific if you want your deep work sessions to have the maximum effectiveness.
For your PhD
In the case of an approaching deadline you know exactly what your end goal is. This is why last minute work is so productive since you simply don’t have the time to side track.
To reproduce this outcome without the actual deadline stress, specifically define what you will spend your deep working session on prior to starting.
Remember that decisions and planning take up mental energy and you want to use it all on the actual thinking process.
Let’s go further with the analogy of deep work being an intensive workout.
You know that you want to be in good shape you need to take care of your diet.
For deep work you also need a ‘brain diet ’.
Just as we crave for a piece of cake our brain craves for novel stimuli.
Putting it bluntly, we don’t like being bored and in modern society we almost never have to be.
When was the last time that you were bored waiting in line?
With smartphones and Wi-Fi we rarely have those moments of downtime. Mindlessly scrolling through Facebook newsfeed, reading the recent tweets or checking out Instagram.
These are only few examples of the entertainment that lies readily in our palm.
As innocent as this may seem, these shallow habits directly deplete our capacity to practice deep work. Focusing intensely on a single task is per definition eliminating additional stimuli.
Think of it, if your brain is used to constant tickling of incoming information, it will scramble around looking for more entertainment refusing to stay focused on one task.
PhD productivity: information diet
For your PhD
Take care of your brain as athletes take care of their condition.
This holds for your free time as well as your working time.
The first step is to start noticing the bad habits.
For example, how often do you pick up your phone with no apparent reason?
If you catch yourself reaching for the phone too often try to have some periods of time without it. Same goes for other shallow but so addictive activities like web surfing, mindless TV watching, YouTube binging, etc…
This may come over as crazy talking, but you should limit working extra hours so you get regular rest from work.
You must work less to produce more.
It’s not easy because for this you’ll need to drop the belief that working extra-long directly translates into higher PhD productivity.
But it really doesn’t. How many productive hours do you really have in a day?
Not so so hours. No. Hours of work where you think “oh, if only I would always work like in this last hour”.
You can count them with one hand.
PhD productivity: work less
This is caused by the fact that at one point our mental energy gets depleted . If you stay longer in the lab or do some evening work at home, you’ll not be working at your maximum capacity.
Even worse, you’ll be compromising the productivity of the next day by not allowing your mental resources to replenish.
For your PhD
This really boils down to planning your days so that you don’t spend most of you waking hours at work. Don’t you want to have a good work-life balance?
Of course there will be exceptions like intensive experimental set-ups or article submission sprint, but don’t make overworking a habit.
Limiting your time at work will also force you to spend your time carefully to complete the set goals for the day.
Have you noticed how after spending a day on solving a challenging problem with no result, you would effortlessly find the solution the next day after having ‘slept on it’?
We often account this to being rested or looking at a problem with ‘fresh eyes’ when in fact our mind could have been working on it intensely while we didn’t even realize it.
The collaboration between conscious and unconscious mind is discussed in the unconscious-thought theory (UTT) . One of the ideas being that the majority of our mental capacity is hidden in the subconscious depth of our mind.
Even more shocking is the idea that the subconscious part of our brain is better suited for solving complex problems.
PhD productivity: unconscious mind
Wait! Don’t knock yourself unconscious.
For your PhD
The key is to give your mind the space to work in the subconscious mode.
For this you should completely disengage your conscious mind from that particular task, which is easier said than done.
This comes back to the previous point of having down time completely free of work related thoughts.
Having a good sleep. A long shower. A walk in the woods.
All this helps you use the subconscious mind.
Less sleep means less PhD productivity
Practicing deep work will give you small incremental steps of progress during individual session.
You must be consistent to achieve great results in the long run.
Human nature is weak for instant gratification and therefore often visions of tomorrow’s success are not sufficient to get you through the drudgery of today. This is very similar to sticking to a healthy diet or working out regularly.
One day of eating healthy or at a gym won’t make much of a difference, it’s the continuous effort that counts.
To stimulate yourself to stick through the hard times and not skip deep work time, keep a score of your progress.
Tally your deep work sessions in a way that you will see the growing chain (e.g. your desktop, sheet of paper on the wall, mobile app). The longer the chain gets the more effort you’ll put not to break it (psychological fact).
This approach works wonders and it’s not surprising that it’s popular among people willing to stick to healthy diet or exercise routine.
Deep work for further career
The great thing about cultivating the deep work skill during your PhD is that it will also give you a boost in your further career.
Deep work is both useful in academia and in industry. However, in industry it will be much harder to stick to it. With the open offices being common in the modern work space and the whole social media/email frenzy, it feels like our society and employers do everything to prevent us from deep work.
Master the deep work skill to succeed in your PhD.
Practice it consistently to boost your PhD productivity.
You will have a massive personal advantage over the majority stuck in the shallows.
https://blog.edgemesh.com/deploy-a-global-private-cdn-on-your-lunch-break-7550e9a9ad7e
Edgemesh gets the vast majority of it’s network throughput via the browser based clients, which helps diversify the content distribution network while dramatically reducing page load times and the customer’s server load. This organic visitor mesh is fantastic and has the beautiful property of reverse scaling (the more visitors who hit the website, the larger the mesh capacity!). Users get faster page loads transparently while site operators get increased network capacity — it’s a win win.
But early in our development a major customer asked us for a server side version of the Edgemesh client. This server side version would allow the customer to utilize their existing global infrastructure while simultaneously providing an always online peer to accelerate their mesh content. Essentially they were looking for way to implement their own CDN services without the complexity of changing their current infrastructure.
And so, the Supernode was born.
To get started with a Supernode, you will need to have an active domain registered with Edgemesh. If you haven’t done so yet, go ahead and sign-up and add Edgemesh to your site (it only takes 5 minutes 😊 ).
Once you’re good to go, have a look at the map overview. Here we can see your site visitors as they browse your new Edgemesh enhanced website and your mesh beginning to grow.
Map overview with site visitors (in purple clusters) Under the covers what’s happening is visitors are accessing the website and downloading the content (if it isn’t in their Edgemesh cache yet). This organic crawling of the website allows Edgemesh to know what assets (images etc) are the most requested and thereby the most important to replicate across the mesh network. Below is an example with debug enabled in the Chrome console showing this workflow.
You can see this for yourself by visiting the Edgemesh homepage and in your Chrome console type:
localStorage.setItem("debug","edgemesh*")
User visits page and downloads assets , it then stores these for other peers to obtain
As the user stays online, it begins to replicate in other assets from other peers. Effectively pre-caching the content This workflow, where visitors crawl your site organically and registers new assets on the mesh is important for our Supernodes.
Unlike the client side services, Supernode’s do not crawl a site. Instead Supernodes listen for any new asset registration and attempt to replicate that asset in from a browser peer as quickly as possible. Since Supernodes have a larger storage capacity (however much disk you allow) when compared to the browser based peers, they attempt to keep as many assets as possible in their cache.
Supernodes consolidate assets to larger cache points Getting going with a Supernode is as simple as running a docker instance. Go ahead and pull down the Supernode image (it’s rather large) and get yourself a shell.
docker run -it quay.io/edgemesh/supernode bash Make sure you have a browser open to your Edgemesh enabled site, this will allow the supernode to quickly discover a peer and begin to replicate in assets. To watch the magic happen, we’re going to run this Supernode in DEBUG mode. With your shell open just do:
DEBUG=supernode* npm run docker-start && forever logs 0 -f If everything worked as designed, you should see something like the image below:
Supernode coming online We should also see a Supernode in the portal if we look at the map overview.
When a Supernode comes online it receives instructions on what it should replicate in and from whom (peer ID) along with a checksum for each and every asset. This checksum is used to validate the integrity of the asset. The debug console shows these 4 components right after startup, as shown below.
The assets themselves are stored in the data directory. We can ctrl+c to stop the Supernode and have a look ourselves.
It is important to note that simply placing an asset in the data directory will not make that asset available for the mesh. Supernode’s can only populate their caches via another peer (either a browser peer or another Supernode). The knowledge of what assets reside in which Supernodes sits on the Edgemesh signal server backplane.
The index.json file has some data on what each asset is. You can install the wonderful jq utility and have a look for yourself.
Now that our Supernode is online and has some assets, browser peers will automatically discover it and use it as an always online high speed alternative peer. Unlike the browser client, Supernode’s can rapidly service hundreds of browser peers simultaneously — allowing you to deploy high capacity caches wherever you can run docker. Best of all there are zero configuration changes required to your infrastructure. No DNS entries, no load balancer — Supernodes come online and discover your mesh and add capacity automagically.
Now that we understand the basics, let’s roll out a global CDN cache using a single line of bash. For this example we’re going to roll out our Supernode network across Google Cloud.
Google’s network is one of the best in the world, and includes access to a number of low latency transport (private) backbones to help our Supernodes rapidly replicate content across the globe. Of course at the major internet exchanges such as Equinix, this network meets with peering partners to deliver a truly massive footprint. To get a feeling of the scale have a look at Google (ASN 15169) on Caida.
I’m assuming you have created an account on Google Cloud and created a project. For a step by step guide checkout this and go ahead an install the gcloud command line utilities.
We can start small and deploy a single Supernode in a single Google datacenter. I’ve shown the script below here with some comments
edgemesh-supernode
EM_GCE_PROJ="edgemesh-supernode"
docker-machine create --driver google \ # use the GCE driver --google-project $EM_GCE_PROJ \ # pass our project name --google-zone us-west1-b \ # which datacenter to run in --google-machine-type f1-micro \ # CPU is not that relevant --google-preemptible \ #why not! Supernodes can come on and offline --google-machine-image https://www.googleapis.com/compute/v1/projects/ubuntu-os-cloud/global/images/family/ubuntu-1404-lts \ em-sn-west1b #what to name this host That will create a docker host, and will kick off something like the below:
Docker-machine on GCE Once the host is available, we can go ahead and run our Supernode there by evaling to that machine as our docker target and then using our docker run command.
eval "$(docker-machine env em-sn-west1b)" \ && docker run quay.io/edgemesh/supernode Assuming everything worked, we can see our new Supernode on the map.
To remove this instance (and stop billing) we can do:
docker-machine rm em-sn-west1b -y With the trial run complete, let’s now rollout a Supernode in every Google zone and call it a day. To do this we’re going to use the following one liner:
gcloud compute zones list --filter NAME:*-a|awk '{print $1}'|sed 1d | while read -r dc; do docker-machine create --driver google --google-project edgemesh-supernode --google-zone $dc --google-machine-type f1-micro --google-preemptible --google-machine-image https://www.googleapis.com/compute/v1/projects/ubuntu-os-cloud/global/images/family/ubuntu-1404-lts em-sn-$dc; eval "$(docker-machine env em-sn-$dc)"; docker run -d quay.io/edgemesh/supernode; done Now let’s break this down:
gcloud compute zones list --filter NAME:*-a
| while read -r dc; do docker-machine create --driver google --google-project edgemesh-supernode --google-zone $dc --google-machine-type f1-micro --google-preemptible --google-machine-image https://www.googleapis.com/compute/v1/projects/ubuntu-os-cloud/global/images/family/ubuntu-1404-lts em-sn-$dc; eval "$(docker-machine env em-sn-$dc)"; docker run -d quay.io/edgemesh/supernode; done From my house this global rollout took 48 minutes and 39 seconds. I also added a few Azure based nodes for good measure. You can of course run the host creation in parallel if that is too long for you.
Once we’re done we can check our map and see our new Supernode network. Not too bad for under an hour!
There are more configuration options available for Supernodes and check out the docs for details around restricting origins, disk size and more.
Also to clean up the instances (and stop the billing) do
docker-machine ls --filter DRIVER=google |awk '{print $1}'|sed 1d | while read -r dc; do docker-machine rm $dc -y; done Until next time!
Like this article? Let us know @Edgemeshinc on Twitter
190 uptown 13 hrs 55
https://www.confluent.io/blog/ksql-open-source-streaming-sql-for-apache-kafka/
I’m really excited to announce KSQL, a streaming SQL engine for Apache KafkaTM. KSQL lowers the entry bar to the world of stream processing, providing a simple and completely interactive SQL interface for processing data in Kafka. You no longer need to write code in a programming language such as Java or Python! KSQL is open-source (Apache 2.0 licensed), distributed, scalable, reliable, and real-time. It supports a wide range of powerful stream processing operations including aggregations, joins, windowing, sessionization, and much more.
A Simple Example
What does it even mean to query streaming data, and how does this compare to a SQL database?
Well, it’s actually quite different to a SQL database. Most databases are used for doing on-demand lookups and modifications to stored data. KSQL doesn’t do lookups (yet), what it does do is continuous transformations— that is, stream processing. For example, imagine that I have a stream of clicks from users and a table of account information about those users being continuously updated. KSQL allows me to model this stream of clicks, and table of users, and join the two together. Even though one of those two things is infinite.
So what KSQL runs are continuous queries — transformations that run continuously as new data passes through them — on streams of data in Kafka topics. In contrast, queries over a relational database are one-time queries — run once to completion over a data set—as in a SELECT statement on finite rows in a database.
What is KSQL Good For?
Great, so you can continuously query infinite streams of data. What is that good for?
CREATE TABLE error_counts AS SELECT error_code, count(*)FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR'
One use of this is defining custom business-level metrics that are computed in real-time and that you can monitor and alert off of, just like you do your CPU load. Another use is to define a notion of correctness for your application in KSQL and check that it is meeting this as it runs in production. Often when we think of monitoring we think about counters and gauges tracking low level performance statistics. These kinds of gauge often can tell you that your CPU load is high, but they can’t really tell you if your application is doing what it’s supposed to do. KSQL allows defining custom metrics off of streams of raw events that applications generate, whether they are logging events, database updates, or any other kind.
For example, a web app might need to check that every time a new customer signs up a welcome email is sent, a new user record is created, and their credit card is billed. These functions might be spread over different services or applications and you would want to monitor that each thing happened for each new customer within some SLA, like 30 secs.
CREATE STREAM possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count() > 3;
A simple version of this is what you saw in the demo above: KSQL queries that transform event streams into numerical time series aggregates that are pumped into Elastic using the Kafka-Elastic connector and visualized in a Grafana UI. Security use cases often look a lot like monitoring and analytics. Rather than monitoring application behavior or business behavior you’re looking for patterns of fraud, abuse, spam, intrusion, or other bad behavior. KSQL gives a simple, sophisticated, and real-time way of defining these patterns and querying real-time streams.
CREATE STREAM vip_users AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
Much of the data processing done in companies falls in the domain of data enrichment: take data coming out of several databases, transform it, join it together, and store it into a key-value store, search index, cache, or other data serving system. For a long time, ETL — Extract, Transform, and Load — for data integration was performed as periodic batch jobs. For example, dump the raw data in real time, and then transform it every few hours to enable efficient queries. For many use cases, this delay is unacceptable. KSQL, when used with Kafka connectors, enables a move from batch data integration to online data integration. You can enrich streams of data with metadata stored in tables using stream-table joins, or do simple filtering of PII (personally identifiable information) data before loading the stream into another system.
Many applications transform an input stream into an output stream. For example, a process responsible for reordering products that are running low in inventory for an online store might feed off a stream of sales and shipments to compute a stream of orders to place.
For more complex applications written in Java, Kafka’s native streams API may be just the thing. But for simple apps, or teams not interested in Java programming a simple SQL interface may be what they’re looking for.
Core Abstractions in KSQL
KSQL uses Kafka’s Streams API internally and they share the same core abstractions for stream processing on Kafka. There are two core abstractions in KSQL that map to the two core abstractions in Kafka Streams and allow you to manipulate Kafka topics:
CREATE STREAM pageviews (viewtime BIGINT, userid VARCHAR, pageid VARCHAR) WITH (kafka_topic='pageviews', value_format=’JSON’);
CREATE TABLE users (registertime BIGINT, gender VARCHAR, regionid VARCHAR, userid VARCHAR) WITH (kafka_topic='users', value_format='DELIMITED');
KSQL simplifies streaming applications as it fully integrates the concepts of tables and streams, allowing joining tables that represent the current state of the world with streams that represent events that are happening right now. A topic in Apache Kafka can be represented as either a STREAM or a TABLE in KSQL, depending on the intended semantics of the processing on the topic. For instance, if you want to read the data in a topic as a series of independent values, you would use CREATE STREAM. An example of such a stream is a topic that captures page view events where each page view event is unrelated and independent of another. If, on the other hand, you want to read the data in a topic as an evolving collection of updatable values, you’d use CREATE TABLE. An example of a topic that should be read as a TABLE in KSQL is one that captures user metadata where each event represents latest metadata for a particular user id, be it user’s name, address or preferences.
KSQL in Action: Real-time clickstream analytics and anomaly detection
Let’s get down to a real demo. This demo shows how you can use KSQL for real-time monitoring, anomaly detection, and alerting. Real-time log analytics on clickstream data can take several forms. In this example, we flag malicious user sessions that are consuming too much bandwidth on our web servers. Monitoring malicious user sessions is one of many applications of sessionization. But broadly, sessions are the building blocks of user behavior analysis. Once you’ve associated users and events to a particular session identifier, you can build out many types of analyses, ranging from simple metrics, such as count of visits, to more complex metrics, such as customer conversion funnels and event flows. We end this demo by showing how you can visualize the output of KSQL queries continuously in real-time on a Grafana dashboard backed by Elastic.
You can also follow our instructions to step through the demo yourself and see the code, too.
A Look Inside
ksql clusterThere is a KSQL server process which executes queries. A set of KSQL processes run as a cluster. You can dynamically add more processing capacity by starting more instances of the KSQL server. These instances are fault-tolerant: if one fails, the others will take over its work. Queries are launched using the interactive KSQL command line client which sends commands to the cluster over a REST API. The command line allows you to inspect the available streams and tables, issue new queries, check the status of and terminate running queries. Internally KSQL is built using Kafka’s Streams API; it inherits its elastic scalability, advanced state management, and fault tolerance, and support for Kafka’s recently introduced exactly-once processing semantics. The KSQL server embeds this and adds on top a distributed SQL engine (including some fancy stuff like automatic byte code generation for query performance) and a REST API for queries and control.
Kafka + KSQL turn the database inside out
We’ve talked in the past about turning the database inside out, now we’re making it real by adding a SQL layer to our inside-out DB.
In a relational database, the table is the core abstraction and the log is an implementation detail. In an event-centric world with the database is turned inside out, the core abstraction is not the table; it is the log. The tables are merely derived from the log and updated continuously as new data arrives in the log. The central log is Kafka and KSQL is the engine that allows you to create the desired materialized views and represent them as continuously updated tables. You can then run point-in-time queries (coming soon in KSQL) against such streaming tables to get the latest value for every key in the log, in an ongoing fashion.
Turning the database inside out with Kafka and KSQL has a big impact on what is now possible with all the data in a company that can naturally be represented and processed in a streaming fashion. The Kafka log is the core storage abstraction for streaming data, allowing same data that went into your offline data warehouse is to now be available for stream processing. Everything else is a streaming materialized view over the log, be it various databases, search indexes, or other data serving systems in the company. All data enrichment and ETL needed to create these derived views can now be done in a streaming fashion using KSQL. Monitoring, security, anomaly and threat detection, analytics, and response to failures can be done in real-time versus when it is too late. All this is available for just about anyone to use through a simple and familiar SQL interface to all your Kafka data: KSQL.
What’s Next for KSQL?
We are releasing KSQL as a developer preview to start building the community around it and gathering feedback. We plan to add several more capabilities as we work with the open source community to turn it into a production-ready system from quality, stability, and operability of KSQL to supporting a richer SQL grammar including further aggregation functions and point-in-time SELECT on continuous tables–i.e., to enable quick lookups against what’s computed so far in addition to the current functionality of continuously computing results off of a stream.
How Do I Get KSQL?
You can get your hands dirty by playing with the KSQL quickstart and the aforementioned demo. We’d love to hear about anything you think is missing or ways it can be improved: chime in on the #KSQL channel on the Confluent Community Slack with any thoughts or feedback, and file a GitHub issue if you spot a bug; we’d love to work really closely with early adopters kicking the tires so don’t be shy. We look forward to collaborating with the rest of the open source community to evolve KSQL into something fantastic.
Join our online talk on September 21 to learn how to build real-time streaming applications with KSQL.
Finally, if you’re interested in stream processing and want to help build KSQL, Confluent is hiring 🙂
Upload file fail !!