Ask HN: How’d you go from a noodler to writing large well structured programs?
96 hodder Nov 2 54
http://news.ycombinator.com/item?id=15604893
I do a lot of coding for work (Quant Analysis) mostly in Matlab, but also VBA, and occasionally python. My team has come to write something like 15k lines of code for our overnight process.
How do you learn to structure a large program? We basically have one Matlab file that calls tons of other routines in different files in a straight order. All are in the same folder. The whole thing is out of hand. We also do not have any form of source control or testing. Wen things break, we fix them on the fly.
Do you have a good book recommendation on project structuring?
sshrinivasan 9 hrs
I suggest you hire a software developer, preferably a senior one if you can afford it. The difference between a bunch of impressive algorithms written by specialists in the field stitched together by brittle connections, and a bunch of impressive algorithms that are part of a well structured, tested and maintainable suite is huge. Reading a book or two is not going to solve the problem in the long term, which I assume is what you want.
Hiring a software developer who can take a lead role in organizing and structuring the codebase will also make the specialists better programmers, since they learn good programming techniques from someone who actually knows them.
Source: Senior software developer, where I initially started as a "algorithm/application developer" and saw the team grow and benefit by hiring some experienced developers.
csours 7 hrs
To add to this: what you have now is likely a 20% product - something on the level of a proof of concept.
All of that connecting tissue that makes things robust can take quite a bit of work, and it's really hard to see whether it's robust or not for a while.
hodder 4 hrs
I wish this was an option to learn from a pro and see how they'd go through it but unfortunately I don't make the hiring decisions or control the budget.
heartbreak 3 hrs
If you can make purchases you can probably hire a consultant for a few hours so that someone can give you an informed opinion.
hodder 3 hrs
I don't make purchases is my point. Im an analyst. The manager makes the budget, purchases and hiring decisions, and they have determined a pro dev to not be an option. I can request that a pro be hired (and have requested additional hires have strong skills) and I do.
inDigiNeous 7 hrs
You learn it by writing out of hand code :) But more on the point, what has helped me immensively has been writing out my thought processes before starting to code. Write out the problem, and solve it in a journal or somewhere before just starting to write out code and go along.
There's nothing wrong with just writing code as you go and solving things while you find problems, but to get structured, documenting the behaviour of the program is a major point. Draw images how the code should work. What is connected to what, who owns what ?
By doing this process, you start to see the big picture. It's not always easy, and the tempation to just "go at it" is always there, but this easily leads to code you need to re-factor or re-think more than you would like.
But of course, sometimes you just have to write the crap version first, and then move on to a more advanced model. Actually, this is the expected way, as coding is a field where you just can't know what lies in that valley before venturing into it. So, accepting the fact that re-factoring is something you do constantly is also key. If you don't refactor, the old crappy solutions hinder and slow you down.
Experience of course helps you make better decisions in new projects.
weej 7 hrs
Recommended resources to consider:
weej 7 hrs
One more I missed regarding review of architecture, code structure/design of OSS: The Architecture of Open Source Applications (Vol 1 & 2) https://www.amazon.com/Architecture-Open-Source-Applications...
testouts 9 hrs
https://bramcohen.livejournal.com/4563.html ... the founder of bittorent.
Ill add a personal note. Being a good professional programmer is also about understanding the context of your program. You will not always have the time to optimize for performance. You will not always have the time to refactor functionality to make it more modular for further variety. Sometimes you will need to skip test becuase deadline says it should be done yesterday. Learn to not be emotionally tied to the code. Get your job done, to the best quality you can for the time you have allotted. Dont be made if its not perfect. But during the train ride or in the shower think about how to make it better. This process of just grasping with the ideas of clean code are good training for when you have to approach a similar problem on the fly.
Mizza 8 hrs
That post is more about the architecture of the BitTorrent network than the original BitTorrent client.
As somebody has a built on top of the original BT codebase, it is not a well structured large program, and there is a reason nobody uses it for anything anymore.
mynegation 4 hrs
I am a quantitative developer working on industrial size projects.
Best way to learn personally is to get hired into a team that does serious project in the area you are interested in. If you are not looking to switch jobs, hire few people who have done that into your team, or get them on a contract (going to be crazy expensive but totally worth it).
First, basics, start with Joel's 12 rules of better code[1]
Next, buy everyone a copy of "Clean Code" book and read it from start to finish.
Look at the open source quant code bases, how they are organized and built. I suggest quantlib[4] and opengamma[3].
If you are looking at architecting quant library specifically, read this article by Russell Goyder[2]
[1] https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-s... [2] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2311743 [3] https://github.com/OpenGamma [4] https://github.com/lballabio/QuantLib
hodder 4 hrs
These are some great suggestions, thanks.
drblast 6 hrs
MatLab is a disaster for this. Great software, but a horrible, horrible coding environment.
I came from a CS background and did a EE masters. Working in Matlab made me want to cry because of the problems you're describing; I think they're endemic to Matlab rather than anything you're doing wrong.
Source control will help, just look at some of the git tutorials. Go through them enough to where you can create a "release" branch that you can roll back to if things go wrong. You probably won't need or use the rest of the features right away if ever.
15k lines also really isn't that much. It sounds like you're managing it OK (separate files instead of one massive one, yay!) other than not having source control.
hodder 3 hrs
Interesting to know others have had similar experiences in Matlab. I'm hoping to find the time to recode the whole thing in python in the new year as well as implement source control. Thanks for sharing your experience.
mlazos 9 hrs
It seems like you have a grasp of what is wrong already. Stuff you can do right now without reading a book - Use source control - Enact a policy where all new code is to be well tested (which you can enforce with some build/checkin system) - Make a refactoring change to isolate sections of code that minimally touch other sections and separate them into projects
As for books, I would suggest looking at Head First Design Patterns by Freeman and Robson. This isn't explicitly about project design, but it is about designing reusable code modules which can really help in any language. It is aimed toward object-oriented languages, (the examples are in Java) but it has helped organize my code in other languages as well. If you're using the OO facilities of Matlab, VB and python this can be useful.
LarryMade2 3 hrs
I dont use ither of those, but my suggestion is to look at well developed projects in the languages you use, check out their file structure.
For me I have a folder for the project, withing that a library folder shared code and libraries, one for the configuration files changeable templates, etc., and another that contains the applications folders (one is the main application one for login, user accounts, maint, etc.) then the different apps I make go into their own folders within the apps one. If you are using multiple languages you might want to have language sub folders as well.
Makes referencing the libraries and config info less crazy and segregates your apps for modularity.
Again check some similar project your language(s) might have differing file structure needs.
Example:
/project
/project/config
/project/lib
/project/app
/project/app/main
/project/app/thisapp
/project/app/anotherapp
etc.
kasbah 7 hrs
I will reiterate some of what others have said:
-
Use version control!
-
Hire a professional software developer
-
Good architecture and abstraction comes mostly from practice but if you really want to do some reading on software architecture check out "The Architecture of Open Source Applications" (vol 1 & vol 2)
http://www.aosabook.org/en/intro1.html
hodder 3 hrs
Version control will be on the list starting next year. Thanks! I'll also take a look at the link.
Unfortunately it isn't in our budget to hire a pro dev to the team but I can work attempt to increase time working with our corporate IT department which has some excellent devs.
trengrj 9 hrs
Have a read of Clean Code and the Pragmatic Programmer.
I’ve seen this happen before in teams that code all day but don’t see themselves as developers. You need a instigate a cultural change at work where it becomes unacceptable to not apply modern development practices just because the team sees themselves as “quants” first.
purplerabbit 7 hrs
I would actually encourage OP to avoid Clean Code.
I found Code Complete to be a much better book. It encourages a deliberate, thoughtful, sustainable approach to the structure of your code, and provides good examples.
Clean Code encourages you to decouple everything as much as possible, which in my experience leads to poor abstractions. (See https://www.sandimetz.com/blog/2016/1/20/the-wrong-abstracti...)
sixothree 6 hrs
Second. When I apply a few of the simple concepts of Clean Code I end up with much better organized code.
mfrye0 4 hrs
I've found that a good rule of thumb is the "3 rule".
When you start using the same code in more than 3 places, put it in a shared lib type of file / folder.
When you start using the same code across 3 projects, put that into a "module" and manage independently in version control.
Depends on the language / framework you're using, but I've found that general rule pretty helpful in my experience.
jxramos 8 hrs
I've found that Matlab is good at prototyping but poor in structuring into large applications with complex physical structure. I'm sure things have changed over the years but the sort of folder oriented loading of modules from the working directory baked habits into people that are hard to break I imagine. Give Large Scale C++ Software Design by Lakos a read to learn about the issues involved in ramping up tall dependency chains. A lot of the book is naturally C++ oriented and thus focuses on impact to compile and link time, but the dependency concepts live on in other languages. Find a CASE tool that works for you https://en.wikipedia.org/wiki/Computer-aided_software_engine.... Visualizing codebases with auto generated diagrams from something like CppDepend or any of its cousins can enlighten a lot. But getting masterful at these subjects is really what Software Architecture is all about. There was a fella by the name of Juval Lowy peddling some architecture training/certification wizardry. Read up on Martin Fowler too, he has good subject matter in this area.
EDIT: But for your immediate Matlab concerns I remember some StackOverflow questions about structuring large matlab applications. I just learned from one that Matlab has this notion of packages I was previously unaware of: https://stackoverflow.com/questions/2748302/what-is-the-clos... https://stackoverflow.com/questions/27861304/managing-bigger... https://stackoverflow.com/questions/2326609/how-to-visualize...
you may search StackOverflow for "matlab packaging" "matlab organization" "matlab structure" etc to get likewise hits.
Adamantcheese 6 hrs
A lot of commenters have recommended reading books that lay out clean code structure and the such, but if the problem is as bad as you say it is, I would recommend reading the somewhat satirical "How to write unmaintainable code". You should be able to find out what issues you do need to fix with that, and the remedies you'll be able to find in the recommended clean code books.
dyarosla 4 hrs
As the parent mentions, you're looking specifically for ways to write 'maintainable code'- that's the terminology you should be searching for, not well-structured code.
namuol 6 hrs
First and foremost: Start by using version control.
The rest is going to be up to you and your team.
Don't try to enforce a structure unless it actually reduces the likelihood of bugs and improves the experience of refactoring.
subwayclub 7 hrs
As a baseline recommendation try this: turn some of the apparently unfactorable pasta code into a large inline function and then start factoring from that point. Usually an opportunity will appear to realign the code in a new way that gets you something you didn't have before. Repeat a few times and you will end up with some original code abstractions that you never would have seen without a careful evolutionary process.
sounds 9 hrs
There are distinct schools of thought. "How to structure a program" is subjective but also divisive!
Here are the major ones (that I'm aware of), making an effort to be inclusive:
- Functional programming. It's possible I'm doing a "retcon" on several models here, but these are a fair start:
1.1. http://www.erlang.se/doc/programming_rules.shtml
1.2. https://www.seas.upenn.edu/~cis341/current/programming_style...
-
C (the language) organization. Embedded, linux kernel, and many open source projects follow this pattern. O'Reilly books are a good start.
-
Java. There are so many competing strategies that the "no true scotsman" fallacy applies. Here's one that seems uncontroversial: https://www.udemy.com/java-design-patterns-tutorial/
scarecrowbob 7 hrs
IMO, the two easy things:
If there are packages you're using from other systems that's often a very easy place to start with the first one... follow through all that code and see how it's organized.
With the second one, I'll say that it's really easy to put up with crappy workflows just because it is what you're used to. You have to not be willing to put up with the things you don't like in order to fix things. Use source control. Don't write functions that are 300 lines long. Think about the times that you have problems in your code and don't let those happen in the future.
hyperpallium 6 hrs
It's not technically difficult to structure a project. But it can be very difficult to well-structure a project. And it has a cost: to set up, to maintain, and an on-going cognitive cost (hopefully offset by the gains).
One way is to avoid duplication: e.g. if the same code appears in two different places, make it into a separate function, and call it from those places.
For general organization, unless you have enough experience tonknow how your code is likely to evolve in future, your organizational strategy may quickly become unsuitable. For the details of Quant Analysis, this may be particularly true. You may need a rougher, broader organization scheme (such as display as opposed to analysis), or a framework that provides the common support code needed by all analysis routines, so the analysis become less cluttered. Apply this even when the support code is only used by one analysis - so that its parts appear in different places.
But here there's a cognitive cost: that analysis may be harder to understand with its parts in different places.
The version control and testing seems simpler to start. You could start using git right away; but it is possible to muck things up - usually can be sorted oit, but there's a learning curve. But for such key code, it's probably another reason to hire an experienced dev, as others have said.
alook 9 hrs
Look at some good open source projects, and look at the overall project/module structure as well as the OO graph.
Re: high-level project layout: - where are their build files? - how are the source packages broken up? - where are the tests stored?
Trace back from the entry point to the code and look at how things are encapsulated, where code gets re-used.
Diagramming out with something like UML is cumbersome but can be helpful to visualize the structure as you're getting started.
Looking at software design patterns (while often overkill) can give you a good sense for some common ways people try to re-use code. For example: - https://en.wikipedia.org/wiki/Design_Patterns#Patterns_by_Ty...
rhoursour 8 hrs
A good place to start would this repo that outlines some popular folder structure conventions: https://github.com/kriasoft/Folder-Structure-Conventions
dizzystar 5 hrs
Accept one simple fact:
Your first attempt is going to suck. How bad is a shade of gray, but it'll suck.
Accept this fact:
Your first attempt is going to suck. How bad is a shade of gray, but it'll suck.
And accept another fact:
Your third attempt is going to suck. How bad is a shade of gray, but it'll suck.
At some point, your attempts start to suck less.
brandonmenc 8 hrs
SICP
I read it in my late teens, and it's what made things finally click re: structuring programs. (This was back in the 90s, when it was far more difficult to inspect other well-structured software that wasn't also low-level C code.)
https://en.wikipedia.org/wiki/Structure_and_Interpretation_o...
mooneater 8 hrs
Familiarity with design patterns in general can help with structuring systems.
Look at structural and behavioral patterns on https://en.m.wikipedia.org/wiki/Software_design_pattern
Facade, strategy , adapter, etc.
Also think about "levelization" John lakos explains that clearly.
mattbgates 8 hrs
I keep writing small programs and each time, I'm finding more efficient ways to write it or learning new things. As I develop new web apps, its like my mind has evolved to say, "I can do this more efficiently and with less code."
Definitely have come along way from the first web app I wrote. Can't say it was not without a lot of writing echo 'test'; to see exactly where the error was, though of course, most frameworks tell you exactly where the error is now, but it helps with testing and debugging.
I do a lot more brainstorming in the beginning, designing of the web app, and then actual coding. It's really just practice. Learning how to keep your code organized and understandable for yourself. Write as if someone is looking over your code and isn't as smart as you :P
hoodoof 7 hrs
Do searches that specifically address what you are trying to do and learn from the resulting blog posts....
https://www.google.com.au/search?q=how+to+structure+matlab+c...
Each time you encounter a new concept, search for that too.
Search for "beautiful matlab code"
Learn from others. Aggressively search for open source matlab code bases and read them.. look at their structure and try to understand why they are doing things the way they do... email the authors and ask them questions.
snarf21 9 hrs
It wouldn't hurt to create some diagrams describing your current process. Then try to identify what parts are dependent on other parts. Also, try to encapsulate some of the pieces so you can separate concerns. This should start you down the path...
cirgue 7 hrs
I work on a machine learning team that has gone through the exact same process. As others have noted, the biggest change for us was cultural.
If you send me a message I would be happy to chat about it.
hodder 3 hrs
Thanks. I'd be interested in how you managed to change the culture? We have a 4 person team and most consider themselves quants/finance people as opposed to programmers. None of us have CS degrees, but are trying to learn and do the best we can.
bjourne 7 hrs
Practice, practice, practice, practice, practice, practice ... Reading a book to learn modularisation is like reading a book to learn how to play piano.
mabynogy 7 hrs
How do you learn to structure a large program?
You can try to organize you stuff by: - layers (time) - functional units (what it does) - subsystems (parts assigned to people)
Over that, you can write a state machine to sequence the jobs, add tracing and isolate parameters.
The idea is to group, organize and classify iteratively.
Your project looks interesting.
Feel free to reach me @gmail.
auganov 6 hrs
First, I'd try to identify how the status quo affects your bottom line. How will you determine if the rewrite/refactor is being successful?
Be wary of the glamour of "doing things the right way". Don't want to get into a new mess.
andrewf 3 hrs
A tangent: Places I've worked with no source control and "we'll fix it live" often lack reliable backups. A failed hard drive was an existential business risk.
I recommend making sure you have a disaster recovery plan in place. Essentially, do you have up-to-date and reliable copies of things off site, and a documented way (which you've tested!) to get your stuff back up and running using only those copies.
tboyd47 5 hrs
Start using Git right away. The time dimension of code is the most useful to organize.
Then learn and use some object oriented programming. Don't go too deep- just enough.
Then organize your code into files and folders based on the class relationships that start emerging. It's really that easy.
One more thing: don't read any books about structuring code or go by any blog posts. They will only teach you someone's opinion. Code structure is a team decision. Make an effort, even a small one, with clear advantages, and your work will be appreciated. Overload your team with demands based on something you read somewhere and they will resist.
EpicEng 5 hrs
It's really that easy
It's really not. Neither git, OOP, nor nice folder structures are a panacea (nothing is, of course). I've seen beautiful C in Source Safe and I've seen awful, overly engineered class happy crap in git repos.
Writing good, non-trivial systems requires experience and skills in many areas. No design pattern or source control system will save you from yourself if you're just not at that level yet.
callmeed 7 hrs
Read Clean Code and Clean Architecture
stephenbez 8 hrs
I suggest reading the book "Clean Code". I've worked with traders/business analysts who do some coding but it isn't their full time job and their code quality isn't great.
After having them read the book and then following up with code reviews from a senior developer they improved their code significantly in just a few months.
"Refactoring" is also a good book, but a lot of time it is too advanced for people who copy and paste code all over and don't use functions, etc.
turc1656 6 hrs
Sounds like you and I are in similar situations. I work in finance and am somewhat of a "DevOps" type of person, although much less so than I used to be in my previous company. That being said, I wrote 2-3 monsters in the past few years - all were projects that I worked on solely from scratch - no outside involvement. The key for me personally was twofold:
1) Owning the project solely, in its entirety, from start to finish. this gives you that sense of responsibility that motivates you when you hit a road block or discover a major hurdle/roadblock/bug in your code
2) Creating large blocks of time to work on it. I found that when I tried to look at the code for 30-60 minutes at a time and get something done, that was the worst work I did and frequently had bugs or needed to be restructured entirely. The only way this didn't happen was when I was working frequently on the project. If I only had, for example, 5 hours in a given week to work on it, it was better to just not work on it unless those 5 hours could be sequential. It is hard to jump in to something, then think back to how it might affect something you wrote 2 weeks ago and then have to refer to that code. You'll spend more time just getting back up to speed than actually getting anything done.
For #1, this may or may not be possible for you. Not sure what the rules are about oversight and how well defined the roles are. But I found it extraordinarily helpful to involve absolutely no one. That also gave me a bird's eye view of everything and when you can recall how you structured a function/method/library that is being called, it's easy to see the pitfalls of how you will encounter a bug or hit your error capture section of your code. If you can't do #1 fully, then section off whatever you can. Clearly delineate what you are responsible for and what someone else is responsible for. For me, I hit limits with other systems that I needed to interact with within the company. So the maintainers of those systems were responsible for their system and only their system and I did my end separately.
For #2, this is usually very difficult to do. For the extremely large hurdles I needed absolute silence and huge blocks of time. This meant there were several weekends that I spent in the office, from 9 AM to 10 PM doing nothing but cranking out code with only food breaks. It worked for me, though. But it sucks big time for obvious reasons. Avoid that scenario at all costs if you can. Try to get those dedicated blocks at the office for complex coding tasks.
I would add a third critical element - that being to plan everything out in advance so that you can make sure what you have in mind actually works well. When you step through everything logically and see it all at once, you can identify problems, unforeseen situations, edge cases, and bottlenecks that will make you pull your hair out later on. Since this code already exists, I would instead modify this to be that you should plan out whatever solution you are going to pursue in full before you write even a line of code. Wait until you have a full game plan.
hodder 3 hrs
Thanks for the advice.
1) We can likely section off large components of code within the group and start refactoring/rewriting from there.
2) This can be tough on a trading floor but I do prefer to code in large time blocks as well.
Thanks again.
wavefunction 9 hrs
Lots of mistakes, and working with smart people who were kind enough to point out the deficiencies in my early and crude approaches, reading many articles and blog posts on every sort of programming topic, and finally and most directly important was reading through codebases at places I worked or open-source projects in my early years and learning from what other people were doing more correctly.
kafkaesq 8 hrs
Working with smart people who were kind enough to point out the deficiencies in my early and crude approaches
This point cannot be underemphasized.
There's no end of stuff you can read online about better practices, these days. But those experience I had working with people who, without so much taking "pity" on me, took me seriously, despite my obvious naivete in certain areas - I'm not sure whether I would have gotten anywhere at all without them.
TheAceOfHearts 7 hrs
As other commenters have said, I think you should consider hiring a software engineer. It's easy to get overwhelmed by tools and minor details.
It's difficult to give you advice, since you've asked such a general question. I've written some thoughts on the matter, but take it all with a grain of salt. At best, my comments are spherical cows [0].
There's nothing wrong with having everything in a single folder; it really depends on the project. If you find it difficult to navigate your way around the source, I'd say there's two high-level approaches for organizing application code: you either group things by feature or by functionality. For example, many traditional web frameworks group source by functionality, so you have a folder for all models, controllers, views, configs, etc.
When structuring application components, you'll typically want to separate things into pure modules and anything which produces side-effects. Pure modules are typically easier to test and debug. If a module depends on anything, consider passing in any parameters during startup or when calling the module, rather than having it look things up on its own. That's the basic idea behind dependency injection.
Git is pretty much the standard for version control. You can get pretty far by just memorizing a few commands [1] and throwing up your code on Bitbucket, GitHub, or GitLab. To make life easier on yourself and your teammates, I'd suggest using a GUI tool like SourceTree. It makes it much harder to shoot yourself in the foot, while providing a far more approachable and discoverable interface. I'm not the biggest fan of git, but since it's what everyone else uses, you pretty much have to suck it up.
Since I'm not familiarized with the ecosystem I can't give much advice with regards to testing. The closest thing to a standard testing interface that I'm aware of is TAP [2]. I'd highly suggest checking out "An introduction to property based testing" [3]; it's one of the most insightful testing-related talks I've seen.
[0] https://en.wikipedia.org/wiki/Spherical_cow
[1] https://xkcd.com/1597/
[2] https://en.wikipedia.org/wiki/Test_Anything_Protocol
[3] https://fsharpforfunandprofit.com/pbt/
pg_bot 8 hrs
IMO poor structure of a codebase is a sign that there is a culture problem. You acknowledge that there is an issue but have dedicated no resources to solve it. The way to fix sloppiness is to break your bad habits for new code, and then fix the issues with the old code over time.
My recommendation would be to begin adding tests to your code, keep files sizes manageable (under 120 lines), keep functions under 7 lines. In general just be a professional instead of running around trying to fight fires all the time. Luckily 15k lines of code is a pretty small project so you aren't completely screwed.