Kyle Evans is a Senior Site Reliability Engineer (SRE) for Lumin Digital. He works with Payam Nael, an SRE team manager. The job they do is both demanding and rewarding, though largely invisible to users. Payam is quick to remind us that when they do their best work, the users don’t know they exist. He is proud of the 100% retention rate on his mostly remote team, a feat that requires close attention to building a sense of community.
Kyle spends much of his time innovating and reinforcing reliability so the user environment is always up, always available to serve Lumin Digital users. He is inspired by working with such a high-functioning team and takes pride in finding solutions to pain points.
Lumin Digital offers a cloud-native platform that levels the playing field for banks and credit unions in the digital era. We sat down with Evans and Nael to understand more about the people behind the curtain serving Lumin Digital’s financial institutions.
The Life of SREs
Q: How would you summarize what SRE’s do?
A: Payam: Our team is made up of the Lumin Digital site reliability engineers. I could say that we are the heartbeat of the organization and that we keep everything running. And I’m not saying that to be boastful, I’m saying that because that’s essentially what we do. We make sure that all of our systems are reliable and scalable, meaning that we meet our user demands in terms of availability and we usually exceed their expectations.
Kyle: Yeah, I’d say that’s all very accurate. Lumin creates online banking software. If you think of the software as a service phrase, our SRE team does everything related to the “as a service” part of the engineering.
We also create tools not just for the user but also for other teams at Lumin to configure the product and set up options for the way that it works, as well as some tooling to help with our internal processes.
Q: What do you look forward to at work each day?
A: Kyle: Well, for me, I frequently have the opportunity to work on a process or a problem that’s causing pain for either the user or the organization itself. Once we’ve developed a solution to that, I like it when the end-user of that solution gets to try it out. You can see relief from whatever pains they were experiencing before and delight at being able to use better tooling or a better process.
Payam: I think everything we work on is very impactful. We make a big difference for our users as well as the people who work with us. Every day, the problems are new and different. But probably what I look forward to the most is working on our team.
I’ve never worked on a team that is so highly skilled and high-functioning. I say that to Kyle all the time. They attack problems in a way that I’ve never seen before. Everyone on the team is very capable and things always seem to come to resolution in a creative and innovative way with this team.
The Lumin Digital Difference
Q: What differentiates Lumin’s products and services?
A: Payam: All of our employees at the company are empowered to take charge of their work and make it the best it can be. And that has translated to us having the best product. The employees at Lumin have direct knowledge of what it takes to build a good product and they do what they do very effectively.
On my team, we take bold innovative steps and we don’t punish people for making mistakes or breaking something. Because you can’t advance without breaking things sometimes. We use it as a learning experience.
Kyle: Our users love the user experience of our digital banking product. It’s beautiful and intuitive to use. Lumin has very intentionally tried to take advantage of advances in software engineering by using a modern software and infrastructure stack. As a result, we’re able to develop new features and improvements and deploy them very quickly versus our competitors, where it sometimes takes months.
Q: What does all of this mean for credit unions?
A: Kyle: Our CEO talks about the value-added for credit unions a lot. For example, JPMorgan Chase will spend billions on their digital banking platform. Credit unions are much smaller and typically don’t have deep pockets. That’s why they turn to us.
The existential crisis in the credit union space is that it’s easy for their users to turn to a big bank that has so many more development resources. They must be able to compete with that.
Q: I would imagine that the same is true for some smaller banks, as well?
A: Kyle: Absolutely. They face the same types of competitive pressures and resource constraints.
Fixing the 520 Error
Q: How did the team discover the 520 problem?
A: Kyle: Essentially, we have an engineer on our team that leads a weekly load testing meeting.
Payam: Just to clarify, the weekly load test checks to ensure that the system can handle, say, 100,000 users logging into the system all at once.
Kyle: And from that meeting, they found that a portion of our user traffic was receiving a 520 error response, which is essentially a web page with a cryptic message. So we started monitoring to determine what was triggering those error messages.
What we found was that there were two main sources of the error. When the user first enters our site, they go through this service called Cloudflare which is a vendor we have that protects against bot and DDOS attacks. After that, a load balancer distributes the user traffic to a bunch of different servers on the back end. And that’s how we’re able to handle so many users.
So, during times of increased user activity, our servers will scale up, and during times of decreased activity, they’ll scale down. So the question was, why wasn’t it working? We found that the 520 error primarily occurred at the time when servers were being added or removed in response to a change in the number of users. To solve the problem, we wrote a program that would run on any server that was added or removed to make the migration of user sessions seamless. We were able to reduce the errors that users saw without any impact on performance.
Payam: The biggest thing I’d add is that if my team of 13 is doing a good job, the user doesn’t know we exist. We have achieved an average of 99.98% uptime for our users. So they rarely notice our activities. In the recent past, we’ve done several upgrades, and this can turn out to be a huge downtime event. I’ve seen it in other places — it’s very disruptive, as in an “I can’t get to my bank account” situation, sometimes for weeks. We did those upgrades without even a hitch.
This is one of those situations that just stuck out and I was impressed that Kyle was able to resolve the 520 issue and not have users stuck on an error page not knowing what’s happening or why.
The Culture at Lumin Digital
Q: You both talk about what you do with so much passion. What keeps you engaged?
A: Kyle: It’s just really amazing how competent everyone is at Lumin, how willing they are to pitch in. Everyone is excellent at troubleshooting, handling pressure, and dealing with ambiguity. When we come together and work on a problem, we’re more than the sum of our parts.
Payam: Yeah, for me, I guess I’ll say it this way: if my team went away and did nothing more our users would be completely happy with what they have right now. But every day my team comes up with something new — a new feature, a new gadget, a new improvement — that makes the experience even better.
The Lumin Digital SREs are behind the scenes ensuring that their digital platform provides the best experience for users. With 365 days, 24/7 access, financial institutions of any size can keep up with the big banks. Learn how today!
Pamela Michaels Fay is a business, financial, technology, legal and lifestyle writer, whose work is informed by over 20 years of strategy, leadership and organizational development consulting for Fortune 500 companies.