Oh, how foolish, yet direly uninspired I was but just 10 days ago.
A boy wandering his way through online bazaars of components, not knowing what in the world would do him right. Dreams of Epyc, the legacy of Xeon. A war of strangely-named Lakes, editions, and socket compatibility in a whirlwind of availability and compromise.
Never was anything straight-forward. That is the way of the home built server, and it’s a journey. If you’re anything like me - selecting the parts alone is a saga of disappointment and anguish, exhilaration and anger. I can research specific information with the best of them, dig deep into product sheets… but there are so many different variables to bring together. RDIMM vs LRDIMM? Oh you BETCHA we’re gonna learn about that! Selecting storage capacity - who really are you, what do you care about, do you care about things, how much will you pay for them? Let’s find out!
And why not toss in learning all about ZFS and Proxmox from scratch, all on my own! I have learned a completely new philosophy of managing a hypervisor going into this. I had a lot of insane and idiotic ideas at the start. But look at me now - a dumby with his very own glorious pile of iron and silicon to torture!
And I’m very proud of what I have built. In this post I’m gonna cover my purpose for building a server, shopping guide, and the physical build of the server itself. It’s taken me awhile to get around to this point because even this much has changed drastically up unto this very day.
A home server. Why do I want it? What do I hope to accomplish?
I love simulation. I am not satisfied to just read about or demo something, I want to BE it. The ultimate entertainment and learning experience for me is doing. If I want to learn how to secure a particular type of environment, I want to walk its grounds with my own 2 feet. Since those opportunities are limited, the next best thing is building simulations of those environments to explore.
The mission of my home laboratory is emulating enterprise environments. I want purpose, definition, and a philosophy for what exists and what it does. That means scale, maturity, and breadth. I want to do it real and right, the same as if I was building something professionally for work. I write documentation, build procedures, and consider projects end-to-end. I don’t just think about throwing together a Windows red team environment with a domain controller, a couple victim machines, and an attacker VM to run through exercises with.
I want a small business environment with processes and work happening. I want to simulate an attack, but I can’t do that before I’ve built and maintained a complete IT environment with the complex and dynamic needs that really drive decision making in the real world. I want to have the Excel spreadsheets of passwords, machines that slipped from asset management and are out-of-date, and a mile-long to-do list of security enhancements we all have that we hope to get around to. That’s the experience in real life, and what I’m simulating here.
To that end, I want to really understand this technology. I want to feel confident and competent as a Linux, network, and virtualization engineer so that when I’m working with real experts in these domains I have the appropriate level of awareness, and empathy with the difficulties and challenges of properly building and securing complex environments.
Beyond simulating laboratory environments, this is going to be a true production server. I’ve built it with extreme performance needs in mind that I intend to fully tweak to and take advantage of. It has redundancy, and enterprise-grade parts intended for real work. I’m going to properly manage my home network as if it was an enterprise-level network with advanced capabilities that reflect that.
It’s going to be my core home infrastructure, as well as my laboratory. It will manage all of my networking, and run any of my home services as I migrate away from the cloud. I want it to stream media, act as a storage server, be my router, security monitoring stack, and also cover home automation.
This is a direct extension of my previous home lab environment that I called DumbyLand which I built on a Supermicro micro appliance server - which I still have and love! I learned everything I know about KVM with that, and am going into this project with a much more mature mindset of how to architect this system thanks to that experience.
To expand my home serving capability, storage was the biggest blocker I had. Building a NAS was a necessity, especially if I wanted to migrate off of cloud services and start to centralize and own all of my data. By the time you spec out a NAS, you may realize it’s only about 25% more of the cost to build an amazing server… so that’s what led me down the route of building an “all-in-one” - storage, virtualization, network appliance, and home server.
I am a cloud cynic. I think it unlocks wonderful capabilities for organizations, and is a necessary part of the modern enterprise toolkit. It is no panacea to IT needs, however. If you intend to fully utilize the cloud, it is a true dedicated discipline that you need to fully engrain into the fold of your IT. Half-assing it only leads to heartburn and sad pandas.
Scalability, and distribution are massively beneficial traits to have for a distributed mobile application, or a service that has extremely volatile use trends. But for your standard IT infrastructure the management headache of learning to properly scope, architect, and configure a cloud with all of its individual services is a massive undertaking. In my view, if this is your goal and you haven’t hired explicit experts in the particular cloud environment you’re onboarding to… you’re doomed.
There are an endless amount of unique scenarios and gotchas in every IT environment, and cloud is no different. Knowing how to manage AWS costs is a complete discipline. AWS IAM is unique compared to Azure IAM, and especially to Active Directory and other privilege management.
Cloud is expensive. While you may have a ton of flexibility over what to spin up, and can microtune to your needs, knowing how to do so effectively has been a headache in my (admittedly limited) exposure to with cloud environments. I haven’t found a single cloud training/development program that gives you a decent amount of flexibility in spinning up resources to experiment with. And god forbid you forget, or accidentally screw up the way in which you suspend/terminate something and end up with unintended costs.
Many CTFers have stories of forgetting a cloud GPU instance only to remember when they receive a bill with a comma a couple months later!
While network bandwidth is a huge benefit of cloud, my primary focus is serving myself and my home. My laboratory environment will be almost entirely self-contained and not have much external WAN requirements or concerns. If I wanted to host a public service that many people would consume, I’d have some problems that cloud would certainly help solve. I have gigabit at home that works very reliably, so I’ll learn how to live with that.
Bandwidth is also expensive as hell! Media streaming HD content through AWS would cost a ton. Transferring data to/from devices, especially for experimentation with large datasets of log/security data or whatever as you experiment might suddenly add up to costing a couple hundred bucks a month!
I calculated the cost of renting a dedicated server with IBM for similar specs - 256GB of RAM, decent CPU, etc. and it would have cost something like $1500 month WITHOUT the storage!
Total, I ended up spending just over $4,000 on this build with taxes/shipping. For some context, $2400 of that alone is just memory/storage, which you would still end up spending if you only got a basic NAS. A nice chassis on top of that would be $400, and if you wanted one capable of running a few containers as well, more like $800. I was also looking at doing a dedicated physical router that would have run me about $400 - but I’ve replicated that and more with my virtual network infrastructure. So, realistically I ended up splurging about $1000 more than a NAS to give myself a massive playground of opportunity, and ability to do damn near anything I want. For another $1200 I can max out the memory to 512GB if I decide that I want to, and that will drop in time as DDR4 becomes cheaper.
Eventually, I also see myself getting a dedicated NAS purely for a massive lake of storage and backup purposes, but I’ll only need that when I get to the point of really pushing this build to its limits. Until then it will have PLENTY of storage to get me started on that self-hosted life.
Final word here on the purpose of this build is self-hosted. The opportunity to build an amazing amount of infrastructure using enterprise-grade open-source software is truly mind-blowing. You can replicate damn near any of the best cloud services with competing projects, and assuming you don’t need huge diversity/availability like serving a product to a customer base, a home server is just fine for all of that!
Have a website or online service I find myself using, or even paying for? May as well host it myself! Want to unlock capabilities in more legitimately managing my network? There’s a ton of tools for that! Security monitoring? You bet. Want to get rid of Google Drive? Done. URL shortener, pasteboard, note sharing, privacy VPN, home automation? So much good stuff, and no more temptation to buy RaspberryPi for it!
Why name it Matrix? Because it is my own self-contained virtual universe. It contains the ability to run a full reality of IT services within itself, with no dependence on any external factors besides Internet.
Here is where the adventure got real. I’ve desired a beefy home server for a while, but never really dug into the dream of actually putting a full build plan put together. It was only after I had done this for about a week and multiple days of research that I actually became dedicated to building the server.
It’s really not an easy task in the modern day to do so. There are many compromises you have to make in market availability of parts, what is hot, what is soon coming. I’ve looked at Epyc 2 CPUs quite a few times and was lightly familiar with the product line, but when I started looking, I chatted with a few folks on Twitter after realizing that availability of the newest stuff was non-existent - like quite literally, you can’t realistically find modern-gen server CPUs at a price affordable to a regular ol’ consumer like me.
What ultimately guided my build falling into place was Amazon, which ended up being my primary spot for components. As much I want to cut off my support of them, you’ll find how critical it was to this build. I ended up finding a good deal on at least a contemporary CPU for a pretty killer deal which narrowed where I was looking elsewhere.
Let’s go a bit over CPUs. There’s many aspects you have to take into consideration when selecting one. First is the basics - core count, hyperthreading support, core frequency. The amount and types of memory that it will support. What socket it is will determine what motherboard it can fit into. Cache size is a huge consideration for performance, I think especially when it comes to virtualization when the CPU is constantly juggling a huge variety of different tasks - the more cache, the better. Going into this I had no clue what would be like “OMG that’s a lot of cache” compared to “oh wow that’s barely anything”. Let’s go over the information I covered in my learning and selection process.
The Intel Xeon e5 v4 family of CPUs are mid-range enterprise-level with support for advanced features, and solid specs with a Broadwell architecture which came out in 2016. Modern-gen architectures are in the range of Kaby Lake, Skylake, Coffee Lake, etc. and while it would have been cool to experiment with the more cutting-edge stuff, Broadwell is tried and true, and easily available. I originally got a e5-2620 because I have never really been limited on CPU usage, but I wanted a half-decent amount of L3 cache and such. That had 20MB cache which would have been pretty solid.
To get a better idea of the market, we’ll take for example a Xeon Silver 4210 available from Newegg. This is a $540 Cascade Lake (released in 2019) architecture server CPU. The MSRP is $501.00 - $511.00 so there’s already inflation there. Taking a closer look, it has a 13.75MB L3 Cache, and 2.2GHz frequency. It uses an LGA 3647 socket.
Maybe $540 is a bit rich for your blood. A more modern-gen Xeon E3 v6 is a Kaby Lake CPU released in 2017 that costs $306. It is 4-core with hyperthreading so 8 threads (4c/8t). Alright so maybe you don’t really need a lot of thread support for your build. It has 8MB of cache, and that doesn’t sound terrible if you’re building a server that has a far lighter load. It has DDR4 support, and supports ECC. Cool! Maybe you’re just building a storage server with a ZFS pool that will serve a few VMs and containers… but then you look at the datasheet and realize it supports a max of 64GB RAM, which pretty quickly kills many modern server builds where I say 128GB is the minimum I would want to be able to scale up to. We’ll cover more of this info later, but between ZFS and the flexibility of being able to freely allocate memory to VMs and containers is so worth it.
Now let’s take a closer look at the Xeon e5-2620 v4 I initially ended up with. It cost $289 refurbished on Amazon, with Prime delivery. It has 8 physical cores, and hyperthreading which results in 16 threads. This is quite a bit of headroom for spreading out processing tasks, as well as perhaps some opportunity for dedicating cores to specific tasks later on. It has 20MB of L3 cache which is far more than a lot of the other server CPUs I looked at. It has a 2.10 GHz frequency CPU, and can turbo up to 3.0 GHz, which I think is plenty of runway in processing power. I’ve never really pushed CPU utilization high with my server loads before so I wouldn’t expect frequency to matter a whole lot as long as I have threads to spread load on, with lots of cache and memory available. It supports enterprise-grade levels of memory with 1.5TB max of DDr4. Initially, I would have been just fine with a quad-core so getting 8 cores was awesome. It uses an LGA2011-3 socket which was extremely widely used, and should be well-supported… but we’ll find that isn’t really the case.
Another huge consideration here is availability. I won’t lie - once I got my heart set on this, I was anxious as hell to get it in my hands, and was willing to compromise on a lot just to make sure I could get it soon. Once everything started to look like I would very lightly compromise, and get it all perfectly timed to arrive within 7 days, it all fell into place. After landing on a CPU, I could shop for motherboard and memory. After those 3, the rest is pretty straight-forward and doesn’t really impact the other components much.
I felt like I had done a pretty good job shopping around by this point, and was stoked on the midway range of things I had ended up with.
However! Fantastic budget power landed in my lap. When chatting about the specs, a friend who is also an asshole, was like “why wouldn’t you get an actually cool CPU, you loser?”. And so I landed back up on Amazon, just by chance taking a poke around. Then I found an E5-2650 that would arrive in time with all the other parts… and it was WAY cheaper at $235! Wow! 12-core, 24 threads with 2.2GHz frequency. It has an MSRP of $1100… this is no lightweight chip! Because I was shopping in the same v4 family of the Xeon e5 CPUs, they were all LGA2011-3 socket so the same motherboard would work regardless.
As luck would have it, that CPU was marked as delivered the same day I got the rest of my parts. Except I never got it. I still have a claim open with FedEx that hasn’t been responded to yet because the package was marked “left at dock”, and signed by someone whose name I certainly don’t know. At my apartment complex, we have an office manager that is there through the whole day, there are package lockers that delivery people can use, and they’re allowed direct access to floors to drop off packages. It’s not consistent what they choose to do, but I’ve never lost a package here, and it’s been secure so far. I’m 100% certain that it was stolen/lost by the delivery person, so I’ve gotta figure that out.
However! ULTIMATE budget power then landed in my lap. This extended my final hardware plans out quite a while, but it was 100% worth it - I found a Xeon e5-2683 v4 CPU on Amazon for $230! WHAT! This chip MSRP is $1850. It is 16-core, with 32 threads at 2.10 GHz. It has 40 (fourty!) MB of L3 cache.
And that’s where we are today! I received it in the mail a couple days ago, and dropped it into my server this morning, and it’s working perfectly.
With one anxiety attack completed in choosing a CPU architecture to go with, next big decision was a motherboard. I knew I needed an LGA2011-3 motherboard. Again this is Broadwell, which to my limited recollection, was huge back in the day for on-prem environments. Maybe I’m wrong, and it’s dying out… or maybe it’s still popular and that’s why it’s so hard to find quality parts? I don’t know, but the market for server motherboards was extremely limited. I did have a solid option, but it would have been great to at least have 3-4 options to pick from.
You really want a server motherboard as the first criteria. The amount of RAM slots alone is a huge consideration, and I really wanted 8. You really spend more for high-density memory, and desktop-style motherboards don’t offer any other real benefits. There were a few options for dual-socket motherboards which was a really interesting consideration, but definitely doesn’t fit my use case, so those were out of the equation. Looking at Newegg, I found a very solid Supermicro board. It would ship in time, had good features, and so I went with it.
Supermicro MBD-X10SRA-O LGA2011-3
It has tons of PCIe expansion ports which ended up being super important in my build, so I’m extremely happy I did a bit of a splurge on getting a higher-end board. It has 10 SATA slots which ended up being perfect for how I currently have things configured, and that’s another huge consideration when selecting a proper motherboard. If you need more than that, you’ll need to use a PCI card to expand that out. I already had a pretty good idea that I was going to use 8 HDD because I had a case in mind which would support that, and that seemed like a really solid number of drives for the use case I have in mind.
Not much personality in the choice for this mobo, but it’s been awesome! It’s an ATX-format board, which is a considerable factor on one of the next big decisions.
Oh boy, RAM is a complex formula. With CPU selected, you narrow RAM to a max DDR rank, and a max amount. The Xeon e5 supports up to 1.5TB of DDR4, but that doesn’t do me a ton of good. With the motherboard selected, I have these constraints to work with
8x slots for RAM total
ECC (Error-Correcting Code) memory can detect and correct data corruption
Non-ECC pretty much isn’t even going to be an option in the server market, and I”m reading most workstations will even be going to it.
Basically, no high-capacity of RAM without ECC
RDIMM vs LRDIMM
Okay, so what is RDIMM vs LRDIMM? To achieve higher density memory, LRDIMM (Load-reduced DIMM) is used as compared to RDIMM (Registered DIMM). To get max utilization of 512GB RAM with this motherboard, I would have to use LRDIMM. Technically LRDIMM is lower performance. In my research, I decided it’s probably 5% in efficiency difference, and rather negligible.
The next significant difference is cost. To max out RDIMM with 256GB RAM across all 8 slots would cost me ~$900 ($450x2). To get 256GB of RAM with LRDIMM cost $1200, so it’s a fairly significant difference in price. The reason I went with this route is because I really would hate to have RAM end up being my bottleneck in getting full utilization of this hardware. A few extra bucks upfront to guarantee that I can expand that to a full 512GB later with few other ramifications seemed like a very solid tradeoff. If you wouldn’t need more than 256GB, RDIMM would be better performing and cheaper.
With all of those other big decisions out of the way, the rest is far more isolated and doesn’t really impact decisions on other components. Because I am going with the all-in-one route with this build, storage is a huge consideration. Many moons ago a DC801 friend recommended this Silverstone case for a NAS, and it’s always stood out to me.
The immediate draw is that it has 8 hot-swap 3.5” bays. Finding that many bays alone is pretty rare, let alone in a capable and customizable format. I was sold almost immediately after getting back to it. My motherboard is a standard ATX format, so I knew I needed a full-sized case - but this is certainly smaller than any full-tower! It has solid cable management, USB 3.0, the 5.25” expansion bays up top.
Initially in my build, I thought I had 2x2.5” SATA SSD that I was going to use, so I currently have an expansion bay in one of the top slots to give me 4x 2.5” hotswap bays. This was a bit expensive ($60), and as I’ll outline later, I found out those 2 drives actually weren’t compatible with SATA, so I have no purpose for those bays other than my boot SSD. I can easily shove that anywhere else in my case, really, so I’m gonna have to decide if this still serves any purpose or if I can return it.
There are other interesting chassis options to go with from Silverstone if you can do micro-ATX motherboards. Otherwise, rack-mounted is probably the only way to go. If I was going to be doing a server + NAS, I likely would have bought a small rack to put in my closet.
Even though I pretty much had my heart set, it really took me for sure deciding on the Silverstone NAS case to decide that I was going to use 8 drives. It must be a simple equation from there, right?
HAH. HAAAHAHAHAHA. This is where you start to question who you are. What makes up the consistency of your soul. What happens if a HDD fails? Can you live with this data evaporating? Would it make you mildly upset, or would a part of you die? How much of a hoarder are you, really? Is it your data, or is your data you?
I’m learning about myself every time I think about storage. Reviewing how much I’ve utilized cloud services in the past to store stuff, but also makes me think back about all the random HDD I have had over the years full of documents, music, and other memories that would have been awesome to keep… but how much space would it be, really? I’m almost maxed out on Google Drive, but that’s only like 17GB… how much of that is compressed, how much have I NOT saved over the years because of that? Quite a lot, I know for sure. I’m sick of it. Archiving that legacy is becoming so important to me that I desperately miss my collections of old data backups, screenshots, chat logs, photos, and especially music.
I love HD content, so media streaming is going to be huge. I’d love to have a collection of FLAC for any music I really care about, but also get away from Spotify entirely which would require downloading a truly massive amount of diverse music because I rely on radio functionality so much to find music for my mood. I’ve never run a Plex server before, so getting familiar with how big file sizes are, the diversity of them… I really have no clue how to properly scope for that.
Plus we have all the unrest lately, and crazy shit of the last couple years. Archiving websites, social media feeds, and entire platforms is a reality! How cool would it be to have space to volunteer to ArchiveTeam for when they’re faced with immediately needing 50TB of space that’s needed, from out of nowhere. It’s probably also not great to entirely rely on torrents retaining archives of videos from protests and such, but make sure at least a few people have those datasets backed up on hard storage. I want to participate in those activism efforts, and find my own niche interests to archive and preserve.
How much am I willing to spend? The biggest cost of these builds by far is storage, because the more that you get, and the more of it you actually access, the more you need it redundant and highly available.
How redundant? Because I don’t have any other dedicated storage to rely on, I wanted to do Raidz2 on this build. This gives me 2 lost drives worth of redundancy before I lose any data. If I get a dedicated NAS, I would probably duplicate important data across both. That would allow a lighter amount of redundancy and restore some of my total capacity for use.
Backup plans? This is my core infrastructure, I have a huge need for availability, reliability, and recovery. My VMs need to be redundant. Their storage needs to be redundant. Their configs, backups, snapshots, etc. all start to add up.
Simulation environment is another major component. I’m a security guy, and monitoring/analysis is a huge component of that. That requires data - and monitoring data tends to be quite verbose. To monitor my home network and do a lot of cool security analysis stuff could be up to a couple TB of space if I go all-out with full PCAP, long-term datasets (1-3 years), etc.
So I have redundancy, simulation needs, archive, backup, service needs, overhead for stuff I’ve forgotten, a cost concern, and then runway to expand because data size only balloons exponentially from here. Simple!
There’s really only 2 choices when it comes to buying HDD straight for NAS purposes. Seagate IronWolf, or Western Digital Red. WD were significantly more expensive, and I honestly don’t know of any difference in benefits between the 2 of them. The IronWolf have plenty fine reviews
4TB - $26/GB ($105)
6TB - $26/GB ($155)
8TB - $25/GB ($196)
10TB - $28/GB ($280)
8TB - $30/GB ($240)
10TB - $28/GB ($280)
The cost difference between different capacities is negligible up to 10TB, where you then have to start to decide whether the extra capacity is worth the increase in the base cost. I did a back of napkin calculation that pointed to me realistically using probably 30TB of storage on this server. Here’s a few of the allocations that I sketched out with an extremely surface level investigation… so it will be extremely interesting to see how it all works out.
Active Data - 2TB
Active VMs - 1TB
Test Windows Environment
Streaming 4K video - 35GB files (~20GB / hour)
Streaming 1080p video - 8GB files (~6GB/hour)
Resting Data - 20TB
Photos / Videos
Digital Media (Plex)
VM Disks- 5TB
Log Backup - 1TB
Backups / Snapshots
Internet Archive Shit
Going with Raidz2 for proper redundancy, and about 30TB of space sounding right, that leaves me with 6TB drives. Just doing a simple calculating for the striping, 46TB of total space becomes 36TB capacity in Raid. With further overhead of ZFS, and actual sizes of disks considered, I ended up with a total of just over 30TB available to my Proxmox host - that sounds just about perfect! Any less I’d skimp on allocation to resources. More than that and I’d be concerned about having thrown money into storage I don’t really need.
Another consideration with ZFS is that the more space you have, the more RAM it uses. An additional component is that the more free space you have, the better it performs. That’s a significant tradeoff to consider when you’re scoping out the size amount, and I think I landed on a perfect middle ground.
Caching is another topic I had to consider in this. ZFS is amazing technology I’ve only barely scratched the surface of. There are so many concepts to learn, but few that are necessary to really know in order to hit the ground running with it. Proxmox has great ZFS support, so I kind of leaned on it for a lot of it. However, I did learn quite a bit of the fundamentals. One of which is caching!
ZFS has a philosophy of multiple performance tiers when it comes to data availability. The reason that it is a RAM hog is that it uses ARC (Adaptive Replacement Cache). When you’re writing data to storage, it goes first into ARC. It’s also a low-latency source for reading from a ZFS pool. The system first looks to ARC when retrieving data - obviously super fast because it’s RAM!
ARC is balanced between MRU (Most Recently Used) and MFU (Most Frequently Used), which is complex as hell. There’s all sorts of algorithms and logic that goes into how ZFS optimizes the caching in this.
I honestly have no idea how to properly scope what ZFS use is going to look like. The workload for this server is quite unknown at this point, but I expect to really push it to the limits in experimentation. However, because it’s a virtualization host above anything else, RAM availability is important!
To help balance that, and to experiment and learn ZFS more deeply, there’s also the L2ARC - Level 2 ARC! It accelerates random read performance on datasets that may be bigger than what the ARC can support - when I’m thinking data archiving, security datasets, data capture/processing, I’m thinking in the TB level so some additional caching would certainly be useful!
In addition to a cache for L2ARC, ZFS also has the ZIL (ZFS Intent Log), or simply known as ‘log’. A synchronous write is when the OS is writing application data to disk. The data is first cached in RAM, but needs to ensure that is written to stable storage before moving onto the next system call. If that’s relying on a spinning HDD, that could be a lot of latency. Having a dedicated SSD for ZIL means it can write from RAM, to an SSD, and then from ZIL to the hard disk at its own pace.
You can specify particular drives in the system for each of those tasks - cache, and ZIL. To reiterate my philosophy with the build - a huge component in going with these advanced capabilities is to really push myself, learn these technologies better, and experiment with performance. I am NOT certain I’ll actually utilize these optimizations to their fullest, but it was a worthwhile investment to try and learn from it, so I’m super happy with what I ended up with
There was really only one major brand to consider when shopping for new, datacentre-grade SSD. The other best option I found was refurbished - Intel P3600 datacentre PCIe NVMEe SSD, and those were recommended specifically in a few blogs.
Kingston Data Centre SATA SSD
480GB - $0.25/GB ($120)
960GB - $0.18/GB ($178.50)
1960GB - $0.17/GB ($335)
1200GB - $0.16/GB ($200)
In my research it was recommended to use SSD for those purposes, obviously. However! Standard consumer SSD are notorious for not putting up to heavy load very well. It really costs to get enterprise-quality, datacenter-grade SSD which can deal with heavy IOPS load. But they exist! And they aren’t insanely out-of-touch. They certainly weren’t cheap, but $200 for datacenter-grade equipment from Intel is tempting, and I figured 1TB each of NVMe would be perfect amount of runway to use for either caching/log because my “active” data calculations came out to be probably 1-3TB max at any particular moment. I can adapt with this over time as I find out my needs change.
The P3600 are AWESOME… but presented one unique problem for me. That whole “enterprise-grade” thing comes with its own learning experience. When I was researching the P3600, and not being knowledgeable in the hardware world… or smart… I foolishly looked past the obvious signs that I needed to take a closer look. The entire P3600 line is pretty diverse between capacities, and connections. Some are 2.5” drives, others are ½ height PCIe.
I abandoned cognitive dissonance, and convinced myself that these were just very durable 2.5” SATA drives and there were also PCIe versions that would slot into an expansion slot and for some reason they just didn’t list them on the website. Yup, that’s definitely it.
I kept seeing they were PCI drives, but ignored it. It doesn’t help that they fit perfectly into a standard 2.5” hotswap cage, and at a glance, their connector looks like SATA… but it’s NOT! After I put them in, and my system didn’t detect the drives, it took me a pretty disturbing amount of time to figure out why they weren’t working. I was at the point of looking up extremely niche issues in BIOS such as VT-x compatibility with IOMMU and NVMe drives or something. It turns out they just weren’t even connected to the system. Mega derp moment.
I then had to research what type of drives they actually were - even reaching out to Twitter at one point - to identify what the hell this port was. Eventually in the full datasheet, I noticed this little detail “8639-compatible connector”. Googling that, I found it’s more commonly known as U.2 (SFF-8639), and there are PCIe adapters for them! I found 2 of them on Amazon Prime, and had them delivered the next day… insane. Again it was WEIRD as hell to me that Amazon was the best distributor for individual, niche server components.
I’m not alone in my idiocy! I figured that this was going to end up being some rather uncommon enterprise crap… and sure enough, on Wikipedia it’s described as “developed for the enterprise market and designed to be used with new PCI Express drives along with SAS and SATA drive”. That day I learned! Then I put them in the adapters, slotted them in, and was up and going.
The last thing to figure out was a boot drive for the host. This will contain my hypervisor’s core system - Proxmox installation, core OS maintenance, ISO images, scripts, etc. It’s generally recommended that you go with a simple thumbdrive storage because it doesn’t need to be super fast storage, just reliable and large enough to install an OS to. However, I wanted to get a nice and quick reliable drive, so Evo 860 is the standard. And I figured I may as well get a 1TB to give me enough room for its own logging, maintenance resources such as scripts and backup data. I could have saved a few bucks going simpler here.
My motherboard has tons of expansion capability, which is a damn good thing because I’ve used almost all of it!
Here’s what I currently have in it
Gigabit NIC Ethernet Expansion
The motherboard I got does have dual Gigabit NICs on it, so I have 2 Ethernet ports I can use… but if this is gonna be my network infrastructure as well, that just won’t do! I need a minimum of 3 Ethernet ports for a straightforward deployment.
I installed a 4-port Gigabit Ethernet card. Later on it very well could make sense to upgrade this to 10G, especially as I grow my home network. It will be interesting to see how this evolves over time, but for now 6x Gigabit ports is very useful
U.2 to PCIe x4 adapter
As I covered before, my 2.5” SSD were actually U.2 interface drives and needed these PCIe adapters to connect up to the system. Luckily I have plenty of PCIe slots!
This motherboard doesn’t have video out, or IPMI, so I needed a GPU to get the core OS installed and to do any network troubleshooting and stuff before I could get an SSH connection back in - or if I lose it.
I also figure that there’s plenty of fun stuff to experiment with PCI passthru of GPU to VMs for cracking or calculations or what have you
There was nothing special about my decision here, really. I wanted semi-modular because I really only would be connecting CPU, mobo, and SATA. I was originally only aiming for about a 500W, but the market was pretty limited. I wanted as efficient as possible because it would be on 24/7 and probably hit a pretty high use threshold. I could go with a full-sized ATX which was nice. Overall, between calculating the difference in efficiency and difference in price between nicer units and their power rating, Bronze-certified seemed like a fair tradeoff.
I ended up just searching for semi-modular, high-efficiency, looking for what had decent reviews, and quiet. I ended up choosing mine because it had a green fan, even though it’ll never ever be seen /shrug. It was 700W, which given that I’m currently planning on leaving my 970 in and I got that higher-end CPU, it might work out better in the end
I think I’m over watercooling. I went with it for my desktop build because I had great aspirations of doing a super tiny gaming PC. That all went out the window and I ended up with a smallish mid-tower case that has a glass case. Pretty much the antithesis of what I wanted, but finding a case without extreme compromise was god damn impossible so I went with what was solid and made me pretty happy.
Air cooling is so efficient, and cheap. I can get basically the highest tier CPU cooler for $60. It’s from Noctua, extremely well-engineered heatsink, beautiful, adjustable for RAM clearance, and they just have hands-down the best fans on the market - silent and huge air circulation.
I went Noctua both for my CPU heatsink, as well as replacing all 3 case fans with high-performance, low-noise fans. Because this will live in my closet 6 feet from where I sleep, noise is a huge consideration! My old lab server had tiny 60mm fans that were jet turbine volume levels…. I really needed to also replace those with higher-grade Noctua fans.
All put together, this is what cooling looks like! 2x on HDD, CPU cooler, and rear exhaust fan.
This wasn’t anything I planned on. Luckily, I had built a new gaming PC last year, and had my original one in the closet. I yanked the EVGA 970 GTX from it because I needed graphics output to install the core OS. Sitting on the ground with a keyboard and USB-powered HDMI screen, I got the core OS installed, and the server put on the network.
If you’re planning to build a server, I would strongly recommend making sure you have an old GPU on hand, or buying one on eBay or something for extremely cheap just to make this backup management simple.
There’s nothing quite like getting to yell at your own computer for your own mistakes.
Scientists say it is 1500% more effective in getting a response from your machine than just yelling at a monitor to get through to the cloud
Welcome to my first blog post in too damn long! I have been working on my home lab for a few months now. I previously had things built up on an ESXi environment, but then there was disaster. As this blog is evidence of, I’m apt to drop out of things for a while at a time. And I did the same for my server. I boot it up one day… and everything was terribly broken. I spent an entire 2 days wiping it clean and rebuilding a KVM box from scratch, with some good Graylog and Suricata and SDN virtualization fun!
I learned a whole lot and am very happy with how things have worked out. I’d love to build out a full-scale, crazy powerful server based on this design. Before some recent news that was going to be a soon reality, but savings will have to be reallocated for the moment. I hope you enjoy a brief overview of what my home lab consists of. I will be releasing more specifically technical guides here in the near future. I’ve built PLENTY on this server given the specs I’m working with, and been able to build out some cool stuff with Graylog I’ll be writing up as well as my existing simulation lab with log collection details and all that good stuff.
I’ve got a whole lot of detail about my build below. Thanks for stopping by!
This has served as an awesome little testbed server. I definitely need to bump up its RAM a bit more, especially now that DDR4 has gotten a lot cheaper since I first installed what its got. It’s a small little guy and I’ve had it running for a couple months in my living room and I don’t even realize it’s there.
Perfect mobile or test bed server. I’m excited to be able to build a full-size rig for home and use this as a rapid development box, eventually using all sorts of fun virtualization things in BSD and what not.
Physical NIC / IP Addresses
eno1: Virtual Tap - OpnSense [Gateway - DHCP]
eno2: Virtual Tap - OpnSense [GuestDevices - 192.168.180.1 / 192.168.80.1]
eno3: 192.168.180.11 - DHCP (Virtual IP - Connected to local switch -> GuestDevices)
eno4: 192.168.1.138 - DHCP (Home LAN)
Primary Network Areas
[HLAN/ELAN] Home LAN / External WAN Gateway
Connected to home LAN as ISP
[PLAN] Physical LAN [PLAN]
Physical Guest Devices
Virtual Machine internal networks
Home LAN Backup
This is a really interesting setup, and I really like it so far. The primary network for even the host OS is based on the virtual OS. I have a virtual router connected to a physical NIC, and that is attached to my home network. It gets assigned a DHCP address by my home router, and that becomes the WAN gateway for my virtual LAN - this is the HLAN. If it was mobile, and I attached it to a business or other external network I’d consider it an ELAN connection.
I have a 2nd physical NIC on my server that is attached to my virtual router as well. This acts as my “Guest Devices” network - a PLAN. One intent of this server is to have multiple people connect and/or with laptop use, so I have a physical network that has a switch and an AP. I connect my laptop to this AP, which is connected to the physical switch. That switch then connects to the virtual router on a “Guest Devices” network described in the next section. This is my primary physical bridge into the virtualized network environment.
IPMI was a lifesaver. I’m a complete noob when I do this stuff and messed up a bunch of things while figuring out networking and some niche Linux networking screwups. Locked myself out many times and struggled and panicked to find cables to hoo up my server to a monitor… such a silly situation to have in 2019, but there I was. That was when I learned how IPMI worked and it was incredibly useful!
Software Defined Network
2 Virtual Bridges
Connected to PLAN
Connected to HLAN
The real magic was in how simple OpenVSwitch was to setup. It sure required a lot of tinkering, but I got there. I had initially been playing with linux bridging but it felt limited and/or overly complex to do things like port mirroring for network monitoring, which is a primary purpose of my monitoring setup.
I switched to OVS, and I transferred a good bit of knowledge from Linux bridging, but it felt a lot more true to physical switching. You create a bridge, and attach ports to those bridges. Those ports are then very simple to attach to virtual machines. It’s simple to manage all of this, get detailed information and match information up since you really cross a lot of transparent bridges when it comes to virtual networking like this.
The router VM is also a pretty fun setup. I decided to run an OPNSense router which would manage a few primary networks
My network concept was very basic. I mainly wanted 3 realms – servers, clients, and guest devices. I created SDN LAN networks which are managed by my OPNSense router VM. I’ll go over all of these and also recap a little of what we covered earlier.
Windows clients, attacker VM, administration machines. Anything that an actual user would sit at and use.
I also had a very useful need to access the server remotely. I created an OpenVPN server via OPNSense and assigned access to a few clients.
This is the network for physical devices that I want to connect into the virtual environment. I also loop the hypervisor host back into this so it gets an IP address into the virtual environment. While everything else has been near perfect, it’s been super unstable for some strange reason so I don’t like to rely on it.
Network to house all appliances, network services, WIndows servers.
This is connected to my home network, gets a DHCP address, and is the primary Internet gateway for all of the virtual machines, the hypervisor host, and the guest devices
There’s definitely some wonky stuff when it comes to routing for my hypervisor host, but overall this setup works wonderfully and reliably without any significant issues I can remember.
I can do some serious firewalling and virtual routing stuff with this setup, and have played around with it a lot. I’ll cover more of that in my DumbyLand lab setup since it’s more directly related to how my VMs are setup and used.
Debian Linux auto-provision
I previously used ESXi on this box and while it was good experience, I’m so much happier with how KVM has been working out so far. Native CLI access is fantastic, and being able to throw up a simple Linux VM for some of the graphical stuff is simple enough with a minimal BunsenLabs build.
Libvirt tools felt very organic for management and creation. The scripabiity of it was great, and even though it’s babby’s first scripts I was able to make some cool ones that made my life a whole lot easier.
virt-manager is the GUI interface for remote console and everything good like that
CLI control is my favorite. It’s super convenient to use virsh to control all aspects of the VM operation. My install scripts are CLI-based. For WIndows, I start CLI and finish in the GUI Console but that’s just the nature of Windows.
I have a fully-automated script that with a Debian base, you go through a simple menu, and are able to create a fully-automated install
As mentioned before, my core virtual router is OPNSense. It’s been awesome and I am excited to continue to use it and learn more about it. Not much to say about it aside from the networking aspect, it’s just fine as a virtual machine.
Log Collection / SIEM - Graylog
I am doing full IDS and log collection across this environment. One of the most significant resources I have assigned is for my Graylog instance with 8GB RAM. I’ve tweaked this machine a lot, but so far this has been pretty space effective for the sparse storage I’ve dedicated so far. There’s a lot of JVM Heap Space stuff that has required some tuning.
I pump a bit of network traffic to it from all the machines sending logs and so far it seems like I haven’t had any problems with reliability of communications or storage.
IDS - Suricata / OpenVSwitch Port Mirroring
It’s very straight forward to mirror traffic on an OVS bridge to another vport on that same bridge. I created a VM that has 3 virtual interfaces, and attached 2 of them to the server0 bridge and the other to the client0 bridge. 1 of the server0 interfaces is a management port and provides standard LAN access. The other 2 get a port mirror from their respective bridges. Suricata is configured to listen on those 2 interfaces, and sniffs them for IDS detection. This all works great!
I have no honest clue how to really do Windows. While I have worked extensively with Windows in my life, I’ve never been a true systems administrator for it. I haven’t built active directory from nothing. This is going to be me documenting my stumbling journey through the wide and wonderful world of Windows environments! Let’s just jump straight in.
A Server and a Workstation
Let’s start with about as basic as you can get. A standard Windows 2016 server, and then a Windows 7 workstation machine.
We’re gonna get rolling right with the domain controller.
Followed this quick guide to get RDP enabled. Need to work on restricting this access later.
Set the server to have static IP address as documented above
Connected back via RDP to that IP once I saved settings. Yay!
Created a new forest
Set DSRM password - “dumbyDSRM1”
NetBIOS Name - DUMBYLAND
ALL DONE! REBOOT –
And we have a domain controller!
Next let’s get a PC up and going. This is going to be my primary interface into the environment. I am going to use it as if it were my client machine for administering the Windows servers and everything else.
This is just a plain ol’ Windows 7 computer at the moment. Let’s get it hooked up into the domain, and create myself a user and domain admin, shall we? For now, I have to do a console authentication directly to the server until I can get this client machine hooked up to the domain
Now, let’s pull up the Active Directory Administrative Center
Cmd.exe > dsac.exe
On the left side, I select dumbyland (local) > Users > New > User
And they’re created! I feel like we should next get going on the client machine. Here’s a vanilla machine, and I have no clue how to do this right… but let’s figure some stuff out.
In the System Properties window, there is the option to “Change” domain settings. Select that. It will prompt you to enter credentials – I used the dumby-user account I just created to simulate a new, regular user!
Okay this just isn’t working. I have no clue what else I need to do. I update my DHCP settings to ensure that 192.168.2.10 (domain controller) is the primary DNS since at one point this was giving me DNS error of some kind. Now I’m getting a logon failure error
I figured that perhaps the issue is that I was using an account that I hadn’t yet logged in with before. To get around this, I used the primary administrator account to sign-in for the first time. My prediction seems correct… and I signed in for the first time! This message was a beautiful sight.
After this, I reboot the PC, and RDP back in to it. I still can’t sign-in as the normal user, but we’ll get that figured out soon enough. I replaced this dumby-user account with the administrator, and successfully signed back in.
Well, this is at least giving me hope!
I can’t get a sign-in to the regular user to work, but I’ve got a canvas to start from. My main goal right now was direct administration of the domain controller since I’m doing everything through the ESXi web console – not exactly speedy and reliable! There are also some issues with directly RDP into the DC, so I’m gonna be using my Windows 7 VM as a ‘jumpbox’ into my domain. (Laptop RDP > Win 7 VM, RDP > Server 2016 VM)
RDPception! Here is my laptop RDP into the Windows 7 VM. From that, I RDP into the domain controller VM. Hooray!
For some reason now that I have direct access, password change and logon for the user worked. I opened up a new RDP session, signed in, and was prompted to change the password.
To get away from the default administrator account, I’m going to start using this new administrative account. First things first, it’s complaining about having Remote Desktop Users permissions – let’s get that going right now.
I opened up my existing admin RDP access through the built-in Administrator account, and right-click my dumby-admin account, and added it to the ‘Remote Desktop Users’ group as well as the ‘Domain Admins’ group. I then signed back in with my dumby-admin account, and kicked out the other RDP session. Hooray!
And now I RDP back into the domain controller with my dumby-admin account from my client PC. WHOOO!I absolutely love hitting these milestone points in little projects like this. Another quick update was to add my user account to the ‘Remote Desktop Users’ group so that I can access it from my laptop directly using my client account.
For some reason… this isn’t working. I’m not sure if there’s a delay in it taking effect or what, but doing a domain sign-in to my ‘dumby-user’ account via RDP is still giving me the permissions error, whomp.
What I’m reading right now is that this is because of permissions on the local computer I’m trying to access, since I need to update its own members of the “Remote Desktop Users” group. It’s worth a try! Let’s first make the change directly, and then if that works, I’ll get some experience with Group Policy and distribute that change to all future computers as well
Okay, so first things first. Search ‘users’, and open ‘Edit local users and groups’. Go to ‘Groups’ > ‘Remote Desktop Users’. Select ‘Add’, and then I searched ‘dumby-user’, which then automatically discovered the domain user. This all makes perfect sense now! You must give individual permissions on each endpoint of which users can access it via RDP, but that user can be a domain user. Good to know!
It doesn’t seem like I needed to do these permissions for my admin account… perhaps because it’s a domain admin? An interesting note, and if that’s true, it prevents me from needing to do any GPO stuff for now, it seems, since I’m only worried about my own user account being able to access this VM. Hooray!
Now, I have a domain authenticated user machine! From it, I can use it to access the DC via my domain admin account – dumby-admin. This is how I was hoping for it all to work! Very happy with all of this so far.
Performance of my server during all this….
Everything is super snappy. RDP is working fantastically – console works a LOT better since I did the driver optimization fix for storage.
A 2nd Windows Client
To start having some real fun, I want a 2nd client that can act as a vulnerable machine. Let’s see what trouble we can get up to! To make things easier, I figure it’s worth learning how to clone VMs with ESXi. I found this guide to do so:
And that’s basically it! I repeat my instructions above to get this new machine joined to the domain, and now I have a nice little set up going! Next up is to introduce some logging, and then I think I’ll have some fun setting up attacks and reviewing that information.
My primary purpose for DumbyLand is to build up a sandbox for simulating different types of IT environments. A significant part of this is collecting logs for analysis to build up my blue team skills further. Architecting, engineering, and deploying SIEM and some basic controls are a huge part of that.
I want to be able to play around with different technologies, but to get things rolling, it’s best to just choose 1 thing at a time that sounds fun and interesting to set up. Graylog is something I’ve been hearing a lot about, it’s definitely something worth checking out and learning!
Instructions to get going for Graylog seem simple enough! Download the OVA, import it, start the VM, and wait.
It starts out with a simple setup of 4GB RAM, 20GB disk, and 2 CPU. I dig this, much preferred to SIEMonster and others that demanded 5 separate systems to get started, most requiring 8GB RAM each! Not something I can do until I set my server up with more RAM for a few hundred more bucks… which I’m hoping to hold off as long as possible to see if this price war lets up.
Enough of that, though. The system got a DHCP lease, and I hit it from the web interface.
Boom! Simple as that! Login with default admin/admin, and we’re rolling.
Send in first log messages
Let’s do it! We’ve got a basic Windows environment already going in the lab, so let’s pull it in.
From reading that community page, I understand that nxlog is going to be the best option for forwarding logs, and I’m always down to learn something new. Unfortunately, Graylog just leaves us with a “they’ll pick it up from here”.
Over on the nxlog website, there are downloads for the community edition for Windows machines. Documentation seems solid so I gave it a pass through reading about features and the basic functionality of it. This will be great to review later in-depth for a good understanding, but I think right now it’s most important to just get it up and going to see what we’ve got.
We’re going to go with a basic agent-based collection (Section 3.3.1) which should require something running nxlog on the Graylog server but we’ll get to that when we need to.
RDP in to my Windows desktop VM, and then to my DC.
Since I don’t want to abandon all good security practice, I’m not gonna go around simply disabling these controls until I have good reason to. To allow this download, I had to add nxlog.co to my trusted sites. Search for ‘Internet Options’, then to Security tab, Trusted Sites, and then Sites to add the URL for the agent. Phew! But the rest was pretty straight forward. Downloaded the .msi, ran it, all good!
After the install finished I should have expected the readme to pop up, but it didn’t. Back to the documentation! The default installation location for me was “C:\Program Files (x86)\nxlog” which contains the conf folder.
In the documentation (Chapter 33. Microsoft Windows) there is a section for advice/recommendations/setup for Windows logging. For now, I really only care about Windows EventLog which there’s a jump to (Chapter 80. Windows Event Log). From there, I jump to (80.1. Local Collection with im_msvistalog) which is used for any system running 2008/Vista or later.
They provide an example of collecting all Windows EventLog data in JSON, written to a local file. I dunno exactly where we’ll go from here to get it to Graylog, but it’s at least the right direction! So let’s do that. Back on my DC, I’m gonna leave basically all of the conf alone, and just add the snippets that are included inthe docs here.
I forgot to open Notepad with admin privs, but also realized that even though the basic config file is super basic I should create a conf backup… good habits and all. Alright!
Now that we’ve got some basic configuration which we can revisit later, let’s get it on (12.1.1. Installing Interactively) and we’re up step 5. to verify the configuration, which passed. Now I opened services by searching ‘Services’, opened it, and then started the nxlog service.
The test config points the json output to “C:\test\sysmon.json” which I hadn’t created the dir for before starting the service. Maybe that’s an issue in getting it to write, but not sure. I double-checked config compared to what was in the example, and then also pulled up the logs for nxlog itself (C:\Program Files (x86)\nxlog\data\nxlog.txt) and saw that I’m getting a message of “WARNING not starting unused module eventlog” which makes me think that even though I defined an input, it doesn’t know what to do with the information.
Sure enough, there’s also an additional error message “WARNING no routes defined!”. I took a quick look at an example from Loggly for nxlog https://www.loggly.com/docs/logging-from-windows/ and found an example route that I used to copy and make one for myself. They have a section “internal” that I’m not sure what it does, but is an additional data source to investigate later.
This is my full conf for a test setup working. Hooray!! It’s very straightforward and simple. Eventlog module pulls down the information, and then uses the defined output ‘file’ to write it to a file all in the magic of json. We’ll see if I have any troubles but at least this is on the road to progress!
I need to configure my Graylog server with a static IP so that I can start to forward stuff to it, then I’ll configure nxlog to forward over the network. Back in the Graylog webconfig, I poked around the options, but doesn’t seem like it will be that easy, which is understandable.
They instruct you to go about it the standard way of editing /etc/network/interfaces and configuring there, as I’ll do now. Since you’re modifying the network connection, directly console in via ESXi and login with ubuntu/ubuntu, and you can follow the instructions below but it’s a standard configuration of the interface.
Logged back into the web config at 192.168.2.20, woohoo! This is all set. Now just to update the configuration on my DC to push towards this IP address.
Slight pivot on configuration. I’m having trouble with Graylog inputs. I get the general gist, but get these “failures” on trying to start the inputs. Not eactly sure what’s wrong, but going to attempt a new thing here.
This is a “content pack” which is a bunch of pre-configured stuff for Graylog, this one specifically for AD security auditing. It seems pretty damn nifty, and the Github includes a nxlog configuration to use. Import the content pack via Graylog webUI, and it will create the input and some dashboards and a couple other things.
Still having trouble getting this new input to work, I did what works best for anyone… I did a nice little restart on the Graylog host. Running ‘netstat -tulnp’ showed me that the designated port 5414 was running, but only on udp6. Perhaps this is just me being misinformed on how things should look, but I’m now the proud recipient of AD logs!
This is really getting my server to spin. It seems like Graylog is starting to choke up, so I’m gonna give it a minute since there was a sudden spike in activity overall and see if things normalize some.
And overall, my server is feeling like this with that
After giving it a while and still not getting a response when searching for messages in Graylog, I figure it’s time to look for more help, perhaps bumping up its CPU availability some.
I ended up not having to do this… yet, at least. I went back and reviewed the configuration steps, re-performed the basic graylog-ctl stuff, and then did the reconfigure command. I went back to the web UI, did a search against the stream from my Windows log input, and…
We have liftoff! Whoooooooooooooooooooooooooo!!! Now that we have a SIEM up and going, I can work piecemeal to have them ingested into Graylog as I bring services and additional machines online. This is so very exciting for me!
Next up I’d really like to be IDS, but I’m really not sure how I’m gonna get traffic mirroring set up on my box since I don’t have vCenter stuff rolling yet. I COULD just set that up, but that seems like a lot of effort for a single (important!) feature of this lab.
Also interestingly enough… now that I got everything working and am running queries against the data, my CPU usage dropped dramatically! It really seems like something got screwed up in the configuration process and the reconfiguration kicked everything back into gear.
For a minute I was really concerned that I was already making my box choke! WIth just 1 input into the SIEM and no heavy queries or anything, already maxing it at a sustained 90% leaving my box at just shy of 50% utilization had me scared. Glad to see things are REALLY starting to work out! Even at the most desperate of times, my server hit a maximum load of 57.4% and an average of 37.44% over the last hour where the vast majority of the time it was in that CPU % funk.
I think that’s pretty god damn impressive, if I may say so myself. To review, this is currently what we’ve got running:
Windows 7 - Client Machine / Admin Machine
Windows 7 - Victim Machine
And let’s keep going! Since things are working well right now, I grabbed a snapshot of Graylog as well as my domain controller. My progress so far has been quite pleasing!
Reviewing OPNsense configuration again real quick, I was reminded that it has built-in IDS features! Doing light digging, I discover it’s based on Suricata, cool!
Let’s try it out! To make the most use of the work I just completed, and to have a bit more fun, let’s also get OPNSense set up with sending its logs to Graylog. This is considered “remote logging” in OPNSense terminology and to be honest was a pain in the ass to find proper documentation for… I never found any.
To get started, navigate to “System > Settings > Logging”
Seems like it should be straight forward, but let’s see!
I have configured Graylog with the following content pack for PfSense since it should work just fine for OPNSense
Naturally… things aren’t straightforward. Now my Graylog isn’t getting Windows logs, what the fuck! Back to basics with troubleshooting this. Everything seems good with nxlog configuration, and the DC can ping my Graylog server. Given my inconsistency earlier, I’m gonna try to just restart Graylog and see what happens.
Back to inputs, pull back all messages for Windows, and sure enough… there they are! There’s definitely some inconsistency going on with my set up. I believe it’s likely Elastisearch since all the data going back over an hour is suddenly available in the UI to search for… bah! Oh well, the joys of simulating and labs!
Guess what else we’re getting now! That’s right…
Woot! That content pack has a Pfsense-Logs input with extractors for a few different things. It pulled out the authentication logs in opnsense with a good source, so I think that should definitely satisfy my basic needs!
This screen makes me happy. Very happy indeed! Quite satisfied with how it’s handled the load so far, even if some things have been a bit funky. Maybe ECC RAM is gonna be a help? I’ll spend a day on stability and load testing in the future once more things are running.
Now that I’ve got a SIEM and IDS features to play around with, I can start to bring on additional services and expand capability and scope! I have plans to bring on representations of development and production servers. In the meantime I also really want to bring up some vulnerable servers to play around with pentesting, maybe try out some brute forcing fun, throw up Mimikatz as well as some Icebreaker (https://github.com/DanMcInerney/icebreaker)) and other MITM goodies. DNS poisoning, RAT, some general C2, data exfil… all that’s good stuff I’ll be getting to as I build this out!
The last few weeks have been extremely difficult. Let alone the last year… I haven’t done nearly as well of a job as I would have liked on keeping up with my blog, but I’m not short on topics to discuss or work to share! There was a stimulus that kicked me into high-gear project mode. Given a choice, I wouldn’t ever choose to go through it again. With the passing of our hacker brother d3c4f, I hit that high-gear mode. I gotta step up, I gotta do more, and I gotta contribute back to the community I hold so dear.
To this end… I introduce…
A home lab is a crucial component of the modern IT professional, and especially so in the security field. By nature, security is cross-discipline and requires a diverse skillset. Unless you’ve got 10 years of IT experience, it’s highly unlikely that you can cover all the ones directly important to your specific position or specialty. This becomes a fundamental issue for both those experienced in the field looking to expand their teams, as well as those new to the field who are finding their way in.
I know this first-hand. I’ve got a solid 8 years of professional experience, and I’d say about 6 of those were strictly IT. In that time, I never got to administer or experience an enterprise environment. I had little exposure to hardware, or complex configurations. I hadn’t had a reason to understand a full-blown virtualized environment. Beyond a home network, some small IT shop deployments, and personal projects my systems administration skills are quite weak! I’ve read Windows logs, but I don’t know exactly what stimulus leads to which logs – being able to directly simulate a user failing to login, or brute forcing a Windows admin account is something I am in dire need of at my current stage of career.
So let’s get to the meat!
I needed a lab. I know that with my personality and skill level, I would only ever fully utilize such a lab if I were able to take it around and share my experience, working with others to accomplish and review the work. Portability was key! Naturally, the market solution to that is the NUC. Extremely capable, extremely transportable.
Recon Infosec created a mobile DFIR lab, and went this route. 3x Skull Canyon NUC, and 1 more machine I imagine is their primary network interface. The NUC are sleek packages with passive cooling, but i7 and 32GB DDR4 RAM. Quite a powerhouse, and you can slap 3 into a nice little case ready to go.
And doing what I do best, I leaned on others for further information.
And, naturally, some jerk got me sallivating for something I had no clue even existed! Thanks @M3atShi3ld you jabroni. He casually threw this out there… and I instantly went into a downspin thinking of all the possibilities – of every possible pro/con I could wrap my head around.
Come 05/02, and I’m speeding home from for work after receiving delivery confirmation of a very sexy box.
There’s an E300-8D in that box. So, why did I make the choice for this box?
There’s a few things that led me to choosing it.
High-price, RAM limited. The Skull Canyon NUC are interesting, but max at 32GB RAM. While they have an i7, I’ve heard for the vast majority of people simple labs like this hardly touch the CPU, so that’s essentially a wasted resource
Given this criticism, it means that to scale up hugely, you need multiple boxes. The cost on this would add up quite quick
With scope of boxes, comes scale of management. Recon Infosec was already up to 3 of these NUC for their lab. That’s a lot of hefty management which is probably cool to implement and good learning, but not something I wanted the hassle or cost of at the moment
Passively cooled. Again, I have no clue what the real resource impact will be on these boxes, but if it does end up chewing CPU I imagine they get quite hot
I heard from multiple people that reliability on NUC was not great
Supermicro Embedded E300-8D
Truly enterprise-grade gear, which I’ve never had the pleasure of interacting with
Xeon processor. I figure for the kinds of loads it will see, it’s optimal to the i7. I have no technical backing to believe this claim.
There are other Supremicro embedded server options which I probably should have done a little more research on. This was a slight snap judgement, but mainly because I was excited to get something significant and hefty
6 onboard NIC. Again, I have no reason why I can justify this and with fancy enough configuration and automation I imagine it would be no big deal to utilize a 4-port server, but I really like the scope and vision that having this capabiity brings to me
128GB RAM max on this box. I can go to 64GB RAM without dipping into extreme ECC price territory like we’re seeing with DDR4 at the moment
Honestly, biggest downside right now is RAM price. I only opted for 32 now hoping that by the time I scale to 64 it will be better, and by the time I’m going to 128GB there won’t be a whole lot of financial constraints on my needs
Has m.2, mSATA, and a 2.5” drive for storage. I’m not exactly sure what options are available on the NUC, but I figure they may max out with just an m.2
Fans are surprisingly quiet. I’ve dealt with some blade servers and R410 and what not. This is NO jet engine like those other ones! It spins up loud on boot, but soon rounds down to a very reasonable volume. I keep it in my living room while hacking away at it and it hasn’t bothered me so far.
SFP+ ports! Tons of gigabit for teaming, huge huge network capacity ability if I ever need to share lots of data, or can connect to big Internet pipes
The Skull Canyon NUC is $520 on Amazon at time of writing
E300-8D is $650 on Amazon
RAM is exactly the same for both (up to 64GB)
So, assuming that CPU doesn’t bottleneck and I scale up to 64GB RAM (which I assuredly will), I’ll have effectively saved myself $400 or so from buying the NUCs on base hardware alone.
That’s not to mention savings on the physical networking infrastructure, since I have so much on-board networking capability
Potential to be able to scale up to 128GB RAM if I don’t hit any other bottlenecks
Uses DDR4, so if I do go to ECC it will be modern gen RAM that is re-usable or tradeable elsewhere
It’s smol. Quite smol.
Breaking Into the Chassis
Initial access was quite straight-forward. RTFM. Remove 2 screws, slide cover plate back. Expose all the juicy inners!
Start with RAM since I know you can’t screw it up, take the little wins before you end up in wars of attrition over stupid things like moving the location of a single stand-off so you can use the mSATA drive.
So about that. Out-of-the-box, I’m not sure what they intend for you to be able to use the stand-off for. It’s right in the path of the mSATA PCI slot, and manual indicates you should move the stand-off to secure the mSATA drive. Okay, whatever.
No. Not just whatever. I’m going to save anyone else who builds this a lot of time. Tear it down – all apart. You will be removing the motherboard to move this standoff. Just heed my lessons.
Like any good hacker, I had it physically disassembled and in pieces within an hour of being hands-on with it.
Next up was the M.2 drive, and the 2.5” drive. I wanted at least something to have a good amount of storage for cheap, long-term storage. Originally I had read there was a 4 TB drive, and ordered it… but it was a 3.5” drive. Ugh. WD Black was a striaghtforward, super cheap option, but you could go with another SSD or maybe find a bigger 2.5” drive.
First Boot and Hypervisor
I decided that since I had a free license thanks to .edu hookups, I’d go with “real boy” virtualization and straight to ESXI 6.5. So let’s get it going! Now this part should be really easy if you’re not a complete idiot like me. I struggled for way too long and too many USB drives, ISO images before I finally got the right god damn ESXi installer image on a USB. Plugged it in, and wait for the first satisfactory boot!
First ESXi boot!
My first connect up to it was me hooking Ethernet cable up to a router I had laying around, and then my PC up to the router. This was my first method of connecting up to the management port on the server, and worked this way for a little while until I discovered direct connect thanks to a little magic called Automatic MDI Crossover – more on that later!
For some reason, I had a deep-seated need to change the hostname first thing. So I went and did that!
I set it by going to the following setting:
Networking > Default TCP/IP stack > Edit Settings
At first, I did something really dumb by screwing around with hardware passthru which was due to me just being really misled with my thinking. I eventually turned back around that rabbit hole. Took a full system reset, but I was able to get back in and screwing around with configuring it.
Set a static IP on the primary management NIC – vmnic0. I chose 192.168.88.50
Set a static IP on the Ethernet interface on your computer (a USB adapter in the case of my laptop) – 192.168.88.100
Connect Ehternet cable from computer to vmnic0
Access the web interface from computer – https://192.168.88.50
Appreciate the great success!
With all this setup, management is that much better. Now you only need to have any regular old cable and a way to connect them, right up to the server!
But this only covers management networking. Given that my ultimate purpose is to travel this thing around, and share in the experience… I need a way to connect up to the internal networks, whatever they may become. There’s also questions here of how exactly I’ll bridge those connections, and how I’ll hook people up to it on the outside. Lots to learn!
I also have some dreams of it being used in situations where I can utilize it as a drop-in IR box, or something fun like that. The box has a lot of physical networking capability… and let’s cover all that!
So, out of the box it has 6 physical NIC on the motherboard.
vmnic0 - Already assigned to management for ESXi
vmnic1 - I wanted to use this port to provide an Internet uplink for the box
vmnic2 - And this one for an external, physical LAN
vmnic3 - Gaming (No plans yet, figure I could eventually use it for mobile LAN server)
vmnic4 - IDS
vmnic5 - Unallocated
At this point, I’m not precisely sure what I wanna do with the rest of this, but have some general ideas. Let’s start to implement at least some of those ideas on the box itself! We’ll now dig into vSwitch networking within ESXi.
I have a general idea what to do, and this is my playground. Let’s go for it!
We’re going to create a series of vSwitch to represent different networks… obviously. 1 for VMs, 1 for physical access, 1 for Internet/”Gateway”, etc. When you create a vSwitch, you can also designate a physical uplink port.
There is a default switch – vSwitch0 that gets created with vmnic0 set as the physical uplink – that’s how the management network works! So let’s create a vSwitch called “VM Access”, because I need an interface into the server for physical devices to get in. I wanted to save the 2nd NIC for gateway since it’s more of a “management” type service, so we’ll use the 3rd physical NIC for that switch.
Name: “VM Access”
Port Group: “Access VMs”
And that’s that! Now, you’ve assigned physical NIC to virtual switches… but need to add VMs to those virtual networks! Network interfaces to VMs in ESXi are called “Port Groups”. You create a new Port Group, and link it directly to a vSwitch. Let’s create a couple more of the vSwitch, and also some port groups to use for the VMs. I did not have the best naming conventions here by any means, do better than me!
Port Group: “Internet Access”
Name: “VM Network”
Port Group: “VM Connection”
Port Group: “Management Network”
We’ve got a physical NIC as figured out as we can so far, let’s get the virtual side up and going!
Now, as I mentioned before, I wasn’t entirely sure how this was all going to work. I had figured that a router VM of some kind would be needed, and that sounded fun enough to play with anyways! I downloaded a copy of OPNSense, and set up a VM for it.
We’re going to initially have 3 interfaces for this VM:
Internet Access - Provide Internet gateway for the router
VM Connection - Provide VMs with a connection to the router
Access VMs - Provide the physical LAN access to the router
This next bit was a lot of fun. It took me 2 laptops, a few afternoons, and some freetime at work to completely figure this all out… but boy did I learn and do a lot to get there!
With console access via ESXi web client, I was connected to the opnsense VM. There are very limited options from this point, and I figured that it would be straight-forward enough to figure out a few interfaces. Boy was I so, so wrong with my ideas and it caused me a lot of grief. To be fair… the problem I was experiencing was extremely basic, and nobody that I asked for help had any for me, so HAH!
I went ahead and configured the 3 interfaces as I saw them. During basic configuration, you set up 2 interfaces – WAN and LAN. There is also a 3rd, simply called OPT1 (Which I later re-named to VM1, as indicated above). Neat little tidbit about OPT1 and LAN – LAN gets rules by default to allow traffic. OPT1 has zero rules to allow traffic by default, and you can’t do jack shit from it until you configure firewall rules on it. The router was configured to provide DHCP for both interfaces.
I did not know about the firewall rules.
I also did not realize how painful it is to swap back and forth from static IP on 1 physical NIC, over to DHCP on the other physical NIC in vane attempts to figure out what in the fuck you screwed up on the configuration.
My laptop connected up, got DHCP – everything looked right. IP was good, subnet good, gateway good. Could not ping the router. Could not access the webconfig. WHAT THE FUCK! I know it’s pretty common that firewall/routers won’t have ping enabled by default, that made sense. But no SSH, and no webconfig… annoying and weird!
Here is my Saturday, hacking away with primary laptop on my left knee, and my Chromebook on the right – both without Ethernet ports, and me struggling to rotate through my stock of 4 sort-of misbehaving USB-Ethernet adapters. It was a great time! I ended up using a LANTurtle, and a simple Microprice adapter I had laying around. The LANTurtle was great and worked reliably, except it has its own layer 3 abstraction that made troubleshooting really hard to pindown and be really sure of myself.
So, here I am. Swapping cables in/out, adapters in/out, swapping back and forth from ESXi management, and connected to the VM Access network. Using tcpdump from the router console, I can see pings and web requests coming from my laptop, so I know layer 1 is working, but SOMETHING IS BLOCKING SHIT. I’m going nuts, absolutely convinced that it’s a VMWare firewalling issue, searching everywhere and asking people for advice.
I finally broke down and went with a new tactical route – from the virtual side. During a free moment I had uploaded a few Windows ISO and created a Win7 VM. I boot that up, and it connected to the VM Network vSwitch. Access it via the webconsole. DHCP connects, I get an IP address on the 192.168.1.x range (the LAN interface on the router) and… AND… IT PINGS THE FUCKING ROUTER. AND CONNECTS TO THE WEB CONFIG OHMAHGAWDTHEFUQ.
Now another REALLY fun bit… This was Win7, with old ass IE being the only available browser. OPNSense webconfig doesn’t work for a god damn thing AT ALLLLLLLLLLLL in that version of IE, naturally.
Another REALLY fun thing about ESXi is that there is no good way to upload files to the VMs. No drag’n drop like I’m so used to with type 2 hypervisors. The best method I found was to create a data ISO, upload that to ESXi, and then connect that up to the VM as a disk drive. Access the ISO, and pull off the data.
After a few flip flops back and forth with insane ideas of what was wrong, I did the work and signed up for a trial with Daemon Tools Lite for their data ISO creation app, loaded one up with Chrome (remember to use the OFFLINE installer if you haven’t yet figured out WAN, like me at this point), and got that onto my desktop VM.
Chrome opened up to the OPNSense webconfig beautifully! Now I’m in business. Oh yes. First thing I did was switch around LAN and OPT assignments for the interfaces.
Back on my laptop… I CAN PING THE GATEWAY AHHHHHHH. And I can access the webconfiAAAAAAAAAAAAAAAAAAAAHHHHHHHHHHHHHHHHHHHHHHHHH
Now like any good systems administrator, first thing I did was go into the firewall, and add ANY*ANY rules to all the interfaces. Back at the Win7 VM, I can ping the gateway. I can ping my laptop. My laptop can ping the gateway, it can ping my VM. Everybody loves everybody. Everything is good in the world.
*Note I did make notes of stupid security things I did. I honestly kinda like this as a retrospective of how certain mistakes are made!
Here we’ll set up some good portability for this mighty fella. I already had a nice, padded case from Harbor Freight that I was using for my poorly abandoned SDR gear. Now repurposed for real use, this package is coming along smoothly!
Another big part of that is getting the hell away from those damned USB adapters. I made a decision to go extremely basic with my initial rollout of network gear.
I’m not entirely sold on either bits of this gear, but so far they have served me well! Connect the first and third Ethernet ports on the server up to the switch, and then the AP to the switch. Cabling is a mess, and ideally I’ll do PoE to an extremely small AP, but I couldn’t find any of the Mikrotik MiniAP available, and didn’t want to totally invest in a PoE switch as I wasn’t sure feature set and expandability required.
Who knows, mabye I’ll get an itch or financial support to go crazy and use the onboard SFP+ ports! Those mean all sorts of applicability to being dropped in for use in all sorts of environments as they are 10gigabit uplinks! Just the gear for them isn’t quite… consumer/hobbyist researcher friendly.
AP configuration was pretty straight forward. Set an SSID, connect up to SSID, it gets passed through to the switch. Switch passes it to the 3rd physical NIC. That gets passed to the core router VM. Now I just connect it up, make sure the VM is running, and I’m ready to roll via a simple Wifi connection!!! This should support quite a few cliens with plenty of speed to access VMs, hassle-free. Hooray!
One final convenience – accessing the ESXi management from Wifi. Another entry into the “security todo” list, I created a new port group - “Management - From Wifi” and attached it to the “VM Access” vSwitch.
The ESXi host itself gets its network interfaces for management via “VMKernel NIC”. The default one is attached to the default vSwitch. This one, we’re going to attach to the “VM Access” vSwitch, as just mentioned. Now the ESXi host has access to the vmnic0 physical NIC for management at 192.168.88.50, and then over the Wifi/ physical LAN at 192.168.1.3. If I’m ever concerned about locking things down – a red vs blue situation, or just around more untrusted/unsavory people, that interface can just be disabled or hardened. Now I can access VMs, and the ESXi host from the comfort of my laptop simply connected to a Wifi connection.
Let The Gateway… OPEN!
HOME STRETCH OF ALL THE FUNDAMENTAL BULLSHIT!
What a journey, am I right? The final goal of mine to have really “figured it all out” as far as the basic infrastrcuture goes was an Internet gateway connection. The reason this has been so difficult is all the networks around me are hobbled together with scotch tape and hope. I’m basically all Wifi at home due to a poor location of coaxial drops.
I found a way to deal with this via my handy dandy Pineapple Mk 5! It’s a very capable and convenient device to keep around as an awesome wireless bridge or general router/gateway. I connected up to it, hooked it up to my home wifi. Then I connected a cable up from its Ethernet port to the server’s 2nd physical NIC – vmnic1, Gateway! IT’S FINALLY HAPPENING!
Checked out the opnsense webconfig, and what did I know… WAN IP ACHIEVED!
Connected to the “DumbyLand - VM Access” SSID, I can ping google.com, and I can access the ESXi web client, and I can ping my Win7 VM! I can access my online Docs! My VM can access the Internet! IT’S ALL WOOOOORKKKIIIINNNNNGGGGG!!!!
I’ve already messed around with a couple ideas for labs. With all of this figured out, the rest should be figuring out the VM environments themselves! I can bridge the worlds, and provide the wealth of the Internet to them.
For now, I think that’s a fucking great start.
I love ya, d3c4f. We all do! And we’ll sure miss you. You’ve inspired many, and will always continue to do so.