Official WebOSINT THM Write-up
This write up seeks to guide users through the TryHackMe WebOSINT room created by yours truly. It is a beginner-level room that introduces basic website research tools. The following write-up will provide completion instructions as well as useful background information and context that may be helpful in open source intelligence investigations on websites.
Let’s dive in!
Task 1: When A Website Does Not Exist (Intro)
We will do a deep dive on search engines shortly, but for now there are a few things to think about that were a bit out of scope from the room, that I still felt were important:
- Sometimes plugging a website into the search bar will send you directly to the site. Avoid this by putting the site in quote marks. Also note that this will only return results where the full domain name is written out on the website.
- Contrary to conventional wisdom in the open source intelligence world, doing research through search engines is not passive. Search engine companies are constantly trying to improve their algorithms. Every single search is tracked and every link that you click from the results pages is recorded. In practice this means that the act of clicking a link at the top of the second page of results increases the chances that, for the next person, it will show up at the bottom of page one.
- Related to number 2, search engines are always crawling the internet for new connections between sites. It is a good practice to be cognizant of the websites you link to. You may notice that the THM room, as well as this write-up, does not directly link to certain sites. This is very intentional. Some of these sites I am happy to help with search results, and others I am not (for a variety of reasons).
There is no flag for this task.
Task 2: Whois Registration
It’s not really THAT big of a deal if you want to use a different service for this task. I have just personally had horrible experiences with certain ones so as a matter of principle I steer people clear of them when possible.
It’s true that completing these flags with the ICANN lookup may be confusing at first as you have to interpret the code. Below are the answers I was going for, but I recommend at least going back to your whois lookup to see where I got them from before proceeding.
What is the name of the company the domain was registered with?
This is one of the first things that shows in any Whois lookup.
What phone number is listed for the registration company?
Located very close to the name of the registration company.
What is the first Nameserver listed for the site?
This one you might have to look around for a bit, but there are usually two or three of them and they tend to have a format similar to XX1.XXXXXXXXXXXX.com
What is listed for the name of the registrant?
It’s pretty obvious this is not the name of a real person or company.
redacted for privacy
What country is listed for the registrant?
Not a country known for transparent record keeping.
Reading between the lines of the flags for this task you have probably figured out that the registration information can’t always be trusted. These days domain name providers offer privacy protection for a negligible fee (or even free).
This wasn’t always the case though, and websites that have been around for a while may still list a legit phone number, address, and person or business name.
Task 3: Ghosts of Websites Past
Internet Archive is an amazing resource for OSINT investigators, but most of its features are beyond the scope of this room. We are focusing today on the “Wayback Machine” that provides frozen in time snapshots of websites.
The Wayback Machine won’t have every little website that ever existed archived, but it does seem to have most of them. It can also be used to check and see if a domain has any kind of previous history before you buy it.
You see, in the SEO world it is widely believed that domains with a longer history are bestowed with more ‘juice’ by the algorithms. Conversely, domains that were previous used for spamming or pornographic content are believed to be penalized.
The search engines are not very forthcoming with inside details of how their algorithms work, but it doesn’t take that long to do a bit of research on a domain before purchasing it. This may be an important piece of the puzzle for the next domain we are going to look at in this room.
For this task you don’t have to read into the site very deeply. You just need to click into the blog posts and give a quick scan of the content.
On to the flags:
What is the first name of the blog’s author?
You can open any of the blog posts from the pain page to see the author’s name towards the top of the page.
What city and country was the author writing from?
The city can be viewed in the very first blog post that mentions Cafe Zorba. If you wanted you could take the additional step of doing a search for the city name with the name of the university that is mentioned to really lock it down.
Gwangju, South Korea
[Research] What is the name (in English) of the temple inside the National Park the author frequently visits?
You don’t have to go any deeper than the main page to see a blog snippet that mentions a national park. Just plug that into Google maps. Make sure to look for the temple that is actually inside the park, not just near the subway station (which is outside the park). That being said, the subway station itself is actually named after the temple we are looking for.
Task 4: Digging Into DNS
The flags for this task make extensive use of the IP History and Reverse IP Lookup searches on ViewDNS.Info.
What was RepublicOfKoffee.com’s IP address as of October 2016?
The IP History search should do the trick here.
Based on the other domains hosted on the same IP address, what kind of hosting service can we safely assume our target uses?
First we have to do a Reverse IP Lookup on the domain on ViewDNS.Info. This will give us a laundry list of domain names all associated with the same IP address. While it’s not totally out of the realm of possibility that one person or company might be running hundreds (or thousands) of websites off of one IP address, if they were we would expect some kind of pattern connecting them. In this case though, the websites are really quite random. We can assume that the owner of our target site is most likely paying for a minimal hosting plan around $10 per month or less. These are called __________ plans.
How many times has the IP address owner changed in the history of the domain?
Back to the IP History search for this one.
All this means is that the company that handled the registration of the domain changed three times. This doesn’t necessarily mean that the domain completely changed ownership three times. It is certainly possible, though. It might just mean that the owner is a bargain hunter and moves to different hosting providers frequently.
It’s not that easy to transfer management of your domains between companies though, so this typically doesn’t happen too often.
Task 5: Taking Off The Training Wheels
This task is a bit of a test. It doesn’t hold you by the hand and tell you exactly where to go for each flag. However, you have used the needed tool for each one of these already.
What is the second nameserver listed for the domain?
Whichever Whois Lookup method you are most familiar with by now should do the trick here too.
What IP address was the domain listed on as of December 2011?
Get over to ViewDNS.Info’s IP History Lookup to find this one.
Based on domains that share the same IP, what kind of hosting service is the domain owner using?
Again, we’re using ViewDNS.Info’s Reverse IP Lookup. If it’s just one site connected to one IP address, it would be a ‘dedicated’ hosting plan. However if there’s a random mashup of disparate domain names associated, it would be a shared plan.
On what date did was the site first captured by the internet archive?
For those you’ll get over to Archive.org and plug the site into the Wayback machine. Find the date farthest to the left of the timeline, click, and find the first snapshot of that year.
June 1st, 1997
What is the first sentence of the first body paragraph from the final capture of 2001?
Pull up the snapshots for 2001, choose the last one. Then just copy and paste.
After years of great online gaming, it’s time to say good-bye.
Using your search engine skills, what was the name of the company that was responsible for the original version of the site?
There are a number of websites with this info, but good ‘ol reliable Wikipedia is what I chose.
What does the first header on the site on the last capture of 2010 say?
This seems to mark a new owner for the website; most likely someone that simply found an expired domain name with lots of SEO juice already pointed at it.
Heat.net — Heating and Cooling
Task 6: Taking A Peek Under The Hood
The room itself goes into plenty of detail on how to view the source code of any given web page on MacOS Chrome (but it should be pretty similar on any browser). If you need a reminder, here is what you need:
View > Developer > View Source
Now on to the flags.
How many internal links are in the text of the article?
How many external links are in the text of the article?
Website in the article’s only external link (that isn’t an ad)
Try to find the Google Analytics code linked to the site
Is the the Google Analytics code in use on another website? Yay or nay
You could simply Google it inside of quote marks, but some of the results there may be a bit confusing. Try running it through nerdydata.com. It should be pretty clear (as of this writing anyway).
Does the link to this website have any obvious affiliate codes embedded with it? Yay or Nay
There are a lot of affiliate programs out there aside from Amazon and Google, so going through all of the possible iterations of this is not feasible. It could be helpful to consider the basic HTML structure of a link though. It would typically look like this:
<a href=”http://www.tryhackme.com”>TryHackMe is AWESOME!</a>
We are really looking at the code in the first < > set. It could have some formatting information like: target=”new”, but if there is an affiliate code embedded in the link it will most likely have extra info tagged onto the end of the website address.
Task 7: Final Exam: Connect The Dots
This final task is really pretty straightforward, although it hadn’t been previously mentioned.
Use the tools in Task 4 to confirm the link between the two sites.
You’re going to need to look at the IP History search on ViewDNS.Info for this one. The two websites share an early hosting company:
Liquid Web, L.L.C
Note that there is a typo in the company name. For the flag, copy and paste it exactly as is from the IP History search and into the flag field.
It may seem odd that both of these websites were, at least at one point, hosted by the same company, but on different IP addresses. In reality, search engine algorithms are smart enough to recognize if a two websites are linking to each other from the same IP address. For a PBN to work as intended, it would need to be on a completely separate IP address than the money site.
In fact, these days, they should be as siloed off from each other as possible. That may not have been true back when these websites were first stood up. SEO strategies are always changing and evolving, so it is completely expected to see shifting approaches over time.
Debriefing and Wrap-Up (Tasks 8/9)
Thank you for trying my THM box and for checking out this Write-up!
Thinking outside the box here, what if you wanted to do the opposite of boosting a website?
What if you wanted to push it down in the rankings and banish it the second, third, or tenth page of the SERPs?
Well, in theory, you could just take all of the tactics we’ve already discussed, and do the opposite. This is what’s called negative SEO. There are highly specialize online ‘Reputation Management’ agencies that specialize in this. It is a whole fascinating area that I first learned about from the book So You’ve Been Publicly Shamed (Wikipedia link) by Jon Ronson.
At any rate, I hope you have enjoyed my room and this write-up!