Kaduu creates Spoofguard – a Domain Variation Analysis Engine to Detect and Mitigate Typosquatting Threats

What is the cyber-risk?

Typosquatting, also known as URL hijacking, involves registering domain names that closely resemble legitimate domains of reputable brands but include small typographical errors. These deceptive domains are often leveraged by attackers in phishing and malware dissemination campaigns. By exploiting common typos made by internet users, attackers can lure victims into visiting malicious websites that mimic the look and feel of legitimate ones. These typo-squatted domains can be used to steal sensitive information, distribute malware, or manipulate traffic for financial gain. Sometimes, attackers reserve such domains without immediate use, holding them dormant until deploying them strategically during specific attacks, often timed with real-world events to maximize impact. Attackers could register domains mimicking major financial institutions to steal credentials from unsuspecting customers or typo-squatted domains could be used to distribute malware disguised as legitimate software.

Examples of Typosquatting Techniques:

  • Character Omission: Dropping a letter from a well-known domain (e.g., example.cm instead of example.com).
  • Character Swap: Reversing two adjacent characters (e.g., examplpe.com).
  • Homoglyph Replacement: Using visually similar characters from different scripts (e.g., replacing ‘o’ with ‘0’ to form examp1e.com).

Why create a new tool?

While there are existing tools such as Urlscrazy, Urlinsane, or DNStwister, they often lack the flexibility, accuracy, and contextual integration needed for comprehensive protection:

  • Flexibility: This engine allows for seamless integration and adjustment of modules, settings, and external data sources, providing tailored defenses against typosquatting.
  • Reduced False Positives: Our approach reduces the common issue of false positives, which is prevalent in other tools.
  • Enhanced Detection: Algorithms are specifically developed based on observed phishing attacks, enhancing detection capabilities.
  • Advanced Filtering: Enables pre-testing filtering of domains, focusing efforts on high-risk targets.
  • Integration of External Feeds: Unlike static methods, this engine utilizes certificate transparency logs and daily domain registration updates to spot potential threats dynamically. This includes detecting client-specific keywords in SSL certificates.
  • Automation: Manual investigation of flagged domains is time-consuming. By integrating with platforms like https://urlscore.ai, the engine automatically assesses the risk level of active websites, identifying potential misuse of logos, brands, or keywords.

Why not just download daily newly registered domains and analyze them?

A significant challenge in typosquatting detection is the absence of a global, accessible database of newly registered domains for most TLDs (Top-Level Domains). There are commercial vendors, but they dont cover real-time detection of newly registered domains. However, some countries do provide access to domain registration data either as part of open data initiatives or through specific agreement terms. Here are a few examples

  • Switzerland and Liechtenstein (.ch & .li): https://portal.switch.ch/pub/open-data/
  • Netherlands (.nl): SIDN, the registry for .nl domains, offers some level of data through their SIDN Labs, although direct access to newly registered domain data might be restricted or aggregated for privacy reasons.
  • Denmark (.dk): DK Hostmaster, which manages the .dk domain, provides access to some data for legitimate purposes, although they also prioritize privacy and data protection.
  • Canada (.ca): The Canadian Internet Registration Authority (CIRA) occasionally releases datasets for academic and non-commercial research purposes, which might include domain registration data.This gap hinders the ability to proactively scan and flag new domains as they are registered, making real-time detection and response more difficult.

Overview of Spoofguard.io Components

General Overview

Spoofguard is composed of several key components, each designed to enhance domain security by detecting potential typosquatting threats. The core of Spoofguard is its domain permutation modules. These modules are configurable through the config.json file, allowing users to activate or customize their settings according to specific security needs. Here are two examples of modules:

  • module_levenshtein: This module employs the Levenshtein distance algorithm to generate domain typos by making single-character edits.
  • module_api_domain_search_ssl.py: This module connects to Kaduu.io, which maintains a constantly updated database of new SSL transparency logs. It searches for records containing the client’s domain, helping identify potentially malicious duplicates.

Workflow: The process begins when a user inputs a domain in the format “example.com”. Spoofguard then initiates a series of modules:

  • Domain Permutations: Thousands of domain permutations are generated or similar domains are retrieved from external feeds.
  • DNS Records: The presence of an NS record prompts the next phase of investigation of searching for A- and MX record: If an A record is found, the analysis progresses.
  • Port Scanning: Open ports (HTTP & HTTPS) are checked to see if the web server is active.
  • Website Investigation: Using urlscore.ai, each active website is assessed to generate a risk score based on its content and activity.
  • Output Handling: Spoofguard provides versatile options for managing the output of its analyses. It supports storing results in a database, saving to local files, parsing externally, and generating detailed reports and exports. This flexibility ensures that organizations can integrate Spoofguard into their existing security frameworks and respond proactively to identified threats.

Details Domain Permutations Modules

The engine generates domain permutations using a variety of methods, each implemented as a module. Here’s a list of all the modules included in the system:

  • module_tld: Manipulates the top-level domain (TLD) part of the input domain.
  • module_prepend: Adds predefined characters to the beginning of the domain.
  • module_common_word: Incorporates common words into the domain to create variations.
  • module_prepend_number: Adds numbers to the beginning of the domain.
  • module_letter_swap: Swaps adjacent characters in the domain.
  • module_hyphenated_domains: Introduces hyphens into the domain.
  • module_bit_squatting: Generates variations by changing bits in the domain name, often resulting in visually similar yet different domain names.
  • module_homograph_word_variations: Creates homographic variations of the domain which may involve replacing characters with visually similar counterparts from different scripts.
  • module_append_tld: Appends additional TLDs to the domain.
  • module_append_tld_variations: Adds variations of TLDs to the domain.
  • module_double_characters: Doubles certain characters in the domain.
  • module_add_common_numbers: Incorporates commonly used numbers into the domain.
  • module_subdomain_add: Adds or manipulates subdomains.
  • module_missing_dashes: Removes dashes from the domain.
  • module_levenshtein_typos: Applies single-character edits to create typos based on the Levenshtein distance algorithm.
  • module_vowel_swap: Swaps vowels in the domain except for the first character.
  • module_misspellings: Introduces common misspellings into the domain.
  • module_punycode_variations: Replace special characters with punycode (e.g. a : ä : xn--4ca.com)
  • module_api_domain_search_ssl.py: Connects to Kaduu that has a daily updated DB of new SSL transperency logs to find results containing the client domain (we would find out if a SSL certificate with clientdomain.maliciousdomain.com exists) and applies filters
  • module_api_domain_search_.py: Connects to Kaduu that has a daily updated DB of new domain registrations to find results containing the client domain (search is domain, which would give back results like domain123.com and applies filters)
  • …and many more

For every domain that is discovered, the main.py script performs some basic DNS tests:

  1. NS Record Test: Checks if a Name Server (NS) record exists for the domain. If no NS record exists, the domain is unlikely to be active.
  2. A Record Test: If an NS record exists, it then checks for an A record, which indicates the IP address associated with the domain.
  3. MX Record Test: Also, if an NS record exists, it checks for Mail Exchange (MX) records, determining if the domain is set up to receive emails.

The system writes logs to document the results of its analysis, including:

  • DNS Query Log: Records each DNS query made along with the results, including NS, A, and MX records found. This log helps track which domains are active and potentially problematic.
  • Domain Variation Log: Lists all generated domain variations for each run, providing a reference for all tested variations. This includes variations that may not have been registered or active. Before performing DNS lookups, the system removes duplicate domain variations to optimize the network operations and processing time.

Details ANALYSIS Module “Webservices and Risk Analysis”

The URLScore Risk Analysis Tool is designed to assess the risk associated with domains by analyzing their security features and known risk factors. This tool automates the process of checking if a domain found in the previous script is reachable via HTTPS and HTTP ports and subsequently scanning each domain using the URLScore API to retrieve a risk assessment.

Features

  • Domain Port Check: Verifies the availability of HTTP (80) and HTTPS (443) services for each domain.
  • Risk Scoring: Utilizes the URLScore API to generate a risk score based on several security checks.
  • Result Logging: Outputs the results of each scan into individual JSON files, organizing them into directories based on the domain name for easy access and analysis.
  • Rate Limit Handling: Includes mechanisms to handle API rate limiting by pausing requests and retrying.

Add a comment