Data mining PII via optical character recognition on publicly hosted image sites pt. 2

Introduction
On June 14, 2016, Ellen Nakashima of The Washington Post published a story that the Democratic National Convention (DNC) had been infiltrated by two teams of state-sponsored Russian hackers (Greenberg, 2019). According to Nakashima, one of the groups, named Cozy Bear*, gained access to the DNC’s email and chat communications and had been monitoring those channels for over a year. The other group, Fancy Bear, gained access to DNC servers in April 2016 and exfiltrated opposition research documents.
*Cybersecurity professionals use a naming system that assigns different animals to different groups’ country of origin. “Bear” indicates that the hackers are Russian.
Fancy Bear was ultimately linked back to the GRU, Russia’s military intelligence agency. Cozy Bear was determined to be a part of Russia’s Foreign Intelligence Service (SVR), the modern iteration of the KGB. The goal of these two groups was to help effectively control the outcome of the 2016 United States presidential election.
State-sponsored cyber attacks are nearly impossible to defend against, even among other state powers. Major intelligence agencies such as the National Security Agency (NSA) in the United States, the Government Communications Headquarters (GCHQ) in the UK, and Russia’s SVR spend millions of dollars each year on cybersecurity. Agencies such as these have funding to discover and purchase zero day vulnerabilities or exploits* to perform cyber espionage or cyberattacks.
*A zero day exploit is a software vulnerability that is exploited before the manufacturer knows it exists or has had a chance to patch it. Zero days are extremely valuable to hackers, especially when they are in a common software application or operating system.
It would seem safe to assume that the infiltration of the DNC was the result of a complex and unstoppable state-sponsored attack, and to be fair, the code that was used to stay in the network while avoiding detection and exfiltrating documents did involve advanced hacking techniques (Buratowski, 2016). The groups gained access to the network, however, through a series of phishing campaigns from which they acquired a variety of officials’ passwords.
According to the Federal Bureau of Investigation, the three most frequently reported complaints in 2019 were phishing or phishing-adjacent . Phishing is email fraud, but similar attacks exist for other technologies. “Smishing” is phishing via text message. “Pharming” is a technique that sets up a legitimate-looking replica of a trusted website and tricks users into entering their credentials that way.
On their website, the FBI describes the five most common methods that cybercriminals use to manipulate people:
- phishing campaigns to acquire credentials,
- phishing campaigns to implant malicious code on target devices,
- spear phishing,
- romance or confidence fraud,
- business email compromise.
Phishing
There are two varieties and six components of a standard phishing attack (Rastenis et al., 2020). Scammers will use either social engineering or a technical code execution in order to “hook” their target. Social engineering varieties evoke an emotional response from the victim, causing them to reply. The classic “Nigerian Prince” scam is an example of a phishing attack conducted via social engineering. Technical execution means that the scammer has included a malicious attachment that will compromise a user’s computer or network.
Figure 1. Taxonomy of a phishing attack
Once a scammer has determined the variety of phishing scam they want to conduct*, they enter the first stage of conducting the attack: selecting email addresses. Typical phishing attacks target massive lists of email addresses acquired via data leaks, database compromise, or are purchased on the Dark Web. Larger volumes of email addresses increase the odds of a successful attack.
*A scammer might also decide which variety of phishing attack they want to conduct based off of a collection of email addresses they have acquired. For instance, if they acquired personal email addresses they may choose to perform a social engineering scam, while a corporate attack might include a malicious attachment posing as a work document.
Content creation is an important component of a successful phishing campaign. Victims must be convinced that they need to interact with the message, whether by replying to a social engineering email or opening an attachment containing malicious code. The more urgent or realistic the attacker can make their email content, the more likely they are to get a hit.
After the content is created to maximize the potential for a hit, the spam is sent and the attacker simply needs to wait. Victims will reply to help a Nigerian prince in exchange for a reward, or log into a bogus Amazon storefront to change a password that has been “compromised”, or open an attachment with the latest details from the HR department containing the company’s policy on COVID-19. The urgency of the message causes the user to act without inspecting the email closely, perhaps failing to notice that an “n” was replaced by a “π” in the sender’s email address.
Once the user has become a victim, the attacker has achieved their goal. Credentials have been harvested; a keylogger has been installed on the user’s PC; the user or network has been compromised.
In his article Scammers Target Legacy Tech, Dan Lohrmann emphasizes that we are getting much better at technical security protections but the human factor remains as the greatest weakness.
What I find most troubling is that these sophisticated online attacks are rarely high tech. Rather, social engineering of human weaknesses is evolving. After fraudsters conduct surveillance and learn the details of your office business process, they use a mix of compromised email accounts, impersonation, legitimate communication channels and real business contacts to trick staff into high-dollar fraud.
Attackers long ago realized that the weakest link in cybersecurity was the people using the technologies. Humans are gullible and emotional; if we get an email that says an account is compromised we will often act first and then think later. Research into whether the skill of the spear phisher or the vulnerability of the user contributes more to the success of an attack has been inconclusive (Nicho et al., 2018).
It is safe to say that the combination of the two makes for an exceptionally effective attack vector. Humans are social creatures who want to trust others, and that leaves us vulnerable to social engineering tactics. We have trouble remembering one password that is so complex that a computer cannot crack it, let alone many that are complex and unique to each account that is created.
Humans also develop habits and on autopilot, maybe open an attachment without thinking about it, or let our guard down when distracted. You and I are the easiest means by which hackers can compromise our data, and regardless of whether an attacker is a government or just a kid in a basement, that adversary is after information to obtain money or power.
Phishing campaigns can be a very direct path to acquiring user credentials. Often a user is simply tricked into entering their credentials on a site that the attacker owns, and that data is captured.
A phishing attack that implants malicious code on a target is slightly less direct, but serves the same goal. When discussing general online safety tips, one that often comes up is “do not click on attachments from unknown/untrusted sources”. Attachments can contain malware that embeds itself on a target’s computer and will wreak any sort of havoc the developer coded into it.
A very popular tool used in malware is a keylogger. Keyloggers are programs that record every keystroke that a user makes on a device. Often the keylogger sends those details to a remote server so whomever installed it can see what the user is typing, such as passwords or credit card details.
Most keyloggers are not detected by antivirus or anti-malware scanners and it can be difficult to identify when one exists on a device (Bhardwaj & Goundar, 2020). One tactic that security teams use to protect against keylogging is a random keyboard generator. This tool generates random keystroke data while a user is typing in an effort to obfuscate the true data that is being typed. However, Lee and Yim demonstrated that this protection can be circumvented using machine learning with greater than 96% accuracy (Lee & Yim, 2020).
The game of cat and mouse between defense and offense constantly escalates.
Spear Phishing
Spear phishing is a phishing attack that is tailored to a specific individual. These attacks can be used to directly gain access to a high profile target’s accounts, but may also be used as a stepping stone to conduct espionage and additional attacks. If a CEO’s account is compromised, for example, then the attackers have the keys to the kingdom since high ranking officials usually have extensive access within a corporate network. Seagate Technology experienced this in 2016 when their HR department sent personal information of more than 10,000 employees to hackers thinking that the request was made by the CEO (Taylor, 2016).
SVR compromised Hillary Clinton’s campaign manager’s email address with a spear phishing attack that was a password reset email that looked like it originated from Google (Sanger, 2018). The link to reset the password redirected to a fake site and when the credentials were entered, they were collected by SVR. Once the attackers had the campaign manager’s inbox under their control, the intelligence unit immediately gained access to 60,000 private emails.
Thus, one of the catalysts of the most impactful political event of the last decade was initiated via credentials harvesting. In 2017, Verizon’s Data Breach Investigation Report detailed that 81% of hacking-related breaches used stolen, default, or weak credentials (George, 2018).
Cybercriminals generally use credentials to acquire two things: money or information. Information may be used to gain more money, but often it is used to gain power. Attackers have several means by which they gain money or information using stolen credentials:
- Take over accounts that have associated funding, such as bank accounts.
- Sell the credentials on the Darknet*.
- Use the credentials in subsequent attacks against an individual, especially if the individual recycles passwords.
- Use the credentials in subsequent attacks against a system or a network.
*Credentials for a single user are generally not very valuable, selling for only a few cents. Hackers are more likely to sell large lists of credentials on the Darknet, such as a dump from a compromised database.
Confidence Fraud
Romance or confidence fraud is fundamentally the same as a phishing attack in its components. An attacker pretends to be someone they are not, feigns importance, and tricks the victim into giving them money or information. Whereas an attacker pretends to be a company that the victim would otherwise trust in a phishing attack, in confidence fraud they might build a friendship and earn the victim’s trust as a companion.
The relationship is important to the victim who is often targeted because they are in a particularly susceptible position, like the elderly or people who have recently lost a loved one. Romance scammers especially will use social media sites to study their victims and tailor their behavior to fill a hole in the victim’s life. They will socially engineer themselves into the victim’s life and then ask for money.
That request for money can be a very large sum. In one case shared by the FBI with the victim’s permission, the victim wired her new close online friend $30,000 after he promised to pay her back within 48 hours. He did not, and instead continued to ask for more. In total she was scammed out of $2,000,000 by a man with whom she was in love, but who did not actually exist.
Business Email Compromise
Finally, business email compromise (BEC) attacks use social engineering to convince employees that attackers are a trusted partner. Attacks usually target employees that have access to business finances and trick the employee into wiring money to a fraudulent bank account. The employee believes they are sending the money to an account owned by that trusted partner, but the wire transfer actually terminates in an account owned by the attacker.
Over the course of two years in 2015 and 2016, a Lithuanian man carried out BEC schemes against Facebook and Google. Evaldas Rimasauska sent the tech giants fake invoices from Quantas, one of the top suppliers of parts for US tech firms, and received over $100 million in payments before being caught (“Social engineering scams”, 2017).
According to the FBI’s criminal complaint center, there has been a 1300% increase in losses incurred by BEC attacks since 2015 and total losses have surpassed $3 billion (Federal Bureau of Investigation, 2020).
To maximize the effectiveness of a business email compromise, attackers will often target the CEO of a company with a spear phishing email to acquire their credentials. Having unprecedented access to the network, malicious actors will monitor the company and wait for an opportune time to strike. The FBI used the example of a CEO being away from the office as a prime opportunity. Attackers then impersonate the CEO and send a request to the finance department to make an immediate wire transfer of thousands of dollars to a trusted partner.
Conclusion
The top five activities performed by cybercriminals are all in pursuit of the same thing: information to gain money or power. Their most common means are via credentials harvesting: simply obtaining a username and password. If the victim has not been conscientious about password management, the credentials for one account can often be pivoted into access to another account. Human error, including that triggered via manipulation, is responsible for as much as 95% of security incidents (Diaz et al., 2018).
Cybercriminals have developed methods of varying sophistication, from simply sending an email with a password reset request to state-sponsored attacks that exploit zero day vulnerabilities in major applications and operating systems. Millions of dollars are spent by governments each year researching ways to exploit software for offensive and defensive purposes, and countless hours are spent by hackers developing new ways to steal from us. It is next to impossible to guard against a state-sponsored attack, and luckily the average citizen is unlikely to be a target of such an expensive and sophisticated operation. We are, however, constantly under attack from many malicious actors that want access to our data.
Yet we often cannot be bothered to use a password manager to create a complex and unique password for each online account we make, or perhaps in some instances we post too much information on public-facing websites. These are dangerous practices, and we put ourselves at risk by not taking basic precautions.
References
Bhardwaj, A., & Goundar, S. (2020). Keyloggers: silent cyber security weapons. Network Security, 2020(2), 14–19. https://doi-org.proxy.uwec.edu/10.1016/S1353-4858(20)30021-030021-0”)
Buratowski, M. (2016). The DNC server breach: who did it and what does it mean? Network Security, 2016(10), 5–7. https://doi-org.proxy.uwec.edu/10.1016/S1353-4858(16)30095-230095-2”)
Diaz, A., Sherman, A. T., & Joshi, A. (2020). Phishing in an academic community: A study of user susceptibility and behavior. Cryptologia, 44(1), 53–67. https://doi-org.proxy.uwec.edu/10.1080⁄01611194.2019.1623343
Federal Bureau of Investigation. (2020). https://www.fbi.gov/news/stories/2019-internet-crime-report-released-021120
George, T. (2018). Security Week. https://www.securityweek.com/foundation-cyber-attacks-credential-harvesting
Greenberg, A. (2019). Sandworm. Doubleday.
Lee, K., & Yim, K. (2020). Cybersecurity Threats Based on Machine Learning-Based Offensive Technique for Password Authentication. Applied Sciences (2076-3417), 10(4), 1286. https://doi-org.proxy.uwec.edu/10.3390/app10041286
Lohrmann, D. (2017). Scammers Target Legacy Tech: Three ways to stop business email compromise. Government Technology, 30(5), 48.
Lord, N. (2017). A Timeline of the Ashley Madison Hack. https://digitalguardian.com/blog/timeline-ashley-madison-hack
Nicho, M., Fakhry, H., & Egbue, U. (2018). Evaluating User Vulnerabilities Vs Phisher Skills in Spear Phishing. IADIS International Journal on Computer Science & Information Systems, 13(2), 93–108.
Rastenis, J., Ramanauskaite, S., Janulevicius, J., Cenys, A., Slotkiene, A., & Pakrijauskas, K. (2020). E-mail-Based Phishing Attack Taxonomy. Applied Sciences (2076- 3417), 10(7), 2363. https://doi-org.proxy.uwec.edu/10.3390/app10072363
Sanger, D. (2018). The Perfect Weapon. Broadway Books.
Social engineering scams ensnare Google, Facebook and their users. (2017). Network Security, 2017(5), 1–2. https://doi-org.proxy.uwec.edu/10.1016/S1353-4858(17)30043-030043-0”)
Taylor, C. (2017). A Cautionary Phishing Tale. ISSA Journal, 15(10), 8.