I have a document called file.txt
that holds some information:
[{"84.15.160.174:4145":"178.208.17.195:9999"},
{"84.15.160.174:4145":"58.253.154.117:9999"},
{"84.15.160.174:4145":"112.87.71.194:9999"},
{"84.15.160.174:4145":"185.103.88.103:38692"}]
My goal is to extract IP addresses along with their ports from a text file named ip.txt. The IP addresses and ports are enclosed within quotation marks, and I would like to filter them one by one, line by line. Although I have given an example of 8 IP addresses and ports, the actual number may range from 50 to 100.
Regardless of the number, my objective is to extract the IP addresses and ports enclosed in quotation marks, one per line.
3 Answers
Introduction
Batch filtering text or numbers between symbols is a common task in data processing. It involves extracting specific information from a text file that is enclosed within specific symbols, such as quotation marks or brackets. In this post, we will explore how to batch filter IP addresses and ports enclosed within quotation marks from a text file using various tools and techniques.
Method 1: Using grep
One of the most popular tools for searching and filtering text is grep. We can use grep to filter IP addresses and ports enclosed within quotation marks from a text file. Here is the command to do that:
grep -Eo '"([0-9]{1,3}.){3}[0-9]{1,3}:[0-9]{1,5}"' file.txt > ip.txt
This command uses the -E option to enable extended regular expressions and the -o option to only print the matching text. The regular expression pattern within the quotation marks matches the IP address and port format (e.g., 84.15.160.174:4145). The output is redirected to the ip.txt file.
Explanation
The -E option enables extended regular expressions, which allow us to use more complex patterns to match the text. The -o option tells grep to only print the matching text, which in this case is the IP address and port enclosed within quotation marks.
The regular expression pattern within the quotation marks matches the IP address and port format, which consists of four sets of numbers separated by periods (e.g., 84.15.160.174) followed by a colon and a number (e.g., :4145). The {1,3} and {1,5} specify that each set of numbers can have one to three digits and each port number can have one to five digits.
The output is redirected to the ip.txt file using the > operator. This creates a new file or overwrites the existing one with the same name.
Limitations
While grep is an excellent tool for filtering text, it has some limitations. For example, if the text file contains other data besides IP addresses and ports enclosed within quotation marks, the grep command will also match those lines. Additionally, if the IP address and port format is different from what we specified in the regular expression pattern, grep will not match those lines.
Method 2: Using awk
Another powerful tool for text processing is awk. We can use awk to filter IP addresses and ports enclosed within quotation marks from a text file. Here is the command to do that:
awk -F'"' '{for(i=2;i<=NF;i+=2) print $i}' file.txt > ip.txt
This command uses the -F option to specify the field separator as a quotation mark and the for loop to iterate over the fields and print the even-numbered fields that contain the IP address and port format. The output is redirected to the ip.txt file.
Explanation
The -F option specifies the field separator as a quotation mark, which splits each line into fields based on the quotation marks. The for loop starts at the second field (i=2) and increments by two (i+=2) to only print the even-numbered fields that contain the IP address and port format.
The output is redirected to the ip.txt file using the > operator. This creates a new file or overwrites the existing one with the same name.
Limitations
Like grep, awk has some limitations. If the text file contains other data besides IP addresses and ports enclosed within quotation marks, the awk command will also print those fields. Additionally, if the IP address and port format is different from what we specified, awk will not match those fields.
Method 3: Using sed
Sed is another tool that we can use to filter IP addresses and ports enclosed within quotation marks from a text file. Here is the command to do that:
sed -E 's/.*"(([0-9]{1,3}.){3}[0-9]{1,3}:[0-9]{1,5})".*/1n/g' file.txt > ip.txt
This command uses the -E option to enable extended regular expressions and the s command to substitute the entire line with the matching IP address and port enclosed within quotation marks. The output is redirected to the ip.txt file.
Explanation
The -E option enables extended regular expressions, which allow us to use more complex patterns to match the text. The s command substitutes the entire line with the matching IP address and port enclosed within quotation marks. The regular expression pattern matches the IP address and port format, which consists of four sets of numbers separated by periods (e.g., 84.15.160.174) followed by a colon and a number (e.g., :4145).
The output is redirected to the ip.txt file using the > operator. This creates a new file or overwrites the existing one with the same name.
Limitations
Like grep and awk, sed has some limitations. If the text file contains other data besides IP addresses and ports enclosed within quotation marks, the sed command will also match those lines. Additionally, if the IP address and port format is different from what we specified in the regular expression pattern, sed will not match those lines.
Method 4: Using Python
Python is a powerful programming language that we can use to filter IP addresses and ports enclosed within quotation marks from a text file. Here is the Python code to do that:
import re
with open('file.txt', 'r') as f:
data = f.read()
pattern = r'"(([0-9]{1,3}.){3}[0-9]{1,3}:[0-9]{1,5})"'
matches = re.findall(pattern, data)
with open('ip.txt', 'w') as f:
for match in matches:
f.write(match[0] + 'n')
This code reads the contents of the file.txt file into a string variable, uses a regular expression pattern to find all IP addresses and ports enclosed within quotation marks, and writes the matches to the ip.txt file.
Explanation
The code reads the contents of the file.txt file into a string variable using the with open(‘file.txt’, ‘r’) as f: data = f.read() syntax. The regular expression pattern matches the IP address and port format, which consists of four sets of numbers separated by periods (e.g., 84.15.160.174) followed by a colon and a number (e.g., :4145).
The matches are written to the ip.txt file using the with open(‘ip.txt’, ‘w’) as f: for match in matches: f.write(match[0] + ‘n’) syntax. The match[0] index is used to extract the IP address and port from the tuple returned by the re.findall() method.
Limitations
While Python is a powerful tool for text processing, it requires more coding than the previous methods. Additionally, if the IP address and port format is different from what we specified in the regular expression pattern, the code will not match those lines.
Method 5: Using PowerShell
PowerShell is a command-line shell and scripting language that we can use to filter IP addresses and ports enclosed within quotation marks from a text file. Here is the PowerShell command to do that:
Get-Content file.txt | Select-String -Pattern '".*?([0-9]{1,3}.){3}[0-9]{1,3}:[0-9]{1,5}.*?"' | ForEach-Object { $_.Matches.Value } | Out-File ip.txt
This command uses the Get-Content cmdlet to read the contents of the file.txt file, the Select-String cmdlet to search for IP addresses and ports enclosed within quotation marks using a regular expression pattern, the ForEach-Object cmdlet to extract the matching text, and the Out-File cmdlet to write the matches to the ip.txt file.
Explanation
The Get-Content cmdlet reads the contents of the file.txt file, and the Select-String cmdlet searches for IP addresses and ports enclosed within quotation marks using a regular expression pattern. The ForEach-Object cmdlet extracts the matching text using the $_.Matches.Value syntax.
The matches are written to the ip.txt file using the Out-File cmdlet. This creates a new file or overwrites the existing one with the same name.
Limitations
While PowerShell is a powerful tool for text processing, it requires more coding than the previous methods. Additionally, if the IP address and port format is different from what we specified in the regular expression pattern, the code will not match those lines.
Conclusion
Batch filtering text or numbers between symbols is a common task in data processing. In this post, we explored five different tools and techniques for filtering IP addresses and ports enclosed within quotation marks from a text file. While each method has its advantages and limitations, they all provide a way to extract specific information from a text file efficiently.
To batch filter the IP addresses and port numbers between the " "
symbols in a text file, you can use the following Python code:
import re
# Open the input file
with open('file.txt', 'r') as f:
# Read the contents of the file into a string
contents = f.read()
# Use a regular expression to extract the IP addresses and port numbers
ips_and_ports = re.findall(r'"([^"]*)"', contents)
# Open the output file
with open('ip.txt', 'w') as f:
# Write each IP address and port number to a separate line in the output file
for ip_and_port in ips_and_ports:
f.write(ip_and_port + '\n')
This code will open the input file file.txt
, read its contents into a string, use a regular expression to extract the IP addresses and port numbers between the " "
symbols, and then write each IP address and port number to a separate line in the output file ip.txt
.
The regular expression r'"([^"]*)"'
matches a double quote "
, followed by zero or more characters that are not a double quote [^"]*
, followed by another double quote "
. The parentheses ( )
around [^"]*
capture the matched text as a group, so the regular expression will return a list of the captured text for each match.
I hope this helps! Let me know if you have any questions.
[test.bat]
@echo off
cls
FOR /F tokens^=2^,4^ delims^=^" %%a in (file.txt) do (
echo %%a
echo %%b
)
[file.txt]
[{"84.15.160.174:4145":"178.208.17.195:9999"},
{"84.15.160.174:4145":"58.253.154.117:9999"},
{"84.15.160.174:4145":"112.87.71.194:9999"},
{"84.15.160.174:4145":"185.103.88.103:38692"}]
Output:
84.15.160.174:4145
178.208.17.195:9999
84.15.160.174:4145
58.253.154.117:9999
84.15.160.174:4145
112.87.71.194:9999
84.15.160.174:4145
185.103.88.103:38692