4 votes

Remove from a string in java all characters that are not in the pattern

I need to clean the string of a String in java that I get when parsing. This string has all kinds of characters, usually letters and numbers, but it also includes symbols or letters in other languages. What I want to do is to remove everything that is not numbers, dots or commas, except for the word "Free" . That is, if I have for example:

String cadena1 = "AU$26.95 с уч. GST" //que me devuelva 26.95
String cadena2 = "Free с уч. GST" //que me devuelva Free

So far I was fixing it with .replaceAll() in this way:

cadena1.replaceAll("с уч. GST|AU$","");

But the code is getting longer and longer, and there are still cases where I can't get it replaced.

0 votes

0 votes

Hi Jetlagfox, were you able to test our answers? Best regards

0 votes

Good morning @lois6b, your answer was the closest thing to what I was looking for, so I accept it as a valid answer, which is also very well explained. thanks!

7voto

lois6b Points 6889

With the regex \\d+(?:[.,]\\d+)?|Free you can tell it to look for the numbers or the Free in the string.

Explanation of the regex:

\\d+(?:[.,]\\d+)?|Free
  • \\d+ - one or more digits
  • (?:[.,]\\d+)?
    • (?: ...) - uncaptured group
    • [.,] - or semicolon or comma
    • \\d+ - one or more digits
    • (...)? - at the end, indicates that it is an optional group. It may or may not appear.
  • ... | ... - OR operator
  • Free - exact match with "Free

This way you tell it to take out what it finds and you can store it in a variable having removed everything you are not interested in.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

String str1 = "AU$26.95 2.25 Freeс уч. GST"; 
String str2 = "AU$26,95  уч. GST";   
String str3 = "AU$Freeс уч. GST";

String resultado1 = comprobar(str1);
String resultado2 = comprobar(str2);
String resultado3 = comprobar(str3);

public static String comprobar(String elem){

    String all = "";
    System.out.println("Cadena: " +elem);

    Pattern pat = Pattern.compile("\\d+(?:[.,]\\d+)?|Free");
    Matcher m = pat.matcher(elem);

    while (m.find()) {

        //System.out.println(m.group(0));
        System.out.println(" -  Coincidencia: " + m.group(0));
        all += m.group(0);
    }

    System.out.println("Resultado: " + all);
    System.out.println("");
    return all; 

}

Output:

Cadena: AU$26.95 2.25 Freeс уч. GST
 -  Coincidencia: 26.95
 -  Coincidencia: 2.25
 -  Coincidencia: Free
Resultado: 26.952.25Free

Cadena: AU$26,95  уч. GST
 -  Coincidencia: 26,95
Resultado: 26,95

Cadena: AU$Freeс уч. GST
 -  Coincidencia: Free
Resultado: Free

Example of online code in Java

Example in JS to see the result right here.

const regex = /\d+(?:[.,]\d+)?|Free/g;
var str1 = "AU\$26.95 2.25 Freeс уч. GST"
var str2 = "AU\$26,95  уч. GST";
var str3 = "AU\$Freeс уч. GST";
let m;

var resultado1 = comprobar(str1);
var resultado2 = comprobar(str2);
var resultado3 = comprobar(str3);

function comprobar(elem) {
  var all = "";
  console.log("Cadena: " +elem);
  while ((m = regex.exec(elem)) !== null) {
    // Esto es necesario para evitar bucles infinitos
    if (m.index === regex.lastIndex) {
      regex.lastIndex++;
    }

    m.forEach(function(match) {
      console.log(` -  Coincidencia: ${match}`);
      all += match;
    });

  }
  console.log("Resultado: " + all);
  console.log("");
  return all;
}

3voto

Mariano Points 21056

The best way is as lois6b answered (should be the accepted answer). Just to play around, I show you something much less efficient, in one line of code.

We can search that, when it is not followed by a pattern, it matches a character, or that it captures the text that matches that pattern to use it in the replacement.

Remove text that does not match a pattern:

r = texto.replaceAll( "(?:(?!patrón)(?s:.))*(patrón)?", "$1");

Description:

  • (?:(?!patrón)(?s:.))* - It is a loop, which repeats itself: if it is not followed by patrón matches 1 character.
    (?: )* is a non-capturing group that repeats the structure (0 to infinite times).

    • (?!patrón) - is a negative inspection ( negative lookahead ) that matches if the current position is not followed by patrón .
    • (?s:.) - Any character.
      A period matches any character except line breaks. But if you use the s modifier ( singleline o DOTALL ), includes \n . In this way we are applying the modifier to this part of the pattern.
  • (patrón)? - Optionally, we capture the text that matches the pattern.
    It will be useful to use it in the replacement, with $1 .

In your case, we simplify it to:

resultado = cadena1.replaceAll(
                        "(?:(?!\\d|Free)(?s:.))*(\\d+(?:[.,]\\d+)*|Free)?",
                        "$1"
                    );

Examples:

AU$26.95 с уч. GST                       --> 26.95
Free с уч. GST                           --> Free
#$%&/=123,456,789.01xxxFree Free!!! 3:)  --> 123,456,789.01FreeFree3

Demo: https://ideone.com/HEYyV1

But again, it's a lazy solution, not the best or most intuitive.

0 votes

Thank you also. I've managed to get it working at the moment, but it's fine in case I need to follow a pattern. At the moment the only thing I have had to do is to make a conditional to "take out" all the cases that had the word "Free" and to remove parameters from the rest to eliminate them at the end, in case there is any character left. I have tried it with several cases and it works, but it doesn't save me from the fact that in some cases it doesn't work as it should. Thanks anyway!

1voto

I pass you a code where you can pass points, numbers and Free. If you have Free, points and numbers in it, it does it well too.

String s="";  //Esto es una variable global donde se va a almacenar el resultado
public void comprobar(String aux){
    if (aux.contains("Free"))
    {
        int posF = aux.indexOf("Free");
        comprobar(aux.substring(0,posF));
        s = s + "Free";
        comprobar(aux.substring(posF+4, aux.length()-1));
    }else{
        compruebaResto(aux);
    }
}

public void compruebaResto(String aux){
    char[] cs = aux.toCharArray();

    for (char c:cs) {  //Recorremos la cadena
        if (c!='.')    //Si no es un punto comprobamos si es un número
        {
            try{
                Integer.parseInt(c+"");  //Si no es un número pasara al cath y no concatenará el valor
                s=s+c;
            }catch(Exception e)
            {}
        }else{
            s = s + c;  //Si es un punto lo aceptamos
        }
    }
}

To call this method from the main you only have to do:

String cadena1 = "AU$26.95 с уч. GST" //que me devuelva 26.95
String cadena2 = "Free с уч. GST" //que me devuelva Free

comprobar(cadena1);
//Hacemos lo necesario con cadena 1

s = "";  //Reiniciamos s para que no se concatenen los valores resultantes de evaluar cadena1 y cadena2
comprobar(cadena2);

0 votes

The same is done by the replaceAll() with a regular expression...

0 votes

I made this code because the user said, "So far I was fixing it with .replaceAll() like this: string1.replaceAll("с уч. GST|AU$",""); But the code gets longer and longer, and there are still cases where I can't get it to replace." As you say, it gets long and sometimes doesn't work for you, this code does.

0 votes

I'm sure your code works, but I recommend that you refine the regular expression, you will save a lot of code by traversing a array and passing them through the replaceAll()

0voto

Orz Points 408

In your case you are replacing exactly those characters by "" but you can use a Regex in replaceAll()

For example, [^\\dA-Za-z]

  • ^ Denial
  • \\d Digits [0-9]
  • A-Za-z Uppercase and lowercase letters

Any character that NO matches the pattern, we replace it with a """.

In the cadena1 you want to remove any character that is not a digit or a dot:

  • cadena1.replaceAll("[^0-9]|.", "");

1 votes

Don't forget the "Free".

0 votes

I am not sure if adding "[^0-9]|.|^Free", ""

0 votes

The only problem is that if you have a chain with several stitches in a row, it will keep them.

HolaDevs.com

HolaDevs is an online community of programmers and software lovers.
You can check other people responses or create a new question if you don't find a solution

Powered by:

X