sed get everything between two charcters

**quo** · 05-30-2012

hello

I'm writing a bash script and I have the following line:

word5 word1 (word3) `word2' word4 [string2]

I want to save in a variable everything that is between []
And then in a second variable everything that is between `'

how can I do that?
Thanks in advance

**anduril462** · 05-30-2012

Try something like (where foo.txt contains your line). If the line is in a variable, you could echo that and pipe it into sed.

Code:

sed 's/.*(\([a-zA-Z0-9_]*\)).*/\1/' < foo.txt

That is sed's substitute command, to replace text using regular expressions. The stuff between the first two / characters is what is matched. The stuff between the second and third / is what the matched text is replaced with.
Those match zero or more of any character. They are needed so sed will match all the other text on the line, and get rid of it in the replacement phase
Those are literal parentheses, they are the parentheses surrounding word3 in your example
Those backslash-parentheses create a subexpression, something that sed will match remember for the replacement part.
That is a character set, it should contain all the possible characters in your word. In this case, I have upper and lower case letters, digits and underscore. Add more if you want. The * following it means "zero or more occurances", so you can have an empty string if you want.
That is what all the matched text gets replaced by. The \1 refers to the first subexpression (everything inside the blue backslash-parentheses), i.e. everything inside the parentheses.

You may need to tweak it if you want different characters in your word (change the purple text), and you will need to create a similar command to handle the `' word (the green stuff).

Now, to get that into a bash variable, you need to use a bash subcommand.

Code:

var=$(some command)

That simply substitutes the $(some command) with the output of some command. That output is stored in the shell variable var. You would use your sed commands in there.

EDIT: I just realized you wanted the thing in [ ], not the thing in ( ). Oh well, consider it "an exercise for the reader". You may need to escape certain characters with a backslash.

**quo** · 05-30-2012

Originally Posted by anduril462

Try something like (where foo.txt contains your line). If the line is in a variable, you could echo that and pipe it into sed.

Code:

sed 's/.*(\([a-zA-Z0-9_]*\)).*/\1/' < foo.txt

That is sed's substitute command, to replace text using regular expressions. The stuff between the first two / characters is what is matched. The stuff between the second and third / is what the matched text is replaced with.
Those match zero or more of any character. They are needed so sed will match all the other text on the line, and get rid of it in the replacement phase
Those are literal parentheses, they are the parentheses surrounding word3 in your example
Those backslash-parentheses create a subexpression, something that sed will match remember for the replacement part.
That is a character set, it should contain all the possible characters in your word. In this case, I have upper and lower case letters, digits and underscore. Add more if you want. The * following it means "zero or more occurances", so you can have an empty string if you want.
That is what all the matched text gets replaced by. The \1 refers to the first subexpression (everything inside the blue backslash-parentheses), i.e. everything inside the parentheses.

You may need to tweak it if you want different characters in your word (change the purple text), and you will need to create a similar command to handle the `' word (the green stuff).

Now, to get that into a bash variable, you need to use a bash subcommand.

Code:

var=$(some command)

That simply substitutes the $(some command) with the output of some command. That output is stored in the shell variable var. You would use your sed commands in there.

EDIT: I just realized you wanted the thing in [ ], not the thing in ( ). Oh well, consider it "an exercise for the reader". You may need to escape certain characters with a backslash.

Thank you so much for your answer.It was very helpful!
For the [] I did:

Code:

sed 's/.*\[\([a-zA-Z0-9_]*\)\].*/\1/'

and it worked just fine but for the `'
I did:

Code:

sed 's/.*\`\([a-zA-Z0-9_]*\)\'.*/\1/'

and it returned:

Unmatched '.

Why is that since I escaped the character?

**anduril462** · 05-30-2012

That has to do with how bash handles quoting. You didn't actually escape the character. Inside single quotes, virtually nothing has special meaning, and nothing can be escaped. Not even a single quote can be escaped, since the escape character (\) has no special meaning -- it's just a literal backslach. You need to get out of single quote mode temporarily. Read this: BASH: Single-quotes inside of single-quoted strings by Stuart Colville for an example.

**quo** · 05-30-2012

Originally Posted by anduril462

That has to do with how bash handles quoting. You didn't actually escape the character. Inside single quotes, virtually nothing has special meaning, and nothing can be escaped. Not even a single quote can be escaped, since the escape character (\) has no special meaning -- it's just a literal backslach. You need to get out of single quote mode temporarily. Read this: BASH: Single-quotes inside of single-quoted strings by Stuart Colville for an example.

thank you!solved

Thread: sed get everything between two charcters

Thread Tools

Search Thread

Display

sed get everything between two charcters

Similar Threads

replacing charcters for other characters

inserting charcters in a string