How to convert a string to a DataFrame with spaces in the same column… here is a solution to the problem.
How to convert a string to a DataFrame with spaces in the same column
The following is an example string. How do I convert this string to a pandas Dataframe?
str1 =
"""
Feature Id & Feature Desc Status Failed Total
--------------------------------------------------- -------- ------ -----
RKSPACE (RackSpace Test In) Passed 0 1
D1 (Drum 1 Test) Passed 0 1
D2 (Drum 2 Test) Passed 0 1
D3 (Drum 3 Test) Passed 0 1
PRIMUS (PRIMUS Ink Test) Not-run 0 0
RGB (RGB Color Test) Passed 0 1
YONO (App Test) Not-run 0 0
PSENSE (Paper Sensor Test) Not-run 0 0
TFlag (Flag Test) Not-run 0 0
MEMT (Memory Test) Passed 0 1
CRG (CARRIAGE Test) Not-run 0 0
"""
I tried the code below
import pandas as pd
from StringIO import StringIO
def get_dataframe(str1):
test_data = StringIO(str1)
df = pd.read_csv(test_data, sep=r'\s+', comment='--', engine='python')
return df
The results I got were ugly and incorrect.
Result Image
I checked other posts but didn’t find any issues dealing with spaces in strings.
Normally, this will easily get a DataFrame if there are no spaces in the first column, but how do you convert it to a DataFrame that retains the same format as str1?
Any help would be greatly appreciated. Thanks
Solution
You can use >read_fwf
:
str1 = """
Feature Id & Feature Desc Status Failed Total
--------------------------------------------------- -------- ------ -----
RKSPACE (RackSpace Test In) Passed 0 1
D1 (Drum 1 Test) Passed 0 1
D2 (Drum 2 Test) Passed 0 1
D3 (Drum 3 Test) Passed 0 1
PRIMUS (PRIMUS Ink Test) Not-run 0 0
RGB (RGB Color Test) Passed 0 1
YONO (App Test) Not-run 0 0
PSENSE (Paper Sensor Test) Not-run 0 0
TFlag (Flag Test) Not-run 0 0
MEMT (Memory Test) Passed 0 1
CRG (CARRIAGE Test) Not-run 0 0
"""
df = pd.read_fwf(pd.compat.StringIO(str1),
colspecs=[(0, 50), (51, 62), (63, 69), (70, 76)],
skiprows=[2],
header=[1])
print (df)
Feature Id & Feature Desc Status Failed Total
0 RKSPACE (RackSpace Test In) Passed 0 1
1 D1 (Drum 1 Test) Passed 0 1
2 D2 (Drum 2 Test) Passed 0 1
3 D3 (Drum 3 Test) Passed 0 1
4 PRIMUS (PRIMUS Ink Test) Not-run 0 0
5 RGB (RGB Color Test) Passed 0 1
6 YONO (App Test) Not-run 0 0
7 PSENSE (Paper Sensor Test) Not-run 0 0
8 TFlag (Flag Test) Not-run 0 0
9 MEMT (Memory Test) Passed 0 1
10 CRG (CARRIAGE Test) Not-run 0 0
Thanks @gyoza Simplified Solution:
df = pd.read_fwf(pd.compat.StringIO(str1),
skiprows=[2],
header=[1])
print (df)
Feature Id & Feature Desc Status Failed Total
0 RKSPACE (RackSpace Test In) Passed 0 1
1 D1 (Drum 1 Test) Passed 0 1
2 D2 (Drum 2 Test) Passed 0 1
3 D3 (Drum 3 Test) Passed 0 1
4 PRIMUS (PRIMUS Ink Test) Not-run 0 0
5 RGB (RGB Color Test) Passed 0 1
6 YONO (App Test) Not-run 0 0
7 PSENSE (Paper Sensor Test) Not-run 0 0
8 TFlag (Flag Test) Not-run 0 0
9 MEMT (Memory Test) Passed 0 1
10 CRG (CARRIAGE Test) Not-run 0 0