I want to find some states who has outstanding features compared with other states.
First I made a new column that can check aging period as “duration”.
After that, I changed award into numeric type, as the greater is the better. I did this because I found that almost over 95% of the data could be categorized by “Double gold”, “Gold”, “Silver”, “Bronze” and this could have order in itself.
In addition, looking into the state information, found that most of the data is composed by the top 5 states with every award so I will decide making other groups except the top 5 states and will set my target more concrete like “Are there any outstanding feature among these top frequently awarded 5 states”
Finihsed wrangling, I put data into PCA at first with the column as follows.
With the output, things below could be considered.
And same thing coulde be confirmed with the heatmap too.
Anyway, returning to the original objective, I would like to found outstanding states, so will using color to check each feature
Overall
Ok too much CA.
Let’s check each states.
NY
Very intersting. Seems NY’s awarded wine has tendency to be cheep. Simultaneously, NY’s awaraded wine has less aging period.
OR
Seems nothing special to be referred.
TX
Seems there is a similarity between NY
WA
Nothing special.
Ok, Let’s visualize what we expect for NY and TX.
Checking the violin plot, it could be said that US awarded wine has more strong feature for cheapness and less aging period.