

contains ( 'Kum & Go', regex = False, case = False ), 'Kum & Go' ), ( df. contains ( "Sam's Club", case = False, regex = False ), "Sam's Club" ), ( df. contains ( "Casey's", case = False, regex = False ), "Casey's General Store" ), ( df. contains ( 'Fareway Stores', case = False, regex = False ), 'Fareway Stores' ), ( df. contains ( 'Walmart|Wal-Mart', case = False ), 'Wal-Mart' ), ( df. contains ( "Smokin' Joe's", case = False, regex = False ), "Smokin' Joe's" ), ( df. contains ( 'Central City', case = False, regex = False ), 'Central City' ), ( df. contains ( 'Hy-Vee', case = False, regex = False ), 'Hy-Vee' ), ( df. To highlight how useful it can be for these data exploration scenarios. It’s not required for the cleaning but I wanted Let’s get started by importing our modules and reading the data.
#Clean text python download#
Due to the size, youĬan download it from the state site for a different time period. That some of the pandas approaches will be relatively slow on your laptop.įor this article, I’ll be using data that includes all of 2019 sales. This is not bigĭata by any means but it is big enough that it can make Excel crawl. Theĭata set for this case is a 565MB CSV file with 24 columns and 2.3M rows. With that data, you can plan your sales process for each of the accounts.Įxcited about the opportunity, you download the data and realize it’s pretty large. You to use your analysis skills to see who the biggest accounts are in the state. That shows all of the liquor sales in the state. Your territory includes Iowa and there just happens to be an open data set


For the sake of this article, let’s say you have a brand new craft whiskey that you would
