Pyspark is .one of the many interfaces in apache spark of python. The pyspark is used for the analyzing and writing of spark applications using the APIs of python. Pyspark is the coding language that is worth learning because it is from python language and apache spark. This language is very great for data analysis at a large scale like building machine pipelines and ETLS for many data platforms.
ittutoria.net – Tutorial & Platform for Developers
ittutoria.net – Tutorial & Platform for Developers is a website on which you can learn the tutorials of many software. But mainly coding software tutorials are common on this website because as the name IT suggests information technology which is the study of software for developers. That is why the software on ittutoria.net – Tutorial & Platform for Developers is coding software.
Mainly python and pysparks tutorials are mostly available on ittutoria.net – Tutorial & Platform for Developers. ittutoria.net – Tutorial & Platform for Developers gives you tutorials of many different things in python and pyspark. One of them is how to convert strings to date in pyspark.
convert string to date Pyspark
According to ittutoria.net – Tutorial & Platform for Developers there are two methods to convert string to date Pyspark. These methods are the with to_date() Function which has two steps the general formula step and the sample coding to convert string to date pyspark step.
And the second method is the convert strings to date pyspark in SQL Pyspark. This method also contains two steps one is the general formula step and the other is the sample coding convert string to date Pyspark in SQL pyspark.
In the first method, you will have to use the to_date function of pyspark to convert the strings to date formats and the date formats will look something like this YYYY-DD-MM and can also be written as YYYY-MM-DD.
The function to_date determines the value of the string in the columns and then it takes the date strings of the column as the first and the patterns of the strings as the second and the syntax of this is like
to_date(col(“string_column_name”),”YYYY-MM-DD”)
When we use the above to_date function for a data frame that is practical then will have to import the function that is being asked for the strings to be converted into the date.
From pyspark.sql.functions import *
df2 = df1.select (col (“column_name”), to_date (col (“column_name”),”YYYY-MM-DD”).alias (“to_date”))
In this case, the corresponding output of the above syntax will be
df2.show ()
The Df1 is the data frame or the data model from which to convert string to date Pyspark. The df2: is the new data model that was created after the conversion of string to date. The to_date is the method used in the conversion of string to date, etc.
To use sample coding to convert string to date in pyspark lets, first of all, try this formula on a simple data frame as given below.
df1=spark.createDataFrame(
data = [ (“1″,”Angela”,”2018-18-07 14:01:23.000″),(“2″,”Amandy”,”2018-21-07 13:04:29.000″),(“3″,”Michalle”,”2018-24-07 06:03:13.009″)],
schema=[“Id”,”Customer Name”,”timestamp”])
df1.printSchema()
As you can see from the above example the timestamp columns strings of the format are like YYYY-MM-DD. Now we will try to convert the data columns by collecting all the timestamps columns in the table to date_converted.
from pyspark.sql.functions import *
df2 = df1.select(col(“timestamp”),to_date(col(“timestamp”),”YYYY-MM-DD”).alias(“to_date”))
df2.show()
The sample coding has converted string format into to_date and returned in the form of a new table as follows.
+———-+———-+
| input | to_date|
+———-+———-+
|2018-18-07|2018-07-18|
|2018-21-07|2023-07-21|
|2018-24-07|2023-07-24|
+———-+———-+
The SQL is also like the above method but uses the SQL instead of the to_date function. But I suggest that you use the above method instead of the SQL method.